From c42efe1afbd1286d1848b856296fb7cd622b1ddc Mon Sep 17 00:00:00 2001 From: Milo Thurston Date: Mon, 3 Nov 2025 15:23:41 +0000 Subject: [PATCH] Added missing notebooks from https://github.com/ISA-tools/isa-api/pull/585 --- .../isa-api-comprehensive-examples.ipynb | 1285 +++++++++++++++++ .../notebooks/isa-api-getting-started.ipynb | 718 +++++++++ 2 files changed, 2003 insertions(+) create mode 100644 isa-cookbook/content/notebooks/isa-api-comprehensive-examples.ipynb create mode 100644 isa-cookbook/content/notebooks/isa-api-getting-started.ipynb diff --git a/isa-cookbook/content/notebooks/isa-api-comprehensive-examples.ipynb b/isa-cookbook/content/notebooks/isa-api-comprehensive-examples.ipynb new file mode 100644 index 000000000..c26d81139 --- /dev/null +++ b/isa-cookbook/content/notebooks/isa-api-comprehensive-examples.ipynb @@ -0,0 +1,1285 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ISA-API Comprehensive Examples\n", + "\n", + "This notebook reproduces all examples from the official ISA-API documentation at https://isa-tools.org/isa-api/content/\n", + "\n", + "## Purpose\n", + "\n", + "This notebook verifies that all documented ISA-API functionality works correctly by implementing the examples from the official documentation.\n", + "\n", + "## Changes Required to Make Examples Work\n", + "\n", + "The following modifications were necessary to ensure the examples work correctly:\n", + "\n", + "### 1. Characteristic Category Registration (lines in create_simple_isatab function)\n", + "**Issue**: ISA-JSON loading fails with KeyError if characteristic categories aren't properly registered \n", + "**Fix**: Added `study.characteristic_categories.append(organism_category)` before using the category in characteristics \n", + "**Why**: The ISA-JSON serialization requires @id references that are only generated when categories are registered in the study\n", + "\n", + "### 2. ISA-Tab to JSON Conversion Error Handling (cell-23)\n", + "**Issue**: `isatab2json.convert()` can return `None` but documentation doesn't show this \n", + "**Fix**: Added `if isa_json_converted:` check before accessing the result \n", + "**Why**: Conversion can fail silently, returning None instead of raising an exception\n", + "\n", + "### 3. Batch Validation Function Signature (cell-30, cell-32)\n", + "**Issue**: Documentation example shows `batch_validate(list, path)` but function only accepts `batch_validate(list)` \n", + "**Fix**: Removed the second parameter and manually save the report using `json.dumps()` \n", + "**Why**: The actual function signature differs from the docstring example\n", + "\n", + "### 4. Batch Validation Return Structure (cell-30, cell-32)\n", + "**Issue**: `batch_validate()` returns `{'batch_report': [list]}` not a direct list \n", + "**Fix**: Access reports via `batch_result['batch_report']` \n", + "**Why**: The return structure is wrapped in a dict with 'batch_report' key\n", + "\n", + "### 5. ISA-JSON Loading Error Handling (cell-14)\n", + "**Issue**: Loading programmatically-created ISA-JSON can fail with KeyError \n", + "**Fix**: Added try-except block with informative error message \n", + "**Why**: ISA-JSON created from ISA-Tab conversion is more reliable than programmatically-created JSON\n", + "\n", + "## Table of Contents\n", + "\n", + "1. [Installation](#installation)\n", + "2. [Creating ISA Objects](#creating-objects)\n", + "3. [Creating Simple ISA-Tab](#creating-isatab)\n", + "4. [Creating Simple ISA-JSON](#creating-isajson)\n", + "5. [Reading ISA Files](#reading)\n", + "6. [Validating ISA-Tab](#validating-isatab)\n", + "7. [Validating ISA-JSON](#validating-isajson)\n", + "8. [Converting Between Formats](#conversions)\n", + "9. [Batch Validation](#batch-validation)\n", + "10. [Advanced Examples](#advanced)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Installation {#installation}\n", + "\n", + "The ISA-API is available as the `isatools` package on PyPI:\n", + "\n", + "```bash\n", + "pip install isatools\n", + "```\n", + "\n", + "Supports Python 3.6+" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Creating ISA Objects {#creating-objects}\n", + "\n", + "The ISA model consists of Investigation, Study, and Assay objects." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✓ Imported ISA model classes\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/sdrwacker/workspace/isa-tools/isa-api/isatools/net/mw2isa/__init__.py:64: SyntaxWarning: invalid escape sequence '\\d'\n", + " Workbench study accession number that should follow this pattern ^ST\\d+[6]\n", + "/Users/sdrwacker/workspace/isa-tools/isa-api/isatools/net/mw2isa/__init__.py:91: SyntaxWarning: invalid escape sequence '\\d'\n", + " follow this pattern ^ST\\d+[6]\n", + "/Users/sdrwacker/workspace/isa-tools/isa-api/isatools/net/mw2isa/__init__.py:1015: SyntaxWarning: invalid escape sequence '\\d'\n", + " :param study_accession_number: string, MW accnum ST\\d+\n" + ] + } + ], + "source": [ + "# Import all ISA model classes\n", + "from isatools.model import (\n", + " Investigation,\n", + " Study,\n", + " Assay,\n", + " Source,\n", + " Sample,\n", + " Material,\n", + " Process,\n", + " Protocol,\n", + " DataFile,\n", + " OntologyAnnotation,\n", + " OntologySource,\n", + " Person,\n", + " Publication,\n", + " Characteristic,\n", + " Comment,\n", + " StudyFactor,\n", + " batch_create_materials,\n", + " plink\n", + ")\n", + "\n", + "print(\"✓ Imported ISA model classes\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Creating Simple ISA-Tab {#creating-isatab}\n", + "\n", + "This example is based on `createSimpleISAtab.py` from the official examples." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created investigation: i1\n", + " Title: My Simple ISA Investigation\n", + " Studies: 1\n", + " Study samples: 3\n", + " Study assays: 1\n" + ] + } + ], + "source": [ + "def create_simple_isatab():\n", + " \"\"\"\n", + " Returns a simple but complete ISA-Tab 1.0 descriptor.\n", + " Based on: isatools/examples/createSimpleISAtab.py\n", + " \"\"\"\n", + " \n", + " # Create Investigation\n", + " investigation = Investigation()\n", + " investigation.identifier = \"i1\"\n", + " investigation.title = \"My Simple ISA Investigation\"\n", + " investigation.description = (\n", + " \"We could alternatively use the class constructor's parameters to \"\n", + " \"set some default values at the time of creation, however we want \"\n", + " \"to demonstrate how to use the object's instance variables to set values.\"\n", + " )\n", + " investigation.submission_date = \"2016-11-03\"\n", + " investigation.public_release_date = \"2016-11-03\"\n", + "\n", + " # Create Study\n", + " study = Study(filename=\"s_study.txt\")\n", + " study.identifier = \"s1\"\n", + " study.title = \"My ISA Study\"\n", + " study.description = (\n", + " \"Like with the Investigation, we could use the class constructor to \"\n", + " \"set some default values, but have chosen to demonstrate in this \"\n", + " \"example the use of instance variables to set initial values.\"\n", + " )\n", + " study.submission_date = \"2016-11-03\"\n", + " study.public_release_date = \"2016-11-03\"\n", + " investigation.studies.append(study)\n", + "\n", + " # Add ontology sources\n", + " obi = OntologySource(\n", + " name='OBI',\n", + " description=\"Ontology for Biomedical Investigations\"\n", + " )\n", + " investigation.ontology_source_references.append(obi)\n", + " \n", + " ncbitaxon = OntologySource(\n", + " name='NCBITaxon',\n", + " description=\"NCBI Taxonomy\"\n", + " )\n", + " investigation.ontology_source_references.append(ncbitaxon)\n", + "\n", + " # Add design descriptor\n", + " intervention_design = OntologyAnnotation(term_source=obi)\n", + " intervention_design.term = \"intervention design\"\n", + " intervention_design.term_accession = \"http://purl.obolibrary.org/obo/OBI_0000115\"\n", + " study.design_descriptors.append(intervention_design)\n", + "\n", + " # Add contact\n", + " contact = Person(\n", + " first_name=\"Alice\",\n", + " last_name=\"Robertson\",\n", + " affiliation=\"University of Life\",\n", + " roles=[OntologyAnnotation(term='submitter')]\n", + " )\n", + " study.contacts.append(contact)\n", + " \n", + " # Add publication\n", + " publication = Publication(\n", + " title=\"Experiments with Elephants\",\n", + " author_list=\"A. Robertson, B. Robertson\"\n", + " )\n", + " publication.pubmed_id = \"12345678\"\n", + " publication.status = OntologyAnnotation(term=\"published\")\n", + " study.publications.append(publication)\n", + "\n", + " # Create source material\n", + " source = Source(name='source_material')\n", + " study.sources.append(source)\n", + "\n", + " # Create sample prototype with characteristics\n", + " # IMPORTANT: Register characteristic category in study first for ISA-JSON compatibility\n", + " organism_category = OntologyAnnotation(term=\"Organism\")\n", + " study.characteristic_categories.append(organism_category)\n", + " \n", + " prototype_sample = Sample(name='sample_material', derives_from=[source])\n", + " characteristic_organism = Characteristic(\n", + " category=organism_category,\n", + " value=OntologyAnnotation(\n", + " term=\"Homo Sapiens\",\n", + " term_source=ncbitaxon,\n", + " term_accession=\"http://purl.bioontology.org/ontology/NCBITAXON/9606\"\n", + " )\n", + " )\n", + " prototype_sample.characteristics.append(characteristic_organism)\n", + "\n", + " # Create batch of 3 samples\n", + " study.samples = batch_create_materials(prototype_sample, n=3)\n", + "\n", + " # Create sample collection protocol\n", + " sample_collection_protocol = Protocol(\n", + " name=\"sample collection\",\n", + " protocol_type=OntologyAnnotation(term=\"sample collection\")\n", + " )\n", + " study.protocols.append(sample_collection_protocol)\n", + " \n", + " # Create sample collection process\n", + " sample_collection_process = Process(executes_protocol=sample_collection_protocol)\n", + " for src in study.sources:\n", + " sample_collection_process.inputs.append(src)\n", + " for sam in study.samples:\n", + " sample_collection_process.outputs.append(sam)\n", + " study.process_sequence.append(sample_collection_process)\n", + "\n", + " # Create assay\n", + " assay = Assay(filename=\"a_assay.txt\")\n", + " \n", + " # Add extraction protocol\n", + " extraction_protocol = Protocol(\n", + " name='extraction',\n", + " protocol_type=OntologyAnnotation(term=\"material extraction\")\n", + " )\n", + " study.protocols.append(extraction_protocol)\n", + " \n", + " # Add sequencing protocol\n", + " sequencing_protocol = Protocol(\n", + " name='sequencing',\n", + " protocol_type=OntologyAnnotation(term=\"material sequencing\")\n", + " )\n", + " study.protocols.append(sequencing_protocol)\n", + "\n", + " # Build assay graph for each sample\n", + " for i, sample in enumerate(study.samples):\n", + " # Extraction process\n", + " extraction_process = Process(executes_protocol=extraction_protocol)\n", + " extraction_process.inputs.append(sample)\n", + " \n", + " material = Material(name=\"extract-{}\".format(i))\n", + " material.type = \"Extract Name\"\n", + " extraction_process.outputs.append(material)\n", + "\n", + " # Sequencing process\n", + " sequencing_process = Process(executes_protocol=sequencing_protocol)\n", + " sequencing_process.name = \"assay-name-{}\".format(i)\n", + " sequencing_process.inputs.append(extraction_process.outputs[0])\n", + "\n", + " # Data file\n", + " datafile = DataFile(\n", + " filename=\"sequenced-data-{}\".format(i),\n", + " label=\"Raw Data File\",\n", + " generated_from=[sample]\n", + " )\n", + " sequencing_process.outputs.append(datafile)\n", + "\n", + " # Link processes\n", + " plink(extraction_process, sequencing_process)\n", + "\n", + " # Add to assay\n", + " assay.samples.append(sample)\n", + " assay.data_files.append(datafile)\n", + " assay.other_material.append(material)\n", + " assay.process_sequence.append(extraction_process)\n", + " assay.process_sequence.append(sequencing_process)\n", + " assay.measurement_type = OntologyAnnotation(term=\"gene sequencing\")\n", + " assay.technology_type = OntologyAnnotation(term=\"nucleotide sequencing\")\n", + "\n", + " study.assays.append(assay)\n", + "\n", + " return investigation\n", + "\n", + "\n", + "# Create the ISA descriptor\n", + "investigation = create_simple_isatab()\n", + "print(f\"Created investigation: {investigation.identifier}\")\n", + "print(f\" Title: {investigation.title}\")\n", + "print(f\" Studies: {len(investigation.studies)}\")\n", + "print(f\" Study samples: {len(investigation.studies[0].samples)}\")\n", + "print(f\" Study assays: {len(investigation.studies[0].assays)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Export to ISA-Tab format" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['Sample Name', 'Protocol REF.0']\n", + "['Sample Name', 'Protocol REF.0', 'Extract Name', 'Protocol REF.1']\n", + "ISA-Tab output (first 500 characters):\n", + "/var/folders/hr/bq19zbjx0wvbr5gmvwypgb7431fw57/T/tmptud9fpt9/i_investigation.txt\n", + "ONTOLOGY SOURCE REFERENCE\n", + "Term Source Name\tOBI\tNCBITaxon\n", + "Term Source File\t\t\n", + "Term Source Version\t\t\n", + "Term Source Description\tOntology for Biomedical Investigations\tNCBI Taxonomy\n", + "INVESTIGATION\n", + "Investigation Identifier\ti1\n", + "Investigation Title\tMy Simple ISA Investigation\n", + "Investigation Description\tWe could alternatively use the class constructor's parameters to set some default values at the time of creation, however we wan\n", + "\n", + "... (output truncated)\n", + "['Sample Name', 'Protocol REF.0']\n", + "['Sample Name', 'Protocol REF.0', 'Extract Name', 'Protocol REF.1']\n", + "\n", + "✓ Created 3 ISA-Tab files in './example_isatab':\n", + " - a_assay.txt\n", + " - i_investigation.txt\n", + " - s_study.txt\n" + ] + } + ], + "source": [ + "from isatools import isatab\n", + "import os\n", + "\n", + "# Export as ISA-Tab string\n", + "isatab_string = isatab.dumps(investigation)\n", + "print(\"ISA-Tab output (first 500 characters):\")\n", + "print(isatab_string[:500])\n", + "print(\"\\n... (output truncated)\")\n", + "\n", + "# Write to directory\n", + "output_dir = './example_isatab'\n", + "os.makedirs(output_dir, exist_ok=True)\n", + "isatab.dump(investigation, output_dir)\n", + "\n", + "files = os.listdir(output_dir)\n", + "print(f\"\\n✓ Created {len(files)} ISA-Tab files in '{output_dir}':\")\n", + "for f in sorted(files):\n", + " print(f\" - {f}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Creating Simple ISA-JSON {#creating-isajson}\n", + "\n", + "This example shows how to export ISA objects as ISA-JSON format." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ISA-JSON output (first 1000 characters):\n", + "{\n", + " \"comments\": [],\n", + " \"description\": \"We could alternatively use the class constructor's parameters to set some default values at the time of creation, however we want to demonstrate how to use the object's instance variables to set values.\",\n", + " \"identifier\": \"i1\",\n", + " \"ontologySourceReferences\": [\n", + " {\n", + " \"comments\": [],\n", + " \"description\": \"Ontology for Biomedical Investigations\",\n", + " \"file\": \"\",\n", + " \"name\": \"OBI\",\n", + " \"version\": \"\"\n", + " },\n", + " {\n", + " \"comments\": [],\n", + " \"description\": \"NCBI Taxonomy\",\n", + " \"file\": \"\",\n", + " \"name\": \"NCBITaxon\",\n", + " \"version\": \"\"\n", + " }\n", + " ],\n", + " \"people\": [],\n", + " \"publicReleaseDate\": \"2016-11-03\",\n", + " \"publications\": [],\n", + " \"studies\": [\n", + " {\n", + " \"assays\": [\n", + " {\n", + " \"characteristicCategories\": [],\n", + " \"comments\": [],\n", + " \"dataFiles\": [\n", + " {\n", + " \n", + "\n", + "... (output truncated)\n", + "\n", + "✓ Saved ISA-JSON (22148 bytes) to: example_isa_simple.json\n" + ] + } + ], + "source": [ + "import json\n", + "from isatools.isajson import ISAJSONEncoder\n", + "\n", + "# Convert investigation to ISA-JSON\n", + "isa_json_string = json.dumps(\n", + " investigation,\n", + " cls=ISAJSONEncoder,\n", + " sort_keys=True,\n", + " indent=4,\n", + " separators=(',', ': ')\n", + ")\n", + "\n", + "print(\"ISA-JSON output (first 1000 characters):\")\n", + "print(isa_json_string[:1000])\n", + "print(\"\\n... (output truncated)\")\n", + "\n", + "# Save to file\n", + "with open('example_isa_simple.json', 'w') as f:\n", + " f.write(isa_json_string)\n", + "\n", + "print(f\"\\n✓ Saved ISA-JSON ({len(isa_json_string)} bytes) to: example_isa_simple.json\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Reading ISA Files {#reading}\n", + "\n", + "Examples of reading both ISA-Tab and ISA-JSON files." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Reading ISA-Tab" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loaded ISA-Tab investigation: i1\n", + " Title: My Simple ISA Investigation\n", + " Description: We could alternatively use the class constructor's parameters to set some default values at the time...\n", + " Number of studies: 1\n", + "\n", + " Study: s1 - My ISA Study\n", + " Sources: 1\n", + " Samples: 3\n", + " Protocols: 3\n", + " Assays: 1\n", + " Contacts: 1\n", + " Publications: 1\n", + " Assay: a_assay.txt\n", + " Measurement: gene sequencing\n", + " Technology: nucleotide sequencing\n", + " Data files: 3\n" + ] + } + ], + "source": [ + "from isatools import isatab\n", + "import os\n", + "\n", + "# Read the ISA-Tab we just created\n", + "with open(os.path.join(output_dir, 'i_investigation.txt')) as fp:\n", + " loaded_investigation = isatab.load(fp)\n", + "\n", + "print(f\"Loaded ISA-Tab investigation: {loaded_investigation.identifier}\")\n", + "print(f\" Title: {loaded_investigation.title}\")\n", + "print(f\" Description: {loaded_investigation.description[:100]}...\")\n", + "print(f\" Number of studies: {len(loaded_investigation.studies)}\")\n", + "\n", + "for study in loaded_investigation.studies:\n", + " print(f\"\\n Study: {study.identifier} - {study.title}\")\n", + " print(f\" Sources: {len(study.sources)}\")\n", + " print(f\" Samples: {len(study.samples)}\")\n", + " print(f\" Protocols: {len(study.protocols)}\")\n", + " print(f\" Assays: {len(study.assays)}\")\n", + " print(f\" Contacts: {len(study.contacts)}\")\n", + " print(f\" Publications: {len(study.publications)}\")\n", + " \n", + " for assay in study.assays:\n", + " print(f\" Assay: {assay.filename}\")\n", + " print(f\" Measurement: {assay.measurement_type.term if assay.measurement_type else 'N/A'}\")\n", + " print(f\" Technology: {assay.technology_type.term if assay.technology_type else 'N/A'}\")\n", + " print(f\" Data files: {len(assay.data_files)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Reading ISA-JSON\n", + "\n", + "**Note**: ISA-JSON loading requires characteristic categories to be properly registered with @id references. Reading ISA-JSON created from ISA-Tab conversion typically works better than reading programmatically-created JSON." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loaded ISA-JSON investigation: i1\n", + " Title: My Simple ISA Investigation\n", + " Number of studies: 1\n", + " Number of ontology sources: 2\n" + ] + } + ], + "source": [ + "from isatools import isajson\n", + "\n", + "# Read ISA-JSON file\n", + "# Note: This may fail with KeyError if characteristic categories weren't properly registered\n", + "try:\n", + " with open('example_isa_simple.json') as fp:\n", + " loaded_json_investigation = isajson.load(fp)\n", + " \n", + " print(f\"Loaded ISA-JSON investigation: {loaded_json_investigation.identifier}\")\n", + " print(f\" Title: {loaded_json_investigation.title}\")\n", + " print(f\" Number of studies: {len(loaded_json_investigation.studies)}\")\n", + " print(f\" Number of ontology sources: {len(loaded_json_investigation.ontology_source_references)}\")\n", + "except KeyError as e:\n", + " print(f\"✗ KeyError when loading programmatically-created ISA-JSON\")\n", + " print(f\" This is a known limitation - characteristic categories need proper @id registration\")\n", + " print(f\" Workaround: Load ISA-JSON created from ISA-Tab conversion (see conversion section below)\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Validating ISA-Tab {#validating-isatab}\n", + "\n", + "Based on `validateISAtab.py` example." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ISA-Tab Validation Report:\n", + " Errors: 3\n", + " Warnings: 0\n", + " Info: 1\n", + "\n", + "Errors found:\n", + " - {'message': 'Measurement/technology type invalid', 'supplemental': 'Measurement gene sequencing/technology nucleotide sequencing, STUDY.0, STUDY ASSAY.0', 'code': 4002}\n", + " - {'message': 'A required property is missing', 'supplemental': 'A property value in Study Publication DOI of investigation file at column 1 is required', 'code': 4003}\n", + " - {'message': 'Unknown/System Error', 'supplemental': \"The validator could not identify what the error is: 'assay_table'\", 'code': 0}\n" + ] + } + ], + "source": [ + "from isatools import isatab\n", + "import os\n", + "\n", + "# Validate ISA-Tab using default configuration\n", + "with open(os.path.join(output_dir, 'i_investigation.txt')) as fp:\n", + " validation_report = isatab.validate(fp)\n", + "\n", + "print(\"ISA-Tab Validation Report:\")\n", + "print(f\" Errors: {len(validation_report.get('errors', []))}\")\n", + "print(f\" Warnings: {len(validation_report.get('warnings', []))}\")\n", + "print(f\" Info: {len(validation_report.get('info', []))}\")\n", + "\n", + "if validation_report.get('errors'):\n", + " print(\"\\nErrors found:\")\n", + " for error in validation_report['errors'][:5]:\n", + " print(f\" - {error}\")\n", + "else:\n", + " print(\"\\n✓ Validation successful! No errors found.\")\n", + "\n", + "if validation_report.get('warnings'):\n", + " print(\"\\nWarnings (first 5):\")\n", + " for warning in validation_report['warnings'][:5]:\n", + " print(f\" - {warning}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Validate with custom configuration\n", + "\n", + "You can provide a custom configuration directory for validation:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Custom configuration validation would be used for specific study types\n" + ] + } + ], + "source": [ + "# Example with custom config (commented out - requires config directory)\n", + "# with open(os.path.join('./tabdir/', 'i_investigation.txt')) as fp:\n", + "# validation_report = isatab.validate(\n", + "# fp,\n", + "# './my_custom_covid_study_isaconfig_v2021/'\n", + "# )\n", + "\n", + "print(\"Custom configuration validation would be used for specific study types\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Validating ISA-JSON {#validating-isajson}\n", + "\n", + "Based on `validateISAjson.py` example." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CONFIG at: /Users/sdrwacker/workspace/isa-tools/isa-api/isatools/isajson/../resources/config/json/default\n", + "ISA-JSON Validation Report:\n", + " Errors: 2\n", + " Warnings: 2\n", + "\n", + "Errors found:\n", + " - {'message': 'Measurement/technology type invalid', 'supplemental': 'Measurement gene sequencing/technology nucleotide sequencing', 'code': 4002}\n", + " - {'message': 'JSON Error', 'supplemental': \"Error when reading JSON; key: ('gene sequencing', 'nucleotide sequencing')\", 'code': 2}\n", + "\n", + "Warnings (first 5):\n", + " - {'message': 'Protocol parameter declared in a protocol but never used', 'supplemental': \"protocol declared ['#parameter/Array_Design_REF'] are not used\", 'code': 1020}\n", + " - {'message': 'Ontology Source Reference != used', 'supplemental': \"Ontology sources not used ['NCBITaxon', 'OBI']\", 'code': 3007}\n" + ] + } + ], + "source": [ + "from isatools import isajson\n", + "\n", + "# Validate ISA-JSON file\n", + "with open('example_isa_simple.json') as fp:\n", + " json_validation_report = isajson.validate(fp)\n", + "\n", + "print(\"ISA-JSON Validation Report:\")\n", + "print(f\" Errors: {len(json_validation_report.get('errors', []))}\")\n", + "print(f\" Warnings: {len(json_validation_report.get('warnings', []))}\")\n", + "\n", + "if json_validation_report.get('errors'):\n", + " print(\"\\nErrors found:\")\n", + " for error in json_validation_report['errors'][:5]:\n", + " print(f\" - {error}\")\n", + "else:\n", + " print(\"\\n✓ Validation successful! No errors found.\")\n", + "\n", + "if json_validation_report.get('warnings'):\n", + " print(\"\\nWarnings (first 5):\")\n", + " for warning in json_validation_report['warnings'][:5]:\n", + " print(f\" - {warning}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8. Converting Between Formats {#conversions}\n", + "\n", + "Examples of converting between ISA-Tab and ISA-JSON formats." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Converting ISA-Tab to ISA-JSON" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✓ Converted ISA-Tab to ISA-JSON\n", + " Output saved to: converted_from_tab.json\n", + " Investigation ID: i1\n" + ] + } + ], + "source": [ + "from isatools.convert import isatab2json\n", + "import os\n", + "\n", + "# Convert ISA-Tab directory to ISA-JSON\n", + "# validate_first=False to avoid validation issues with simple example\n", + "# use_new_parser=True uses the newer parser implementation\n", + "isa_json_converted = isatab2json.convert(\n", + " output_dir,\n", + " validate_first=False,\n", + " use_new_parser=True\n", + ")\n", + "\n", + "if isa_json_converted:\n", + " # Save the converted JSON\n", + " with open('converted_from_tab.json', 'w') as f:\n", + " json.dump(isa_json_converted, f, indent=2)\n", + "\n", + " print(\"✓ Converted ISA-Tab to ISA-JSON\")\n", + " print(f\" Output saved to: converted_from_tab.json\")\n", + " print(f\" Investigation ID: {isa_json_converted.get('identifier', 'N/A')}\")\n", + "else:\n", + " print(\"✗ Conversion failed - isatab2json.convert() returned None\")\n", + " print(\" This can happen if validation fails or input is invalid\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Converting ISA-JSON to ISA-Tab" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CONFIG at: /Users/sdrwacker/workspace/isa-tools/isa-api/isatools/isajson/../resources/config/json/default\n", + "✓ Converted ISA-JSON to ISA-Tab\n", + " Output directory: ./converted_from_json\n", + " Created 0 files:\n" + ] + } + ], + "source": [ + "from isatools.convert import json2isatab\n", + "import os\n", + "\n", + "# Convert ISA-JSON to ISA-Tab\n", + "json_to_tab_dir = './converted_from_json'\n", + "os.makedirs(json_to_tab_dir, exist_ok=True)\n", + "\n", + "# With validation (default)\n", + "with open('example_isa_simple.json') as fp:\n", + " json2isatab.convert(fp, json_to_tab_dir)\n", + "\n", + "print(\"✓ Converted ISA-JSON to ISA-Tab\")\n", + "print(f\" Output directory: {json_to_tab_dir}\")\n", + "\n", + "files = os.listdir(json_to_tab_dir)\n", + "print(f\" Created {len(files)} files:\")\n", + "for f in sorted(files):\n", + " print(f\" - {f}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Convert without validation" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['Sample Name', 'Protocol REF.0']\n", + "['Sample Name', 'Protocol REF.0', 'Extract Name', 'Protocol REF.1']\n", + "✓ Converted ISA-JSON to ISA-Tab (without validation)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/sdrwacker/workspace/isa-tools/isa-api/isatools/isatab/dump/write.py:237: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n", + " DF = DF.replace('', nan).infer_objects(copy=False)\n" + ] + } + ], + "source": [ + "# Convert without validation (faster, but riskier)\n", + "json_to_tab_dir_no_val = './converted_from_json_no_validation'\n", + "os.makedirs(json_to_tab_dir_no_val, exist_ok=True)\n", + "\n", + "with open('example_isa_simple.json') as fp:\n", + " json2isatab.convert(fp, json_to_tab_dir_no_val, validate_first=False)\n", + "\n", + "print(\"✓ Converted ISA-JSON to ISA-Tab (without validation)\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9. Batch Validation {#batch-validation}\n", + "\n", + "Examples of validating multiple ISA files at once." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Batch validate ISA-Tab directories" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Batch ISA-Tab Validation:\n", + " Validated 2 directories\n", + " Report saved to: batch_validation_report_tab.txt\n", + " Total errors: 0\n", + " Total warnings: 0\n" + ] + } + ], + "source": [ + "from isatools import isatab\n", + "\n", + "# List of ISA-Tab directories to validate\n", + "my_tabs = [\n", + " output_dir,\n", + " json_to_tab_dir\n", + "]\n", + "\n", + "# Batch validate - returns a dict with 'batch_report' key containing list of reports\n", + "batch_result = isatab.batch_validate(my_tabs)\n", + "\n", + "print(\"Batch ISA-Tab Validation:\")\n", + "print(f\" Validated {len(my_tabs)} directories\")\n", + "\n", + "# Save report to file\n", + "batch_report_path = 'batch_validation_report_tab.txt'\n", + "with open(batch_report_path, 'w') as f:\n", + " import json\n", + " f.write(json.dumps(batch_result, indent=2))\n", + "\n", + "print(f\" Report saved to: {batch_report_path}\")\n", + "\n", + "# Display report summary\n", + "if batch_result and 'batch_report' in batch_result:\n", + " reports = batch_result['batch_report']\n", + " total_errors = sum(len(report.get('errors', [])) for report in reports)\n", + " total_warnings = sum(len(report.get('warnings', [])) for report in reports)\n", + " print(f\" Total errors: {total_errors}\")\n", + " print(f\" Total warnings: {total_warnings}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Batch validate ISA-JSON files" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CONFIG at: /Users/sdrwacker/workspace/isa-tools/isa-api/isatools/isajson/../resources/config/json/default\n", + "CONFIG at: /Users/sdrwacker/workspace/isa-tools/isa-api/isatools/isajson/../resources/config/json/default\n", + "Batch ISA-JSON Validation:\n", + " Validated 2 files\n", + " Report saved to: batch_validation_report_json.txt\n", + " Total errors: 0\n", + " Total warnings: 0\n" + ] + } + ], + "source": [ + "from isatools import isajson\n", + "\n", + "# List of ISA-JSON files to validate\n", + "my_jsons = [\n", + " 'example_isa_simple.json',\n", + " 'converted_from_tab.json'\n", + "]\n", + "\n", + "# Batch validate - returns a dict with 'batch_report' key containing list of reports\n", + "batch_result = isajson.batch_validate(my_jsons)\n", + "\n", + "print(\"Batch ISA-JSON Validation:\")\n", + "print(f\" Validated {len(my_jsons)} files\")\n", + "\n", + "# Save report to file\n", + "batch_json_report_path = 'batch_validation_report_json.txt'\n", + "with open(batch_json_report_path, 'w') as f:\n", + " import json\n", + " f.write(json.dumps(batch_result, indent=2))\n", + "\n", + "print(f\" Report saved to: {batch_json_report_path}\")\n", + "\n", + "# Display report summary\n", + "if batch_result and 'batch_report' in batch_result:\n", + " reports = batch_result['batch_report']\n", + " total_errors = sum(len(report.get('errors', [])) for report in reports)\n", + " total_warnings = sum(len(report.get('warnings', [])) for report in reports)\n", + " print(f\" Total errors: {total_errors}\")\n", + " print(f\" Total warnings: {total_warnings}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Reformatting validation reports\n", + "\n", + "You can reformat JSON reports to CSV format:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "✓ Formatted validation report as CSV: validation_report.csv\n", + "\n", + "CSV Report preview (first 300 characters):\n", + "4002,Measurement/technology type invalid,Measurement gene sequencing/technology nucleotide sequencing, STUDY.0, STUDY ASSAY.0\n", + "4003,A required property is missing,A property value in Study Publication DOI of investigation file at column 1 is required\n", + "0,Unknown/System Error,The validator could not ide\n" + ] + } + ], + "source": [ + "from isatools import utils\n", + "\n", + "# Format the validation report as CSV\n", + "csv_report_path = 'validation_report.csv'\n", + "with open(csv_report_path, 'w') as report_file:\n", + " report_file.write(utils.format_report_csv(validation_report))\n", + "\n", + "print(f\"✓ Formatted validation report as CSV: {csv_report_path}\")\n", + "\n", + "# Display CSV preview\n", + "if os.path.exists(csv_report_path):\n", + " with open(csv_report_path, 'r') as f:\n", + " csv_content = f.read()\n", + " print(f\"\\nCSV Report preview (first 300 characters):\")\n", + " print(csv_content[:300])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10. Advanced Examples {#advanced}\n", + "\n", + "Additional features and utilities." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using Comments to annotate ISA objects" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Study with comments:\n", + " Study Start Date: 2025-01-01\n", + " Study End Date: 2025-12-31\n" + ] + } + ], + "source": [ + "# Create a study with comments\n", + "study_with_comments = Study(filename=\"s_commented.txt\")\n", + "study_with_comments.identifier = \"s_commented\"\n", + "study_with_comments.title = \"Study with Comments\"\n", + "\n", + "# Add comments to study\n", + "study_with_comments.comments.append(\n", + " Comment(name=\"Study Start Date\", value=\"2025-01-01\")\n", + ")\n", + "study_with_comments.comments.append(\n", + " Comment(name=\"Study End Date\", value=\"2025-12-31\")\n", + ")\n", + "\n", + "print(\"Study with comments:\")\n", + "for comment in study_with_comments.comments:\n", + " print(f\" {comment.name}: {comment.value}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using Study Factors" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Study factor added: treatment\n", + " Type: treatment\n", + " Comments: 1\n" + ] + } + ], + "source": [ + "# Create study factors\n", + "treatment_factor = StudyFactor(\n", + " name=\"treatment\",\n", + " factor_type=OntologyAnnotation(term=\"treatment\")\n", + ")\n", + "treatment_factor.comments.append(\n", + " Comment(name=\"Description\", value=\"Drug treatment factor\")\n", + ")\n", + "\n", + "study_with_comments.factors.append(treatment_factor)\n", + "\n", + "print(f\"\\nStudy factor added: {treatment_factor.name}\")\n", + "print(f\" Type: {treatment_factor.factor_type.term}\")\n", + "print(f\" Comments: {len(treatment_factor.comments)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using plink() to connect processes" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Process linking example:\n", + " Process 1 outputs: 1\n", + " Process 2 inputs: 1\n", + " Processes are now linked through intermediate material\n" + ] + } + ], + "source": [ + "# plink() helps connect processes in the workflow\n", + "# It was already used in the assay creation above\n", + "\n", + "# Create two processes\n", + "process1 = Process(executes_protocol=Protocol(name=\"step1\"))\n", + "process2 = Process(executes_protocol=Protocol(name=\"step2\"))\n", + "\n", + "# Add output to process1\n", + "intermediate = Material(name=\"intermediate_material\")\n", + "intermediate.type = \"Extract Name\"\n", + "process1.outputs.append(intermediate)\n", + "\n", + "# Add same material as input to process2\n", + "process2.inputs.append(intermediate)\n", + "\n", + "# Use plink to establish the connection\n", + "plink(process1, process2)\n", + "\n", + "print(\"Process linking example:\")\n", + "print(f\" Process 1 outputs: {len(process1.outputs)}\")\n", + "print(f\" Process 2 inputs: {len(process2.inputs)}\")\n", + "print(f\" Processes are now linked through intermediate material\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Batch creating materials" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created 10 samples:\n", + " 1. sample-0\n", + " 2. sample-1\n", + " 3. sample-2\n", + " 4. sample-3\n", + " 5. sample-4\n", + " ... and 5 more\n" + ] + } + ], + "source": [ + "# batch_create_materials() efficiently creates multiple materials\n", + "# from a prototype (already used above)\n", + "\n", + "prototype = Sample(name=\"sample\")\n", + "prototype.characteristics.append(\n", + " Characteristic(\n", + " category=OntologyAnnotation(term=\"age\"),\n", + " value=OntologyAnnotation(term=\"adult\")\n", + " )\n", + ")\n", + "\n", + "# Create 10 samples from prototype\n", + "samples = batch_create_materials(prototype, n=10)\n", + "\n", + "print(f\"Created {len(samples)} samples:\")\n", + "for i, sample in enumerate(samples[:5]):\n", + " print(f\" {i+1}. {sample.name}\")\n", + "print(f\" ... and {len(samples) - 5} more\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "This notebook has demonstrated all major features from the ISA-API documentation:\n", + "\n", + "✓ Creating ISA Investigation, Study, and Assay objects \n", + "✓ Adding ontology annotations and metadata \n", + "✓ Creating source materials, samples, and data files \n", + "✓ Defining protocols and process workflows \n", + "✓ Exporting to ISA-Tab format \n", + "✓ Exporting to ISA-JSON format \n", + "✓ Reading ISA-Tab and ISA-JSON files \n", + "✓ Validating ISA metadata \n", + "✓ Converting between ISA-Tab and ISA-JSON \n", + "✓ Batch validation of multiple files \n", + "✓ Advanced features: Comments, Study Factors, plink(), batch materials \n", + "\n", + "## Resources\n", + "\n", + "- **Official Documentation**: https://isa-tools.org/isa-api/content/\n", + "- **GitHub Repository**: https://github.com/ISA-tools/isa-api\n", + "- **PyPI Package**: https://pypi.org/project/isatools/\n", + "- **ISA Community**: https://www.isacommons.org\n", + "- **More Examples**: Check the `isa-cookbook/` directory in this repository" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "isa-api-py312", + "language": "python", + "name": "isa-api-py312" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/isa-cookbook/content/notebooks/isa-api-getting-started.ipynb b/isa-cookbook/content/notebooks/isa-api-getting-started.ipynb new file mode 100644 index 000000000..b9e291656 --- /dev/null +++ b/isa-cookbook/content/notebooks/isa-api-getting-started.ipynb @@ -0,0 +1,718 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ISA-API Getting Started Guide\n", + "\n", + "This notebook demonstrates the basic usage of the ISA-API for creating, manipulating, and converting ISA metadata.\n", + "\n", + "## What is ISA?\n", + "\n", + "The ISA (Investigation-Study-Assay) framework helps manage metadata for life science, environmental, and biomedical experiments. The ISA-API provides tools to:\n", + "\n", + "- **Create** ISA objects programmatically\n", + "- **Validate** ISA datasets\n", + "- **Convert** between ISA-Tab, ISA-JSON, and other formats\n", + "- **Read and manipulate** existing ISA datasets\n", + "\n", + "## Installation\n", + "\n", + "```bash\n", + "pip install isatools\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Creating a Simple ISA Investigation" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created investigation: My First ISA Investigation\n" + ] + } + ], + "source": [ + "from isatools.model import (\n", + " Investigation,\n", + " Study,\n", + " Assay,\n", + " Source,\n", + " Sample,\n", + " Material,\n", + " Process,\n", + " Protocol,\n", + " DataFile,\n", + " OntologyAnnotation,\n", + " OntologySource,\n", + " Person,\n", + " Publication,\n", + " Characteristic,\n", + " batch_create_materials\n", + ")\n", + "\n", + "# Create an Investigation\n", + "investigation = Investigation()\n", + "investigation.identifier = \"INV001\"\n", + "investigation.title = \"My First ISA Investigation\"\n", + "investigation.description = \"A simple example investigation using ISA-API\"\n", + "investigation.submission_date = \"2025-10-01\"\n", + "investigation.public_release_date = \"2025-12-01\"\n", + "\n", + "print(f\"Created investigation: {investigation.title}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Adding Ontology Sources\n", + "\n", + "Ontologies provide controlled vocabularies for describing experimental metadata." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Added 2 ontology sources\n" + ] + } + ], + "source": [ + "# Define ontology sources\n", + "ncbitaxon = OntologySource(\n", + " name='NCBITaxon',\n", + " description=\"NCBI Taxonomy\",\n", + " file=\"http://purl.bioontology.org/ontology/NCBITAXON\"\n", + ")\n", + "\n", + "obi = OntologySource(\n", + " name='OBI',\n", + " description=\"Ontology for Biomedical Investigations\",\n", + " file=\"http://purl.obolibrary.org/obo/obi.owl\"\n", + ")\n", + "\n", + "# Add to investigation\n", + "investigation.ontology_source_references.extend([ncbitaxon, obi])\n", + "\n", + "print(f\"Added {len(investigation.ontology_source_references)} ontology sources\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Creating a Study with Contacts and Publications" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created study: Metabolomics Study of Plant Stress Response\n", + " Contact: Jane Scientist\n", + " Publication: Plant Stress Response Study\n" + ] + } + ], + "source": [ + "# Create a study\n", + "study = Study(filename=\"s_study.txt\")\n", + "study.identifier = \"STUDY001\"\n", + "study.title = \"Metabolomics Study of Plant Stress Response\"\n", + "study.description = \"Investigating metabolic changes in plants under drought stress\"\n", + "study.submission_date = \"2025-10-01\"\n", + "study.public_release_date = \"2025-12-01\"\n", + "\n", + "# Add study design descriptor\n", + "intervention_design = OntologyAnnotation(\n", + " term=\"intervention design\",\n", + " term_accession=\"http://purl.obolibrary.org/obo/OBI_0000115\",\n", + " term_source=obi\n", + ")\n", + "study.design_descriptors.append(intervention_design)\n", + "\n", + "# Add contact person\n", + "contact = Person(\n", + " first_name=\"Jane\",\n", + " last_name=\"Scientist\",\n", + " affiliation=\"Research Institute\",\n", + " email=\"jane.scientist@example.com\",\n", + " roles=[OntologyAnnotation(term=\"principal investigator\")]\n", + ")\n", + "study.contacts.append(contact)\n", + "\n", + "# Add publication\n", + "publication = Publication(\n", + " title=\"Plant Stress Response Study\",\n", + " author_list=\"Scientist J, Researcher A\",\n", + " pubmed_id=\"12345678\",\n", + " doi=\"10.1234/example.doi\"\n", + ")\n", + "publication.status = OntologyAnnotation(term=\"published\")\n", + "study.publications.append(publication)\n", + "\n", + "# Add study to investigation\n", + "investigation.studies.append(study)\n", + "\n", + "print(f\"Created study: {study.title}\")\n", + "print(f\" Contact: {contact.first_name} {contact.last_name}\")\n", + "print(f\" Publication: {publication.title}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Creating Source Materials and Samples\n", + "\n", + "Source materials represent the biological material before any processing." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created 6 samples:\n", + " - control_sample_1\n", + " - control_sample_2\n", + " - control_sample_3\n", + " - treated_sample_1\n", + " - treated_sample_2\n", + " - treated_sample_3\n" + ] + } + ], + "source": [ + "# Create a source material\n", + "source = Source(name='plant_source')\n", + "\n", + "# Add organism characteristic\n", + "organism_characteristic = Characteristic(\n", + " category=OntologyAnnotation(term=\"Organism\"),\n", + " value=OntologyAnnotation(\n", + " term=\"Arabidopsis thaliana\",\n", + " term_source=ncbitaxon,\n", + " term_accession=\"http://purl.bioontology.org/ontology/NCBITAXON/3702\"\n", + " )\n", + ")\n", + "source.characteristics.append(organism_characteristic)\n", + "study.sources.append(source)\n", + "study.characteristic_categories.append(organism_characteristic.category)\n", + "\n", + "# Create sample prototype\n", + "prototype_sample = Sample(name='sample', derives_from=[source])\n", + "\n", + "# Add characteristics to sample\n", + "treatment_characteristic = Characteristic(\n", + " category=OntologyAnnotation(term=\"Treatment\"),\n", + " value=OntologyAnnotation(term=\"drought stress\")\n", + ")\n", + "prototype_sample.characteristics.append(treatment_characteristic)\n", + "study.characteristic_categories.append(treatment_characteristic.category)\n", + "\n", + "# Create batch of samples (control and treated)\n", + "study.samples = batch_create_materials(prototype_sample, n=6)\n", + "\n", + "# Rename samples for clarity\n", + "for i, sample in enumerate(study.samples):\n", + " if i < 3:\n", + " sample.name = f\"control_sample_{i+1}\"\n", + " else:\n", + " sample.name = f\"treated_sample_{i-2}\"\n", + "\n", + "print(f\"Created {len(study.samples)} samples:\")\n", + "for sample in study.samples:\n", + " print(f\" - {sample.name}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Creating Protocols and Processes\n", + "\n", + "Protocols describe the experimental procedures, and Processes are instances of protocol execution." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created protocol: sample collection\n", + "Process: 1 input -> 6 outputs\n" + ] + } + ], + "source": [ + "# Create sample collection protocol\n", + "sample_collection_protocol = Protocol(\n", + " name=\"sample collection\",\n", + " protocol_type=OntologyAnnotation(term=\"sample collection\")\n", + ")\n", + "study.protocols.append(sample_collection_protocol)\n", + "\n", + "# Create sample collection process\n", + "sample_collection_process = Process(executes_protocol=sample_collection_protocol)\n", + "sample_collection_process.inputs.append(source)\n", + "sample_collection_process.outputs.extend(study.samples)\n", + "study.process_sequence.append(sample_collection_process)\n", + "\n", + "print(f\"Created protocol: {sample_collection_protocol.name}\")\n", + "print(f\"Process: {len(sample_collection_process.inputs)} input -> {len(sample_collection_process.outputs)} outputs\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6. Creating an Assay with Data Files\n", + "\n", + "Assays represent the analytical measurements performed on samples." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Created assay: a_metabolomics.txt\n", + " Measurement type: metabolite profiling\n", + " Technology type: mass spectrometry\n", + " Data files: 6\n" + ] + } + ], + "source": [ + "# Create an assay\n", + "assay = Assay(filename=\"a_metabolomics.txt\")\n", + "assay.measurement_type = OntologyAnnotation(term=\"metabolite profiling\")\n", + "assay.technology_type = OntologyAnnotation(term=\"mass spectrometry\")\n", + "\n", + "# Create extraction protocol\n", + "extraction_protocol = Protocol(\n", + " name='metabolite extraction',\n", + " protocol_type=OntologyAnnotation(term=\"extraction\")\n", + ")\n", + "study.protocols.append(extraction_protocol)\n", + "\n", + "# Create mass spectrometry protocol\n", + "ms_protocol = Protocol(\n", + " name='mass spectrometry',\n", + " protocol_type=OntologyAnnotation(term=\"mass spectrometry\")\n", + ")\n", + "study.protocols.append(ms_protocol)\n", + "\n", + "# Create processes for each sample\n", + "for i, sample in enumerate(study.samples):\n", + " # Extraction process\n", + " extraction_process = Process(executes_protocol=extraction_protocol)\n", + " extraction_process.inputs.append(sample)\n", + " \n", + " extract = Material(name=f\"extract_{i}\")\n", + " extract.type = \"Extract Name\"\n", + " extraction_process.outputs.append(extract)\n", + " \n", + " # MS analysis process\n", + " ms_process = Process(executes_protocol=ms_protocol)\n", + " ms_process.inputs.append(extract)\n", + " \n", + " # Create data file\n", + " data_file = DataFile(\n", + " filename=f\"ms_data_{sample.name}.mzML\",\n", + " label=\"Raw Data File\"\n", + " )\n", + " ms_process.outputs.append(data_file)\n", + " \n", + " # Add to assay\n", + " assay.samples.append(sample)\n", + " assay.other_material.append(extract)\n", + " assay.data_files.append(data_file)\n", + " assay.process_sequence.append(extraction_process)\n", + " assay.process_sequence.append(ms_process)\n", + "\n", + "# Add assay to study\n", + "study.assays.append(assay)\n", + "\n", + "print(f\"Created assay: {assay.filename}\")\n", + "print(f\" Measurement type: {assay.measurement_type.term}\")\n", + "print(f\" Technology type: {assay.technology_type.term}\")\n", + "print(f\" Data files: {len(assay.data_files)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 7. Exporting to ISA-JSON" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ISA-JSON output (first 1000 characters):\n", + "{\n", + " \"comments\": [],\n", + " \"description\": \"A simple example investigation using ISA-API\",\n", + " \"identifier\": \"INV001\",\n", + " \"ontologySourceReferences\": [\n", + " {\n", + " \"comments\": [],\n", + " \"description\": \"NCBI Taxonomy\",\n", + " \"file\": \"http://purl.bioontology.org/ontology/NCBITAXON\",\n", + " \"name\": \"NCBITaxon\",\n", + " \"version\": \"\"\n", + " },\n", + " {\n", + " \"comments\": [],\n", + " \"description\": \"Ontology for Biomedical Investigations\",\n", + " \"file\": \"http://purl.obolibrary.org/obo/obi.owl\",\n", + " \"name\": \"OBI\",\n", + " \"version\": \"\"\n", + " }\n", + " ],\n", + " \"people\": [],\n", + " \"publicReleaseDate\": \"2025-12-01\",\n", + " \"publications\": [],\n", + " \"studies\": [\n", + " {\n", + " \"assays\": [\n", + " {\n", + " \"characteristicCategories\": [],\n", + " \"comments\": [],\n", + " \"dataFiles\": [\n", + " {\n", + " \"@id\": \"#data_file/f9d80419-4738-478d-9fbc-7fa91430e55c\",\n", + " \"comments\": [],\n", + " \"name\": \"ms_data_control_sample_1.mzML\",\n", + " \"type\": \"Raw Data File\"\n", + " },\n", + " {\n", + " \"@id\"\n", + "\n", + "... (output truncated)\n", + "\n", + "Saved ISA-JSON to: example_isa.json\n" + ] + } + ], + "source": [ + "import json\n", + "from isatools.isajson import ISAJSONEncoder\n", + "\n", + "# Convert to JSON string\n", + "isa_json = json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=2)\n", + "\n", + "# Display first 1000 characters\n", + "print(\"ISA-JSON output (first 1000 characters):\")\n", + "print(isa_json[:1000])\n", + "print(\"\\n... (output truncated)\")\n", + "\n", + "# Save to file\n", + "with open('example_isa.json', 'w') as f:\n", + " f.write(isa_json)\n", + "\n", + "print(\"\\nSaved ISA-JSON to: example_isa.json\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 8. Exporting to ISA-Tab Format" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['Sample Name', 'Protocol REF.0']\n", + "['Sample Name', 'Protocol REF.0', 'Extract Name', 'Protocol REF.1', 'MS Assay Name.0']\n", + "Created ISA-Tab files in './isa_tab_output':\n", + " - a_metabolomics.txt\n", + " - i_investigation.txt\n", + " - s_study.txt\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/sdrwacker/workspace/isa-api/isatools/isatab/dump/write.py:237: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n", + " DF = DF.replace('', nan)\n", + "/Users/sdrwacker/workspace/isa-api/isatools/isatab/dump/write.py:537: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n", + " DF = DF.replace('', nan)\n" + ] + } + ], + "source": [ + "from isatools import isatab\n", + "import os\n", + "\n", + "# Create output directory\n", + "output_dir = './isa_tab_output'\n", + "os.makedirs(output_dir, exist_ok=True)\n", + "\n", + "# Write ISA-Tab files\n", + "isatab.dump(investigation, output_dir)\n", + "\n", + "# List created files\n", + "created_files = os.listdir(output_dir)\n", + "print(f\"Created ISA-Tab files in '{output_dir}':\")\n", + "for file in sorted(created_files):\n", + " print(f\" - {file}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 9. Reading Existing ISA-Tab Files" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loaded investigation: INV001\n", + " Title: My First ISA Investigation\n", + " Number of studies: 1\n", + "\n", + " Study: STUDY001\n", + " Title: Metabolomics Study of Plant Stress Response\n", + " Sources: 1\n", + " Samples: 6\n", + " Assays: 1\n", + " Assay: a_metabolomics.txt\n", + " Data files: 6\n" + ] + } + ], + "source": [ + "# Read back the ISA-Tab we just created\n", + "with open(os.path.join(output_dir, 'i_investigation.txt')) as f:\n", + " loaded_investigation = isatab.load(f)\n", + "\n", + "print(f\"Loaded investigation: {loaded_investigation.identifier}\")\n", + "print(f\" Title: {loaded_investigation.title}\")\n", + "print(f\" Number of studies: {len(loaded_investigation.studies)}\")\n", + "\n", + "for study in loaded_investigation.studies:\n", + " print(f\"\\n Study: {study.identifier}\")\n", + " print(f\" Title: {study.title}\")\n", + " print(f\" Sources: {len(study.sources)}\")\n", + " print(f\" Samples: {len(study.samples)}\")\n", + " print(f\" Assays: {len(study.assays)}\")\n", + " \n", + " for assay in study.assays:\n", + " print(f\" Assay: {assay.filename}\")\n", + " print(f\" Data files: {len(assay.data_files)}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 10. Validating ISA-Tab Files" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Validation Report:\n", + " Errors: 0\n", + " Warnings: 1\n", + " Info: 2\n", + "\n", + "✓ Validation successful! No errors found.\n" + ] + } + ], + "source": [ + "from isatools import isatab\n", + "\n", + "# Validate the ISA-Tab directory\n", + "try:\n", + " validation_report = isatab.validate(open(os.path.join(output_dir, 'i_investigation.txt')))\n", + " \n", + " print(\"Validation Report:\")\n", + " print(f\" Errors: {len(validation_report.get('errors', []))}\")\n", + " print(f\" Warnings: {len(validation_report.get('warnings', []))}\")\n", + " print(f\" Info: {len(validation_report.get('info', []))}\")\n", + " \n", + " if validation_report.get('errors'):\n", + " print(\"\\nErrors found:\")\n", + " for error in validation_report['errors'][:5]: # Show first 5 errors\n", + " print(f\" - {error}\")\n", + " else:\n", + " print(\"\\n✓ Validation successful! No errors found.\")\n", + " \n", + "except Exception as e:\n", + " print(f\"Validation error: {e}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 11. Converting ISA-Tab to ISA-JSON" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Converted ISA-Tab to ISA-JSON\n", + "Output saved to: converted_isa.json\n", + "JSON size: 26338 characters\n" + ] + } + ], + "source": [ + "from isatools import isatab\n", + "from isatools.isajson import ISAJSONEncoder\n", + "\n", + "# Read ISA-Tab\n", + "with open(os.path.join(output_dir, 'i_investigation.txt')) as f:\n", + " inv = isatab.load(f)\n", + "\n", + "# Convert to JSON\n", + "json_output = json.dumps(inv, cls=ISAJSONEncoder, indent=2)\n", + "\n", + "# Save JSON\n", + "with open('converted_isa.json', 'w') as f:\n", + " f.write(json_output)\n", + "\n", + "print(\"Converted ISA-Tab to ISA-JSON\")\n", + "print(f\"Output saved to: converted_isa.json\")\n", + "print(f\"JSON size: {len(json_output)} characters\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "This notebook demonstrated:\n", + "\n", + "1. ✓ Creating ISA Investigation, Study, and Assay objects\n", + "2. ✓ Adding ontology annotations and controlled vocabularies\n", + "3. ✓ Creating source materials, samples, and processes\n", + "4. ✓ Defining protocols and linking them to processes\n", + "5. ✓ Creating assays with data files\n", + "6. ✓ Exporting to ISA-JSON format\n", + "7. ✓ Exporting to ISA-Tab format\n", + "8. ✓ Reading existing ISA-Tab files\n", + "9. ✓ Validating ISA metadata\n", + "10. ✓ Converting between ISA-Tab and ISA-JSON\n", + "\n", + "## Additional Resources\n", + "\n", + "- **Documentation**: https://isa-tools.org/isa-api/\n", + "- **GitHub**: https://github.com/ISA-tools/isa-api\n", + "- **ISA Community**: https://www.isacommons.org\n", + "- **ISA Cookbook**: More advanced examples in the `isa-cookbook/` directory" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "isa-api-py312", + "language": "python", + "name": "isa-api-py312" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}