Giter Club home page Giter Club logo

isa-api's Introduction

ISA-API Logo







Py versions Build Status Coverage Status PyPI version Documentation Status

The open source ISA metadata tracking tools help to manage an increasingly diverse set of life science, environmental and biomedical experiments that employing one or a combination of technologies.

Built around the ‘Investigation’ (the project context), Study’ (a unit of research) and ‘Assay’ (analytical measurement) general-purpose Tabular format, the ISA tools helps you to provide rich description of the experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable.

To find out more about ISA, see https://isa-tools.org/

To find out who's using ISA and about the ISA development and user community, see www.isacommons.org

The ISA API aims to provide you, the developer, with a set of tools to help you easily and quickly build your own ISA objects, validate, and convert between serializations of ISA-formatted datasets and other formats/schemas (e.g. SRA schemas). The ISA API is published on PyPI as the isatools package.

isatools currently supports Python 3.6+.


Read the Publication...

Read our open access publication "ISA API: An open platform for interoperable life science experimental metadata", published in GigaScience as a technical note

David Johnson, Dominique Batista, Keeva Cochrane, Robert P. Davey, Anthony Etuk, Alejandra Gonzalez-Beltran, Kenneth Haug, Massimiliano Izzo, Martin Larralde, Thomas N. Lawson, Alice Minotto, Pablo Moreno, Venkata Chandrasekhar Nainala, Claire O'Donovan, Luca Pireddu, Pierrick Roger, Felix Shaw, Christoph Steinbeck, Ralf J. M. Weber, Susanna-Assunta Sansone, Philippe Rocca-Serra. ISA API: An open platform for interoperable life science experimental metadata. 2020.11.13.382119; doi: 10.1093/gigascience/giab060



Authors: The ISA team.

License: This code is licensed under the CPAL License.

Repository: https://github.com/ISA-tools/isa-api

ISA team email: [email protected]

ISA discussion group: https://groups.google.com/forum/#!forum/isaforum

Github issue tracker: https://github.com/ISA-tools/isa-api/issues


Using the ISA-API

The documentation to install and use the ISA-API (v0.12 and above) can be found here.

For the previous versions (up to v0.11) check the documentation here.

Contributing

The ISA-API is still in development. We would be very happy to receive any help and contributions (testing, feature requests, pull requests). Please feel free to contact our development team at [email protected], or ask a question, report a bug or file a feature request in the Github issue tracker at https://github.com/ISA-tools/isa-api/issues.

isa-api's People

Contributors

agbeltran avatar alfieabdulrahman avatar djcomlab avatar drj11 avatar jshoyer avatar nokome avatar nsoranzo avatar ntk73 avatar pcm32 avatar pkrog avatar proccaserra avatar rabuono avatar terazus avatar vedina avatar yjarosz avatar zigur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

isa-api's Issues

Characteristics of Sources appear in Samples list of generated JSON

I've noticed that Characteristics[] that appear in the Sources section of study ISAtab files appear in the characteristics of samples after the isatab2json conversion.

I am unsure if this is normal behaviour, as the samples list in the study tab files do not repeat these. Do samples inherit source characteristics?

Data file pointers

Is it possible to extend the examples to include data objects with file names. It's not clear at the moment where one should specify the data file names in the data object.

Also, I've noticed the BII-I-1 JSON examples include pointers to tab delimited files (i_*.txt, s_*.txt , a_*.txt), while all the metadata from these files should already be in the JSON itself.

Parameters in the process schema

Could you clarify, the parameters_schema.json consists only of ontology annotation (parameterType) and unit, not sure where numeric values should be stored.

isatab2json conversion throws AttributeError in createExecuteStudyProtocol()

Converting BII-I-1 is unsuccessful, throwing up the following error when running test_isatab2json.py and elsewhere:

Error
Traceback (most recent call last):
  File "/PycharmProjects/isa-api/tests/test_isatab2json.py", line 19, in test_bii_i_1_conversion
    isa_json = self.isatab2json.convert(test_data_dir, self.sample_data_dir)
  File "/PycharmProjects/isa-api/isatools/convert/isatab2json.py", line 49, in convert
    ("studies", self.createStudies(isa_tab.studies))
  File "//PycharmProjects/isa-api/isatools/convert/isatab2json.py", line 215, in createStudies
    ("processSequence", self.createProcessSequence(study.process_nodes, source_dict, sample_dict, data_dict)),
  File "/PycharmProjects/isa-api/isatools/convert/isatab2json.py", line 270, in createProcessSequence
    ("executesProtocol", self.createExecuteStudyProtocol(process_node_name, process_nodes[process_node_name])),
  File "/PycharmProjects/isa-api/isatools/convert/isatab2json.py", line 305, in createExecuteStudyProtocol
    ("name", process_node.protocol)
AttributeError: 'ProcessNodeRecord' object has no attribute 'protocol'

Study filenames missing from conversion to JSON

Currently the study filename properties are missing from the tab2json conversions.

Are already aware of this as was due to previous changes, and will need to reintroduce to allow more accurate conversions back to ISAtab from JSON.

typo in publication_schema.json

"#ref" is used instead of "$ref"

Therefore:
"#ref": "ontology_annotation_schema.json#"
should be
"$ref": "ontology_annotation_schema.json#"

isatab2CEDAR tests reference non-existent datasets in repository scope

In tests/test_isatab2cedar.py to test the CEDAR converter, there is a test to use metabolights datasets however these are not present in the repo.

Consider removing the test (as it will always fail without the data) or implement a way to automatically get and use some test data from metabolights or other repo.

ideally-canonical.json invalid JSON

ideally-canonical.json is not valid-formed JSON, so does not load in the Python JSON parser. Need to take care to run it through a validator before committing.

Representation of experimental data extracted from literature

Representing experimental data extracted from published literature is a very common use case, but not straightforward to fit into ISA model.

The samples, the protocols, the assays will be different across papers. As the goal of data gathering exercise is to be able to combine the data from different papers, one investigation per paper is not necessary the right approach. If combining into single investigation, then I am not aware of means to assign the publications to particular assays or samples.

What would be the recommended approach?

JSON files in ISA archive

Will ISA with JSON serialisation be using the same approach of separate i_* , s_* and a_* files as ISA-TAB, or one JSON file for everything is envisaged?

Sorry if this is already addressed in the documents, could not find a reference.

Unpopulated objects appear in generated JSON

When generating JSON using istab2json for BII-I-1, there's several instances where objects appear but have not values in the properties. For example, under protocols, a few components and parameters appear with one component or parameter object but all set to empty strings.

Also for every process listed under processSequence in the assays list.

ISA v1 JSON examples

In addition to the schema I am looking for ISA-JSON examples (both v1 and v2).

We've tried locally https://github.com/ISA-tools/isa-rest-api and it converts (some of our) ISA-TAB archives to JSON v1 (some fail). Wondering how complete is the convertor at this moment. For example I can't find the input/output chain in JSON neither in our tests , nor in this BII-I-1 JSON .

The reason to look for examples is it is not clear for me from the schema how the chains will be represented. The input/output nodes of the process chain refer to the same schema (material, data) as for describing sources and samples. Are these going to be repeated in the input/output chain, or referred by some identifier?

I could guess the JSON v1 schema follows very closely the Java object model as used before, but in the Java model the references are just pointing to the same objects, while not sure how this will look in JSON.

Characteristic category and Factor Value factor names missing in JSON schemas and tab2json conversion

For each Characteristic or Factor Value, there is an additional typing that is not currently present in the schemas.

Example in BII-I-1 tabular:

Characteristics[organism] column is present, but 'organism' part is lost in conversion.
Factor Value[limiting nutrient] column is present, but 'limiting nutrient' part is lost in conversion.

I've already done some work in the isatab package loader to parse these into model objects, so it will probably be straightforward to modify the JSON schemas and tab2json converter to fix this.

Check ISA v1 JSON schemas

This might already be completed related to the JSON schemas validation. We need to validate against converted ISA-Tab files.

Tab to JSON converter fails on parsing Affiliations

test_isa_v1_parser.py on feat/reader branch throws up an exception:

Error
Traceback (most recent call last):
  File "/Users/dj/PycharmProjects/isa-api/tests/test_isa_v1_parser.py", line 166, in test_jsonToIsatab_writer
    mywriter.parsingJson(self._json_dir, output_dir)
  File "/Users/dj/PycharmProjects/isa-api/isatools/io/json_to_isatab.py", line 78, in parsingJson
    self.writeJsonInvestigationToIsatab(i_filenames, output_dir)
  File "/Users/dj/PycharmProjects/isa-api/isatools/io/json_to_isatab.py", line 96, in writeJsonInvestigationToIsatab
    investigation_str = self.processInvestigationJson(jsonData)
  File "/Users/dj/PycharmProjects/isa-api/isatools/io/json_to_isatab.py", line 129, in processInvestigationJson
    my_str = self.writeSectionInvestigation(my_str, "INVESTIGATION CONTACTS", jsonData["investigation"]["investigationContacts"], self._isatab_i_investigation_contacts_sec)
  File "/Users/dj/PycharmProjects/isa-api/isatools/io/json_to_isatab.py", line 159, in writeSectionInvestigation
    my_str = my_str + "\"" + b[self.commonFunctions.makeAttributeName(i)] + "\"" + "\t"
KeyError: 'investigationPersonAffiliation'

This seems to be caused by the json_to_isatab.py JsonToIsatabWriter() class converter being unable to handle looking up investigationPersonAffiliation in ISA-JSON.

Incompatibilities with Python 3

JsonToIsatabWriter does not work due to some incompatibilities with Python 3 that removed some string processing functions used, in particular in the common functions package.

There may be more cases of this throughout the codebase and more stringent testing is needed to check. Note that the changes from Python 2 to 3 (and even to later 3.x versions) are not just syntactic, but also have removed some packaged functions entirely.

needed minor fixes in isa_model_version_1_0_schemas

In order to generate automatically Java classes from the schemas for isa v.1.0 we used jsonschema2pojo CLI application. The generation failed (several exceptions were thrown). The three problems appear to be due to following small typos/errors:

(1) in material_schema.json

"items" : {
"$ref": "material_attribute_schema.json#characteristic"
}
changed to:
"items" : {
"$ref": "material_attribute_schema.json"
}

(2) in process_schema.json

"executesProtocol": {
"$ref": "protocol.json#"
},
changed to:
"executesProtocol": {
"$ref": "protocol_schema.json#"
},

(3) in sample_schema.json

"items" : {
"$ref" : "factor_value.json#"
}

changed to:

"items" : {
"$ref" : "factor_value_schema.json#"
}

The problems (2) and (3) are caused by missing "_schema" in the "$ref" definitions

I am not sure for the first case (1).

derives relation between material?

instead of relying on Protocol Applications:
suggestion from COPO project:
COPO folks were asking about a way to link sources and samples (both ways I suppose, but maybe mainly from samples back to sources).

The current model allows this only through a process (which can be implicit in the ISA-tab files).

Do we want to allow more direct links? I think this would make the model less consistent but we can consider that...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.