Comments (3)
Hi @ptth222 thanks for the detailed report.
The behaviour you observed is down to the fact that in the ISA-Tab format, the s_ (study) file can only have one Source Name (the graph should start with a Source Node), and one Sample Name Node.
However, Sample Name node are allowed in the a_ (assay) files to allow for aliquoting and fractions of a sample.
Can you open a PR so we can review and acknowledge your contribution to the code base?
many thanks and hi to Hunter.
from isa-api.
I assume for the PR you are talking about issue #501. I have created a PR for that issue.
For this issue I am not convinced that the study file can only have 1 Sample Name node and assay files can have multiple. If anything it actually seems like the opposite. At the very least, the documentation and code are at odds with what you have said and these things should be reconciled.
First, I will reiterate what the documentation says:
The last sentence here https://isa-specs.readthedocs.io/en/latest/isatab.html#study-table-file suggests to me that it should be possible:
"Node properties, such as Characteristics (for Material nodes), Parameter Value (for Process nodes) and additional Name columns for special cases of Process node to disambiguate Protocol REF entries of MUST follow the named node of context."
This is specifically saying there can be additional Name columns in the study file.
Secondly, as I said previously, you can generate a study file with multiple Sample Name columns using the converter from JSON to Tab. If you look at a part of the code in the write_study_table_files function you can see that the code specifically counts Sample Name nodes.
sample_in_path_count = 0
protocol_in_path_count = 0
longest_path = _longest_path_and_attrs(paths, s_graph.indexes)
for node_index in longest_path:
node = s_graph.indexes[node_index]
if isinstance(node, Source):
olabel = "Source Name"
columns.append(olabel)
columns += flatten(
map(lambda x: get_characteristic_columns(olabel, x),
node.characteristics))
columns += flatten(
map(lambda x: get_comment_column(
olabel, x), node.comments))
elif isinstance(node, Process):
olabel = "Protocol REF.{}".format(protocol_in_path_count)
columns.append(olabel)
protocol_in_path_count += 1
if node.executes_protocol.name not in protnames.keys():
protnames[node.executes_protocol.name] = protrefcount
protrefcount += 1
columns += flatten(map(lambda x: get_pv_columns(olabel, x),
node.parameter_values))
if node.date is not None:
columns.append(olabel + ".Date")
if node.performer is not None:
columns.append(olabel + ".Performer")
columns += flatten(
map(lambda x: get_comment_column(
olabel, x), node.comments))
elif isinstance(node, Sample):
olabel = "Sample Name.{}".format(sample_in_path_count)
columns.append(olabel)
sample_in_path_count += 1
columns += flatten(
map(lambda x: get_characteristic_columns(olabel, x),
node.characteristics))
columns += flatten(
map(lambda x: get_comment_column(
olabel, x), node.comments))
columns += flatten(map(lambda x: get_fv_columns(olabel, x),
node.factor_values))
The write_assay_table_files function however, does not count Sample Name nodes. You can actually see that at one point it did, but that has been commented out:
for study_obj in inv_obj.studies:
for assay_obj in study_obj.assays:
a_graph = assay_obj.graph
if a_graph is None:
break
protrefcount = 0
protnames = dict()
def flatten(current_list):
return [item for sublist in current_list for item in sublist]
columns = []
# start_nodes, end_nodes = _get_start_end_nodes(a_graph)
paths = _all_end_to_end_paths(
a_graph, [x for x in a_graph.nodes()
if isinstance(a_graph.indexes[x], Sample)])
if len(paths) == 0:
log.info("No paths found, skipping writing assay file")
continue
if _longest_path_and_attrs(paths, a_graph.indexes) is None:
raise IOError(
"Could not find any valid end-to-end paths in assay graph")
for node_index in _longest_path_and_attrs(paths, a_graph.indexes):
node = a_graph.indexes[node_index]
if isinstance(node, Sample):
olabel = "Sample Name"
# olabel = "Sample Name.{}".format(sample_in_path_count)
# sample_in_path_count += 1
columns.append(olabel)
columns += flatten(
map(lambda x: get_comment_column(olabel, x),
node.comments))
if write_factor_values:
columns += flatten(
map(lambda x: get_fv_columns(olabel, x),
node.factor_values))
elif isinstance(node, Process):
olabel = "Protocol REF.{}".format(
node.executes_protocol.name)
columns.append(olabel)
if node.executes_protocol.name not in protnames.keys():
protnames[node.executes_protocol.name] = protrefcount
protrefcount += 1
if node.date is not None:
columns.append(olabel + ".Date")
if node.performer is not None:
columns.append(olabel + ".Performer")
columns += flatten(map(lambda x: get_pv_columns(olabel, x),
node.parameter_values))
if node.executes_protocol.protocol_type:
oname_label = get_column_header(
node.executes_protocol.protocol_type.term,
protocol_types_dict
)
if oname_label is not None:
columns.append(oname_label)
elif node.executes_protocol.protocol_type.term.lower() \
in protocol_types_dict["nucleic acid hybridization"][SYNONYMS]:
columns.extend(
["Hybridization Assay Name",
"Array Design REF"])
columns += flatten(
map(lambda x: get_comment_column(olabel, x),
node.comments))
for output in [x for x in node.outputs if
isinstance(x, DataFile)]:
columns.append(output.label)
columns += flatten(
map(lambda x: get_comment_column(output.label, x),
output.comments))
elif isinstance(node, Material):
olabel = node.type
columns.append(olabel)
columns += flatten(
map(lambda x: get_characteristic_columns(olabel, x),
node.characteristics))
columns += flatten(
map(lambda x: get_comment_column(olabel, x),
node.comments))
elif isinstance(node, DataFile):
pass # handled in process
I also modified an example to have multiple Sample Name columns in an assay file, and it does not get converted to JSON correctly. Specifically, there is not an error, but you cannot find the second Sample Name column samples anywhere in the JSON.
It should also be noted that multiple Sample Name columns in the study or assay files does not produce any sort of validation error or warning.
I hope I have demonstrated how both the documentation and code contradict what you said about Sample Name columns. If you are confident about how multiple Sample Names are supposed to work, then the documentation and code, both validation and conversion, should be changed to reflect that. If what I have shown is correct, however, then the ProcessSequenceFactory code needs to be changed to look for more than 1 Sample Name column. It may need to be changed regardless because it finds 1 set of samples from the study file and uses that as ground truth for the assays, so if there are new samples in an assay (due to having multiple Sample Name columns) they won't be found. Either way, the code and behavior do not agree with what you have said and it needs to be reconciled. I don't mind trying to make the code changes myself, but I do need to be sure about how things are supposed to function.
from isa-api.
I worked on another project for a bit, but now I am moving back to the one that involves this package. I really need this to be resolved so that I can move forward. Please consider this a gentle reminder. If a meeting would be better, then I would be happy to meet.
from isa-api.
Related Issues (20)
- Different Protocol Names In Study Sequence Cause An Error
- Unable to generate ISA-Tab files for a study and associated assays
- OLS3->OLS4 update
- allow more than one identifier for ISA.Study
- Multiple Data Files of The Same Type Will Only Have 1 Name in Assay Conversion HOT 2
- Comments[] attached to assays aren't serialized HOT 1
- create a helper function to compute checksums for all 'Data File' objects
- Labeled Extract silently dropped from ISA JSON serialization if Label not declared
- ISATAB serialization: effect of presence/absence to "Term Source REF" "Term Accession" on parsing
- copy function invoked by json2isatab converter has side effects
- ISA converter - use of default configuration - update needed to cope with new assay definitions
- Add new tests to check serialization to TAB/JSON of Quantitative Values with Units (for Characterististics. FV, PV) HOT 1
- ISA Validation Configurations Too Restrictive HOT 5
- ISA Tab Validation Rule Exception Messaging HOT 1
- Characteristic values are recast as string in ISA tab
- rewrite test for test_get_ontology in test_isatools_utils.py
- Inconsistencies Between Code, Documentation, and Schema HOT 5
- in datafile.py, ensure checkum_value can not be site if checksum_type is not specified
- documentation needed for requesting new context files for ISA
- investigation why magetab2json on MEXP31 can not complete isa.json validation following refactor
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from isa-api.