usegalaxy-eu / ena-upload-cli Goto Github PK
View Code? Open in Web Editor NEWENA upload tool - script your Open Data upload to the European Nucleotide Archive
License: MIT License
ENA upload tool - script your Open Data upload to the European Nucleotide Archive
License: MIT License
Because of a timestamp in the filename, files with a colon :
character are not accepted by windows, preventing from installing and using the script.
Need to have a look whether for other sample types xml and xsd's exist
Hi all,
It seems that the upload tool is not compatible with with the latest pandas version (2.0.3). If I run it with that version (using the example data and metadata from the repository) I get the following:
Traceback (most recent call last):
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 301, in lookup_attr
val = getattr(obj, key)
^^^^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/generic.py", line 5989, in __getattr__
return object.__getattribute__(self, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Series' object has no attribute 'iteritems'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3653, in get_loc
return self._engine.get_loc(casted_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'iteritems'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 307, in lookup_attr
val = obj[key]
~~~^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/series.py", line 1007, in __getitem__
return self._get_value(key)
^^^^^^^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/series.py", line 1116, in _get_value
loc = self.index.get_loc(label)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3655, in get_loc
raise KeyError(key) from err
KeyError: 'iteritems'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/bin/ena-upload-cli", line 10, in <module>
sys.exit(main())
^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/ena_upload.py", line 953, in main
schema_xmls = run_construct(
^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/ena_upload.py", line 298, in run_construct
schema_xmls[schema] = construct_xml(schema, stream, xsds[schema])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/ena_upload.py", line 235, in construct_xml
xml_string = stream.render(method='xml', encoding='utf-8')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/core.py", line 184, in render
return encode(generator, method=method, encoding=encoding, out=out)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 59, in encode
return _encode(''.join(list(iterator)))
^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 243, in __call__
for kind, data, pos in stream:
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 674, in __call__
for kind, data, pos in stream:
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 779, in __call__
for kind, data, pos in chain(stream, [(None, None, None)]):
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 598, in __call__
for ev in stream:
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/core.py", line 292, in _ensure
for event in stream:
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/base.py", line 641, in _include
for event in stream:
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/markup.py", line 326, in _match
for event in stream:
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/base.py", line 581, in _flatten
for kind, data, pos in stream:
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/directives.py", line 369, in __call__
iterable = _eval_expr(self.expr, ctxt, vars)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/base.py", line 290, in _eval_expr
retval = expr.evaluate(ctxt)
^^^^^^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 160, in evaluate
return eval(self.code, _globals, {'__data__': data})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/templates/ENA_template_runs.xml", line 14, in <Expression 'iter(run_groups.iteritems())'>
<py:for each="alias, experiment_alias in run_groups.iteritems()">
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 309, in lookup_attr
val = cls.undefined(key, owner=obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 397, in undefined
raise UndefinedError(key, owner=owner)
genshi.template.eval.UndefinedError: alias
run_alias_1a [experiment_alias_7a]
run_alias_3c [experiment_alias_9c]
Name: experiment_alias, dtype: object has no member named "iteritems"
I downgraded to pandas 1.5.3 and now it seems to work fine. I used
ena-upload-cli --action add --center 'CRG' --study ena_templates/example_tables/ENA_template_studies.tsv --sample ena_templates/example_tables/ENA_template_samples.tsv --experiment ena_templates/example_tables/ENA_template_experiments.tsv --run ena_templates/example_tables/ENA_template_runs.tsv --data ena_templates/example_data/*gz --dev --secret ena_templates/.secret.yml --draft --no_data_upload
for running the tool.
Thanks,
Status, accession, submission_date and taxon ID. These columns can be added if not already present.
Currently we only parse the attributes that are being used by ENA. Off course the xml can hold more information and this is also stored on ENAs side (although not displayed).
Hello,
I'm attempting to submit non-viral raw reads to ENA using the test Webin submission portal. The tool appears to be functioning correctly, but it isn't creating a new project on the ENA portal. The files only become accessible the next day under the "Unsubmitted files" section. Is this normal? How can I successfully submit a project?
Let me know if you need more informations.
Thank you .
Add support for tables that to not contain add
or modify
in the status column
ENA supports the submission of other analysis spreadsheets/XMLs.
Following the analysis xsd formatting
when running the example command following error is thrown:
ena-upload-cli --action add --center 'VIB-UGENT' --study example_tables/ENA_template_studies.tsv --sample example_tables/ENA_template_samples.tsv --experiment example_tables/ENA_template_experiments.tsv --run example_tables/ENA_template_runs.tsv --data example_data/*gz --dev --secret .secret.yml --no_upload
No files will be uploaded, remove `--no_upload' argument to perform upload.
No valid checksums found, generate now... Traceback (most recent call last):
File "/home/bedro/.local/bin/ena-upload-cli", line 8, in <module>
sys.exit(main())
File "/home/bedro/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 733, in main
md5 = df['file_name'].apply(lambda x: file_md5[x]).values
File "/home/bedro/.local/lib/python3.8/site-packages/pandas/core/series.py", line 4138, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer
File "/home/bedro/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 733, in <lambda>
md5 = df['file_name'].apply(lambda x: file_md5[x]).values
KeyError: 'ENA_TEST2.R1.fastq.gz'
For the field file_format
in the ENA run table.
Possible values for the field would be the ones listed in ENA_template_FILE.xml.
However, if any of the following values are used, an error message is generated.
<MESSAGES>
<ERROR>In run, alias:"run_1", accession:"", In filename:"1.bam", filetype:"454_native". Invalid file type "454_native".</ERROR>
<ERROR>In run, alias:"run_10", accession:"", In filename:"10.bam", filetype:"Illumina_native". Invalid file type "Illumina_native".</ERROR>
<ERROR>In run, alias:"run_11", accession:"", In filename:"11.bam", filetype:"Illumina_native_int". Invalid file type "Illumina_native_int".</ERROR>
<ERROR>In run, alias:"run_12", accession:"", In filename:"12.bam", filetype:"Illumina_native_prb". Invalid file type "Illumina_native_prb".</ERROR>
<ERROR>In run, alias:"run_13", accession:"", In filename:"13.bam", filetype:"Illumina_native_qseq". Invalid file type "Illumina_native_qseq".</ERROR>
<ERROR>In run, alias:"run_14", accession:"", In filename:"14.bam", filetype:"Illumina_native_scarf". Invalid file type "Illumina_native_scarf".</ERROR>
<ERROR>In run, alias:"run_15", accession:"", In filename:"15.bam", filetype:"Illumina_native_seq". Invalid file type "Illumina_native_seq".</ERROR>
<ERROR>In run, alias:"run_16", accession:"", In filename:"16.tar", filetype:"OxfordNanopore_native". Invalid file suffix for file "16.tar". File compression is required for file type "OxfordNanopore_native". Supported compression formats are: BZIP2, GZIP with file suffixes: .bz2, .gz.</ERROR>
<ERROR>In run, alias:"run_19", accession:"", In filename:"19.bam", filetype:"SOLiD_native". Invalid file type "SOLiD_native".</ERROR>
<ERROR>In run, alias:"run_2", accession:"", In filename:"2.bam", filetype:"454_native_qual". Invalid file type "454_native_qual".</ERROR>
<ERROR>In run, alias:"run_20", accession:"", In filename:"20.bam", filetype:"SOLiD_native_csfasta". Invalid file type "SOLiD_native_csfasta".</ERROR>
<ERROR>In run, alias:"run_21", accession:"", In filename:"21.bam", filetype:"SOLiD_native_qual". Invalid file type "SOLiD_native_qual".</ERROR>
<ERROR>In run, alias:"run_22", accession:"", In filename:"22.fastq", filetype:"sra". Invalid file type "sra".</ERROR>
<ERROR>In run, alias:"run_24", accession:"", In filename:"24.fastq", filetype:"tab". Invalid file type "tab".</ERROR>
<ERROR>In run, alias:"run_3", accession:"", In filename:"3.bam", filetype:"454_native_seq". Invalid file type "454_native_seq".</ERROR>
<ERROR>In run, alias:"run_7", accession:"", In filename:"7.fastq", filetype:"fasta". Invalid file type "fasta".</ERROR>
<ERROR>In run, alias:"run_9", accession:"", In filename:"9.bam", filetype:"Helicos_native". Invalid file type "Helicos_native".</ERROR>
<ERROR>In run, alias:"run_1", accession:"". Invalid group of files: 1 "454_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_10", accession:"". Invalid group of files: 1 "Illumina_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_11", accession:"". Invalid group of files: 1 "Illumina_native_int" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_12", accession:"". Invalid group of files: 1 "Illumina_native_prb" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_13", accession:"". Invalid group of files: 1 "Illumina_native_qseq" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_14", accession:"". Invalid group of files: 1 "Illumina_native_scarf" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_15", accession:"". Invalid group of files: 1 "Illumina_native_seq" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_19", accession:"". Invalid group of files: 1 "SOLiD_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_2", accession:"". Invalid group of files: 1 "454_native_qual" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_20", accession:"". Invalid group of files: 1 "SOLiD_native_csfasta" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_21", accession:"". Invalid group of files: 1 "SOLiD_native_qual" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_22", accession:"". Invalid group of files: 1 "sra" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_24", accession:"". Invalid group of files: 1 "tab" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_3", accession:"". Invalid group of files: 1 "454_native_seq" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_7", accession:"". Invalid group of files: 1 "fasta" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<ERROR>In run, alias:"run_9", accession:"". Invalid group of files: 1 "Helicos_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
<INFO>This submission is a TEST submission and will be discarded within 24 hours</INFO>
</MESSAGES>
ENA might be using another list of file formats to validate. Their documentation at Accepted Read Data Formats either does not list the values leading to the error, or flags them as deprecated.
I don’t think that the success attribute of the XML receipt can be used as an indicator of a successful submission. You would still need to parse the content to look for errors and successfully allocated accession numbers in the body of the receipt, see for example from the corresponding implementation in the webin-cli by ENA
Now it is only possible to change the release data manually though the website.
Since v3.0.1 ENA Webin makes use of FTPS
The FTP protocol does not support secure connections between the client and the server and account credentials are sent as plain text. This might not be an issue depending on how the traffic is routed but it not good practice for general use cases.
The tool would benefit from different verbose modes and real logging.
Checklists from https://www.ebi.ac.uk/ena/browser/checklists
the contents of Library Name (Experiment metadata) appear to be ignored and automatically generated by concatenating "library""Experiment Alias""Sample Alias"
This tricks the validation although nothing is present
The controlled vocabulary for the sample checklists are now checked on ENAs side, this could also be done on our side
Hi,
we're currently preparing a submission for the (now discontinued) linked-read sequencing from 10X Genomics. Similar to the current 10X single-cell RNA-Seq datasets, our datasets consists of two paired-end FASTQ files plus an index file. The data is whole-genome sequencing, so I suppose ENA is the correct place to submit them.
However, for each set of RUNs that includes an index file, ena-upload-cli returns the following error. Is this something I can mitigate on my end, or is it tool or even an issue for ENA?
Error:
Oops:
In run, alias: "<...>". Read type information missing in run.
Thanks and best,
Fritjof
Hi there,
I am unable get this the following command to run on my ubuntu VM. The tool was installed using the pip command (pip install ena-upload-cli). My ubuntu VM already has the ftp port 21 open by default. Any thoughts?
ena-upload-cli --action add --center 'BioCommons Australia' --study ENA_template_studies.tsv --sample ENA_template_samples.tsv --experiment ENA_template_experiments.tsv --run ENA_template_runs.tsv --data *gz -d --secret .secret.yml
Check if all required columns are present in the study table.
Check if all required columns are present in the sample table.
Check if all required columns are present in the experiment table.
Check if all required columns are present in the run table.
No valid checksums found, generate now... done.
Connecting to ftp.webin2.ebi.ac.uk....
uploading /home/ubuntu/ena/ENA_TEST1.R1.fastq.gz
ERROR: The read operation timed out
ERROR: If your connection times out at this stage, it propably is because of a firewall that is in place. FTP is used in passive mode and connection will be opened to one of the ports: 40000 and 50000.
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/ena-upload-cli", line 11, in
load_entry_point('ena-upload-cli==0.6.1', 'console_scripts', 'ena-upload-cli')()
File "/home/ubuntu/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 925, in main
submit_data(file_paths, password, webin_id)
File "/home/ubuntu/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 424, in submit_data
print(ftps.storbinary(f'STOR {filename}', open(path, 'rb')))
File "/usr/lib/python3.8/ftplib.py", line 504, in storbinary
conn.unwrap()
File "/usr/lib/python3.8/ssl.py", line 1285, in unwrap
s = self._sslobj.shutdown()
socket.timeout: The read operation timed out
Many thanks,
Thanks for the nice helper. However, I noticed that currently ena-upload-cli supports only mandatory ERC000033 fields in their XML templates. I came up for myself with a little hack with updated XML forms https://github.com/avilab/ena-upload-cli/tree/location to get some additional metadata uploaded, e.g. age, geographic location locality+lon/lat. I appreciate that under current implementation there is no simple fix to include any of optional checklist fields, given that ENA database may not accept empty fields(not sure).
One way to fix this possible issue would be, very briefly, to check imported tables against schemas and serialise dictionaries to XML. So that any combination of non-mandatory fields can be included.
I happy to be corrected and directed to the right path if there is a way to include optional/recommended (virus) metadata fields using this app as it is.
ena_upload --action add --center 'your_center_name' --study example_tables/ENA_template_studies.tsv --sample example_tables/ENA_template_samples.tsv --experiment example_tables/ENA_template_experiments.tsv --run example_tables/ENA_template_runs.tsv --data example_data/*gz --dev --secret .secret.yml
This throws the error:
Traceback (most recent call last):
File "c:\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\Python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\rabuo\ena-upload-cli\Scripts\ena-upload-cli.exe\__main__.py", line 7, in <module>
File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 771, in main
schema_xmls = run_construct(
File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 243, in run_construct
schema_xmls[schema] = construct_xml(schema, stream, xsds[schema])
File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 182, in construct_xml
validate_xml(xsd, xml_string)
File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 133, in validate_xml
return xmlschema.assertValid(doc)
File "src\lxml\etree.pyx", line 3623, in lxml.etree._Validator.assertValid
lxml.etree.DocumentInvalid: Element 'SAMPLE_ATTRIBUTES': Missing child element(s). Expected is ( SAMPLE_ATTRIBUTE )., line 10
This is because the default checklist ERC000011
has only optional fields, and if no value is given for one of them, the template will create a SAMPLE_ATTRIBUTES
object without SAMPLE_ATTRIBUTE
children (since they are all optional)
Solution: extra if statement for the <SAMPLE_ATTRIBUTES> object in the template that checks if the row contains an optional field
This way no mapping table is needed
This would be for testing purposes
Hi,
when I processed some of our data, the data validation failed with an error message relating to missing "Illumina" elements in the XML. I could solve this by adding the following lines to
ENA_template_PLATFORM.XML (l.28):
<INSTRUMENT_MODEL py:when="row.instrument_model.lower().strip() == 'illumina novaseq 6000'">Illumina NovaSeq 6000</INSTRUMENT_MODEL>
SRA.common.xsd (l.911):
<xs:enumeration value="Illumina NovaSeq 6000"/>
In course of doing this, I also noticed that the NextSeq and HiSeq X platforms are listed without the Illumina prefix, e.g.
<INSTRUMENT_MODEL py:when="row.instrument_model.lower().strip() == 'illumina hiseq 4000'">Illumina HiSeq 4000</INSTRUMENT_MODEL>
<INSTRUMENT_MODEL py:when="row.instrument_model.lower().strip() == 'nextseq 550'">NextSeq 550</INSTRUMENT_MODEL>
I'm happy to provide a PR for this, however, I wonder if this is the clean way to do or if the files were originally fetched from ENA and should be rather fixed there.
Best
Fritjof
Hi,
I think there might be a problem with how version 0.6.3 of the upload client is built, since I can't install it using pip. I have no problem installing version 0.6.2. Here is the installation log:
I would take the chance to ask about uploading an analysis. I have an count table with samples as columns and genome accessions as rows, and apart from that, the unprocessed reads from deep shotgun sequencing. I know how to proceed with the reads, but how should I do with the count table? Is it possible to add this table to the study?
Also, does the sample table allows for custom fields? Besides the fields from the ENA checklist.
If these last two enquiries do not belong here, I am happy to move them elsewhere.
Thank you.
If rows are given without values, errors like
genshi.template.eval.UndefinedError: nan has no member named "lower"
Are thrown
A feature brought up in issue #48 which is worth thinking about!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.