usegalaxy-eu / ena-upload-cli Goto Github PK

ENA upload tool - script your Open Data upload to the European Nucleotide Archive

License: MIT License

Python 100.00%

ena-upload-cli's Issues

Windows compatibility issues

Because of a timestamp in the filename, files with a colon : character are not accepted by windows, preventing from installing and using the script.

Add GitHub Action to check if new xsd templates are added to ENA

Support for ERC000033 checklist

Need to have a look whether for other sample types xml and xsd's exist

Name: experiment_alias, dtype: object has no member named "iteritems"

Hi all,
It seems that the upload tool is not compatible with with the latest pandas version (2.0.3). If I run it with that version (using the example data and metadata from the repository) I get the following:

Traceback (most recent call last):
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 301, in lookup_attr
    val = getattr(obj, key)
          ^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/generic.py", line 5989, in __getattr__
    return object.__getattribute__(self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Series' object has no attribute 'iteritems'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3653, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'iteritems'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 307, in lookup_attr
    val = obj[key]
          ~~~^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/series.py", line 1007, in __getitem__
    return self._get_value(key)
           ^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/series.py", line 1116, in _get_value
    loc = self.index.get_loc(label)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3655, in get_loc
    raise KeyError(key) from err
KeyError: 'iteritems'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/bin/ena-upload-cli", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/ena_upload.py", line 953, in main
    schema_xmls = run_construct(
                  ^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/ena_upload.py", line 298, in run_construct
    schema_xmls[schema] = construct_xml(schema, stream, xsds[schema])
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/ena_upload.py", line 235, in construct_xml
    xml_string = stream.render(method='xml', encoding='utf-8')
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/core.py", line 184, in render
    return encode(generator, method=method, encoding=encoding, out=out)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 59, in encode
    return _encode(''.join(list(iterator)))
                           ^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 243, in __call__
    for kind, data, pos in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 674, in __call__
    for kind, data, pos in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 779, in __call__
    for kind, data, pos in chain(stream, [(None, None, None)]):
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 598, in __call__
    for ev in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/core.py", line 292, in _ensure
    for event in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/base.py", line 641, in _include
    for event in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/markup.py", line 326, in _match
    for event in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/base.py", line 581, in _flatten
    for kind, data, pos in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/directives.py", line 369, in __call__
    iterable = _eval_expr(self.expr, ctxt, vars)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/base.py", line 290, in _eval_expr
    retval = expr.evaluate(ctxt)
             ^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 160, in evaluate
    return eval(self.code, _globals, {'__data__': data})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/templates/ENA_template_runs.xml", line 14, in <Expression 'iter(run_groups.iteritems())'>
    <py:for each="alias, experiment_alias in run_groups.iteritems()">
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 309, in lookup_attr
    val = cls.undefined(key, owner=obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 397, in undefined
    raise UndefinedError(key, owner=owner)
genshi.template.eval.UndefinedError: alias
run_alias_1a    [experiment_alias_7a]
run_alias_3c    [experiment_alias_9c]
Name: experiment_alias, dtype: object has no member named "iteritems"

I downgraded to pandas 1.5.3 and now it seems to work fine. I used

ena-upload-cli --action add --center 'CRG' --study ena_templates/example_tables/ENA_template_studies.tsv --sample ena_templates/example_tables/ENA_template_samples.tsv --experiment ena_templates/example_tables/ENA_template_experiments.tsv --run ena_templates/example_tables/ENA_template_runs.tsv --data ena_templates/example_data/*gz --dev --secret ena_templates/.secret.yml --draft --no_data_upload

for running the tool.
Thanks,

Dropping unnecessary columns

Status, accession, submission_date and taxon ID. These columns can be added if not already present.

Add support for custom attributes

Currently we only parse the attributes that are being used by ENA. Off course the xml can hold more information and this is also stored on ENAs side (although not displayed).

Submitting new project to ENA

Hello,
I'm attempting to submit non-viral raw reads to ENA using the test Webin submission portal. The tool appears to be functioning correctly, but it isn't creating a new project on the ENA portal. The files only become accessible the next day under the "Unsubmitted files" section. Is this normal? How can I successfully submit a project?
Let me know if you need more informations.
Thank you .

If table is given without action values, an error is thrown

Add support for tables that to not contain add or modify in the status column

Add support for the submission of analysis objects

ENA supports the submission of other analysis spreadsheets/XMLs.

Following the analysis xsd formatting

--no_upload parameter not working

when running the example command following error is thrown:

ena-upload-cli --action add --center 'VIB-UGENT' --study example_tables/ENA_template_studies.tsv --sample example_tables/ENA_template_samples.tsv --experiment example_tables/ENA_template_experiments.tsv --run example_tables/ENA_template_runs.tsv --data example_data/*gz --dev --secret .secret.yml --no_upload
No files will be uploaded, remove `--no_upload' argument to perform upload.
No valid checksums found, generate now... Traceback (most recent call last):
  File "/home/bedro/.local/bin/ena-upload-cli", line 8, in <module>
    sys.exit(main())
  File "/home/bedro/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 733, in main
    md5 = df['file_name'].apply(lambda x: file_md5[x]).values
  File "/home/bedro/.local/lib/python3.8/site-packages/pandas/core/series.py", line 4138, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer
  File "/home/bedro/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 733, in <lambda>
    md5 = df['file_name'].apply(lambda x: file_md5[x]).values
KeyError: 'ENA_TEST2.R1.fastq.gz'

Possible values for file_format are not clear

For the field file_format in the ENA run table.
Possible values for the field would be the ones listed in ENA_template_FILE.xml.
However, if any of the following values are used, an error message is generated.

454_native
454_native_qual
454_native_seq
fasta
helicos_native
illumina_native
illumina_native_int
illumina_native_prb
illumina_native_qseq
illumina_native_scarf
illumina_native_seq
solid_native
solid_native_csfasta
solid_native_qual
sra
tab

Click to show error message

<MESSAGES>
          <ERROR>In run, alias:"run_1", accession:"", In filename:"1.bam", filetype:"454_native". Invalid file type "454_native".</ERROR>
          <ERROR>In run, alias:"run_10", accession:"", In filename:"10.bam", filetype:"Illumina_native". Invalid file type "Illumina_native".</ERROR>
          <ERROR>In run, alias:"run_11", accession:"", In filename:"11.bam", filetype:"Illumina_native_int". Invalid file type "Illumina_native_int".</ERROR>
          <ERROR>In run, alias:"run_12", accession:"", In filename:"12.bam", filetype:"Illumina_native_prb". Invalid file type "Illumina_native_prb".</ERROR>
          <ERROR>In run, alias:"run_13", accession:"", In filename:"13.bam", filetype:"Illumina_native_qseq". Invalid file type "Illumina_native_qseq".</ERROR>
          <ERROR>In run, alias:"run_14", accession:"", In filename:"14.bam", filetype:"Illumina_native_scarf". Invalid file type "Illumina_native_scarf".</ERROR>
          <ERROR>In run, alias:"run_15", accession:"", In filename:"15.bam", filetype:"Illumina_native_seq". Invalid file type "Illumina_native_seq".</ERROR>
          <ERROR>In run, alias:"run_16", accession:"", In filename:"16.tar", filetype:"OxfordNanopore_native". Invalid file suffix for file "16.tar". File compression is required for file type "OxfordNanopore_native". Supported compression formats are: BZIP2, GZIP with file suffixes: .bz2, .gz.</ERROR>
          <ERROR>In run, alias:"run_19", accession:"", In filename:"19.bam", filetype:"SOLiD_native". Invalid file type "SOLiD_native".</ERROR>
          <ERROR>In run, alias:"run_2", accession:"", In filename:"2.bam", filetype:"454_native_qual". Invalid file type "454_native_qual".</ERROR>
          <ERROR>In run, alias:"run_20", accession:"", In filename:"20.bam", filetype:"SOLiD_native_csfasta". Invalid file type "SOLiD_native_csfasta".</ERROR>
          <ERROR>In run, alias:"run_21", accession:"", In filename:"21.bam", filetype:"SOLiD_native_qual". Invalid file type "SOLiD_native_qual".</ERROR>
          <ERROR>In run, alias:"run_22", accession:"", In filename:"22.fastq", filetype:"sra". Invalid file type "sra".</ERROR>
          <ERROR>In run, alias:"run_24", accession:"", In filename:"24.fastq", filetype:"tab". Invalid file type "tab".</ERROR>
          <ERROR>In run, alias:"run_3", accession:"", In filename:"3.bam", filetype:"454_native_seq". Invalid file type "454_native_seq".</ERROR>
          <ERROR>In run, alias:"run_7", accession:"", In filename:"7.fastq", filetype:"fasta". Invalid file type "fasta".</ERROR>
          <ERROR>In run, alias:"run_9", accession:"", In filename:"9.bam", filetype:"Helicos_native". Invalid file type "Helicos_native".</ERROR>
          <ERROR>In run, alias:"run_1", accession:"". Invalid group of files: 1 "454_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_10", accession:"". Invalid group of files: 1 "Illumina_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_11", accession:"". Invalid group of files: 1 "Illumina_native_int" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_12", accession:"". Invalid group of files: 1 "Illumina_native_prb" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_13", accession:"". Invalid group of files: 1 "Illumina_native_qseq" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_14", accession:"". Invalid group of files: 1 "Illumina_native_scarf" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_15", accession:"". Invalid group of files: 1 "Illumina_native_seq" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_19", accession:"". Invalid group of files: 1 "SOLiD_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_2", accession:"". Invalid group of files: 1 "454_native_qual" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_20", accession:"". Invalid group of files: 1 "SOLiD_native_csfasta" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_21", accession:"". Invalid group of files: 1 "SOLiD_native_qual" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_22", accession:"". Invalid group of files: 1 "sra" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_24", accession:"". Invalid group of files: 1 "tab" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_3", accession:"". Invalid group of files: 1 "454_native_seq" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_7", accession:"". Invalid group of files: 1 "fasta" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_9", accession:"". Invalid group of files: 1 "Helicos_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <INFO>This submission is a TEST submission and will be discarded within 24 hours</INFO>
</MESSAGES>

ENA might be using another list of file formats to validate. Their documentation at Accepted Read Data Formats either does not list the values leading to the error, or flags them as deprecated.

Correct criteria for a successful submission

I don’t think that the success attribute of the XML receipt can be used as an indicator of a successful submission. You would still need to parse the content to look for errors and successfully allocated accession numbers in the body of the receipt, see for example from the corresponding implementation in the webin-cli by ENA

Changing release date through API

Now it is only possible to change the release data manually though the website.

FTPS instead of FTP

Since v3.0.1 ENA Webin makes use of FTPS

Possibility to give the taxon_id through the table

Support FTPS connection to protect account credidentials

The FTP protocol does not support secure connections between the client and the server and account credentials are sent as plain text. This might not be an issue depending on how the traffic is routed but it not good practice for general use cases.

Include script used to generate tabular metadata templates

Catching errors better + more clear error messages

The tool would benefit from different verbose modes and real logging.

Adding support for all official ENA checklists

Checklists from https://www.ebi.ac.uk/ena/browser/checklists

Library name generation

the contents of Library Name (Experiment metadata) appear to be ignored and automatically generated by concatenating "library""Experiment Alias""Sample Alias"

nan is filled in when column is mandatory and cell is empty

This tricks the validation although nothing is present

Adding controlled vocabulary to the sample checklists

The controlled vocabulary for the sample checklists are now checked on ENAs side, this could also be done on our side

Submitting paired reads plus their indices

Hi,

we're currently preparing a submission for the (now discontinued) linked-read sequencing from 10X Genomics. Similar to the current 10X single-cell RNA-Seq datasets, our datasets consists of two paired-end FASTQ files plus an index file. The data is whole-genome sequencing, so I suppose ENA is the correct place to submit them.

However, for each set of RUNs that includes an index file, ena-upload-cli returns the following error. Is this something I can mitigate on my end, or is it tool or even an issue for ENA?

Error:

Oops:
In run, alias: "<...>". Read type information missing in run.

Thanks and best,
Fritjof

The read operation timed out

Hi there,
I am unable get this the following command to run on my ubuntu VM. The tool was installed using the pip command (pip install ena-upload-cli). My ubuntu VM already has the ftp port 21 open by default. Any thoughts?

ena-upload-cli --action add --center 'BioCommons Australia' --study ENA_template_studies.tsv --sample ENA_template_samples.tsv --experiment ENA_template_experiments.tsv --run ENA_template_runs.tsv --data *gz -d --secret .secret.yml
Check if all required columns are present in the study table.
Check if all required columns are present in the sample table.
Check if all required columns are present in the experiment table.
Check if all required columns are present in the run table.
No valid checksums found, generate now... done.

Connecting to ftp.webin2.ebi.ac.uk....
uploading /home/ubuntu/ena/ENA_TEST1.R1.fastq.gz
ERROR: The read operation timed out
ERROR: If your connection times out at this stage, it propably is because of a firewall that is in place. FTP is used in passive mode and connection will be opened to one of the ports: 40000 and 50000.
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/ena-upload-cli", line 11, in
load_entry_point('ena-upload-cli==0.6.1', 'console_scripts', 'ena-upload-cli')()
File "/home/ubuntu/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 925, in main
submit_data(file_paths, password, webin_id)
File "/home/ubuntu/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 424, in submit_data
print(ftps.storbinary(f'STOR {filename}', open(path, 'rb')))
File "/usr/lib/python3.8/ftplib.py", line 504, in storbinary
conn.unwrap()
File "/usr/lib/python3.8/ssl.py", line 1285, in unwrap
s = self._sslobj.shutdown()
socket.timeout: The read operation timed out

Many thanks,

Supporting all optional fields in the run/experiment and study xml

Possibility to upload non-mandatory (virus) metadata

Thanks for the nice helper. However, I noticed that currently ena-upload-cli supports only mandatory ERC000033 fields in their XML templates. I came up for myself with a little hack with updated XML forms https://github.com/avilab/ena-upload-cli/tree/location to get some additional metadata uploaded, e.g. age, geographic location locality+lon/lat. I appreciate that under current implementation there is no simple fix to include any of optional checklist fields, given that ENA database may not accept empty fields(not sure).

One way to fix this possible issue would be, very briefly, to check imported tables against schemas and serialise dictionaries to XML. So that any combination of non-mandatory fields can be included.

I happy to be corrected and directed to the right path if there is a way to include optional/recommended (virus) metadata fields using this app as it is.

Most minimal example not working

ena_upload --action add --center 'your_center_name' --study example_tables/ENA_template_studies.tsv --sample example_tables/ENA_template_samples.tsv --experiment example_tables/ENA_template_experiments.tsv --run example_tables/ENA_template_runs.tsv --data example_data/*gz --dev --secret .secret.yml

This throws the error:

Traceback (most recent call last):
  File "c:\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\rabuo\ena-upload-cli\Scripts\ena-upload-cli.exe\__main__.py", line 7, in <module>
  File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 771, in main
    schema_xmls = run_construct(
  File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 243, in run_construct
    schema_xmls[schema] = construct_xml(schema, stream, xsds[schema])
  File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 182, in construct_xml
    validate_xml(xsd, xml_string)
  File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 133, in validate_xml
    return xmlschema.assertValid(doc)
  File "src\lxml\etree.pyx", line 3623, in lxml.etree._Validator.assertValid
lxml.etree.DocumentInvalid: Element 'SAMPLE_ATTRIBUTES': Missing child element(s). Expected is ( SAMPLE_ATTRIBUTE )., line 10

This is because the default checklist ERC000011 has only optional fields, and if no value is given for one of them, the template will create a SAMPLE_ATTRIBUTES object without SAMPLE_ATTRIBUTE children (since they are all optional)

Solution: extra if statement for the <SAMPLE_ATTRIBUTES> object in the template that checks if the row contains an optional field

Using the labels from ENA in the tsv table headers instead of the underscore version

This way no mapping table is needed

Adding flag to not do a submission but just to generate the XMLs

This would be for testing purposes

Missing / Wrong sequencer identifiers in templates

Hi,

when I processed some of our data, the data validation failed with an error message relating to missing "Illumina" elements in the XML. I could solve this by adding the following lines to

ENA_template_PLATFORM.XML (l.28):
<INSTRUMENT_MODEL py:when="row.instrument_model.lower().strip() == 'illumina novaseq 6000'">Illumina NovaSeq 6000</INSTRUMENT_MODEL>

SRA.common.xsd (l.911):
<xs:enumeration value="Illumina NovaSeq 6000"/>

In course of doing this, I also noticed that the NextSeq and HiSeq X platforms are listed without the Illumina prefix, e.g.

<INSTRUMENT_MODEL py:when="row.instrument_model.lower().strip() == 'illumina hiseq 4000'">Illumina HiSeq 4000</INSTRUMENT_MODEL>
<INSTRUMENT_MODEL py:when="row.instrument_model.lower().strip() == 'nextseq 550'">NextSeq 550</INSTRUMENT_MODEL>

I'm happy to provide a PR for this, however, I wonder if this is the clean way to do or if the files were originally fetched from ENA and should be rather fixed there.

Best

Fritjof

The license_file parameter is deprecated, use license_files instead.

Hi,

I think there might be a problem with how version 0.6.3 of the upload client is built, since I can't install it using pip. I have no problem installing version 0.6.2. Here is the installation log:

ena.log

I would take the chance to ask about uploading an analysis. I have an count table with samples as columns and genome accessions as rows, and apart from that, the unprocessed reads from deep shotgun sequencing. I know how to proceed with the reads, but how should I do with the count table? Is it possible to add this table to the study?

Also, does the sample table allows for custom fields? Besides the fields from the ENA checklist.

If these last two enquiries do not belong here, I am happy to move them elsewhere.

Thank you.

Empty rows give error

If rows are given without values, errors like

genshi.template.eval.UndefinedError: nan has no member named "lower"

Are thrown

[Discussion] Adding support for Nextcloud file transfer to ENA ftp server ?

A feature brought up in issue #48 which is worth thinking about!

usegalaxy-eu / ena-upload-cli Goto Github PK

ena-upload-cli's Issues

Recommend Projects

Recommend Topics

Recommend Org