Giter Club home page Giter Club logo

ena-upload-cli's Issues

Windows compatibility issues

Because of a timestamp in the filename, files with a colon : character are not accepted by windows, preventing from installing and using the script.

Name: experiment_alias, dtype: object has no member named "iteritems"

Hi all,
It seems that the upload tool is not compatible with with the latest pandas version (2.0.3). If I run it with that version (using the example data and metadata from the repository) I get the following:

Traceback (most recent call last):
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 301, in lookup_attr
    val = getattr(obj, key)
          ^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/generic.py", line 5989, in __getattr__
    return object.__getattribute__(self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Series' object has no attribute 'iteritems'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3653, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'iteritems'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 307, in lookup_attr
    val = obj[key]
          ~~~^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/series.py", line 1007, in __getitem__
    return self._get_value(key)
           ^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/series.py", line 1116, in _get_value
    loc = self.index.get_loc(label)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3655, in get_loc
    raise KeyError(key) from err
KeyError: 'iteritems'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/bin/ena-upload-cli", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/ena_upload.py", line 953, in main
    schema_xmls = run_construct(
                  ^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/ena_upload.py", line 298, in run_construct
    schema_xmls[schema] = construct_xml(schema, stream, xsds[schema])
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/ena_upload.py", line 235, in construct_xml
    xml_string = stream.render(method='xml', encoding='utf-8')
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/core.py", line 184, in render
    return encode(generator, method=method, encoding=encoding, out=out)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 59, in encode
    return _encode(''.join(list(iterator)))
                           ^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 243, in __call__
    for kind, data, pos in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 674, in __call__
    for kind, data, pos in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 779, in __call__
    for kind, data, pos in chain(stream, [(None, None, None)]):
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/output.py", line 598, in __call__
    for ev in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/core.py", line 292, in _ensure
    for event in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/base.py", line 641, in _include
    for event in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/markup.py", line 326, in _match
    for event in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/base.py", line 581, in _flatten
    for kind, data, pos in stream:
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/directives.py", line 369, in __call__
    iterable = _eval_expr(self.expr, ctxt, vars)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/base.py", line 290, in _eval_expr
    retval = expr.evaluate(ctxt)
             ^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 160, in evaluate
    return eval(self.code, _globals, {'__data__': data})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/ena_upload/templates/ENA_template_runs.xml", line 14, in <Expression 'iter(run_groups.iteritems())'>
    <py:for each="alias, experiment_alias in run_groups.iteritems()">
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 309, in lookup_attr
    val = cls.undefined(key, owner=obj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/bi/fduarte/miniconda3/envs/ena-iupload/lib/python3.11/site-packages/genshi/template/eval.py", line 397, in undefined
    raise UndefinedError(key, owner=owner)
genshi.template.eval.UndefinedError: alias
run_alias_1a    [experiment_alias_7a]
run_alias_3c    [experiment_alias_9c]
Name: experiment_alias, dtype: object has no member named "iteritems"

I downgraded to pandas 1.5.3 and now it seems to work fine. I used

ena-upload-cli --action add --center 'CRG' --study ena_templates/example_tables/ENA_template_studies.tsv --sample ena_templates/example_tables/ENA_template_samples.tsv --experiment ena_templates/example_tables/ENA_template_experiments.tsv --run ena_templates/example_tables/ENA_template_runs.tsv --data ena_templates/example_data/*gz --dev --secret ena_templates/.secret.yml --draft --no_data_upload

for running the tool.
Thanks,

Add support for custom attributes

Currently we only parse the attributes that are being used by ENA. Off course the xml can hold more information and this is also stored on ENAs side (although not displayed).

Submitting new project to ENA

Hello,
I'm attempting to submit non-viral raw reads to ENA using the test Webin submission portal. The tool appears to be functioning correctly, but it isn't creating a new project on the ENA portal. The files only become accessible the next day under the "Unsubmitted files" section. Is this normal? How can I successfully submit a project?
Let me know if you need more informations.
Thank you .

--no_upload parameter not working

when running the example command following error is thrown:

ena-upload-cli --action add --center 'VIB-UGENT' --study example_tables/ENA_template_studies.tsv --sample example_tables/ENA_template_samples.tsv --experiment example_tables/ENA_template_experiments.tsv --run example_tables/ENA_template_runs.tsv --data example_data/*gz --dev --secret .secret.yml --no_upload
No files will be uploaded, remove `--no_upload' argument to perform upload.
No valid checksums found, generate now... Traceback (most recent call last):
  File "/home/bedro/.local/bin/ena-upload-cli", line 8, in <module>
    sys.exit(main())
  File "/home/bedro/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 733, in main
    md5 = df['file_name'].apply(lambda x: file_md5[x]).values
  File "/home/bedro/.local/lib/python3.8/site-packages/pandas/core/series.py", line 4138, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer
  File "/home/bedro/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 733, in <lambda>
    md5 = df['file_name'].apply(lambda x: file_md5[x]).values
KeyError: 'ENA_TEST2.R1.fastq.gz'

Possible values for file_format are not clear

For the field file_format in the ENA run table.
Possible values for the field would be the ones listed in ENA_template_FILE.xml.
However, if any of the following values are used, an error message is generated.

  • 454_native
  • 454_native_qual
  • 454_native_seq
  • fasta
  • helicos_native
  • illumina_native
  • illumina_native_int
  • illumina_native_prb
  • illumina_native_qseq
  • illumina_native_scarf
  • illumina_native_seq
  • solid_native
  • solid_native_csfasta
  • solid_native_qual
  • sra
  • tab
Click to show error message

<MESSAGES>
          <ERROR>In run, alias:"run_1", accession:"", In filename:"1.bam", filetype:"454_native". Invalid file type "454_native".</ERROR>
          <ERROR>In run, alias:"run_10", accession:"", In filename:"10.bam", filetype:"Illumina_native". Invalid file type "Illumina_native".</ERROR>
          <ERROR>In run, alias:"run_11", accession:"", In filename:"11.bam", filetype:"Illumina_native_int". Invalid file type "Illumina_native_int".</ERROR>
          <ERROR>In run, alias:"run_12", accession:"", In filename:"12.bam", filetype:"Illumina_native_prb". Invalid file type "Illumina_native_prb".</ERROR>
          <ERROR>In run, alias:"run_13", accession:"", In filename:"13.bam", filetype:"Illumina_native_qseq". Invalid file type "Illumina_native_qseq".</ERROR>
          <ERROR>In run, alias:"run_14", accession:"", In filename:"14.bam", filetype:"Illumina_native_scarf". Invalid file type "Illumina_native_scarf".</ERROR>
          <ERROR>In run, alias:"run_15", accession:"", In filename:"15.bam", filetype:"Illumina_native_seq". Invalid file type "Illumina_native_seq".</ERROR>
          <ERROR>In run, alias:"run_16", accession:"", In filename:"16.tar", filetype:"OxfordNanopore_native". Invalid file suffix for file "16.tar". File compression is required for file type "OxfordNanopore_native". Supported compression formats are: BZIP2, GZIP with file suffixes: .bz2, .gz.</ERROR>
          <ERROR>In run, alias:"run_19", accession:"", In filename:"19.bam", filetype:"SOLiD_native". Invalid file type "SOLiD_native".</ERROR>
          <ERROR>In run, alias:"run_2", accession:"", In filename:"2.bam", filetype:"454_native_qual". Invalid file type "454_native_qual".</ERROR>
          <ERROR>In run, alias:"run_20", accession:"", In filename:"20.bam", filetype:"SOLiD_native_csfasta". Invalid file type "SOLiD_native_csfasta".</ERROR>
          <ERROR>In run, alias:"run_21", accession:"", In filename:"21.bam", filetype:"SOLiD_native_qual". Invalid file type "SOLiD_native_qual".</ERROR>
          <ERROR>In run, alias:"run_22", accession:"", In filename:"22.fastq", filetype:"sra". Invalid file type "sra".</ERROR>
          <ERROR>In run, alias:"run_24", accession:"", In filename:"24.fastq", filetype:"tab". Invalid file type "tab".</ERROR>
          <ERROR>In run, alias:"run_3", accession:"", In filename:"3.bam", filetype:"454_native_seq". Invalid file type "454_native_seq".</ERROR>
          <ERROR>In run, alias:"run_7", accession:"", In filename:"7.fastq", filetype:"fasta". Invalid file type "fasta".</ERROR>
          <ERROR>In run, alias:"run_9", accession:"", In filename:"9.bam", filetype:"Helicos_native". Invalid file type "Helicos_native".</ERROR>
          <ERROR>In run, alias:"run_1", accession:"". Invalid group of files: 1 "454_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_10", accession:"". Invalid group of files: 1 "Illumina_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_11", accession:"". Invalid group of files: 1 "Illumina_native_int" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_12", accession:"". Invalid group of files: 1 "Illumina_native_prb" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_13", accession:"". Invalid group of files: 1 "Illumina_native_qseq" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_14", accession:"". Invalid group of files: 1 "Illumina_native_scarf" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_15", accession:"". Invalid group of files: 1 "Illumina_native_seq" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_19", accession:"". Invalid group of files: 1 "SOLiD_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_2", accession:"". Invalid group of files: 1 "454_native_qual" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_20", accession:"". Invalid group of files: 1 "SOLiD_native_csfasta" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_21", accession:"". Invalid group of files: 1 "SOLiD_native_qual" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_22", accession:"". Invalid group of files: 1 "sra" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_24", accession:"". Invalid group of files: 1 "tab" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_3", accession:"". Invalid group of files: 1 "454_native_seq" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_7", accession:"". Invalid group of files: 1 "fasta" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <ERROR>In run, alias:"run_9", accession:"". Invalid group of files: 1 "Helicos_native" file. Supported file grouping(s) are: [ at least 1 "CompleteGenomics_native" files],[ at least 1 "fastq" files],[1 "OxfordNanopore_native" file],[ at least 1 "PacBio_HDF5" files],[1 "bam" file],[1 "cram" file],[1 "sff" file],[1 "srf" file].</ERROR>
          <INFO>This submission is a TEST submission and will be discarded within 24 hours</INFO>
</MESSAGES>

ENA might be using another list of file formats to validate. Their documentation at Accepted Read Data Formats either does not list the values leading to the error, or flags them as deprecated.

Support FTPS connection to protect account credidentials

The FTP protocol does not support secure connections between the client and the server and account credentials are sent as plain text. This might not be an issue depending on how the traffic is routed but it not good practice for general use cases.

Library name generation

the contents of Library Name (Experiment metadata) appear to be ignored and automatically generated by concatenating "library""Experiment Alias""Sample Alias"

Submitting paired reads plus their indices

Hi,

we're currently preparing a submission for the (now discontinued) linked-read sequencing from 10X Genomics. Similar to the current 10X single-cell RNA-Seq datasets, our datasets consists of two paired-end FASTQ files plus an index file. The data is whole-genome sequencing, so I suppose ENA is the correct place to submit them.

However, for each set of RUNs that includes an index file, ena-upload-cli returns the following error. Is this something I can mitigate on my end, or is it tool or even an issue for ENA?

Error:

Oops:
In run, alias: "<...>". Read type information missing in run.

Thanks and best,
Fritjof

The read operation timed out

Hi there,
I am unable get this the following command to run on my ubuntu VM. The tool was installed using the pip command (pip install ena-upload-cli). My ubuntu VM already has the ftp port 21 open by default. Any thoughts?

ena-upload-cli --action add --center 'BioCommons Australia' --study ENA_template_studies.tsv --sample ENA_template_samples.tsv --experiment ENA_template_experiments.tsv --run ENA_template_runs.tsv --data *gz -d --secret .secret.yml
Check if all required columns are present in the study table.
Check if all required columns are present in the sample table.
Check if all required columns are present in the experiment table.
Check if all required columns are present in the run table.
No valid checksums found, generate now... done.

Connecting to ftp.webin2.ebi.ac.uk....
uploading /home/ubuntu/ena/ENA_TEST1.R1.fastq.gz
ERROR: The read operation timed out
ERROR: If your connection times out at this stage, it propably is because of a firewall that is in place. FTP is used in passive mode and connection will be opened to one of the ports: 40000 and 50000.

Traceback (most recent call last):
File "/home/ubuntu/.local/bin/ena-upload-cli", line 11, in
load_entry_point('ena-upload-cli==0.6.1', 'console_scripts', 'ena-upload-cli')()
File "/home/ubuntu/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 925, in main
submit_data(file_paths, password, webin_id)
File "/home/ubuntu/.local/lib/python3.8/site-packages/ena_upload/ena_upload.py", line 424, in submit_data
print(ftps.storbinary(f'STOR {filename}', open(path, 'rb')))
File "/usr/lib/python3.8/ftplib.py", line 504, in storbinary
conn.unwrap()
File "/usr/lib/python3.8/ssl.py", line 1285, in unwrap
s = self._sslobj.shutdown()
socket.timeout: The read operation timed out

Many thanks,

Possibility to upload non-mandatory (virus) metadata

Thanks for the nice helper. However, I noticed that currently ena-upload-cli supports only mandatory ERC000033 fields in their XML templates. I came up for myself with a little hack with updated XML forms https://github.com/avilab/ena-upload-cli/tree/location to get some additional metadata uploaded, e.g. age, geographic location locality+lon/lat. I appreciate that under current implementation there is no simple fix to include any of optional checklist fields, given that ENA database may not accept empty fields(not sure).

One way to fix this possible issue would be, very briefly, to check imported tables against schemas and serialise dictionaries to XML. So that any combination of non-mandatory fields can be included.

I happy to be corrected and directed to the right path if there is a way to include optional/recommended (virus) metadata fields using this app as it is.

Most minimal example not working

ena_upload --action add --center 'your_center_name' --study example_tables/ENA_template_studies.tsv --sample example_tables/ENA_template_samples.tsv --experiment example_tables/ENA_template_experiments.tsv --run example_tables/ENA_template_runs.tsv --data example_data/*gz --dev --secret .secret.yml

This throws the error:

Traceback (most recent call last):
  File "c:\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\rabuo\ena-upload-cli\Scripts\ena-upload-cli.exe\__main__.py", line 7, in <module>
  File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 771, in main
    schema_xmls = run_construct(
  File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 243, in run_construct
    schema_xmls[schema] = construct_xml(schema, stream, xsds[schema])
  File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 182, in construct_xml
    validate_xml(xsd, xml_string)
  File "c:\users\rabuo\ena-upload-cli\lib\site-packages\ena_upload\ena_upload.py", line 133, in validate_xml
    return xmlschema.assertValid(doc)
  File "src\lxml\etree.pyx", line 3623, in lxml.etree._Validator.assertValid
lxml.etree.DocumentInvalid: Element 'SAMPLE_ATTRIBUTES': Missing child element(s). Expected is ( SAMPLE_ATTRIBUTE )., line 10

This is because the default checklist ERC000011 has only optional fields, and if no value is given for one of them, the template will create a SAMPLE_ATTRIBUTES object without SAMPLE_ATTRIBUTE children (since they are all optional)

Solution: extra if statement for the <SAMPLE_ATTRIBUTES> object in the template that checks if the row contains an optional field

Missing / Wrong sequencer identifiers in templates

Hi,

when I processed some of our data, the data validation failed with an error message relating to missing "Illumina" elements in the XML. I could solve this by adding the following lines to

ENA_template_PLATFORM.XML (l.28):
<INSTRUMENT_MODEL py:when="row.instrument_model.lower().strip() == 'illumina novaseq 6000'">Illumina NovaSeq 6000</INSTRUMENT_MODEL>

SRA.common.xsd (l.911):
<xs:enumeration value="Illumina NovaSeq 6000"/>

In course of doing this, I also noticed that the NextSeq and HiSeq X platforms are listed without the Illumina prefix, e.g.

<INSTRUMENT_MODEL py:when="row.instrument_model.lower().strip() == 'illumina hiseq 4000'">Illumina HiSeq 4000</INSTRUMENT_MODEL>
<INSTRUMENT_MODEL py:when="row.instrument_model.lower().strip() == 'nextseq 550'">NextSeq 550</INSTRUMENT_MODEL>

I'm happy to provide a PR for this, however, I wonder if this is the clean way to do or if the files were originally fetched from ENA and should be rather fixed there.

Best

Fritjof

The license_file parameter is deprecated, use license_files instead.

Hi,

I think there might be a problem with how version 0.6.3 of the upload client is built, since I can't install it using pip. I have no problem installing version 0.6.2. Here is the installation log:

ena.log

I would take the chance to ask about uploading an analysis. I have an count table with samples as columns and genome accessions as rows, and apart from that, the unprocessed reads from deep shotgun sequencing. I know how to proceed with the reads, but how should I do with the count table? Is it possible to add this table to the study?

Also, does the sample table allows for custom fields? Besides the fields from the ENA checklist.

If these last two enquiries do not belong here, I am happy to move them elsewhere.

Thank you.

Empty rows give error

If rows are given without values, errors like

genshi.template.eval.UndefinedError: nan has no member named "lower"

Are thrown

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.