sanger-pathogens / seroba Goto Github PK

k-mer based Pipeline to identify the Serotype from Illumina NGS reads

Home Page: https://sanger-pathogens.github.io/seroba/

License: Other

Shell 2.71% Python 96.62% Dockerfile 0.67%

genomics sequencing next-generation-sequencing research bioinformatics bioinformatics-pipeline global-health infectious-diseases pathogen

seroba's Issues

Default for runSerotyping coverage not being set correctly

Dear seroba team,

I have run into an issue where serotyping won't run unless I explicitly set coverage.

root@d2d570a8775d:/# seroba runSerotyping seroba/database/ data/testsample_1.fastq.gz data/testsample_2.fastq.gz data/TESTcontainer
Traceback (most recent call last):
  File "/usr/local/bin/seroba", line 4, in <module>
    __import__('pkg_resources').run_script('seroba==1.0.2', 'seroba')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 650, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1453, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.8/dist-packages/seroba-1.0.2-py3.8.egg/EGG-INFO/scripts/seroba", line 86, in <module>
  File "/usr/local/lib/python3.8/dist-packages/seroba-1.0.2-py3.8.egg/seroba/tasks/sero_run.py", line 13, in run
  File "/usr/local/lib/python3.8/dist-packages/seroba-1.0.2-py3.8.egg/seroba/serotyping.py", line 34, in __init__
TypeError: unsupported operand type(s) for /: 'NoneType' and 'float'

It seems like a default value for cov is not being set? I can get it to run by explicitly specifying coverage:

root@d2d570a8775d:/# seroba runSerotyping --coverage 20 /seroba/database/ /data/ERR1438805_1.fastq.gz /data/ERR1438805_2.fastq.gz /data/TESTcontainer

I am using the latest docker container from sangerpathogens/seroba (b4f4e60ee092)

Names for forwards and reverse reads does not match. Cannot continue

I am getting this error:

Names for forwards and reverse reads does not match. Cannot continue

My R1 file:

@NB551233:47:H5WGVAFXY:1:11101:24878:1062 1:N:0:CGAGGCTG+NCTCTAGG

My R2 file:

@NB551233:47:H5WGVAFXY:1:11101:24878:1062 2:N:0:CGAGGCTG+NCTCTAGG

CC: @aunderwo

AttributeError: 'Namespace' object has no attribute 'database_dir'

% seroba getPneumocat db

Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/bin/seroba", line 4, in <module>
    __import__('pkg_resources').run_script('seroba==0.1.4', 'seroba')
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 742, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1510, in run_script
    exec(script_code, namespace, namespace)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/seroba-0.1.4-py3.6.egg/EGG-INFO/scripts/seroba", line 86, in <module>
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/seroba-0.1.4-py3.6.egg/seroba/tasks/getPneumocat.py", line 6, in run
AttributeError: 'Namespace' object has no attribute 'database_dir'

Can't build DB? False | ariba prepare_ref | no such file

False
no such file
ariba prepareref -f /home/linuxbrew/db/temp_aribaX5nkguz0r/temp_fasta_ref.fasta -m /home/linuxbrew/db/temp_aribaX5nkguz0r/temp_meta_ref.tsv --cdhit_clusters /home/linuxbrew/db/temp_aribaX5nkguz0r/cdhit_cluster_ref seroba/ariba_db/01/ref
False
no such file
ariba prepareref -f /home/linuxbrew/db/temp_aribaXxkf8ld23/temp_fasta_ref.fasta -m /home/linuxbrew/db/temp_aribaXxkf8ld23/temp_meta_ref.tsv --cdhit_clusters /home/linuxbrew/db/temp_aribaXxkf8ld23/cdhit_cluster_ref seroba/ariba_db/02/ref
False
no such file
ariba prepareref -f /home/linuxbrew/db/temp_aribaXhf22fy64/temp_fasta_ref.fasta -m /home/linuxbrew/db/temp_aribaXhf22fy64/temp_meta_ref.tsv --cdhit_clusters /home/linuxbrew/db/temp_aribaXhf22fy64/cdhit_cluster_ref seroba/ariba_db/03/ref
False
no such file
ariba prepareref -f /home/linuxbrew/db/temp_aribaXkkhn_2b7/temp_fasta_ref.fasta -m /home/linuxbrew/db/temp_aribaXkkhn_2b7/temp_meta_ref.tsv --cdhit_clusters /home/linuxbrew/db/temp_aribaXkkhn_2b7/cdhit_cluster_ref seroba/ariba_db/04/ref
False

Dependencies issues when using seroba through conda

Hello! I have encountered few issues while trying to run seroba through a conda environment. One is the same reported in issue #59 with biopython and the other one is with bowtie2. In the later one, seroba runs properly at the beginning but then it stops with an error about not being able to get the bowtie2 --version. The specific error that bowtie2 throws is this:

error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory

I tracked down both errors and it turns out that in order to use seroba, you need to list as dependencies biopython=1.74 (see this forum) and tbb=2020.3 (see this other forum). I made few tests and it seems to work fine for me (with seroba 1.0.0 and 1.0.2).

It would be really helpful to have these dependencies properly documented (I assume the error is not unique from the conda version but I did not test it) or simply have them included in the installation.

The "cd_cluster.tsv" created recently is different from the previous one

cd_cluster_old.txt
cd_cluster_new.txt

The cd_cluster_old.txt is the previous one. And the cd_cluster_new.txt is the new cd_cluster.tsv the program created when I tried to build another copy in my another device.
Is this due to new version of KMC or Python3 I used in my new device?

The new cd_cluster.tsv looks like to have some problems and will make the seroba serotyping end with errors for some serotypes.
Please let me know if you could recreate the problem and any solution?

Josh

Readme mentions SVN on a git database

For SeroBA version 0.1.3 and greater, download the database provided within this git repository:

Install svn
svn checkout "https://github.com/sanger-pathogens/seroba/trunk/database"

serotyping from assemblies as input

Dear Friends,

I don't have reads, but assemblies (one or mere contigs) of S. pneumoniae.
How can I assign the serotype?
seroBa doesn't look like to support fasta input right?

Bests,
Alex

No detailed output file

Hello.

Nice tool! Thank you.

I seem to have run in to an unexpected behaviour. When I run version 1.0.1 on a my test case, I only get a pred.tsv file with three columns. I see no detailed_serogroup_info.txt. Admittedly, I have only run it on a single sample, and the third column suggests it might be contaminated. I am wondering if I missed out on the detailed_serogroup_info because the sample appears contaminated.

Thank you.

Anders.

TypeError: argument of type 'NoneType' is not iterable

On a sample, SeroBA encounters a fatal Python error

cluster detected 1 threads available to it
cluster reported completion
cluster_3 detected 1 threads available to it
cluster_3 reported completion
cluster_4 detected 1 threads available to it
cluster_4 reported completion
cluster_6 detected 1 threads available to it
cluster_6 reported completion

0.013121071707115657
/seroba-1.0.2/build/kmc_tools simple /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/NP-0087-IDRL-AKU_S92_trimmed seroba/database/kmer_db/11F/11F intersect /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/inter
0.03264454465908326
/seroba-1.0.2/build/kmc_tools simple /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/NP-0087-IDRL-AKU_S92_trimmed seroba/database/kmer_db/06C/06C intersect /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/inter
0.034005116366132154
/seroba-1.0.2/build/kmc_tools simple /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/NP-0087-IDRL-AKU_S92_trimmed seroba/database/kmer_db/10A/10A intersect /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/inter
0.018366189193022266
15C
{'15A': 0, '15B': 0, '15C': 0, '15F': 16}
15A
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
15B
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
15C
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
15C
15F
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
{'15A': -1, '15B': 0, '15C': -2.5, '15F': 15}
{'15A': {'genes': [], 'pseudo': [], 'allele': [], 'snps': []}, '15B': {'genes': [], 'pseudo': [], 'allele': [], 'snps': []}, '15C': {'genes': [], 'pseudo': ['wciZ'], 'allele': [], 'snps': []}, '15F': {'genes': [], 'pseudo': [], 'allele': [], 'snps': []}}
['15A', '15C']
15A
15B/15C
15C
None
Traceback (most recent call last):
  File "/usr/local/bin/seroba", line 4, in <module>
    __import__('pkg_resources').run_script('seroba==1.0.2', 'seroba')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1445, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/EGG-INFO/scripts/seroba", line 86, in <module>
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/tasks/sero_run.py", line 19, in run
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/serotyping.py", line 481, in run
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/serotyping.py", line 453, in _prediction
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/serotyping.py", line 397, in _find_serotype
TypeError: argument of type 'NoneType' is not iterable

The relevant part in the code is

seroba/seroba/serotyping.py

Lines 392 to 397 in 8138dc8

 if mixed_serotype != None: 

 for key in min_keys: 

 print(key) 

 print(mixed_serotype) 

 if key not in mixed_serotype: 

 mixed_serotype = None

I think this piece of code has a logic flaw.

While iterating through min_keys in line 393:

when if key not in mixed_serotype at line 396 is true, mixed_serotype is therefore set to None.
In the next loop, if key not in mixed_serotype at line 396 is effectively turns into if key not in None and leads to the Python error TypeError: argument of type 'NoneType' is not iterable

Fixing this seems to be trivial, but I am not sure which one is the right approach:

looping through all min_keys, only when all keys are not in mixed_serotype, then mixed_serotype should be set to None
looping through all min_keys, when any key is not in mixed_serotype, mixed_serotype should be set to None and exit the loop

pkg_resources.ResolutionError: No script named 'seroba'

I reinstalled from git (git clone seroba && cd seroba && python3 setup.py install):

% seroba

Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/bin/seroba", line 4, in <module>
    __import__('pkg_resources').run_script('seroba==0.1.5', 'seroba')
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 748, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1509, in run_script
    raise ResolutionError("No script named %r" % script_name)
pkg_resources.ResolutionError: No script named 'seroba'

error

Hello,
I install seroba and download the database from PneumoCaT. But I am not able to create the database.
(py36)[ctsui@grl-salk Strep_sero]$ seroba createDBs pneumoDB 71
Traceback (most recent call last):
File "/home/ctsui/.conda/envs/py36/bin/seroba", line 86, in
args.func(args)
File "/home/ctsui/.conda/envs/py36/lib/python3.6/site-packages/seroba/tasks/createDBs.py", line 10, in run
ref_db.run()
File "/home/ctsui/.conda/envs/py36/lib/python3.6/site-packages/seroba/ref_db_creator.py", line 237, in run
os.makedirs(os.path.join(self.out_dir,'ariba_db'))
File "/home/ctsui/.conda/envs/py36/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'pneumoDB/ariba_db'
(

Will appreciate for advice! Thanks,

Clement

Dodgy readme formatting

https://github.com/sanger-pathogens/seroba#debian-testingubuntu-1604-xenial

After kmc, error "Stopping! Signal received: 13"

I was wondering what this error means?

<snip>
0.02010796221322537
19A
cluster detected 1 threads available to it
cluster reported completion
Stopping! Signal received: 13

provide a pip3 install https:// install line

This would make it much easier

Can't build the ariba database

Hi,
After installation with conda install -c bioconda seroba the ariba database doesn't load with seroba createDBs my_database/ 71 because installed tbb=2021.2.0 is incompatible with bowtie2 (for ariba). After downgrading to tbb=2020.2 there is no bowtie2 issue anymore, but I'm getting another error while trying to load the database (see below and attached), namely the prepare_ref issue related to issue #37 despite PR #38, this also result in segmentation fault (core dumped) issue. Any help would be greatly appreciated.
See also attached log.
False no such file ariba prepareref -f /mnt/c/Users/wille/Documents/Pneumococcus/ngsserotyping/db/temp_aribaXjuhh_i6n/temp_fasta_ref.fasta -m /mnt/c/Users/wille/Documents/Pneumococcus/ngsserotyping/db/temp_aribaXjuhh_i6n/temp_meta_ref.tsv --max_noncoding_length 50000 --cdhit_clusters /mnt/c/Users/wille/Documents/Pneumococcus/ngsserotyping/db/temp_aribaXjuhh_i6n/cdhit_cluster_ref /mnt/c/Users/wille/Documents/Pneumococcus/ngsserotyping/db/ariba_db/01/ref False

-ci1 -m1 -t1 -fm /home/wmiellet/anaconda3/envs/env-seroba/bin/kmc -k71 -ci1 -m1 -t1 -fm /mnt/c/Users/wille/Documents/Pneumococcus/ngsserotyping/db/kmer_db/01/01.fasta /mnt/c/Users/wille/Documents/Pneumococcus/ngsserotyping/db/kmer_db/01/01 /mnt/c/Users/wille/Documents/Pneumococcus/ngsserotyping/db/kmer_db/01 Segmentation fault (core dumped)
seroba_log.txt

Run Seroba with PneumoCat v1.2.1??

Hi,
Noticed an issue while setting up the database for Seroba. The program is downloading an outdated version of the Pneumocat database (v1.1). The subsequent versions of Pneumocat have included important revisions, particularly on serotype 15A and serogroup 19.

Is there a way to run Seroba using the Pneumocat database v1.2.1??
Thanks.

Swiss_NT

Hello
I am trying to understand the difference between "Swiss_NT" and "untypable" serotypes in Seroba. I have a set of non-encapsulated S. pneumoniae strains from the published literature which I serotyped again using Seroba. Some are called "Swiss_NT" while some are predicted as "untypable" by Seroba. I can not find any information on "Swiss_NT" in the Seroba manual. Is there any paper that explains this serotype?

Thanks
Tauqeer

Names for forwards and reverse reads does not match. Cannot continue

seroba runSerotyping databases=/data1/tools/seroba_db/database/ read1=1_S1_1.fq.gz read2= 1_S1_2.fq.gz prefix=/home/kdyoung/seroba/

vesion=1.0.2
sample name= 1_S1_1.fq.gz 1_S1_2.fq.gz

what happen?

plz solve

Empty summary.tsv file

Hi,

I downloaded seroba via conda and ran it as recommended. However, although I get pred.tsv output for all genomes, I don't get a summary.tsv output. I used a for loop to run the commands for all my genome files since it was easier that way (see below):

#define serotype
for f in ./*_1P.fastq
do
 base=$(basename $f "_1P.fastq")
 basebam=$(basename $f "_L1_out_1P.fastq")
 seroba runSerotyping $out/Pneumocat-dir/ ${base}_1P.fastq ${base}_2P.fastq $out/summary_out/seroBA_${basebam} &&
 seroba summary $out/summary_out/seroBA_${basebam}
done

When I realized that I didn't get the summary output file, I decided to run the seroba summary command for just one output folder (instead of the for loop), but that still didn't work. As you can see in the snapshot below, the size of the file remains 0 KB.

Is there something I did wrong? Thanks for your help in advance.

KeyError:'24B'

Hey everyone,

I am getting a KeyError when I am running seroba on my test data.

Traceback (most recent call last):
  File "/home/user/.local/bin/seroba", line 86, in <module>
    args.func(args)
  File "/home/user/.local/lib/python3.8/site-packages/seroba/tasks/sero_run.py", line 19, in run
    sero.run()
  File "/home/user/.local/lib/python3.8/site-packages/seroba/serotyping.py", line 479, in run
    cluster = self.serotype_cluster_dict[self.best_serotype]
KeyError: '24B'

I am not sure what it means or how to fix it. Maybe someone had the same or a similar problem ?

Thank you in advance!

Typo in docs

Copy the databse to a directory:

=> database

"Bash Run_nucmer.sh" Error

Dear Developer,

In what case will seroba run run_nucmer.sh? And where is this assembly file which rummer requires from "/data/xxxxxxx/result/assemblies.fa"

Do I need to provide assembly for seroba?

This probably means that very few reads were mapped at all. No local assemblies will be run
WARNING: not enough proper read pairs (found 0) to determine insert size.
This probably means that very few reads were mapped at all. No local assemblies will be run
The following command failed with exit code 255
bash run_nucmer.sh

The output was:

bash: warning: setlocale: LC_ALL: cannot change locale (en_US.utf8)
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = "en_US.utf8",
	LC_COLLATE = "C",
	LANG = "en_AU.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
reading input file "p.ntref" of length 8097 
construct suffix tree for sequence of length 8097
(maximum reference length is 536870908)
(maximum query length is 4294967295)
CONSTRUCTIONTIME /usr/bin/mummer p.ntref 0.00
/usr/bin/mummer: cannot open file "/data/xxxxxxxx/result/assemblies.fa" or file "/data/xxxxxxx/result/assemblies.fa" is empty
ERROR: mummer and/or mgaps returned non-zero
ERROR: Could not parse delta file, p.delta
error no: 400
ERROR: Could not parse delta file, p.delta.filter
error no: 402
ERROR: Could not parse delta file, p.delta.filter
error no: 402

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pymummer/syscall.py", line 20, in run
    output = subprocess.check_output(cmd, shell=True, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.7/subprocess.py", line 395, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.7/subprocess.py", line 487, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'bash run_nucmer.sh' returned non-zero exit status 255.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/seroba", line 4, in <module>
    __import__('pkg_resources').run_script('seroba==1.0.1', 'seroba')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1453, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.7/dist-packages/seroba-1.0.1-py3.7.egg/EGG-INFO/scripts/seroba", line 86, in <module>
  File "/usr/local/lib/python3.7/dist-packages/seroba-1.0.1-py3.7.egg/seroba/tasks/sero_run.py", line 19, in run
  File "/usr/local/lib/python3.7/dist-packages/seroba-1.0.1-py3.7.egg/seroba/serotyping.py", line 481, in run
  File "/usr/local/lib/python3.7/dist-packages/seroba-1.0.1-py3.7.egg/seroba/serotyping.py", line 453, in _prediction
  File "/usr/local/lib/python3.7/dist-packages/seroba-1.0.1-py3.7.egg/seroba/serotyping.py", line 269, in _find_serotype
  File "/usr/lib/python3/dist-packages/pymummer/nucmer.py", line 144, in run
    syscall.run('bash ' + script, verbose=self.verbose)
  File "/usr/lib/python3/dist-packages/pymummer/syscall.py", line 26, in run
    raise Error('Error running command:', cmd)
pymummer.syscall.Error: ('Error running command:', 'bash run_nucmer.sh')```

Josh

seroba stopping signal recieved 28

cluster detected 1 threads available to it
cluster reported completion
The following command failed with exit code 1
rm -rf /media/crlkims/Data_Vol_1/ROSE/Wghole_genome_sequencing_pneumoniae/RawData_385/ERR1638455/fastq_files/seroba_out/ref/ariba.tmp.1_w_k7u0/cluster

The output was:

rm: cannot remove '/media/crlkims/Data_Vol_1/ROSE/Wghole_genome_sequencing_pneumoniae/RawData_385/ERR1638455/fastq_files/seroba_out/ref/ariba.tmp.1_w_k7u0/cluster': Directory not empty

Stopping! Signal received: 28
Traceback (most recent call last):
File "/home/crlkims/anaconda2/bin/seroba", line 4, in
import('pkg_resources').run_script('seroba==1.0.1', 'seroba')
File "/home/crlkims/.local/lib/python3.6/site-packages/pkg_resources/init.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/crlkims/.local/lib/python3.6/site-packages/pkg_resources/init.py", line 1460, in run_script
exec(script_code, namespace, namespace)
File "/home/crlkims/anaconda2/lib/python3.6/site-packages/seroba-1.0.1-py3.6.egg/EGG-INFO/scripts/seroba", line 86, in
File "/home/crlkims/anaconda2/lib/python3.6/site-packages/seroba-1.0.1-py3.6.egg/seroba/tasks/sero_run.py", line 19, in run
File "/home/crlkims/anaconda2/lib/python3.6/site-packages/seroba-1.0.1-py3.6.egg/seroba/serotyping.py", line 480, in run
File "/home/crlkims/anaconda2/lib/python3.6/site-packages/seroba-1.0.1-py3.6.egg/seroba/serotyping.py", line 94, in _run_ariba_on_cluster
File "/home/crlkims/anaconda2/lib/python3.6/shutil.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: 'seroba_out/genes/assembled_genes.fa.gz'

I am gettig report.tsv but its not complete

YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated,

seroba  getPneumocat seroba
--2019-04-03 09:34:36--  https://github.com/phe-bioinformatics/PneumoCaT/archive/v1.1.tar.gz
Resolving github.com (github.com)... 192.30.255.113, 192.30.255.112
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/phe-bioinformatics/PneumoCaT/tar.gz/v1.1 [following]
--2019-04-03 09:34:37--  https://codeload.github.com/phe-bioinformatics/PneumoCaT/tar.gz/v1.1
Resolving codeload.github.com (codeload.github.com)... 192.30.255.120, 192.30.255.121
Connecting to codeload.github.com (codeload.github.com)|192.30.255.120|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: 'v1.1.tar.gz'

v1.1.tar.gz                  [              <=>               ] 320.07M  3.77MB/s    in 78s     

2019-04-03 09:35:56 (4.09 MB/s) - 'v1.1.tar.gz' saved [335618305]

/home/linuxbrew/.linuxbrew/opt/python/lib/python3.7/site-packages/seroba/get_pneumocat_data.py:47: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  allele_snp=yaml.load( open( os.path.join(serogroup_dir,subdir,'mutationdb.yml'), "rb" ) )

Got an error massage through docker with Ubuntu20.04 LTS

When I run seroBA using docker 20.10.1 in Ubuntu20.04LTS, I got an error massage as follows,

Traceback (most recent call last):
File "/usr/local/bin/seroba", line 4, in
import('pkg_resources').run_script('seroba==1.0.2', 'seroba')
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 650, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1453, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.8/dist-packages/seroba-1.0.2-py3.8.egg/EGG-INFO/scripts/seroba", line 86, in
File "/usr/local/lib/python3.8/dist-packages/seroba-1.0.2-py3.8.egg/seroba/tasks/sero_run.py", line 13, in run
File "/usr/local/lib/python3.8/dist-packages/seroba-1.0.2-py3.8.egg/seroba/serotyping.py", line 34, in init
TypeError: unsupported operand type(s) for /: 'NoneType' and 'float'

How can I solve this problem?

latest build on docker hub is non functional

Greetings,

The latest build on dockerhub (tag latest) is non-functional. The program is exiting with the following error:

(base) cimendes@lobo-1:~/pneumo/in_silico_serotype/seroba_fabio_2021$ srun --nodes=1 --ntasks=1 --cpus-per-task=4 shifter --image=sangerpathogens/seroba:latest seroba runSerotyping /seroba/database/ /home/cimendes/pneumo/in_silico_serotype/seroba_fabio_2021/concatenated_reads/2017PP664_1.fastq.gz /home/cimendes/pneumo/in_silico_serotype/seroba_fabio_2021/concatenated_reads/2017PP664_2.fastq.gz /mnt/nfs/lobo/ONEIDA-NFS/cimendes/pneumo/in_silico_serotype/seroba_fabio_2021/outdir/2017PP664/seroba;
Traceback (most recent call last):
File "/usr/local/bin/seroba", line 4, in
import('pkg_resources').run_script('seroba==1.0.2', 'seroba')
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 650, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1453, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python3.8/dist-packages/seroba-1.0.2-py3.8.egg/EGG-INFO/scripts/seroba", line 86, in
File "/usr/local/lib/python3.8/dist-packages/seroba-1.0.2-py3.8.egg/seroba/tasks/sero_run.py", line 13, in run
File "/usr/local/lib/python3.8/dist-packages/seroba-1.0.2-py3.8.egg/seroba/serotyping.py", line 34, in init
TypeError: unsupported operand type(s) for /: 'NoneType' and 'float'
srun: error: compute-1: task 0: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=2113593.0

Running the same command with build sangerpathogens/seroba:remove_sanger_pathogen_email works as expected.

Hi..any suggestions on how this error was resolved for you?

Hi..any suggestions on how this error was resolved for you? I am using singularity image on centos7 without admin previleges.

Originally posted by @sreerampeela in #58 (comment)

errors during installation via conda/mamba

Dear Lennard,

is the tool seroba still maintained?
We had issues with the installation via conda and mamba.

#Standard installation from readme.md
conda install -c bioconda seroba
--> error with incompatible glibc versions.

#Standard installation from bioconda
mamba install seroba
--> seroba does not exist

Could not solve for environment specs
The following package could not be installed
└─ seroba does not exist (perhaps a typo or a missing channel).

#Installation with custom yml file with information from this post
#conda conda env create -f seroba.yml
--> worked, but after setting up databases, we got this error:

ERROR: I tried to get the version of nucmer with: "/mnt/localdata/homes/user/miniconda3/envs/seroba/bin/nucmer --version" and the output didn't match this regular expression: "^NUCmer (NUCleotide MUMmer) version ([0-9.]+)"
Something wrong with at least one dependency. Please see the above error message(s)
Traceback (most recent call last):
File "/mnt/localdata/homes/user/miniconda3/envs/seroba/bin/seroba", line 3, in
import seroba
File "/mnt/localdata/homes/user/miniconda3/envs/seroba/lib/python3.6/site-packages/seroba/init.py", line 16, in
from seroba import *
File "/mnt/localdata/homes/user/miniconda3/envs/seroba/lib/python3.6/site-packages/seroba/kmc.py", line 6, in
ext_progs = external_progs.ExternalProgs()
File "/mnt/localdata/homes/user/miniconda3/envs/seroba/lib/python3.6/site-packages/seroba/external_progs.py", line 90, in init
raise Error('Dependency error(s). Cannot continue')

We then removed line 15 and 27 in the script external_progs.py.

Now we can call seroba but not sure if it really works now.

Thanks for your time and input on this topic!

All the best,
Markus

install from source

Would it be possible to install seroBA from source?

Thanks

Job runs for 17+ hours without completion

Hi there,

I have run a few hundred samples through SeroBA v1.0.2 using the sangerpathogens/seroba docker image. Of these, 6 of them have failed to complete the job, producing the following the log files and continuing to run for 17+ hours before I aborted the job. QC of reads and assemblies from these samples have looked fine. Do you know what might be causing this issue and how we might avoid it?

Thanks in advance, Emma

Stage 1: 0% Stage 1: 26% Stage 1: 52% Stage 1: 78% Stage 1: 100%
Stage 2: 100%
1st stage: 3.11065s
2nd stage: 0.071989s
Total : 3.18264s
Tmp size : 0MB

Stats:
No. of k-mers below min. threshold : 0
No. of k-mers above max. threshold : 0
No. of unique k-mers : 0
No. of unique counted k-mers : 0
Total no. of k-mers : 0
Total no. of reads : 1476720
Total no. of super-k-mers : 0
in1: 0% in1: 0% in2: 0%

biopython issue!

i install seroba by conda and encounter this message.

$ seroba
Traceback (most recent call last):
File "/home/ctsui/.conda/envs/seroBA/bin/seroba", line 3, in
import seroba
File "/home/ctsui/.conda/envs/seroBA/lib/python3.6/site-packages/seroba/init.py", line 16, in
from seroba import *
File "/home/ctsui/.conda/envs/seroBA/lib/python3.6/site-packages/seroba/tasks/init.py", line 10, in
from seroba.tasks import *
File "/home/ctsui/.conda/envs/seroBA/lib/python3.6/site-packages/seroba/tasks/getPneumocat.py", line 2, in
from seroba import get_pneumocat_data
File "/home/ctsui/.conda/envs/seroBA/lib/python3.6/site-packages/seroba/get_pneumocat_data.py", line 6, in
from Bio.Alphabet import generic_dna
File "/home/ctsui/.conda/envs/seroBA/lib/python3.6/site-packages/Bio/Alphabet/init.py", line 21, in
"Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

Error:

Hi,
I have been using seroba v1.0.1 recently, but after changing to seroba v.1.0.2, i started having the same error of min memory must be at least 2GB. The same thing happened to me even after reverting back to v.1.0.1...

Error: Wrong parameret: min memory must be at least 2GB
/phe/tools/miniconda3/envs/phetype/bin/kmc_tools simple /scratch/iidlleo/Spneumoniae/temp.kmctmwj04if/2212515304 /phe/tools/seroba/database/kmer_db/35B/35B intersect /scratch/iidlleo/Spneumoniae/temp.kmctmwj04if/inter
Error: Cannot open file /scratch/iidlleo/Spneumoniae/temp.kmctmwj04if/2212515304.kmc_pre
Error: Cannot open file /scratch/iidlleo/Spneumoniae/temp.kmctmwj04if/inter.kmc_pre
Traceback (most recent call last):
  File "/phe/tools/miniconda3/envs/phetype/bin/seroba", line 86, in <module>
    args.func(args)
  File "/phe/tools/miniconda3/envs/phetype/lib/python3.6/site-packages/seroba/tasks/sero_run.py", line 19, in run
    sero.run()
  File "/phe/tools/miniconda3/envs/phetype/lib/python3.6/site-packages/seroba/serotyping.py", line 468, in run
    self._run_kmc()
  File "/phe/tools/miniconda3/envs/phetype/lib/python3.6/site-packages/seroba/serotyping.py", line 68, in _run_kmc
    with open( temp_hist, 'r') as fobj:
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/iidlleo/Spneumoniae/temp.kmctmwj04if/hist'

Is there something missing in my dependencies?

11A/11C Misidentification

Hello!

I work at the Minnesota Department of Health. Internally, have been using Seroba as a replacement for our conventional/molecular serotyping of Strep pneumo for a few months now.

We recently sequenced a handful of 11As (previously serotyped by quellung) that Seroba calls 11C. We’ve tried running Seroba in a number of different environments/containers and it is consistently predicting 11C. Manually mapping the reads the each cps loci seems to give better results for 11A than 11C as well.

Any idea what might be happening here? I'd be happy to share the sequence data privately.

Thanks!

nucmer dependency error

Hi,

I have installed KMC and Mummer. They are in the path and work. However, when I try to run seroba, it complains:

ERROR: I tried to get the version of nucmer with: "/apps/mummer/4.0.0.beta2/bin/nucmer --version" and the output didn't match this regular expression: "^NUCmer $NUCleotide MUMmer$ version ([0-9\.]+)"

When I run /apps/mummer/4.0.0.beta2/bin/nucmer --version I get 4.0.0beta2, which indeed does not match what sroba expects.

Seroba test error while installed with conda

I installed seroba in a new conda environment. While running test as mentioned, it failed. Attached is the full STDOUT for the test
seroba_error.txt

gzip: No such file or directory

Hey everybody,

because I couldn't fix the problem executing seroba installed from source, I tried to run it using the provided docker image.
Unfortunatly I am still not able to get it to work. I get the following error message. Can maybe someone help me with that?

sudo docker run --rm -it -v /home/user/Test:/data sangerpathogens/seroba seroba runSerotyping seroba/database '/home/user/RKI4410_S1_L001_R1.fastq.gz' '/home/user/RKI4410_S1_L001_R2.fastq.gz' '/home/user/Test/output_test' gzip: /home/user/RKI4410_S1_L001_R1.fastq.gz: No such file or directory Traceback (most recent call last): File "/usr/local/bin/seroba", line 4, in <module> __import__('pkg_resources').run_script('seroba==1.0.2', 'seroba') File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 666, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1469, in run_script exec(script_code, namespace, namespace) File "/usr/local/lib/python3.7/dist-packages/seroba-1.0.2-py3.7.egg/EGG-INFO/scripts/seroba", line 86, in <module> File "/usr/local/lib/python3.7/dist-packages/seroba-1.0.2-py3.7.egg/seroba/tasks/sero_run.py", line 19, in run File "/usr/local/lib/python3.7/dist-packages/seroba-1.0.2-py3.7.egg/seroba/serotyping.py", line 468, in run File "/usr/local/lib/python3.7/dist-packages/seroba-1.0.2-py3.7.egg/seroba/serotyping.py", line 60, in _run_kmc File "/usr/local/lib/python3.7/dist-packages/seroba-1.0.2-py3.7.egg/seroba/kmc.py", line 10, in run_kmc File "/usr/local/lib/python3.7/dist-packages/seroba-1.0.2-py3.7.egg/seroba/common.py", line 42, in detect_sequence_format File "/usr/lib/python3/dist-packages/pyfastaq/utils.py", line 15, in open_file_read raise Error("Error opening for reading gzipped file '" + filename + "'") pyfastaq.utils.Error: Error opening for reading gzipped file '/home/user/RKI4410_S1_L001_R1.fastq.gz'

Thank you very much in advance!

Karsten

Conda install 'createDBs' and 'getPneumocat' errors

Hello!

I was thinking about including the databases in the bioconda recipe, but I'm running into some errors.

I installed Seroba using mamba

mamba create -n test-seroba -c conda-forge -c bioconda seroba

I'm getting the following errors:

`createDBs`

seroba createDBs database 71
Traceback (most recent call last):
  File "/home/robert_petit/miniconda3/envs/test-seroba/bin/seroba", line 86, in <module>
    args.func(args)
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.8/site-packages/seroba/tasks/createDBs.py", line 10, in run
    ref_db.run()
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.8/site-packages/seroba/ref_db_creator.py", line 236, in run
    self.meta_dict = self._read_meta_data_tsv(self.meta_data_tsv)
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.8/site-packages/seroba/ref_db_creator.py", line 183, in _read_meta_data_tsv
    with open(meta_data_tsv,'r') as fobj:
FileNotFoundError: [Errno 2] No such file or directory: 'database/meta.tsv'

`getPneumocat`

seroba getPneumocat database
--2022-01-26 02:36:35--  https://github.com/phe-bioinformatics/PneumoCaT/archive/v1.1.tar.gz
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/phe-bioinformatics/PneumoCaT/tar.gz/v1.1 [following]
--2022-01-26 02:36:35--  https://codeload.github.com/phe-bioinformatics/PneumoCaT/tar.gz/v1.1
Resolving codeload.github.com (codeload.github.com)... 140.82.114.9
Connecting to codeload.github.com (codeload.github.com)|140.82.114.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘v1.1.tar.gz’

v1.1.tar.gz                                                                  [                                                                 <=>                                                                                                              ] 320.07M  24.1MB/s    in 13s

2022-01-26 02:36:48 (24.0 MB/s) - ‘v1.1.tar.gz’ saved [335618305]

Traceback (most recent call last):
  File "/home/robert_petit/miniconda3/envs/test-seroba/bin/seroba", line 86, in <module>
    args.func(args)
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.8/site-packages/seroba/tasks/getPneumocat.py", line 7, in run
    pneumo.run()
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.8/site-packages/seroba/get_pneumocat_data.py", line 105, in run
    self._pneumocat_db_2_tsv(self.serogroup_dir,self.out_file)
  File "/home/robert_petit/miniconda3/envs/test-seroba/lib/python3.8/site-packages/seroba/get_pneumocat_data.py", line 47, in _pneumocat_db_2_tsv
    allele_snp=yaml.load( open( os.path.join(serogroup_dir,subdir,'mutationdb.yml'), "rb" ) )
TypeError: load() missing 1 required positional argument: 'Loader'

Here's my conda env if you think it might be helpful

conda env export
name: test-seroba
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - _sysroot_linux-64_curr_repodata_hack=3=h5bd9786_13
  - ariba=2.14.6=py38hc37a69a_2
  - bcftools=1.14=hde04aa1_1
  - beautifulsoup4=4.10.0=pyha770c72_0
  - biopython=1.77=py38h1e0a361_1
  - bowtie2=2.2.5=py38h8c62d01_8
  - brotli=1.0.9=h7f98852_6
  - brotli-bin=1.0.9=h7f98852_6
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.18.1=h7f98852_0
  - ca-certificates=2021.10.26=h06a4308_2
  - cd-hit=4.8.1=h2e03b76_5
  - certifi=2021.10.8=py38h578d9bd_1
  - cycler=0.11.0=pyhd8ed1ab_0
  - dendropy=4.5.2=pyh3252c3a_0
  - fonttools=4.29.0=py38h497a2fe_0
  - freetype=2.11.0=h70c0345_0
  - gettext=0.21.0=hf68c758_0
  - giflib=5.2.1=h516909a_2
  - gsl=2.7=he838d99_0
  - htslib=1.14=h5138463_1
  - icu=69.1=h9c3ff4c_0
  - jpeg=9d=h516909a_0
  - kernel-headers_linux-64=3.10.0=h4a8ded7_13
  - kiwisolver=1.3.2=py38h1fd1430_1
  - kmc=3.2.1=h95f258a_1
  - krb5=1.19.2=hcc1bbae_3
  - lcms2=2.12=hddcbb42_0
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - libblas=3.9.0=13_linux64_openblas
  - libbrotlicommon=1.0.9=h7f98852_6
  - libbrotlidec=1.0.9=h7f98852_6
  - libbrotlienc=1.0.9=h7f98852_6
  - libcblas=3.9.0=13_linux64_openblas
  - libcurl=7.81.0=h2574ce0_0
  - libdeflate=1.9=h7f98852_0
  - libedit=3.1.20210714=h7f8727e_0
  - libev=4.33=h516909a_1
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=11.2.0=h1d223b6_12
  - libgfortran-ng=11.2.0=h69a702a_12
  - libgfortran5=11.2.0=h5c6108e_12
  - libgomp=11.2.0=h1d223b6_12
  - libiconv=1.16=h516909a_0
  - libidn2=2.3.2=h7f98852_0
  - liblapack=3.9.0=13_linux64_openblas
  - libnghttp2=1.46.0=hce63b2e_0
  - libnsl=2.0.0=h7f98852_0
  - libopenblas=0.3.18=pthreads_h8fe5266_0
  - libpng=1.6.37=hed695b0_2
  - libssh2=1.10.0=ha56f1ee_2
  - libstdcxx-ng=11.2.0=he4da1e4_12
  - libtiff=4.2.0=hf544144_3
  - libunistring=0.9.10=h14c3975_0
  - libwebp=1.2.0=h3452ae3_0
  - libwebp-base=1.2.0=h7f98852_2
  - libxml2=2.9.12=h885dcf4_1
  - libzlib=1.2.11=h36c2ea0_1013
  - llvm-openmp=8.0.1=hc9558a2_0
  - lz4-c=1.9.3=h9c3ff4c_1
  - matplotlib-base=3.5.1=py38hf4fb855_0
  - mummer=3.23=pl5321h1b792b2_13
  - munkres=1.1.4=pyh9f0ad1d_0
  - ncurses=6.2=h58526e2_4
  - numpy=1.22.1=py38h6ae9a64_0
  - olefile=0.46=pyh9f0ad1d_1
  - openmp=8.0.1=0
  - openssl=1.1.1m=h7f8727e_0
  - packaging=21.3=pyhd8ed1ab_0
  - perl=5.32.1=1_h7f98852_perl5
  - pillow=8.4.0=py38h5aabda8_0
  - pip=21.3.1=pyhd8ed1ab_0
  - pyfastaq=3.17.0=py_2
  - pymummer=0.10.3=py_2
  - pyparsing=3.0.7=pyhd8ed1ab_0
  - pysam=0.17.0=py38h104f7d5_1
  - python=3.8.12=hb7a2778_2_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python_abi=3.8=2_cp38
  - pyyaml=6.0=py38h497a2fe_3
  - readline=8.1=h46c0cb4_0
  - samtools=1.14=hb421002_0
  - seroba=1.0.2=py_0
  - setuptools=60.5.0=py38h578d9bd_0
  - six=1.16.0=pyh6c4a22f_0
  - soupsieve=2.3.1=pyhd8ed1ab_0
  - spades=3.15.3=h95f258a_1
  - sqlite=3.37.0=h9cd32fc_0
  - sysroot_linux-64=2.17=h4a8ded7_13
  - tk=8.6.11=h27826a3_1
  - unicodedata2=14.0.0=py38h497a2fe_0
  - wget=1.20.3=ha56f1ee_1
  - wheel=0.37.1=pyhd8ed1ab_0
  - xz=5.2.5=h516909a_1
  - yaml=0.2.5=h7f98852_2
  - zlib=1.2.11=h36c2ea0_1013
  - zstd=1.5.2=ha95c52a_0
prefix: /home/robert_petit/miniconda3/envs/test-seroba

Make a git tag for 1.0.2 ?

A git tag v1.0.2 && git push --tags would be great so we can fix/update the conda package.

	if mixed_serotype != None:
	for key in min_keys:
	print(key)
	print(mixed_serotype)
	if key not in mixed_serotype:
	mixed_serotype = None

sanger-pathogens / seroba Goto Github PK

seroba's Issues

createDBs

getPneumocat

Recommend Projects

Recommend Topics

Recommend Org

`createDBs`

`getPneumocat`