khyox / recentrifuge Goto Github PK

View Code? Open in Web Editor NEW

86.0 6.0 6.0 13.18 MB

Recentrifuge: robust comparative analysis and contamination removal for metagenomics

Home Page: http://www.recentrifuge.org

License: Other

Python 62.44% JavaScript 37.56%

metagenomics low-biomass centrifuge lmat clark kraken comparative-genomics ngs robustness contamination nanopore

recentrifuge's Issues

Test files?

Are there any test files that I can use to check if my installation is working? I tried using the files in the test/ folder, but I got an error.

Command:

./recentrifuge.py -f test/ctrl1.mck -f test/ctrl2.mck -f test/ctrl3.mck -f test/smpl1.mck -f test/smpl2.mck -f test/smpl3.mck

Output:


=-= ./recentrifuge.py =-= v0.21.1 - Sep 2018 =-= by Jose Manuel Martí =-=

Loading NCBI nodes... OK! 
Loading NCBI names... OK! 
Building dict of parent to children taxa... OK! 

Please, wait, processing files in parallel...

Error parsing line: (# Homo sapiens
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (9606	600
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (# Cutibacterium acnes
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (1747	250
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (# E. coli
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (562	50
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (# Zea mays
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (4577	25
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (# Triticum aestivum
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (4565	3
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (# Malassezia globosa CBS 7966
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (425265	25
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (# Pan troglodytes (chimpanzee) 
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (9598	25
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (# Propionibacterium phage SKKY
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (1655020	15
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (# Methanosarcina mazei
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (2209	5
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (# Lactobacillus (genus)
) in test/ctrl1.mck. Ignoring line!
Error parsing line: (1578	2
) in test/ctrl1.mck. Ignoring line!
Warning! test/ctrl1.mck seems truncated!
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/Users/sminot/Documents/GitHub/recentrifuge/recentrifuge/taxclass.py", line 74, in process_output
    log, stat, counts, scores = read_method(target_file, scoring, minscore)
  File "/Users/sminot/Documents/GitHub/recentrifuge/recentrifuge/centrifuge.py", line 127, in read_output
    + f'Cannot read any sequence from"{output_file}"')
Exception: 
ERROR! Cannot read any sequence from"test/ctrl1.mck"
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./recentrifuge.py", line 671, in <module>
    main()
  File "./recentrifuge.py", line 644, in main
    read_samples()
  File "./recentrifuge.py", line 373, in read_samples
    input_files, [r.get() for r in async_results]):
  File "./recentrifuge.py", line 373, in <listcomp>
    input_files, [r.get() for r in async_results]):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 608, in get
    raise self._value
Exception: 
ERROR! Cannot read any sequence from"test/ctrl1.mck"

These are probably just the wrong files to use. Can you provide an example command to process the test files in your installation documentation? Thanks!

Building nt database - no multithreading at centrifuge-build step

First thanks a lot for your detailed description of how to build the nt database using centrifuge (more detailed than at centrifuge's docs).

However, I cannot succeed with the last step. I got nt.fna nt.map the nt dusted fna etc. But even on a scaleway ubuntu 22.04 instance with 96 cores and 384 G RAM, I cannot get centrifuge-build into using more than just a single thread.

I tried to install centrifuge in any thinkable way, binaries, build from source, the generic ubuntu package (sudo apt-get install centrifuge) and run
make THREADS=96 nt (which gave me several 1.2 TB nt.* files) or directly centrifuge-build -p 96 ...
but it always ends up taking forever just to start and then find files with gaps and using just one core.

Finally, as I have seen that in many forums: Do you or anybody reading that have a resource for rather current (at least 2020 or later) nt indices (nt.1.cf, nt.2.cf, nt.3.cf). I guess many people struggle with that and not everybody needs p+h+v only but some other mammalian species. I have spent days (and fees) on scaleway/aws instances just to fail at the last step. I have used centrifuge successfully before and really liked it and would love to use recentrifuge afterwards for further analysis but without a recent nt index, that is impossible unfortunately.

Any suggestions?

Thanks a lot.

Krona generation fails: all seqs are orphans?

I am using kraken2 output in recentrifuge and am not getting the results I expected:

Command:
$ rcf -n ../taxdump/ --debug -k .

Input file: (see attached)
krakenout.krk.zip

Output:
=-= /gpfs0/home/gdlauberlab/rwr002/.local/bin/rcf =-= v0.28.8 - Apr 2019 =-= by Jose Manuel Martí =-=

INFO: Debugging mode activated
INFO: Active parameters:
nodespath = ../taxdump/
kraken = ['.']
extra = FULL
controls = 0
scoring = SHEL
summary = add
debug = True
Kraken .krk files to analyze: ['krakenout.krk']
Loading NCBI nodes... OK!
Loading NCBI names... OK!
Building dict of parent to children taxa... OK!

Please, wait, processing files in parallel...

Processing sample krakenout.krk ...
Loading output file krakenout.krk... OK!
Seqs read: 15_884 [25.51 Mnt]
Seqs clas: 15_884 (0.00% unclassified)
Seqs pass: 15_884 (0.00% rejected)
Scores SHEL: min = 35.0, max = 35.0, avr = 35.0
Coverage(%): min = 0.0, max = 0.0, avr = 0.0
Read length: min = 460 nt, max = 6.08 knt, avr = 1.60 knt
TaxIds: by classifier = 186, by filter = 186
Building from raw data with mintaxa = 4 ...
Building ontology tree with all-in-1... OK!
Check for more seqs lost ([in/ex]clude affects)...
Info: 15884 additional seqs discarded (100.000% of accepted)
Checking taxid loss (orphans)... Warning! Orphan taxid Streptococcus dysgalactiae subsp. equisimilis AC-2713 (taxid 759913)
Warning! Orphan taxid Shigella boydii (taxid 621)
Warning! Orphan taxid Staphylococcus epidermidis (taxid 1282)
Warning! Orphan taxid Pseudomonas koreensis (taxid 198620)
Warning! Orphan taxid Pseudomonas aeruginosa group (taxid 136841)
Warning! Orphan taxid Pseudomonas aeruginosa B136-33 (taxid 1280938)
Warning! Orphan taxid Lactobacillus (taxid 1578)
Warning! Orphan taxid Shigella flexneri 2002017 (taxid 591020)
Warning! Orphan taxid Streptococcus pyogenes MGAS15252 (taxid 798300)
Warning! Orphan taxid Acinetobacter sp. FDAARGOS_493 (taxid 2420300)
Warning! Orphan taxid Bacillus cereus group (taxid 86661)
Warning! Orphan taxid Enterobacter bugandensis (taxid 881260)
Warning! Orphan taxid Pseudomonadaceae (taxid 135621)
Warning! Orphan taxid Escherichia coli P12b (taxid 910348)
Warning! Orphan taxid Neisseria meningitidis M7124 (taxid 1095685)
Warning! Orphan taxid Clostridium botulinum (taxid 1491)
Warning! Orphan taxid Lactobacillus sp. HSLZ-75 (taxid 1720083)
Warning! Orphan taxid Acinetobacter radioresistens (taxid 40216)
Warning! Orphan taxid Pseudomonas fluorescens (taxid 294)
Warning! Orphan taxid Yersinia pestis Angola (taxid 349746)
Warning! Orphan taxid Escherichia coli ABU 83972 (taxid 655817)
Warning! Orphan taxid Staphylococcus sp. AntiMn-1 (taxid 1715860)
Warning! Orphan taxid Escherichia coli Nissle 1917 (taxid 316435)
Warning! Orphan taxid Lactobacillus delbrueckii subsp. delbrueckii (taxid 83684)
Warning! Orphan taxid Bacteria (taxid 2)
Warning! Orphan taxid Salmonella enterica subsp. enterica serovar Napoli (taxid 1151001)
Warning! Orphan taxid Clostridium perfringens str. 13 (taxid 195102)
Warning! Orphan taxid Streptococcus mutans (taxid 1309)
Warning! Orphan taxid Streptococcus iniae (taxid 1346)
Warning! Orphan taxid Clostridium baratii (taxid 1561)
Warning! Orphan taxid Edwardsiella ictaluri (taxid 67780)
Warning! Orphan taxid Bacillus thuringiensis serovar finitimus YBT-020 (taxid 930170)
Warning! Orphan taxid Lactobacillus delbrueckii subsp. lactis (taxid 29397)
Warning! Orphan taxid Bacteroides vulgatus (taxid 821)
Warning! Orphan taxid Salmonella enterica subsp. enterica serovar Montevideo str. USDA-ARS-USMARC-1901 (taxid 1454598)
Warning! Orphan taxid Acinetobacter baumannii TYTH-1 (taxid 1100841)
Warning! Orphan taxid Streptococcus suis (taxid 1307)
Warning! Orphan taxid Staphylococcus argenteus (taxid 985002)
Warning! Orphan taxid Actinomyces pacaensis (taxid 1852377)
Warning! Orphan taxid Bacillus cereus D17 (taxid 1454382)
Warning! Orphan taxid Pseudomonas protegens (taxid 380021)
Warning! Orphan taxid Klebsiella michiganensis (taxid 1134687)
Warning! Orphan taxid Escherichia coli UMNF18 (taxid 1050617)
Warning! Orphan taxid Staphylococcus haemolyticus (taxid 1283)
Warning! Orphan taxid Cutibacterium acnes (taxid 1747)
Warning! Orphan taxid Streptococcus parauberis NCFD 2020 (taxid 873447)
Warning! Orphan taxid Staphylococcus aureus (taxid 1280)
Warning! Orphan taxid Streptococcus pyogenes (taxid 1314)
Warning! Orphan taxid Enterococcaceae (taxid 81852)
Warning! Orphan taxid Pseudomonas (taxid 286)
Warning! Orphan taxid Clostridium beijerinckii NCIMB 8052 (taxid 290402)
Warning! Orphan taxid Staphylococcus aureus subsp. aureus TCH60 (taxid 548473)
Warning! Orphan taxid Acinetobacter calcoaceticus/baumannii complex (taxid 909768)
Warning! Orphan taxid Pseudomonas litoralis (taxid 797277)
Warning! Orphan taxid Lactobacillus reuteri (taxid 1598)
Warning! Orphan taxid Lelliottia amnigena (taxid 61646)
Warning! Orphan taxid Rhodobacter sphaeroides (taxid 1063)
Warning! Orphan taxid Escherichia coli O104:H4 (taxid 1038927)
Warning! Orphan taxid Acinetobacter sp. SWBY1 (taxid 2079596)
Warning! Orphan taxid Porphyromonas gingivalis (taxid 837)
Warning! Orphan taxid Lactobacillus helveticus (taxid 1587)
Warning! Orphan taxid Clostridium bornimense (taxid 1216932)
Warning! Orphan taxid Bacillus anthracis str. Vollum (taxid 261591)
Warning! Orphan taxid Streptococcus sp. FDAARGOS_522 (taxid 2420310)
Warning! Orphan taxid Streptococcus halotolerans (taxid 1814128)
Warning! Orphan taxid Clostridium autoethanogenum DSM 10061 (taxid 1341692)
Warning! Orphan taxid Pseudomonas aeruginosa LESB65 (taxid 1408273)
Warning! Orphan taxid Streptococcus (taxid 1301)
Warning! Orphan taxid Staphylococcus epidermidis ATCC 12228 (taxid 176280)
Warning! Orphan taxid Clostridium beijerinckii NRRL B-598 (taxid 1428454)
Warning! Orphan taxid Bifidobacterium adolescentis (taxid 1680)
Warning! Orphan taxid Escherichia coli 536 (taxid 362663)
Warning! Orphan taxid Streptococcus agalactiae GD201008-001 (taxid 1203670)
Warning! Orphan taxid Escherichia coli ACN001 (taxid 1311757)
Warning! Orphan taxid Salmonella enterica subsp. enterica serovar Senftenberg (taxid 28150)
Warning! Orphan taxid Bifidobacterium adolescentis ATCC 15703 (taxid 367928)
Warning! Orphan taxid Klebsiella pneumoniae (taxid 573)
Warning! Orphan taxid Staphylococcus simulans (taxid 1286)
Warning! Orphan taxid Candidatus Annandia adelgestsuga (taxid 1302411)
Warning! Orphan taxid Clostridium saccharobutylicum (taxid 169679)
Warning! Orphan taxid Streptococcus pneumoniae SPN032672 (taxid 869311)
Warning! Orphan taxid Clostridium sp. JN-9 (taxid 2507159)
Warning! Orphan taxid Acinetobacter pittii PHEA-2 (taxid 871585)
Warning! Orphan taxid Streptococcus suis T15 (taxid 1340847)
Warning! Orphan taxid Escherichia coli (taxid 562)
Warning! Orphan taxid Staphylococcus pasteuri (taxid 45972)
Warning! Orphan taxid Lactobacillus plantarum (taxid 1590)
Warning! Orphan taxid Salmonella enterica subsp. enterica (taxid 59201)
Warning! Orphan taxid Acinetobacter baumannii ZW85-1 (taxid 1400867)
Warning! Orphan taxid Salmonella enterica subsp. enterica serovar Typhimurium (taxid 90371)
Warning! Orphan taxid Acinetobacter baumannii PKAB07 (taxid 1413216)
Warning! Orphan taxid Acinetobacter baumannii (taxid 470)
Warning! Orphan taxid Streptococcus pluranimalium (taxid 82348)
Warning! Orphan taxid Pseudomonas aeruginosa LES431 (taxid 1408272)
Warning! Orphan taxid Streptococcus sp. Z15 (taxid 2173853)
Warning! Orphan taxid Rhodobacter sphaeroides 2.4.1 (taxid 272943)
Warning! Orphan taxid Shigella dysenteriae (taxid 622)
Warning! Orphan taxid Clostridium cellulovorans 743B (taxid 573061)
Warning! Orphan taxid Streptococcus equi (taxid 1336)
Warning! Orphan taxid Clostridium acetobutylicum (taxid 1488)
Warning! Orphan taxid Acinetobacter nosocomialis (taxid 106654)
Warning! Orphan taxid Streptococcus intermedius C270 (taxid 862966)
Warning! Orphan taxid Rhodobacter sphaeroides ATCC 17029 (taxid 349101)
Warning! Orphan taxid Aerococcus (taxid 1375)
Warning! Orphan taxid Enterobacteriaceae (taxid 543)
Warning! Orphan taxid Cronobacter sakazakii SP291 (taxid 956149)
Warning! Orphan taxid Acinetobacter oleivorans DR1 (taxid 436717)
Warning! Orphan taxid Enterococcus faecalis (taxid 1351)
Warning! Orphan taxid Clostridium sp. MF28 (taxid 1702238)
Warning! Orphan taxid Bacillus (taxid 1386)
Warning! Orphan taxid Neisseria meningitidis WUE 2594 (taxid 942513)
Warning! Orphan taxid Bacillus thuringiensis MC28 (taxid 1195464)
Warning! Orphan taxid Staphylococcus succinus (taxid 61015)
Warning! Orphan taxid Streptococcus dysgalactiae subsp. equisimilis (taxid 119602)
Warning! Orphan taxid Acinetobacter bereziniae (taxid 106648)
Warning! Orphan taxid Enterobacter cloacae (taxid 550)
Warning! Orphan taxid Streptococcus pyogenes str. Manfredo (taxid 160491)
Warning! Orphan taxid Acinetobacter venetianus VE-C3 (taxid 1197884)
Warning! Orphan taxid Streptococcus sp. FDAARGOS_520 (taxid 2420308)
Warning! Orphan taxid Acinetobacter (taxid 469)
Warning! Orphan taxid Staphylococcus schleiferi (taxid 1295)
Warning! Orphan taxid Salmonella enterica subsp. enterica serovar Bareilly str. CFSAN000191 (taxid 1173457)
Warning! Orphan taxid Escherichia coli LY180 (taxid 1335916)
Warning! Orphan taxid Neisseria meningitidis M04-240196 (taxid 935593)
Warning! Orphan taxid Bacillus cereus (taxid 1396)
Warning! Orphan taxid Lactobacillus gasseri ATCC 33323 = JCM 1131 (taxid 324831)
Warning! Orphan taxid Bacillus anthracis (taxid 1392)
Warning! Orphan taxid Lactobacillus lindneri (taxid 53444)
Warning! Orphan taxid Enterobacter asburiae (taxid 61645)
Warning! Orphan taxid Bacillus thuringiensis (taxid 1428)
Warning! Orphan taxid Clostridium beijerinckii (taxid 1520)
Warning! Orphan taxid Clostridium sp. CT4 (taxid 2320868)
Warning! Orphan taxid Clostridium estertheticum subsp. estertheticum (taxid 1552)
Warning! Orphan taxid Staphylococcus aureus subsp. aureus NCTC 8325 (taxid 93061)
Warning! Orphan taxid Staphylococcus auricularis (taxid 29379)
Warning! Orphan taxid Staphylococcus caprae (taxid 29380)
Warning! Orphan taxid Staphylococcus aureus 08BA02176 (taxid 1229492)
Warning! Orphan taxid Staphylococcus muscae (taxid 1294)
Warning! Orphan taxid Bacteroides vulgatus ATCC 8482 (taxid 435590)
Warning! Orphan taxid Helicobacter pylori OK113 (taxid 1248725)
Warning! Orphan taxid Deinococcus radiodurans R1 (taxid 243230)
Warning! Orphan taxid Enterobacterales (taxid 91347)
Warning! Orphan taxid Pseudomonas entomophila (taxid 312306)
Warning! Orphan taxid Clostridium perfringens (taxid 1502)
Warning! Orphan taxid Clostridium drakei (taxid 332101)
Warning! Orphan taxid Escherichia coli APEC IMT5155 (taxid 1329907)
Warning! Orphan taxid Helicobacter pylori (taxid 210)
Warning! Orphan taxid Clostridium argentinense (taxid 29341)
Warning! Orphan taxid Staphylococcus (taxid 1279)
Warning! Orphan taxid Pseudomonas putida (taxid 303)
Warning! Orphan taxid Staphylococcus capitis (taxid 29388)
Warning! Orphan taxid Acinetobacter haemolyticus (taxid 29430)
Warning! Orphan taxid Escherichia coli APEC O1 (taxid 405955)
Warning! Orphan taxid Escherichia coli O104:H4 str. 2011C-3493 (taxid 1133852)
Warning! Orphan taxid Pseudomonas aeruginosa PA7 (taxid 381754)
Warning! Orphan taxid Lactobacillus helveticus H10 (taxid 767462)
Warning! Orphan taxid Rhodobacter (taxid 1060)
Warning! Orphan taxid Staphylococcus condimenti (taxid 70255)
Warning! Orphan taxid Enterococcus faecalis ARO1/DG (taxid 565651)
Warning! Orphan taxid Acinetobacter johnsonii (taxid 40214)
Warning! Orphan taxid Pseudomonas chlororaphis (taxid 587753)
Warning! Orphan taxid Streptococcus agalactiae ILRI005 (taxid 1309807)
Warning! Orphan taxid Bacillus thuringiensis serovar kurstaki str. HD-1 (taxid 1261129)
Warning! Orphan taxid Staphylococcus aureus subsp. aureus MSSA476 (taxid 282459)
Warning! Orphan taxid Salmonella enterica subsp. enterica serovar Heidelberg str. B182 (taxid 1160717)
Warning! Orphan taxid Bacillus cereus biovar anthracis str. CI (taxid 637380)
Warning! Orphan taxid Staphylococcus aureus subsp. aureus (taxid 46170)
Warning! Orphan taxid Streptococcus mutans UA159 (taxid 210007)
Warning! Orphan taxid Streptococcus agalactiae (taxid 1311)
Warning! Orphan taxid Enterobacter ludwigii (taxid 299767)
Warning! Orphan taxid Lactobacillus johnsonii (taxid 33959)
Warning! Orphan taxid Acinetobacter baumannii BJAB0868 (taxid 1096997)
Warning! Orphan taxid Clostridium (taxid 1485)
Warning! Orphan taxid Staphylococcus stepanovicii (taxid 643214)
Warning! Orphan taxid Morganella morganii (taxid 582)
Warning! Orphan taxid Lactobacillus murinus (taxid 1622)
Warning! Orphan taxid Pseudomonas xinjiangensis (taxid 487184)
Warning! Orphan taxid Escherichia coli O157:H7 str. EDL933 (taxid 155864)
Warning! Orphan taxid Pseudomonas aeruginosa (taxid 287)
Warning! Orphan taxid Staphylococcus lugdunensis (taxid 28035)
Warning! Orphan taxid Bacillus cereus ATCC 10987 (taxid 222523)
Warning! Orphan taxid Neisseria meningitidis (taxid 487)
Warning! Orphan taxid Bacillus velezensis (taxid 492670)
Warning! Orphan taxid Bacilli (taxid 91061)
Warning! Orphan taxid Salmonella enterica (taxid 28901)
Warning! Orphan taxid Staphylococcus nepalensis (taxid 214473)
WARNING! 186 orphan taxids (100.00% of accepted)
and 15884 orphan sequences (100.000% of accepted)
Assess accumulation due to "folding the tree"...
Info: Folded TaxID Bacteroides vulgatus ATCC 8482 (taxid 435590) (Unnamed) with 2501 original seqs
Info: Folded TaxID Cutibacterium acnes (taxid 1747) (Unnamed) with 652 original seqs
Info: Folded TaxID Lactobacillus gasseri ATCC 33323 = JCM 1131 (taxid 324831) (Unnamed) with 617 original seqs
Info: Folded TaxID Porphyromonas gingivalis (taxid 837) (Unnamed) with 1707 original seqs
Info: Folded TaxID Streptococcus (taxid 1301) (Unnamed) with 547 original seqs
Info: Folded TaxID Clostridium beijerinckii NCIMB 8052 (taxid 290402) (Unnamed) with 298 original seqs
Info: Folded TaxID Escherichia coli (taxid 562) (Unnamed) with 491 original seqs
Info: Folded TaxID Deinococcus radiodurans R1 (taxid 243230) (Unnamed) with 792 original seqs
Info: Folded TaxID Streptococcus mutans (taxid 1309) (Unnamed) with 429 original seqs
Info: Folded TaxID Staphylococcus pasteuri (taxid 45972) (Unnamed) with 73 original seqs
Info: Folded TaxID Acinetobacter (taxid 469) (Unnamed) with 942 original seqs
Info: Folded TaxID Neisseria meningitidis (taxid 487) (Unnamed) with 1047 original seqs
Info: Folded TaxID Clostridium beijerinckii (taxid 1520) (Unnamed) with 206 original seqs
Info: Folded TaxID Rhodobacter sphaeroides (taxid 1063) (Unnamed) with 523 original seqs
Info: Folded TaxID Helicobacter pylori (taxid 210) (Unnamed) with 515 original seqs
Info: Folded TaxID Staphylococcus aureus (taxid 1280) (Unnamed) with 300 original seqs
Info: Folded TaxID Bifidobacterium adolescentis (taxid 1680) (Unnamed) with 338 original seqs
Info: Folded TaxID Acinetobacter baumannii (taxid 470) (Unnamed) with 58 original seqs
Info: Folded TaxID Rhodobacter (taxid 1060) (Unnamed) with 12 original seqs
Info: Folded TaxID Clostridium botulinum (taxid 1491) (Unnamed) with 18 original seqs
Info: Folded TaxID Bacteroides vulgatus (taxid 821) (Unnamed) with 840 original seqs
Info: Folded TaxID Salmonella enterica subsp. enterica serovar Typhimurium (taxid 90371) (Unnamed) with 2 original seqs
Info: Folded TaxID Enterococcus faecalis ARO1/DG (taxid 565651) (Unnamed) with 6 original seqs
Info: Folded TaxID Clostridium perfringens (taxid 1502) (Unnamed) with 5 original seqs
Info: Folded TaxID Staphylococcus (taxid 1279) (Unnamed) with 339 original seqs
Info: Folded TaxID Acinetobacter nosocomialis (taxid 106654) (Unnamed) with 1 original seqs
Info: Folded TaxID Lactobacillus (taxid 1578) (Unnamed) with 174 original seqs
Info: Folded TaxID Pseudomonas (taxid 286) (Unnamed) with 299 original seqs
Info: Folded TaxID Streptococcus mutans UA159 (taxid 210007) (Unnamed) with 209 original seqs
Info: Folded TaxID Enterobacteriaceae (taxid 543) (Unnamed) with 366 original seqs
Info: Folded TaxID Staphylococcus epidermidis ATCC 12228 (taxid 176280) (Unnamed) with 126 original seqs
Info: Folded TaxID Acinetobacter venetianus VE-C3 (taxid 1197884) (Unnamed) with 26 original seqs
Info: Folded TaxID Staphylococcus succinus (taxid 61015) (Unnamed) with 16 original seqs
Info: Folded TaxID Salmonella enterica subsp. enterica serovar Bareilly str. CFSAN000191 (taxid 1173457) (Unnamed) with 6 original seqs
Info: Folded TaxID Clostridium (taxid 1485) (Unnamed) with 241 original seqs
Info: Folded TaxID Streptococcus agalactiae ILRI005 (taxid 1309807) (Unnamed) with 2 original seqs
Info: Folded TaxID Bifidobacterium adolescentis ATCC 15703 (taxid 367928) (Unnamed) with 51 original seqs
Info: Folded TaxID Staphylococcus aureus subsp. aureus (taxid 46170) (Unnamed) with 158 original seqs
Info: Folded TaxID Bacillus cereus (taxid 1396) (Unnamed) with 3 original seqs
Info: Folded TaxID Enterococcus faecalis (taxid 1351) (Unnamed) with 184 original seqs
Info: Folded TaxID Bacillus (taxid 1386) (Unnamed) with 51 original seqs
Info: Folded TaxID Staphylococcus capitis (taxid 29388) (Unnamed) with 4 original seqs
Info: Folded TaxID Staphylococcus argenteus (taxid 985002) (Unnamed) with 1 original seqs
Info: Folded TaxID Escherichia coli ACN001 (taxid 1311757) (Unnamed) with 22 original seqs
Info: Folded TaxID Klebsiella pneumoniae (taxid 573) (Unnamed) with 8 original seqs
Info: Folded TaxID Escherichia coli LY180 (taxid 1335916) (Unnamed) with 15 original seqs
Info: Folded TaxID Escherichia coli O157:H7 str. EDL933 (taxid 155864) (Unnamed) with 55 original seqs
Info: Folded TaxID Staphylococcus epidermidis (taxid 1282) (Unnamed) with 88 original seqs
Info: Folded TaxID Actinomyces pacaensis (taxid 1852377) (Unnamed) with 39 original seqs
Info: Folded TaxID Rhodobacter sphaeroides ATCC 17029 (taxid 349101) (Unnamed) with 3 original seqs
Info: Folded TaxID Clostridium autoethanogenum DSM 10061 (taxid 1341692) (Unnamed) with 3 original seqs
Info: Folded TaxID Enterobacterales (taxid 91347) (Unnamed) with 35 original seqs
Info: Folded TaxID Streptococcus pyogenes (taxid 1314) (Unnamed) with 21 original seqs
Info: Folded TaxID Escherichia coli APEC O1 (taxid 405955) (Unnamed) with 3 original seqs
Info: Folded TaxID Pseudomonas aeruginosa (taxid 287) (Unnamed) with 91 original seqs
Info: Folded TaxID Clostridium sp. MF28 (taxid 1702238) (Unnamed) with 14 original seqs
Info: Folded TaxID Staphylococcus sp. AntiMn-1 (taxid 1715860) (Unnamed) with 18 original seqs
Info: Folded TaxID Staphylococcus caprae (taxid 29380) (Unnamed) with 11 original seqs
Info: Folded TaxID Shigella flexneri 2002017 (taxid 591020) (Unnamed) with 14 original seqs
Info: Folded TaxID Streptococcus agalactiae (taxid 1311) (Unnamed) with 7 original seqs
Info: Folded TaxID Neisseria meningitidis WUE 2594 (taxid 942513) (Unnamed) with 3 original seqs
Info: Folded TaxID Streptococcus suis T15 (taxid 1340847) (Unnamed) with 1 original seqs
Info: Folded TaxID Lactobacillus murinus (taxid 1622) (Unnamed) with 1 original seqs
Info: Folded TaxID Lactobacillus lindneri (taxid 53444) (Unnamed) with 1 original seqs
Info: Folded TaxID Streptococcus sp. FDAARGOS_522 (taxid 2420310) (Unnamed) with 10 original seqs
Info: Folded TaxID Lactobacillus helveticus (taxid 1587) (Unnamed) with 1 original seqs
Info: Folded TaxID Staphylococcus stepanovicii (taxid 643214) (Unnamed) with 5 original seqs
Info: Folded TaxID Staphylococcus aureus subsp. aureus MSSA476 (taxid 282459) (Unnamed) with 1 original seqs
Info: Folded TaxID Staphylococcus aureus 08BA02176 (taxid 1229492) (Unnamed) with 3 original seqs
Info: Folded TaxID Streptococcus sp. FDAARGOS_520 (taxid 2420308) (Unnamed) with 2 original seqs
Info: Folded TaxID Streptococcus pluranimalium (taxid 82348) (Unnamed) with 4 original seqs
Info: Folded TaxID Escherichia coli O104:H4 (taxid 1038927) (Unnamed) with 7 original seqs
Info: Folded TaxID Clostridium bornimense (taxid 1216932) (Unnamed) with 5 original seqs
Info: Folded TaxID Acinetobacter radioresistens (taxid 40216) (Unnamed) with 2 original seqs
Info: Folded TaxID Bacillus velezensis (taxid 492670) (Unnamed) with 5 original seqs
Info: Folded TaxID Acinetobacter johnsonii (taxid 40214) (Unnamed) with 7 original seqs
Info: Folded TaxID Staphylococcus condimenti (taxid 70255) (Unnamed) with 2 original seqs
Info: Folded TaxID Staphylococcus auricularis (taxid 29379) (Unnamed) with 3 original seqs
Info: Folded TaxID Acinetobacter calcoaceticus/baumannii complex (taxid 909768) (Unnamed) with 16 original seqs
Info: Folded TaxID Streptococcus sp. Z15 (taxid 2173853) (Unnamed) with 1 original seqs
Info: Folded TaxID Clostridium argentinense (taxid 29341) (Unnamed) with 6 original seqs
Info: Folded TaxID Enterococcaceae (taxid 81852) (Unnamed) with 1 original seqs
Info: Folded TaxID Streptococcus parauberis NCFD 2020 (taxid 873447) (Unnamed) with 5 original seqs
Info: Folded TaxID Pseudomonas koreensis (taxid 198620) (Unnamed) with 8 original seqs
Info: Folded TaxID Pseudomonas aeruginosa group (taxid 136841) (Unnamed) with 2 original seqs
Info: Folded TaxID Pseudomonas aeruginosa LES431 (taxid 1408272) (Unnamed) with 2 original seqs
Info: Folded TaxID Clostridium beijerinckii NRRL B-598 (taxid 1428454) (Unnamed) with 1 original seqs
Info: Folded TaxID Pseudomonas chlororaphis (taxid 587753) (Unnamed) with 2 original seqs
Info: Folded TaxID Enterobacter bugandensis (taxid 881260) (Unnamed) with 3 original seqs
Info: Folded TaxID Streptococcus iniae (taxid 1346) (Unnamed) with 2 original seqs
Info: Folded TaxID Clostridium cellulovorans 743B (taxid 573061) (Unnamed) with 4 original seqs
Info: Folded TaxID Staphylococcus lugdunensis (taxid 28035) (Unnamed) with 2 original seqs
Info: Folded TaxID Enterobacter ludwigii (taxid 299767) (Unnamed) with 1 original seqs
Info: Folded TaxID Bacillus anthracis str. Vollum (taxid 261591) (Unnamed) with 1 original seqs
Info: Folded TaxID Bacillus cereus D17 (taxid 1454382) (Unnamed) with 2 original seqs
Info: Folded TaxID Pseudomonas entomophila (taxid 312306) (Unnamed) with 3 original seqs
Info: Folded TaxID Pseudomonas protegens (taxid 380021) (Unnamed) with 1 original seqs
Info: Folded TaxID Neisseria meningitidis M7124 (taxid 1095685) (Unnamed) with 3 original seqs
Info: Folded TaxID Staphylococcus aureus subsp. aureus NCTC 8325 (taxid 93061) (Unnamed) with 3 original seqs
Info: Folded TaxID Streptococcus pneumoniae SPN032672 (taxid 869311) (Unnamed) with 2 original seqs
Info: Folded TaxID Staphylococcus muscae (taxid 1294) (Unnamed) with 1 original seqs
Info: Folded TaxID Escherichia coli O104:H4 str. 2011C-3493 (taxid 1133852) (Unnamed) with 5 original seqs
Info: Folded TaxID Salmonella enterica (taxid 28901) (Unnamed) with 1 original seqs
Info: Folded TaxID Lactobacillus johnsonii (taxid 33959) (Unnamed) with 2 original seqs
Info: Folded TaxID Clostridium sp. CT4 (taxid 2320868) (Unnamed) with 4 original seqs
Info: Folded TaxID Bacillus thuringiensis MC28 (taxid 1195464) (Unnamed) with 2 original seqs
Info: Folded TaxID Staphylococcus schleiferi (taxid 1295) (Unnamed) with 1 original seqs
Info: Folded TaxID Lelliottia amnigena (taxid 61646) (Unnamed) with 1 original seqs
Info: Folded TaxID Clostridium perfringens str. 13 (taxid 195102) (Unnamed) with 1 original seqs
Info: Folded TaxID Neisseria meningitidis M04-240196 (taxid 935593) (Unnamed) with 2 original seqs
Info: Folded TaxID Cronobacter sakazakii SP291 (taxid 956149) (Unnamed) with 2 original seqs
Info: Folded TaxID Clostridium sp. JN-9 (taxid 2507159) (Unnamed) with 1 original seqs
Info: Folded TaxID Acinetobacter sp. FDAARGOS_493 (taxid 2420300) (Unnamed) with 1 original seqs
Info: Folded TaxID Bacillus anthracis (taxid 1392) (Unnamed) with 4 original seqs
Info: Folded TaxID Morganella morganii (taxid 582) (Unnamed) with 1 original seqs
Info: Folded TaxID Salmonella enterica subsp. enterica serovar Senftenberg (taxid 28150) (Unnamed) with 2 original seqs
Info: Folded TaxID Pseudomonas litoralis (taxid 797277) (Unnamed) with 1 original seqs
Info: Folded TaxID Acinetobacter haemolyticus (taxid 29430) (Unnamed) with 2 original seqs
Info: Folded TaxID Pseudomonas fluorescens (taxid 294) (Unnamed) with 2 original seqs
Info: Folded TaxID Clostridium acetobutylicum (taxid 1488) (Unnamed) with 1 original seqs
Info: Folded TaxID Clostridium baratii (taxid 1561) (Unnamed) with 1 original seqs
Info: Folded TaxID Staphylococcus aureus subsp. aureus TCH60 (taxid 548473) (Unnamed) with 1 original seqs
Info: Folded TaxID Acinetobacter baumannii PKAB07 (taxid 1413216) (Unnamed) with 1 original seqs
Info: Folded TaxID Pseudomonas aeruginosa PA7 (taxid 381754) (Unnamed) with 3 original seqs
Info: Folded TaxID Clostridium drakei (taxid 332101) (Unnamed) with 2 original seqs
Info: Folded TaxID Shigella dysenteriae (taxid 622) (Unnamed) with 3 original seqs
Info: Folded TaxID Acinetobacter bereziniae (taxid 106648) (Unnamed) with 1 original seqs
Info: Folded TaxID Edwardsiella ictaluri (taxid 67780) (Unnamed) with 1 original seqs
Info: Folded TaxID Acinetobacter oleivorans DR1 (taxid 436717) (Unnamed) with 2 original seqs
Info: Folded TaxID Streptococcus intermedius C270 (taxid 862966) (Unnamed) with 3 original seqs
Info: Folded TaxID Bacteria (taxid 2) (Unnamed) with 2 original seqs
Info: Folded TaxID Escherichia coli UMNF18 (taxid 1050617) (Unnamed) with 1 original seqs
Info: Folded TaxID Bacillus thuringiensis (taxid 1428) (Unnamed) with 2 original seqs
Info: Folded TaxID Bacillus thuringiensis serovar finitimus YBT-020 (taxid 930170) (Unnamed) with 1 original seqs
Info: Folded TaxID Lactobacillus reuteri (taxid 1598) (Unnamed) with 1 original seqs
Info: Folded TaxID Shigella boydii (taxid 621) (Unnamed) with 2 original seqs
Info: Folded TaxID Staphylococcus haemolyticus (taxid 1283) (Unnamed) with 2 original seqs
Info: Folded TaxID Acinetobacter pittii PHEA-2 (taxid 871585) (Unnamed) with 1 original seqs
Info: Folded TaxID Streptococcus agalactiae GD201008-001 (taxid 1203670) (Unnamed) with 1 original seqs
Info: Folded TaxID Escherichia coli 536 (taxid 362663) (Unnamed) with 1 original seqs
Info: Folded TaxID Enterobacter asburiae (taxid 61645) (Unnamed) with 1 original seqs
Info: Folded TaxID Clostridium saccharobutylicum (taxid 169679) (Unnamed) with 6 original seqs
Info: Folded TaxID Streptococcus pyogenes MGAS15252 (taxid 798300) (Unnamed) with 2 original seqs
Info: Folded TaxID Clostridium estertheticum subsp. estertheticum (taxid 1552) (Unnamed) with 1 original seqs
Info: Folded TaxID Aerococcus (taxid 1375) (Unnamed) with 1 original seqs
Info: Folded TaxID Pseudomonadaceae (taxid 135621) (Unnamed) with 1 original seqs
Info: Folded TaxID Escherichia coli P12b (taxid 910348) (Unnamed) with 1 original seqs
Info: Folded TaxID Pseudomonas aeruginosa B136-33 (taxid 1280938) (Unnamed) with 2 original seqs
Info: Folded TaxID Streptococcus pyogenes str. Manfredo (taxid 160491) (Unnamed) with 1 original seqs
Info: Folded TaxID Pseudomonas aeruginosa LESB65 (taxid 1408273) (Unnamed) with 1 original seqs
Info: Folded TaxID Staphylococcus simulans (taxid 1286) (Unnamed) with 2 original seqs
Info: Folded TaxID Salmonella enterica subsp. enterica serovar Napoli (taxid 1151001) (Unnamed) with 1 original seqs
Info: Folded TaxID Lactobacillus delbrueckii subsp. lactis (taxid 29397) (Unnamed) with 1 original seqs
Info: Folded TaxID Bacillus cereus biovar anthracis str. CI (taxid 637380) (Unnamed) with 1 original seqs
Info: Folded TaxID Acinetobacter baumannii TYTH-1 (taxid 1100841) (Unnamed) with 1 original seqs
Info: Folded TaxID Lactobacillus plantarum (taxid 1590) (Unnamed) with 2 original seqs
Info: Folded TaxID Salmonella enterica subsp. enterica (taxid 59201) (Unnamed) with 3 original seqs
Info: Folded TaxID Escherichia coli APEC IMT5155 (taxid 1329907) (Unnamed) with 1 original seqs
Info: Folded TaxID Pseudomonas xinjiangensis (taxid 487184) (Unnamed) with 1 original seqs
Info: Folded TaxID Yersinia pestis Angola (taxid 349746) (Unnamed) with 1 original seqs
Info: Folded TaxID Escherichia coli Nissle 1917 (taxid 316435) (Unnamed) with 1 original seqs
Info: Folded TaxID Rhodobacter sphaeroides 2.4.1 (taxid 272943) (Unnamed) with 1 original seqs
Info: Folded TaxID Streptococcus halotolerans (taxid 1814128) (Unnamed) with 1 original seqs
Info: Folded TaxID Helicobacter pylori OK113 (taxid 1248725) (Unnamed) with 1 original seqs
Info: Folded TaxID Acinetobacter baumannii BJAB0868 (taxid 1096997) (Unnamed) with 1 original seqs
Info: Folded TaxID Lactobacillus helveticus H10 (taxid 767462) (Unnamed) with 1 original seqs
Info: Folded TaxID Bacillus cereus ATCC 10987 (taxid 222523) (Unnamed) with 1 original seqs
Info: Folded TaxID Salmonella enterica subsp. enterica serovar Montevideo str. USDA-ARS-USMARC-1901 (taxid 1454598) (Unnamed) with 1 original seqs
Info: Folded TaxID Enterobacter cloacae (taxid 550) (Unnamed) with 1 original seqs
Info: Folded TaxID Acinetobacter baumannii ZW85-1 (taxid 1400867) (Unnamed) with 1 original seqs
Info: Folded TaxID Streptococcus dysgalactiae subsp. equisimilis AC-2713 (taxid 759913) (Unnamed) with 1 original seqs
Info: Folded TaxID Escherichia coli ABU 83972 (taxid 655817) (Unnamed) with 1 original seqs
Info: Folded TaxID Staphylococcus nepalensis (taxid 214473) (Unnamed) with 1 original seqs
Info: Folded TaxID Klebsiella michiganensis (taxid 1134687) (Unnamed) with 1 original seqs
Info: Folded TaxID Streptococcus dysgalactiae subsp. equisimilis (taxid 119602) (Unnamed) with 2 original seqs
Info: Folded TaxID Lactobacillus sp. HSLZ-75 (taxid 1720083) (Unnamed) with 1 original seqs
Info: Folded TaxID Candidatus Annandia adelgestsuga (taxid 1302411) (Unnamed) with 1 original seqs
Info: Folded TaxID Bacilli (taxid 91061) (Unnamed) with 1 original seqs
Info: Folded TaxID Pseudomonas putida (taxid 303) (Unnamed) with 1 original seqs
Info: Folded TaxID Acinetobacter sp. SWBY1 (taxid 2079596) (Unnamed) with 1 original seqs
Info: Folded TaxID Streptococcus equi (taxid 1336) (Unnamed) with 1 original seqs
Info: Folded TaxID Salmonella enterica subsp. enterica serovar Heidelberg str. B182 (taxid 1160717) (Unnamed) with 1 original seqs
Info: Folded TaxID Lactobacillus delbrueckii subsp. delbrueckii (taxid 83684) (Unnamed) with 1 original seqs
Info: Folded TaxID Bacillus cereus group (taxid 86661) (Unnamed) with 1 original seqs
Info: Folded TaxID Streptococcus suis (taxid 1307) (Unnamed) with 1 original seqs
Info: Folded TaxID Bacillus thuringiensis serovar kurstaki str. HD-1 (taxid 1261129) (Unnamed) with 1 original seqs
INFO: 186 TaxIDs folded (100.00% of TAF —TaxIDs after filtering—)
INFO: Final assigned TaxIDs: 0 (reduced to 0.00% of number of TAF)
krakenout sample VOID
Load elapsed time: 2.69 sec

Building the taxonomy multiple tree... Traceback (most recent call last):
File "/gpfs0/home/gdlauberlab/rwr002/.local/bin/rcf", line 812, in
main()
File "/gpfs0/home/gdlauberlab/rwr002/.local/bin/rcf", line 792, in main
generate_krona()
File "/gpfs0/home/gdlauberlab/rwr002/.local/bin/rcf", line 570, in generate_krona
for sample in samples
ValueError: min() arg is an empty sequence

parameter -y/--minscore in rextract

By the way, do you have an idea what score (-y) I should feed to rextract for filtering centrifuge results? I have 150bp fastq files from illumina sequencing.

My recommendation is to avoid too low minscore (-y flag also) values to filter sequences with low scores. Also, if you have control sequences, you may want to lower ctrlminscore (-z flag also) to have more sequences in the controls and thus more sequences removed after the robust control removal algorithm. So, --minscore 35 and --ctrlminscore 25 could be good values to start with.

Originally posted by @khyox in #30 (comment)

rextract: zipped fastq files

Hi, is it possible to add an option for taking zipped fastq files and outputing zipped files as well. This is useful when the data is too large. Thanks!!

export .png?

Hi Jose,
I was wondering if I had missed the direction somewhere in the Wiki, but is there a way to generate a .png (or .jpeg, or .pdf) of the .html file directly without having to open up the browser and selecting "snapshot"? I was hoping to generate a high quality image from the command line, though obviously it's much cooler to be able to manipulate the object and take a series of photos.
Related - I can't seem to even get the snapshot feature to work. I've tried from your example page but all it does is move me to a new, empty tab in the same browser.
Thanks

Rextract from Kraken2 output

Question

I tried to run rextract with a kraken2 output but I get the following warning message: No matching read found! Exiting.
Is there a way to extract reads from kraken2 output or it's just made to run with Centrifuge output files??

what the unassgined means?

Hi,
Thanks for your works.
Could you please tell me what the unassigned means? I noticed that the example you used on the github showed the number of count were equal to that of unassigned. Does it stand all the reads classified under that taxid were multiple alignments?

Data from article "Recentrifuge: Robust comparative analysis and contamination removal for metagenomics"

Hi !
I'm very interested about Recentrifuge, and I would have liked to test on it the data of the article "Recentrifuge: Robust comparative analysis and contamination removal for metagenomics". But, I noticed that these were not present on the GitHub, and that the "http://som1.uv.es/plasmaCFS" link in the article was not functional anymore.
So, I would like to know if it was possible to have access to these data with an updated link?
I thank you in advance!

divided by zero error - fault by using krakenuniq?

Bug report

Bug summary

I am trying to run "rextract" on my data and it always gets a "division by zero" error.

Running Recentrifuge

Command line

rextract command

/mnt/sfb900nfs/groups/tuemmler/erik/rare_validation/recentrifuge/recentrifuge/rextract -f ${file}_output.txt -i 28132 -q $file -n /mnt/sfb900nfs/groups/tuemmler/erik/rare_validation/recentrifuge/recentrifuge/taxdump

rcf command

/mnt/sfb900nfs/groups/tuemmler/erik/rare_validation/recentrifuge/recentrifuge/rcf -k $input -n db/taxonomy/

Data

I have used the output generated by krakenuniq. It has failed with all my used files.

Actual outcome

=-= /mnt/sfb900nfs/groups/tuemmler/erik/rare_validation/recentrifuge/recentrifuge/rextract =-= v0.28.13 - October 2019 =-= by Jose Manuel Martí =-=

Loading NCBI nodes... OK! 
Loading NCBI names... OK! 
Building dict of parent to children taxa... OK! 
List of taxa (and below) to be explicitly included:
		Id	Scientific Name
		28132	Prevotella melaninogenica
Building taxonomy tree... OK!
Filtering taxa... OK!
  5 taxid selected in 2 different taxonomical levels:
  Number of different SPECIES: 1
  Number of different NO_RANK: 4
Loading output file TrackCF_01_S1_R1.fastq_output.txt... OK!
  Load elapsed time: 0.33 sec
Traceback (most recent call last):
  File "/mnt/sfb900nfs/groups/tuemmler/erik/rare_validation/recentrifuge/recentrifuge/rextract", line 347, in <module>
    main()
  File "/mnt/sfb900nfs/groups/tuemmler/erik/rare_validation/recentrifuge/recentrifuge/rextract", line 241, in main
    print(f'  \033[90mMatching reads: \033[0m{len(records):_d} \033[90m\t'
ZeroDivisionError: division by zero

Versions

Operating system: Ubuntu 18.04.3 LTS
Python version: Python 3.7.4
Recentrifuge version: recentrifuge v0.28.13
Release of Centrifuge, LMAT, CLARK, Kraken, etc. (version used to generate the input data): KrakenUniq version 0.5.7
Pandas version (if applicable):
Other libraries (if applicable):

Enhancement: provide control samples in a different directory

Suggested in #37 as possible enhancement when the dataset may contain a significative number of negative control samples. When a metagenomic/metatranscriptomic study has many controls, it may be a good idea to have them organized in a different directory, separated from the regular samples, so that you don't need to rename all of them to be able to use the -c option with the number of control samples.

taxonomy files for common 16S datasets?

Thank you for creating this tool - it is extremely useful!
I have been using some of the Kraken2 16S databases that are pre-compiled (RDP, SILVA, etc), and unfortunately these outputs are not compatible with recentrifuge because they do not use the NCBI taxonomy. Do you have any suggestions for using recentrifuge with analyses that have been done on 16S reference databases?

recentrifuge.rank.UnsupportedTaxLevelError: Unknown tax level section

Bug report

Bug summary

error running rcf similar to prev subcohort
f'Unknown tax level {_rank}') recentrifuge.rank.UnsupportedTaxLevelError: Unknown tax level section

Only works using old nodes&names

Running Recentrifuge

Command line

/home/recentrifuge/recentrifuge-master/rcf -f /home/600219/Classification/barcode01_classification.out -e FULL


#### Data

centrifuge classification.out file

### Actual outcome

<!--The output produced by the above code, which may be a screenshot, console output, etc.-->

File "/home/recentrifuge/recentrifuge-master/recentrifuge/taxonomy.py", line 40, in __init__
    self.read_nodes(nodes_file)
  File "/home/recentrifuge/recentrifuge-master/recentrifuge/taxonomy.py", line 82, in read_nodes
    f'Unknown tax level {_rank}')
recentrifuge.rank.UnsupportedTaxLevelError: Unknown tax level section

### Expected outcome

<!--If this used to work in an earlier version of Recentrifuge, please note the version it used to work on-->


### Versions 
<!--Please specify your platform and versions of the relevant libraries you are using:-->
  * Operating system:
  * Python version:
  * Recentrifuge version: 
  * Release of Centrifuge, LMAT, CLARK, Kraken, etc. (version used to generate the input data):
  * Pandas version (if applicable):
  * Other libraries (if applicable):

retest fails on MacOS

Bug report

Bug summary

retest fails on my Mac. I have tried running it several different ways. The method below gets the furthest, but still fails.

Running Recentrifuge

Command line

conda create -n recentrifuge python=3.7
source activate recentrifuge
git clone [email protected]:khyox/recentrifuge.git
cd recentrifuge
conda install --file=requirements.txt
./retest -d -l -r

Note that I tried using pip as well, but it fails early on due to a standard pip/matplotlib/MacOS bug.

Actual outcome

It gets as far as rextract, then I get:

>>> CHECKING ./rextract RESULTS AGAINST STANDARD ... 6001 fastq seqs OK!

>>> ANALYZING ROBUST CONTAMINATION REMOVAL ... 2019-02-16 16:36:13.156 python3[77444:17304867] -[NSApplication _setup:]: unrecognized selector sent to instance 0x7fa86e17cfc0
2019-02-16 16:36:13.158 python3[77444:17304867] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[NSApplication _setup:]: unrecognized selector sent to instance 0x7fa86e17cfc0'
*** First throw call stack:
(
	0   CoreFoundation                      0x00007fff3991d43d __exceptionPreprocess + 256
	1   libobjc.A.dylib                     0x00007fff6582a720 objc_exception_throw + 48
	2   CoreFoundation                      0x00007fff3999a255 -[NSObject(NSObject) __retain_OA] + 0
	3   CoreFoundation                      0x00007fff398bcad0 ___forwarding___ + 1486
	4   CoreFoundation                      0x00007fff398bc478 _CF_forwarding_prep_0 + 120
	5   libtk8.6.dylib                      0x0000000117b7b31d TkpInit + 413
	6   libtk8.6.dylib                      0x0000000117ad317e Initialize + 2622
	7   _tkinter.cpython-37m-darwin.so      0x00000001178fba0f _tkinter_create + 1183
	8   python3                             0x00000001041f28b6 _PyMethodDef_RawFastCallKeywords + 230
	9   python3                             0x000000010432fba2 call_function + 306
	10  python3                             0x000000010432d852 _PyEval_EvalFrameDefault + 46114
	11  python3                             0x00000001043211fe _PyEval_EvalCodeWithName + 414
	12  python3                             0x00000001041f1587 _PyFunction_FastCallDict + 231
	13  python3                             0x0000000104273c31 slot_tp_init + 193
	14  python3                             0x000000010427dc01 type_call + 241
	15  python3                             0x00000001041f2283 _PyObject_FastCallKeywords + 179
	16  python3                             0x000000010432fc35 call_function + 453
	17  python3                             0x000000010432d946 _PyEval_EvalFrameDefault + 46358
	18  python3                             0x00000001041f2075 function_code_fastcall + 117
	19  python3                             0x000000010432fb27 call_function + 183
	20  python3                             0x000000010432d852 _PyEval_EvalFrameDefault + 46114
	21  python3                             0x00000001043211fe _PyEval_EvalCodeWithName + 414
	22  python3                             0x00000001041f1587 _PyFunction_FastCallDict + 231
	23  python3                             0x00000001041f54a2 method_call + 130
	24  python3                             0x00000001041f2ef2 PyObject_Call + 130
	25  python3                             0x000000010432da9d _PyEval_EvalFrameDefault + 46701
	26  python3                             0x00000001043211fe _PyEval_EvalCodeWithName + 414
	27  python3                             0x00000001041f1587 _PyFunction_FastCallDict + 231
	28  python3                             0x000000010432da9d _PyEval_EvalFrameDefault + 46701
	29  python3                             0x00000001043211fe _PyEval_EvalCodeWithName + 414
	30  python3                             0x00000001041f2783 _PyFunction_FastCallKeywords + 195
	31  python3                             0x000000010432fb27 call_function + 183
	32  python3                             0x000000010432d946 _PyEval_EvalFrameDefault + 46358
	33  python3                             0x00000001043211fe _PyEval_EvalCodeWithName + 414
	34  python3                             0x00000001041f2783 _PyFunction_FastCallKeywords + 195
	35  python3                             0x000000010432fb27 call_function + 183
	36  python3                             0x000000010432d88d _PyEval_EvalFrameDefault + 46173
	37  python3                             0x00000001043211fe _PyEval_EvalCodeWithName + 414
	38  python3                             0x00000001041f2783 _PyFunction_FastCallKeywords + 195
	39  python3                             0x000000010432fb27 call_function + 183
	40  python3                             0x000000010432d88d _PyEval_EvalFrameDefault + 46173
	41  python3                             0x00000001043211fe _PyEval_EvalCodeWithName + 414
	42  python3                             0x0000000104384760 PyRun_FileExFlags + 256
	43  python3                             0x0000000104383bd7 PyRun_SimpleFileExFlags + 391
	44  python3                             0x00000001043b17bf pymain_main + 9583
	45  python3                             0x00000001041c4bbd main + 125
	46  libdyld.dylib                       0x00007fff668f8085 start + 1
)
libc++abi.dylib: terminating with uncaught exception of type NSException
Abort trap: 6

Expected outcome

retest should return 0.

Versions

Operating system: MacOS 10.14
Python version: 3.6 and 3.7
Recentrifuge version: 0.28.2 from pip, or local version from git
All other packages:

$ conda list
# packages in environment at /Users/benkaehler/miniconda3/envs/recentrifuge:
#
# Name                    Version                   Build  Channel
biopython                 1.72             py37h6440ff4_0  
blas                      1.0                         mkl  
ca-certificates           2019.1.23                     0  
certifi                   2018.11.29               py37_0  
cycler                    0.10.0                   py37_0  
et_xmlfile                1.0.1                    py37_0  
freetype                  2.9.1                hb4e5f40_0  
intel-openmp              2019.1                      144  
jdcal                     1.4                      py37_0  
kiwisolver                1.0.1            py37h0a44026_0  
libcxx                    4.0.1                hcfea43d_1  
libcxxabi                 4.0.1                hcfea43d_1  
libedit                   3.1.20181209         hb402a30_0  
libffi                    3.2.1                h475c297_4  
libgfortran               3.0.1                h93005f0_2  
libpng                    1.6.36               ha441bb4_0  
matplotlib                3.0.2            py37h54f8f79_0  
mkl                       2019.1                      144  
mkl_fft                   1.0.10           py37h5e564d8_0  
mkl_random                1.0.2            py37h27c97d8_0  
ncurses                   6.1                  h0a44026_1  
numpy                     1.15.4           py37hacdab7b_0  
numpy-base                1.15.4           py37h6575580_0  
openpyxl                  2.5.14                     py_0  
openssl                   1.1.1a               h1de35cc_0  
pandas                    0.24.1           py37h0a44026_0  
pip                       19.0.1                   py37_0  
pyparsing                 2.3.1                    py37_0  
python                    3.7.2                haf84260_0  
python-dateutil           2.7.5                    py37_0  
pytz                      2018.9                   py37_0  
readline                  7.0                  h1de35cc_5  
setuptools                40.7.3                   py37_0  
six                       1.12.0                   py37_0  
sqlite                    3.26.0               ha441bb4_0  
tk                        8.6.8                ha441bb4_0  
tornado                   5.1.1            py37h1de35cc_0  
wheel                     0.32.3                   py37_0  
xlrd                      1.2.0                    py37_0  
xz                        5.2.4                h1de35cc_4  
zlib                      1.2.11               h1de35cc_3

Excel output with empty columns

Hi there! Thank you very much for recentrifuge, the HTML output is incredible useful! However, I have a problem with the Excel output. It has empty columns everywhere. Could you check it, please? Thanks a lot!

The last stable release of pandas is breaking retest

Bug report

Bug summary

CI build is failing: The last stable release of pandas is breaking retest.

Command line

retest -l -r -d -m -s

Actual outcome

After a minimum code update to avoid (uncaught) exception:

(...)

>>> COMPARING ./rcf RESULTS WITH STANDARD:
>> TEST FOR _sample_stats... FAILED!
Test results are:
            ctrl1  ctrl2  ctrl3  smpl1  smpl2  smpl3  smpl4  smplH
Length max    200    200    200    200    200    200    200    200
Length max  200.0  200.0  200.0  200.0  200.0  200.0  200.0  200.0
... but standard results are:
               ctrl1     ctrl2     ctrl3  ...     smpl3     smpl4     smplH
Length max       200       200       200  ...       200       200       200
Seqs. read  100000.0  100000.0  100000.0  ...  100000.0  100000.0  600000.0

[1 rows x 8 columns]

Process finished with exit code 8

Expected outcome

(...)

Process finished with exit code 0

Versions

Pandas version: 0.25.0

CSV output

Thanks for Recentrifuge! It would be nice to have cvs output as an alternative to the excel output. Any plans on this?

Using centrifuge premade (.cf) database

Hi-

I'm looking to use recentrifuge with an existing pre-built nt centrifuge database (i.e. files nt.1.cf, nt.2.cf, nt.3.cf, and nt.4.cf). From here you seem to point to being able to use this directly. Am I missing a step to get the .dmp files from this cf database?

Thanks,
Ben

More than 10 mins on a centrifuge result (21.3 MB)

Hi,

The latest version (v1.2.1) took more than 10 mins on a small result.
ion_test.fq.zip
The input was a centrifuge result as attached (21.3 MB after decompressing).

Python: Python 3.6.10 :: Anaconda, Inc.
Centrifuge: 1.0.4
Hardware: CPU 24C/48T, 128 GB RAM.

Thanks in advance for your helps.

-k for multiple directories of .krk files?

Hi,
I have >500 samples to compare and >20 controls. As such I want to keep the script simple and I'm therefore wondering if I can add all my .krk files to one directory and all my .krk sample files to another and then simply point to the two directories with the -k option, with -c giving the file number in the first -k directory? from the main page I see this can be done with real samples pointing to a single directory, but can this be done with multiple directories and in combination with -c? I ask before I test as I don't want to waste time moving all files etc. if it is not possible?

original script (from main page)
rcf -n /my/tax/dir -k CTRL1.krk -k CTRL2.krk -k X1.krk -k X2.krk -k X3.krk -c 2 -o Xsamples.rcf.html -s KRAKEN -y 25 -x 9606

modified script (from main page)
rcf -n /my/tax/dir -k ./controls -k ./real_samples -c 2 -o Xsamples.rcf.html -s KRAKEN -y 25 -x 9606

cheers

interpreting log file for contamination removal

Hi @khyox,
I'm trying to understand/interpret the log file so as I can remove contaminats manually based on taxonids. I would like to remove those that have been assigned as "critical". However, it is difficult to understand what these are from the log file attached. If I run find for "critical" it highlights 18 hits, at different taxonomical levels. However, some of these critical hits are class level taxonids (e.g. Actionbacteria and Gammaproteobacteria), and I cannot possibly imagine that these should be removed from the dataset, and I'm guessing a lower taxonomic rank is actually being flagged, but as I write, not easy to see? A seperate contamiation output file that gives a simple result for interpretation and downstream processing would be a welcomed addition?
Recentrifuge17_log.txt

regards

Contamination removal help and too large HTML files

Hi,
I'm running recentrifuge with kraken2 with the main goal being removal of contaminants and for this I have negative control(s). However, what I'm seeing in my results is that the "Exclusive" and "control" for the real samples are identical, so recentrifuge is removing everything "shared" between the negative and real sample, rather than removing some and reducing the signal for others, as I was expecting (and also what I see from the paper)?

I am running on samples seperately and then samples pooled. I attach my code. running the latest version of the software

rcf -n /media/ubuntu/Elements/reference_genomes/Recentrifuge/taxdump -k 0_TRJE-N1_NEG.krk -k "$b".krk -c 1 -o ./FINAL/"$b"_after.html -d -s KRAKEN -x 9606 > ./FINAL/"$b"_after_log.txt

rcf -n /media/ubuntu/Elements/reference_genomes/Recentrifuge/taxdump
-k /media/ubuntu/Elements/NEWPIPELINE_MetaAIR/RAW_DATA/clean_data_1_17b/RECENTRIFUGE
-c 7 -o OUTPUT.html -d -s KRAKEN -x 9606 > log.txt &

FYI: I now see the second works better on pooled samples and the control and exclusive for the real samples are different. The problem however is that in some cases the html file is so large it is not possible to open?

regards

EXAMPLE.zip

Can command "rextract" support to extract reads from fasta files?

Thank you for this convenient software!
I have used this software to investigate species taxonomy of assembled sequences. The command for centrifuge is
"centrifuge -x $centrifuge_db -f 04.Assembly/trinity_denovo/Trinity.fasta
-p 6 -S tmp/centrifuge_fa_result
--report-file tmp/centrifuge_fa_report.tsv "

And i successfully obtained the html taxonomy by Recentrifuge software ("rcf" command), however, i can not extract according reads I wanted from fasta sequences using the command "rextract". I try many commands it provided:
"rextract -n $recentrifuge_taxa -i 10239 -f tmp/centrifuge_fa_result
-1/-2/-q 04.Assembly/trinity_denovo/Trinity.fasta "

I read the description about this command, and it can not support to extract sequences from fasta files . I am not familiar to python script. Can you add this function to this command? I am hoping that you could solve this request i wanted. Thanks again.
Zhenzhi Han

No sequence passed the filter error

Dear all

I ran rcf based on kraken2 outputs and encountered the following error:

Loading NCBI nodes... OK!
Loading NCBI names... OK!
Building dict of parent to children taxa... OK!

Please, wait, processing files in parallel...

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/mhyleung/workspace/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/mhyleung/workspace/anaconda3/envs/py36/lib/python3.6/site-packages/recentrifuge/taxclass.py", line 86, in process_output
    log, stat, counts, scores = read_method(target_file, scoring, minscore)
  File "/home/mhyleung/workspace/anaconda3/envs/py36/lib/python3.6/site-packages/recentrifuge/kraken.py", line 135, in read_kraken_output
    raise Exception(red('\nERROR! ') + 'No sequence passed the filter!')
Exception:
ERROR! No sequence passed the filter!
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mhyleung/workspace/anaconda3/envs/py36/bin/rcf", line 812, in <module>
    main()
  File "/home/mhyleung/workspace/anaconda3/envs/py36/bin/rcf", line 771, in main
    read_samples()
  File "/home/mhyleung/workspace/anaconda3/envs/py36/bin/rcf", line 450, in read_samples
    input_files, [r.get() for r in async_results]):
  File "/home/mhyleung/workspace/anaconda3/envs/py36/bin/rcf", line 450, in <listcomp>
    input_files, [r.get() for r in async_results]):
  File "/home/mhyleung/workspace/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 670, in get
    raise self._value
Exception:
ERROR! No sequence passed the filter!

My command is

rcf -k control1.krk -k control2.krk -k control3.krk -k control4.krk -k sample1.krk -k sample2.krk -k /sample3.krk -k sample4.krk -c 4 -o rcf_output.html -s KRAKEN -y 25

Thank you so much

Regards

Marcus

KeyError: 'STRAIN'

Hi, can you help me with this issues?

Bug report

=-= /home/abomba/miniconda3/bin/rcf =-= v1.0.3 - May 2020 =-= by Jose Manuel Martí =-=

Loading NCBI nodes...Traceback (most recent call last):
  File "/home/abomba/miniconda3/lib/python3.6/site-packages/recentrifuge/taxonomy.py", line 79, in read_nodes
    rank = Rank[_rank.upper().replace(" ", "_")]
  File "/home/abomba/miniconda3/lib/python3.6/enum.py", line 329, in __getitem__
    return cls._member_map_[name]
KeyError: 'STRAIN'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/abomba/miniconda3/bin/rcf", line 813, in <module>
    main()
  File "/home/abomba/miniconda3/bin/rcf", line 737, in main
    collapse, excluding, including, args.debug)
  File "/home/abomba/miniconda3/lib/python3.6/site-packages/recentrifuge/taxonomy.py", line 40, in __init__
    self.read_nodes(nodes_file)
  File "/home/abomba/miniconda3/lib/python3.6/site-packages/recentrifuge/taxonomy.py", line 82, in read_nodes
    f'Unknown tax level {_rank}')
recentrifuge.rank.UnsupportedTaxLevelError: Unknown tax level strain

Command line

> rcf -f s.tsv -n ~/miniconda3/bin/taxdump

Data

Versions

Operating system: Ubuntu 16.04
Python version: 3.6
Recentrifuge version: rcf version 1.0.3 released in May 2020

pandas and Excel output

Hi Jose,
I'm running recentrifuge in a virtual environment created as follows:

conda create -n py36 python=3.6 openpyxl biopython pandas

However, the recentrifuge program throws an error at the final step where it would generate an Excel file because it does not recognize Pandas as being installed.

The error:

Building the taxonomy multiple tree... OK!
Generating final plot (/mnt/lustre/macmaneslab/devon/pore604/centrifuge/pore604_results_unclassified.rcf.html)... OK!
WARNING! Pandas not installed: Excel cannot be created.

This is odd to me because that virtual environment suggests panadas is indeed present:

(py36) [devon@premise]$ pip freeze
biopython==1.70
certifi==2018.4.16
et-xmlfile==1.0.1
jdcal==1.4
mmtf-python==1.1.0
msgpack==0.5.6
numpy==1.12.1
olefile==0.45.1
ont-fast5-api==0.4.1
openpyxl==2.4.0b1
pandas==0.22.0
Pillow==4.2.1
progressbar33==2.4
python-dateutil==2.7.2
pytz==2018.4
reportlab==3.4.0
six==1.11.0

I thought maybe Panads was too new, so I downgraded to an older version (0.20.3) and the issue is the same. Do you have any suggestions on how to troubleshoot further?

Thanks!

Versions

Operating system: x86_64 GNU/Linux; Scientific Linux release 7.3 (Nitrogen)
Python version: Python 3.6.5
Recentrifuge version: v0.18.6
Centrifuge or LMAT version used to generate the input data: 1.0.3-beta
Pandas version (if applicable): 0.20.3
Other libraries (if applicable): see above

The instructions for installing are extremely unclear

The instructions for how to install recentrifuge must be leaving a lot out.

I ran python3.8 -m pip install recentrifuge, then I installed all the dependencies the same way.
But rcf is not a command my computer can find.
What installation step is missing?

Recentrifuge fails because of new NCBI taxonomy rank: subcohort

Bug report

Bug summary

Recentrifuge fails while loading NCBI data from recent versions from NCBI taxdump files. A recent (last weeks) change in the NCBI taxdump files has triggered this error. The problem is the inclusion of a new taxonomic rank in NCBI Taxonomy: subcohort.

Actual outcome

Loading NCBI nodes...Traceback (most recent call last):
  File "/software/Python/Python-3.7.1/lib/python3.7/site-packages/recentrifuge/taxonomy.py", line 79, in read_nodes
    rank = Rank[_rank.upper().replace(" ", "_")]
  File "/software/Python/Python-3.7.1/lib/python3.7/enum.py", line 351, in __getitem__
    return cls._member_map_[name]
KeyError: 'SUBCOHORT'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/software/Python/Python-3.7.1/bin/rcf", line 812, in <module>
    main()
  File "/software/Python/Python-3.7.1/bin/rcf", line 736, in main
    collapse, excluding, including, args.debug)
  File "/software/Python/Python-3.7.1/lib/python3.7/site-packages/recentrifuge/taxonomy.py", line 40, in __init__
    self.read_nodes(nodes_file)
  File "/software/Python/Python-3.7.1/lib/python3.7/site-packages/recentrifuge/taxonomy.py", line 82, in read_nodes
    f'Unknown tax level {_rank}')
recentrifuge.rank.UnsupportedTaxLevelError: Unknown tax level subcohort

Expected outcome

Loading NCBI nodes... OK!
Loading NCBI names... OK!

Workaround

While I solve the issue, you can use a previous version of the NCBI taxonomy files from January. You can download it from HERE.

Acknoledgements

Thanks to Shashi Kanth from Theranosis for reporting this issue through email.

abundance table as output

Hi @khyox,

I am wondering what options I need to use to get an abundace table as ouput if I want to downstream process in another program (e.g. R)? for example an output similar to that given by kraken2 report (attached), but of course post contamination removal. Another way of asking is what is the input used to construct the graphic html files?

regards
SL342519_REPORT.Kraken2.txt

(question) produced output

Hi@khyox

I used the tool for Kraken files. after downloading the tools, here is my follow the command

to my cuurent direcity

$ retaxdump

$ rcf -k CRTL1.kraken -k abudl_testextract_barcode26.kraken -c 1 -o Xsamples.rcf.html -s KRAKEN -y 25 -x 9606

following the output and reading the instruction, I observe that the confidence level is increased (from the chart link) but wondering

is that mean any organism present in control will be discarded in samples? but what if the sample has the discarded organism (not from contamination) ? , in the excel produced file, is there an option I can see which organism is selected for filtering?

following the output, I see you can choose either CSV, FULL, etc Can I produce Kraken report.txt file as well for each sample?

Thank you for the help

unclassified reads in rextract

Hi, thanks for the tool. I have a question regarding the usage of rextract.

I have the classification result from centrifuge and want to use rextract to remove some contaminant reads(mainly bacteria) from my dataset so that I can do de novo genome assembly for my species(Eukoryota). It seems that either option "-i" or "-x" will not include any of the unclassified reads in the extraction. I wonder if there is a reason for this. Does it make more sense to exclude the unclassified reads for de novo assembly? Thank you.

rextract: read retrieval failed

Bug report

rextract fails with "this object should be subclassed" error when trying to extract reads of oomycetes

Running Recentrifuge

Command line

scratch/software/miniconda3/bin/rextract \
 -f /tmp/annew/centrifuge_Pf_Sb3.results \
 -i 4762 \
 -n /scratch/public_data/nt-centrifuge_12jan2021 \
 -1 /tmp/annew/al-conc-mate.1.fastq \
 -2 /tmp/annew/al-conc-mate.2.fastq

Data

-1 and -2 are fastq files as output by centrifuge v 1.0.4, these are all those reads which could be classified
-f is the centrifuge output produced by the centrifuge run (not the summary report)

n is the directory where the taxdump files live

Actual outcome

# stdout:
=-= /scratch/software/miniconda3/bin/rextract =-= v1.3.1 - Jan 2021 =-= by Jose Manuel Martí =-=

ESC[90mLoading NCBI nodes...ESC[0mESC[92m OK! ESC[0m
ESC[90mLoading NCBI names...ESC[0mESC[92m OK! ESC[0m
ESC[90mBuilding dict of parent to children taxa...ESC[0mESC[92m OK! ESC[0m
List of taxa (and below) to be explicitly included:
                Id      Scientific Name
                4762    Oomycota
ESC[90mBuilding taxonomy tree...ESC[0mESC[92m OK!ESC[0m
ESC[90mFiltering taxa...ESC[0mESC[92m OK!ESC[0m
  3383ESC[90m taxid selected in ESC[0m13ESC[90m different taxonomical levels:ESC[0m
  Number of different PHYLUM: 1
  Number of different ORDER: 11
  Number of different FAMILY: 19
  Number of different GENUS: 81
  Number of different SPECIES_GROUP: 1
  Number of different SPECIES: 3124
  Number of different SUBSPECIES: 3
  Number of different FORMA_SPECIALIS: 7
  Number of different VARIETY: 18
  Number of different FORMA: 6
  Number of different STRAIN: 40
  Number of different ISOLATE: 3
  Number of different NO_RANK: 69
ESC[90mLoading output file /tmp/annew/centrifuge_Pf_Sb3.results...ESC[0mESC[92m OK!ESC[0m
ESC[90m  Load elapsed time: ESC[0m868ESC[90m secESC[0m
  ESC[90mMatching reads: ESC[0m11_344_941 ESC[90m       (ESC[0m10.2115%ESC[90m of sample)
ESC[90mLoading FASTQ files /tmp/annew/al-conc-mate.1.fastq and /tmp/annew/al-conc-mate.2.fastq...
Mseqs: ESC[0m0.........1.........2.........3.........4.........5.........6.........7.........8.........9.........10.........11.........12.........13.........14.........15.........16.........17.........18.........19.........20.........21.........22.........23.........24.........25.........26.........27.........28.........29.........30.........31.........32.........33.........34.........35.........36.........37.........38.........39.........40.........41.........42.........43.........44.........45.........46.........47.........48.........49.........50.........51.........52.........53.........54.........55.........56.........57.........58.........59.........60.........61.........62.........63.........64.........65.........66.........67.........68.........69.........70.........71.........72.........73.........74.........75.........76.ESC[96m 76.2 MseqsESC[0m ESC[92mOK! ESC[0m


# Stderr:
Traceback (most recent call last):
  File "/scratch/software/miniconda3/bin/rextract", line 347, in <module>
    main()
  File "/scratch/software/miniconda3/bin/rextract", line 333, in main
    SeqIO.write(seqs1, filename1, 'quickfastq')
  File "/scratch/software/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/__init__.py", line 561, in write
    count = writer_class(fp).write_file(sequences)
  File "/scratch/software/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 139, in write_file
    raise NotImplementedError("This object should be subclassed")
NotImplementedError: This object should be subclassed

Expected outcome

I expected two fastq files with paired reads identified as taxon 4762 or members of that order (all oomycetes).

Versions

Operating system: debian stretch
Python version: Python 3.7.4
Recentrifuge version: rextract release 1.3.1
Release of Centrifuge, LMAT, CLARK, Kraken, etc. (version used to generate the input data): Centrifuge v 1.0.4
Pandas version (if applicable):
Other libraries (if applicable):

rextract: add option to get the unclassified reads

Suggested in #27 as a spin-off.

retest error: "remock: error: one of the arguments -m/--mock is required"

Bug report

Bug summary

I am getting the following error when running retest:

And when I am running remock:

Running Recentrifuge

Command line

> retest
> remock -m recentrifuge/test/ -r 35 -d

Data

Format of input data (Centrifuge, LMAT, CLARK, Kraken, or other —generic interface—)?
If the generic interface was used, please provide some lines selected from the data files as a sample of the format:
Control samples (if any):
Regular samples:

Actual outcome

# If applicable, paste the console output here
#
#

Expected outcome

Versions

Operating system:
Python version:
Recentrifuge version:
Release of Centrifuge, LMAT, CLARK, Kraken, etc. (version used to generate the input data):
Pandas version (if applicable):
Other libraries (if applicable):

export PDF from recentrifuge results?

It is a great work for us to analyze the metagenome datas. But I have a question shown below. Can I directly export the PDF for publication from the recentrifuge results? If not available, can you add the function to the recentrifuge? Thank you so much!
The result of centrifuge was analyzed through recentrifuge and pavian. I found that the numbers of raw reads were different in the recentrifuge and pavian. Of course, the percent of species was also different. Could you found the difference in assessment of recentrifuge? And can you tell me the reason? Thanks.

centrifuge db

Hi,

How may I use for recentrifuge the same libraries/db that I already downloaded for centrifuge?

Thanks!

rextract: ZeroDivisionError

Hi @khyox ,
I was trying to run rextract.py" with sample dataset, it is terminating before completing the scripts and throwing the following error:

Traceback (most recent call last):
File "/home/galaxy/.local/bin/rextract", line 347, in
main()
File "/home/galaxy/.local/bin/rextract", line 241, in main
print(f' \033[90mMatching reads: \033[0m{len(records):_d} \033[90m\t'
ZeroDivisionError: division by zero

I observe that the: length of the records 0 and number of sequences 0 are coming zero in some of the taxon id case. What I feel that there should be one condition to check it and exit gracefully at that point.

ImportError: cannot import name 'SequentialSequenceWriter' from 'Bio.SeqIO.Interfaces'

Hello,

I failed to use recentrifuge using conda after installation.
Failed for :

fresh install : mamba create -n rcf -c bioconda recentrifuge=1.10.0
update into an env mamba install -c bioconda recentrifuge=1.10.0
install into an env mamba install -c bioconda recentrifuge

This error :

rcf --help

Traceback (most recent call last):
  File "/home/pierre/miniconda3/envs/rcf/bin/rcf", line 33, in <module>
    from recentrifuge import __version__, __author__, __date__
  File "/home/pierre/miniconda3/envs/rcf/lib/python3.8/site-packages/recentrifuge/__init__.py", line 41, in <module>
    from . import lmat_io  # LMAT support
  File "/home/pierre/miniconda3/envs/rcf/lib/python3.8/site-packages/recentrifuge/lmat_io.py", line 11, in <module>
    from Bio.SeqIO.Interfaces import SequentialSequenceWriter
ImportError: cannot import name 'SequentialSequenceWriter' from 'Bio.SeqIO.Interfaces' (/home/pierre/miniconda3/envs/rcf/lib/python3.8/site-packages/Bio/SeqIO/Interfaces.py)

Bug Report: PermissionError: [Errno 13] Permission denied

Here is the bug report of recentrifuge.
The error is list below:
-------->
=-= /home/Software/Miniconda3/envs/poliolab-ngs-env/bin/rcf =-= v1.1.0 - Jun 2020 =-= by Jose M
anuel Martí =-=

Loading NCBI nodes... OK!
Loading NCBI names... OK!
Building dict of parent to children taxa... OK!

Please, wait, processing files in parallel...

Traceback (most recent call last):
File "/home/Software/Miniconda3/envs/poliolab-ngs-env/bin/rcf", line 836, in
main()
File "/home/Software/Miniconda3/envs/poliolab-ngs-env/bin/rcf", line 795, in main
read_samples()
File "/home/Software/Miniconda3/envs/poliolab-ngs-env/bin/rcf", line 461, in read_samples
len(input_files))) as pool:
File "/home/Software/Miniconda3/envs/poliolab-ngs-env/lib/python3.6/multiprocessing/context.py", line 119, in Pool
context=self.get_context())
File "/home/Software/Miniconda3/envs/poliolab-ngs-env/lib/python3.6/multiprocessing/pool.py", line 156, in init
self._setup_queues()
File "/home/Software/Miniconda3/envs/poliolab-ngs-env/lib/python3.6/multiprocessing/pool.py", line 249, in _setup_queues
self._inqueue = self._ctx.SimpleQueue()
File "/home/Software/Miniconda3/envs/poliolab-ngs-env/lib/python3.6/multiprocessing/context.py", line 112, in SimpleQueue
return SimpleQueue(ctx=self.get_context())
File "/home/Software/Miniconda3/envs/poliolab-ngs-env/lib/python3.6/multiprocessing/queues.py", line 315, in init
self._rlock = ctx.Lock()
File "/home/Software/Miniconda3/envs/poliolab-ngs-env/lib/python3.6/multiprocessing/context.py", line 67, in Lock
return Lock(ctx=self.get_context())
File "/home/Software/Miniconda3/envs/poliolab-ngs-env/lib/python3.6/multiprocessing/synchronize.py", line 162, in init
SemLock.init(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/home/Software/Miniconda3/envs/poliolab-ngs-env/lib/python3.6/multiprocessing/synchronize.py", line 59, in init
unlink_now)
PermissionError: [Errno 13] Permission denied
----------->

The command line is:
-------->
rcf -n /Database/taxdump/ -k 03.Reads.classify/kraken2_classification_result -e FULL -o 03.Reads.classify/classification.kraken2.html
--------->

I have test the version v1.1.0 and v1.1.1, and all appear this error. Can you provide a solution for this ?

definition of contaminat level for removal?

Hi,
is it possible to get some guidelines for the robust contamination removal? At what level should taxa be removed and where should this be considered? For example "Critical" should be removed but what about taxa identified as "severe" or "mild", where does the line go? or would you define a cut-off using the score value in the excel file? Also, for the html output what is actually being removed from the final plot and at which level? and is it possible to control this as a user, lets say I want to keep "mild" in the final plot?

I ask, as with default settings I am getting 40 taxa identified as "severe" contaminats.

regards

ZeroDivisionError

Bug report

Hello @khyox,
This is the same error as post #22. I believe I am using the correct centrifuge file.

Bug summary

ZeroDivisionError: division by zero when using rextract with centrifuge output file.

Running Centrifuge/Recentrifuge

Command line - centrifuge

>centrifuge --verbose -p 22 -x /home/Data1/Centrifuge_index/centrifuge-abv-univec-bp -U /home/Desktop/seq_analysis/results_162853/combined/barcode_07.fastq > /home/Desktop/seq_analysis/results_162853/centrifuge/barcode_07/barcode_07.out

Command line - rextract

>rextract -f /home/Desktop/seq_analysis/results_162853/centrifuge/barcode_07/barcode_07.out -n /home/Data1/Centrifuge_Index/taxonomy -i 621 -q /home/Desktop/seq_analysis/results_162853/combined/barcode_07.fastq

Data

Format of input data (Centrifuge, LMAT, CLARK, Kraken, or other —generic interface—)? Centrifuge

Actual outcome

Centrifuge output file:

head -n 20 /home/Desktop/seq_analysis/results_162853/centrifuge/barcode_07/barcode_07.out
Input bt2 file: "/home/Data1/Centrifuge_Index/centrifuge-abv-univec-bp"
Query inputs (DNA, FASTQ):
  /home/Desktop/seq_analysis/results_162853/combined/barcode_07.fastq
Quality inputs:
Output file: ""
Local endianness: little
Sanity checking: disabled
Assertions: disabled
Trying /home/Data1/Centrifuge_Index/centrifuge-abv-univec-bp
readID	seqID	taxID	score	2ndBestScore	hitLength	queryLength	numMatches
d03898bd-1221-486a-a664-7b1d07c8e9c0	CP049598.1	1406	81	81	24	530	2
d03898bd-1221-486a-a664-7b1d07c8e9c0	CP049783.1	1406	81	81	24	530	2
a35cca09-8af6-4ead-9c4e-be8ec5c77b8d	genus	561	121	121	26	686	5
a35cca09-8af6-4ead-9c4e-be8ec5c77b8d	species	32630	121	121	26	686	5
a35cca09-8af6-4ead-9c4e-be8ec5c77b8d	species	703	121	121	26	686	5
a35cca09-8af6-4ead-9c4e-be8ec5c77b8d	genus	186777	121	121	26	686	5
a35cca09-8af6-4ead-9c4e-be8ec5c77b8d	genus	620	121	121	26	686	5
14f82998-810e-49ee-bd85-2006c73d5f03	species	1423	4225	0	121	849	1
2f6f45b8-b733-4fa7-9e5a-1de37b12f042	CP057475.1	562	289	0	32	2301	1
443fd5a8-f51e-4152-a3cb-ca370335bdb2	CP013187.1	161398	361	121	34	3941	1

rextract output:

 =-= /home/miniconda3/envs/recentrifuge/bin/rextract =-= v1.3.3 - May 2021 =-= by Jose Manuel Martí =-=

Loading NCBI nodes... OK! 
Loading NCBI names... OK! 
Building dict of parent to children taxa... OK! 
List of taxa (and below) to be explicitly included:
		Id	Scientific Name
		621	Shigella boydii
Building taxonomy tree... OK!
Filtering taxa... OK!
  15 taxid selected in 2 different taxonomical levels:
  Number of different SPECIES: 1
  Number of different NO_RANK: 14
Loading output file /home/Desktop/seq_analysis/results_162853/centrifuge/barcode_07/barcode_07.out... OK!
  Load elapsed time: 0.00303 sec
Traceback (most recent call last):
  File "/home/miniconda3/envs/recentrifuge/bin/rextract", line 347, in <module>
    main()
  File "/home/miniconda3/envs/recentrifuge/bin/rextract", line 241, in main
    print(f'  \033[90mMatching reads: \033[0m{len(records):_d} \033[90m\t'
ZeroDivisionError: division by zero

Expected outcome

Versions

Operating system: Ubuntu 20.04
Python version: 3.6.13
Recentrifuge version: 1.3.3
Release of Centrifuge, LMAT, CLARK, Kraken, etc. (version used to generate the input data): Centrifuge v1.0.4
Pandas version (if applicable): 1.1.5
Other libraries (if applicable):

min() arg is an empty sequence [error]

Hi,

I tried to run recentrifuge after reinstalling and got the following error. How can I resolve this problem?

Thanks,
Chris

$ rcf -n /nfs/data/bsd/je0/taxdump -k /nfs/data/bsd/je0/schadt/kraken2/trimmed/all_krk/ -c 2 -o /nfs/data/bsd/je0/schadt/recentrifuge/trimmed/all_quality_trimmed_seqs.html -s KRAKEN -y 25

=-= /nfs/data/bsd/je0/miniconda3/bin/rcf =-= v0.28.14 - December 2019 =-= by Jose Manuel Martí =-=

Kraken .krk files to analyze: ['/nfs/data/bsd/je0/schadt/kraken2/trimmed/all_krk/P1-S1.krk', '/nfs/data/bsd/je0/schadt/kraken2/trimmed/all_krk/P2-S2.krk']
Control(s) sample(s) for subtractions:
/nfs/data/bsd/je0/schadt/kraken2/trimmed/all_krk/P1-S1.krk
/nfs/data/bsd/je0/schadt/kraken2/trimmed/all_krk/P2-S2.krk
Loading NCBI nodes... OK!
Loading NCBI names... OK!
Building dict of parent to children taxa... OK!

Please, wait, processing files in parallel...

Loading output file /nfs/data/bsd/je0/schadt/kraken2/trimmed/all_krk/P1-S1.krk... OK!
Seqs read: 11_788 [5.41 Mnt]
Seqs clas: 2_197 (81.36% unclassified)
Seqs pass: 48 (97.82% rejected)
Scores SHEL: min = 102.0, max = 589.0, avr = 197.6
Coverage(%): min = 26.3, max = 100.0, avr = 47.8
Read length: min = 261 nt, max = 783 nt, avr = 364 nt
TaxIds: by classifier = 529, by filter = 22
Building from raw data with mintaxa = 2 ...
Check for more seqs lost ([in/ex]clude affects)... OK!
/nfs/data/bsd/je0/schadt/kraken2/trimmed/all_krk/P1-S1 ctrl OK!
Load elapsed time: 0.114 sec

Loading output file /nfs/data/bsd/je0/schadt/kraken2/trimmed/all_krk/P2-S2.krk... OK!
Seqs read: 11_915 [5.45 Mnt]
Seqs clas: 2_241 (81.19% unclassified)
Seqs pass: 52 (97.68% rejected)
Scores SHEL: min = 97.0, max = 623.0, avr = 232.1
Coverage(%): min = 25.0, max = 100.0, avr = 57.8
Read length: min = 266 nt, max = 622 nt, avr = 365 nt
TaxIds: by classifier = 545, by filter = 20
Building from raw data with mintaxa = 2 ...
Check for more seqs lost ([in/ex]clude affects)... OK!
/nfs/data/bsd/je0/schadt/kraken2/trimmed/all_krk/P2-S2 ctrl OK!
Load elapsed time: 0.125 sec

Please, wait. Performing cross analysis in parallel...

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/nfs/data/bsd/je0/miniconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/nfs/data/bsd/je0/miniconda3/lib/python3.8/site-packages/recentrifuge/core.py", line 393, in process_rank
shared_analysis()
File "/nfs/data/bsd/je0/miniconda3/lib/python3.8/site-packages/recentrifuge/core.py", line 172, in shared_analysis
min_taxa=get_shared_mintaxa(),
File "/nfs/data/bsd/je0/miniconda3/lib/python3.8/site-packages/recentrifuge/core.py", line 72, in get_shared_mintaxa
return min([mintaxas[smpl] for smpl in raws[controls:]])
ValueError: min() arg is an empty sequence
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/nfs/data/bsd/je0/miniconda3/bin/rcf", line 813, in
main()
File "/nfs/data/bsd/je0/miniconda3/bin/rcf", line 782, in main
analyze_samples()
File "/nfs/data/bsd/je0/miniconda3/bin/rcf", line 495, in analyze_samples
[r.get() for r in async_results]):
File "/nfs/data/bsd/je0/miniconda3/bin/rcf", line 495, in
[r.get() for r in async_results]):
File "/nfs/data/bsd/je0/miniconda3/lib/python3.8/multiprocessing/pool.py", line 768, in get
raise self._value
ValueError: min() arg is an empty sequence

ZeroDivisionError/Centrifuge

Bug report

Hi @khyox,

Thanks for the nice format, this is related to post #18

Bug summary

ZeroDivisionError: division by zero

How I got here

Command line (running centrifuge)

> centrifuge -q -x /home/Staff/uqgni1/tools/centrifuge/hvc -U Run02_filtered.fastq -p 16 --report-file centrifuge-hvc.txt

Command line (running rextract)

> rextract -f centrifuge-hvc.txt -i 694009 -q Run02_filtered.fastq -n ~/miniconda2/envs/recentrifuge/bin/taxdump/

centrifuge output (centrifuge-hvc.txt)

head centrifuge-hvc.txt
name	taxID	taxRank	genomeSize	numReads	numUniqueReads	abundance
Homo sapiens	9606	species	3272089205	11164	8884	0.0
Human alphaherpesvirus 2	10310	species	154675	1	1	0.0
Cercopithecine alphaherpesvirus 2	10317	species	150715	1	0	0.0
Bovine alphaherpesvirus 1	10320	species	135301	8	0	0.0
Suid alphaherpesvirus 1	10345	species	143461	1	1	0.0
Murid betaherpesvirus 1	10366	species	230278	1	1	0.0
Tupaiid betaherpesvirus 1	10397	species	195859	1	1	0.0
Ovine gammaherpesvirus 2	10398	species	135135	1	1	0.0
Human adenovirus 2	10515	leaf	35937	11	0	0.0

Error message outcome (Slurm system)

Loading NCBI nodes... OK!
Loading NCBI names... OK!
Building dict of parent to children taxa... OK!
List of taxa (and below) to be explicitly included:
		Id	Scientific Name
		694009	Severe acute respiratory syndrome-related coronavirus
Building taxonomy tree... OK!
Filtering taxa... OK!
  261 taxid selected in 2 different taxonomical levels:
  Number of different SPECIES: 1
  Number of different NO_RANK: 260
Loading output file centrifuge-hvc.txt... OK!
  Load elapsed time: 0.0314 sec
Traceback (most recent call last):
  File "/home/Staff/uqgni1/miniconda2/envs/recentrifuge/bin/rextract", line 347, in <module>
     main()
   File "/home/Staff/uqgni1/miniconda2/envs/recentrifuge/bin/rextract", line 241, in main
    print(f'  \033[90mMatching reads: \033[0m{len(records):_d} \033[90m\t'
 ZeroDivisionError: division by zero

refasplit output files with non-padded zeros

Bug report

Bug summary

refasplit output files are saved without zero-padding.

Running Recentrifuge

Command line

Any run of refasplit with more than 10 output files, e.g:

refasplit -d -i refasplit_test.fa.gz -o ~/tmp/refasplit_ -n 256 --compress

Data

Non relevant

Actual outcome

For 128 files:

refasplit_0.fa.gz
(...)
refasplit_9.fa.gz
refasplit_10.fa.gz
(...)
refasplit_99.fa.gz
refasplit_100.fa.gz
(...)
refasplit_128.fa.gz

Expected outcome

refasplit_000.fa.gz
(...)
refasplit_009.fa.gz
refasplit_010.fa.gz
(...)
refasplit_099.fa.gz
refasplit_100.fa.gz
(...)
refasplit_128.fa.gz

Versions

3.9.0 (refasplit was introduced in this release)

multiprocessing.pool.RemoteTraceback:

Hi,

I am trying to use recentrifuge and I get the following error:

`=-= recentrifuge.py =-= v0.18.3 =-= Mar 2018 =-=

Loading NCBI nodes... OK!
Loading NCBI names... OK!
Building dict of parent to children taxa... OK!

Please, wait, processing files in parallel...

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/damientully/anaconda3/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/Users/damientully/Downloads/recentrifuge-master/recentrifuge/centrifuge.py", line 102, in process_report
tree.prune(mintaxa, None, collapse, debug)
File "/Users/damientully/Downloads/recentrifuge-master/recentrifuge/trees.py", line 471, in prune
and self[tid].prune(min_taxa, min_rank, collapse, debug)):
File "/Users/damientully/Downloads/recentrifuge-master/recentrifuge/trees.py", line 471, in prune
and self[tid].prune(min_taxa, min_rank, collapse, debug)):
File "/Users/damientully/Downloads/recentrifuge-master/recentrifuge/trees.py", line 471, in prune
and self[tid].prune(min_taxa, min_rank, collapse, debug)):
[Previous line repeated 3 more times]
File "/Users/damientully/Downloads/recentrifuge-master/recentrifuge/trees.py", line 484, in prune
self.score = ((self.score * self.counts
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "recentrifuge.py", line 624, in
main()
File "recentrifuge.py", line 597, in main
read_samples()
File "recentrifuge.py", line 329, in read_samples
input_files, [r.get() for r in async_results]):
File "recentrifuge.py", line 329, in
input_files, [r.get() for r in async_results]):
File "/Users/damientully/anaconda3/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'`

OverflowError when trying passing centrifuge input test

Hello,

I tried to update the recentrifuge galaxy tool from 1.12.1 to 1.13.1 but when I tested centrifuge input, the tool failed and make and error.
Inputs from https://github.com/galaxyproject/tools-iuc/tree/main/tools/recentrifuge/test-data

test-db is a light taxa database
centrifuge folder contain workging files with 1.12 version
Command used :
rcf -n ./test-data/test-db -f rcf_test/centrifuge_1.out

Output observed :

=-= /opt/conda/envs/recentrifuge/bin/rcf =-= v1.13.1 - Jan 2024 =-= by Jose Manuel Martí =-=

Loading NCBI nodes... OK! 
Loading NCBI names... OK! 
Building dict of parent to children taxa... OK! 

Please, wait, processing files in parallel...

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/envs/recentrifuge/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/recentrifuge/lib/python3.11/site-packages/recentrifuge/taxclass.py", line 128, in process_output
    tree.allin1(ontology=ontology, counts=counts, scores=scores,
  File "/opt/conda/envs/recentrifuge/lib/python3.11/site-packages/recentrifuge/trees.py", line 276, in allin1
    child_acc: Union[int, None] = self[tid].allin1(
                                  ^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/recentrifuge/lib/python3.11/site-packages/recentrifuge/trees.py", line 276, in allin1
    child_acc: Union[int, None] = self[tid].allin1(
                                  ^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/recentrifuge/lib/python3.11/site-packages/recentrifuge/trees.py", line 276, in allin1
    child_acc: Union[int, None] = self[tid].allin1(
                                  ^^^^^^^^^^^^^^^^^
  [Previous line repeated 3 more times]
  File "/opt/conda/envs/recentrifuge/lib/python3.11/site-packages/recentrifuge/trees.py", line 305, in allin1
    update_score_and_acc(chld, child_acc)
  File "/opt/conda/envs/recentrifuge/lib/python3.11/site-packages/recentrifuge/trees.py", line 268, in update_score_and_acc
    self[tid].score = swmean(
                      ^^^^^^^
  File "/opt/conda/envs/recentrifuge/lib/python3.11/site-packages/recentrifuge/trees.py", line 263, in swmean
    (cnt1 * 10**sco1 + cnt2 * 10**sco2) / (cnt1 + cnt2)
            ~~^^~~~~
OverflowError: (34, 'Numerical result out of range')
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/recentrifuge/bin/rcf", line 928, in <module>
    main()
  File "/opt/conda/envs/recentrifuge/bin/rcf", line 887, in main
    read_samples()
  File "/opt/conda/envs/recentrifuge/bin/rcf", line 492, in read_samples
    input_files, [r.get() for r in async_results]):
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/recentrifuge/bin/rcf", line 492, in <listcomp>
    input_files, [r.get() for r in async_results]):
                  ^^^^^^^
  File "/opt/conda/envs/recentrifuge/lib/python3.11/multiprocessing/pool.py", line 774, in get
    raise self._value
OverflowError: (34, 'Numerical result out of range')

float deprecated in numpy 1.24

Bug report

Bug summary

I believe that there is a clash between latest version of numpy deprecating float to use default float and openpyxl version.

Running Recentrifuge

Command line

~/recentrifuge/rcf -c 1 -f $NEGDIR -f $S1 -f $S2 -f $S3 -x 0 -x 9606 -n $NODES -o "$EOUT/samples-out.html" -e "$EOUT/samples-out.csv" -p

Data

Centrifuge troubleshooting files - 3 samples, 1 negative control

Actual outcome

Traceback (most recent call last):
  File "recentrifuge/rcf", line 63, in <module>
    import openpyxl
  File ".pyenv/versions/3.8.3/lib/python3.8/site-packages/openpyxl/__init__.py", line 4, in <module>
    from openpyxl.compat.numbers import NUMPY, PANDAS
  File ".pyenv/versions/3.8.3/lib/python3.8/site-packages/openpyxl/compat/__init__.py", line 3, in <module>
    from .numbers import NUMERIC_TYPES
  File ".pyenv/versions/3.8.3/lib/python3.8/site-packages/openpyxl/compat/numbers.py", line 41, in <module>
    numpy.float,
  File ".pyenv/versions/3.8.3/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])

AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Expected outcome

Do I need to update openpyxl or use a container?

Versions

Operating system: Linux
Python version: 3.8.3
Recentrifuge version: 1.12.0
Release of Centrifuge: 1.0.4
Pandas version (if applicable): 1.5.3
Other libraries (if applicable): openpyxl v3.0.5

Score understanding

Hello,
thanks for developing this software, I found it really useful in my work, clinical diagnosis with shotgun metagenomics. I am running rcf with some of my RNA libraries results from kraken2.
This is my command:

rcf -n /DATA/share/microbio/taxdump -k tneg_revelo.krk -k influenza_revelo.krk -k metapn_revelo.krk -c 1 -o revelo_mintaxa500.html  -s KRAKEN  -e CSV -y 50 -m 500 -d

So I have chosen KRAKEN as score (% kmer coverage) and min score 50 (so my understanding is that I am refusing reads that are not 50 % kmer coverage) and min taxa 500 (taxa with less than 500 reads will be folded)
My question: in my stats results I see Score limit 50 but Score min 102 , score mean 227 and score max 269, how comes ? I was expecting %-like scoring as I have choses KRAKEN as score so I do not understand what these min, mean , max scores means in the stat file ... Can you shed some light into these ?

Again, thanks :)

Issue with nodes/names missing unclassified readID (0)

Bug report

Bug summary

Nodes/names file are missing an unclassified taxID (0) that is present in the troubleshooting file. I am not sure if this is normal behaviour. A simple fix for this is to edit the "-x 0" out of the command line call. After doing this, I had no issues with running recentrifuge for centrifuge.

Running Recentrifuge

Command line

~/recentrifuge/rcf -c 1 -f $NEGDIR -f $S1 -f $S2 -f $S3 -x 0 -x 9606 -n $NODES -o "$EOUT/samples-out.html" -e "CSV"

Data

Centrifuge troubleshooting files - 3 samples, 1 negative control
Data sample from centrifuge with unclassified read/taxID:
readID seqID taxID score 2ndBestScore hitLength queryLength numMatches
a32cfebc-f87b-4ccb-8468-ced0c4007f4c unclassified 0 0 0 0 334 1

Actual outcome

=-= /recentrifuge/rcf =-= v1.2.0 - Sep 2020 =-= by Jose Manuel Martí =-=

Control(s) sample(s) for subtractions:
        /mnt/usersData/DNA/analysis//sample_data/20230106_aDNA_Plain-medium-16S_0CFU_94_12//centrifuge/20230106_aDNA_Plain-medium-16S_0CFU_94_12_v_f_b2_no_host_centrifuge_troubleshooting_report.tsv
Loading NCBI nodes... OK!
Loading NCBI names... OK!
Building dict of parent to children taxa... OK!
List of taxa (and below) to be excluded:
                Id      Scientific Name
Traceback (most recent call last):
  File "/recentrifuge/rcf", line 836, in <module>
    main()
  File "/recentrifuge/rcf", line 759, in main
    ncbi: Taxonomy = Taxonomy(nodesfile, namesfile, plasmidfile,
  File "/recentrifuge/recentrifuge/taxonomy.py", line 63, in __init__
    print(f'\t\t{taxid}\t{self.names[taxid]}')
KeyError: '0'

Expected outcome

Versions

Operating system: Linux
Python version: 3.8.3
Recentrifuge version: 1.12.0
Release of Centrifuge: 1.0.4
Pandas version (if applicable): 1.5.3
Other libraries (if applicable): openpyxl v3.1.x

Compatibility with Ganon?

Hi
I'm using Ganon (https://github.com/pirovc/ganon) to classify reads, and this program seems to be gaining popularity. As such, I'm wondering a) if recentrifuge will be updated to use the results of ganon directly, as centrifuge, clark etc? and b) if not, which output from ganon would be most compatable as an input to recentrifuge?