Giter Club home page Giter Club logo

filter_classified_reads's People

Contributors

peterk87 avatar sourcery-ai-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

sourcery-ai-bot

filter_classified_reads's Issues

Kraken2 paired read mode results cannot be parsed

  • filter_classified_reads version: 0.1.0
  • Python version: 3.6.7
  • Operating System: Ubuntu 16.04

Description

filter_classified_reads cannot parse Kraken2 results from paired-end reads with the --paired flag.

What I Did

Kraken2 was run with paired-end reads with the --paired flag producing classification results that could not be parsed by filter_classified_reads:

$ filter_classified_reads -i reads_1.fastp.fastq.gz -I reads_2.fastp.fastq.gz \
    -o reads_1.viral_unclassified.fastq -O reads_2.viral_unclassified.fastq \
    -c reads-centrifuge_results.tsv -C reads-kreport.tsv \
    -k reads-kraken2_results.tsv -K reads-kraken2_report.tsv \
    --taxids 10239
...
2019-09-05 13:52:49,389 INFO: Parsing kraken2 results into DataFrame [i
n target_classified_reads.py:49]
Traceback (most recent call last):
  File "pandas/_libs/parsers.pyx", line 1191, in pandas._libs.parsers.T
extReader._convert_tokens
TypeError: Cannot cast array from dtype('O') to dtype('uint16') accordi
ng to the rule 'safe'

Snippet from Kraken2 classification results output file:

U	M04594:80:000000000-G37TN:1:1101:16267:2330	0	149|151	0:115 |:| 0:117
U	M04594:80:000000000-G37TN:1:1101:11949:2338	0	150|151	0:116 |:| 0:117
C	M04594:80:000000000-G37TN:1:1101:14888:2339	9606	151|151	0:52 9606:1 0:64 |:| 0:117

filter_classified_reads is expecting certain data types within certain fields like uint16 in the sequence length field (4th field).

Add CLI command for getting taxids for species with greater than X specific reads

  • Add CLI opt for minimum number of reads specific to species taxid. Only species taxids that have N or greater reads classified to that taxid will be output.
  • Allow specification of base taxids to look under
    • e.g. if interested in finding viruses then look under taxid=10239
  • Allow exclusion of certain child taxids under base taxids
    • e.g. Exclude taxid children of Caudovirales (rank=order, taxid=28883) to ignore bacteriophages if only interested in eukaryotic viruses

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.