Giter Club home page Giter Club logo

Comments (6)

tseemann avatar tseemann commented on July 21, 2024 10

This can be done in Unix like this:

cat in.fq | paste - - - - | grep -v -f -F name.lst | tr "\t" "\n" > out.fq

See my blog post for more details: http://thegenomefactory.blogspot.com.au/2012/05/cool-use-of-unix-paste-with-ngs.html

from seqtk.

conchoecia avatar conchoecia commented on July 21, 2024 1

For fasta: I think a legal fasta is char wrapped, so this is safer until seqtk has a -v option.

bioawk -cfastx '{printf(">%s\t%s\n", $name, $seq)}' in.fa | \
  grep -v -f name.lst | tr "\t" "\n" > out.fa

from seqtk.

edwwlui avatar edwwlui commented on July 21, 2024 1

This can be done in Unix like this:

cat in.fq | paste - - - - | grep -v -f -F name.lst | tr "\t" "\n" > out.fq

See my blog post for more details: http://thegenomefactory.blogspot.com.au/2012/05/cool-use-of-unix-paste-with-ngs.html

Great solution! Btw I am suggesting an edit to be cat in.fq | paste - - - - | grep -v -F -f name.lst | tr "\t" "\n" > out.fq since grep -f -F might throw an error that -F is treated as a file after -f.

from seqtk.

deprekate avatar deprekate commented on July 21, 2024

That is a cool trick. I always seem to find new ways to use paste. But as a fix I added a fork, we will see if it gets accepted. It would be valueable, as the highest rated method is to use a QIIME script, which requires installing the whole QIIME package. I just like the ability to do the same/opposite task with the same tool:

seqtk subseq in.fa good.list
seqtk exclude in.fa bad.list

I chose exclude over remove since remove is ambiguous, technically speaking subseq is removing (it removes the reads in name.lst from the in.fa). Which is why I would switch the subseq flag to extract, as the term is more apt (the term subseq is more descriptive of splicing a read). But that changes the behavior of the original code, and most developer do not like changing already established functionality.

from seqtk.

tseemann avatar tseemann commented on July 21, 2024

The traditional way in Unix to do these two types of subsets is to have a single command, but have a -v flag to "invert" the behaviour:

seqtk subset in.fq good.list > good.fq
seqtk subset in.fq bad.list > bad.fq

And the inverse:

seqtk subset -v in.fq good.list > bad.fq
seqtk subset -v in.fq bad.list > good.fq

It would be good if samtools faidx had a -v option.

from seqtk.

y9c avatar y9c commented on July 21, 2024

There is a -M (mask) argument in seqtk seq command, but the reads are soft masked, not being removed from the file.

from seqtk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.