Giter Club home page Giter Club logo

Comments (2)

simoncchu avatar simoncchu commented on July 30, 2024

For your first question, it doesn't matter whether you set -M or -Y option or not. xTea will re-do the alignment for the clipped reads (primary alignments only).
For the second question, you need to set some slack value when comparing the breakpoints. Because for one insertion, there are actually two breakpoints (there is a target site duplication between them, usually short). Tools (including xTea) only report one breakpoint, but depends on settings, for one insertion, maybe different breakpoints are reported for the same insertion. Thus, you cannot require exactly the same. And, most of the time, distance between two insertions are much larger than the value you set, thus it will not affect the comparison results much.

from xtea.

MarcelloMalpighi avatar MarcelloMalpighi commented on July 30, 2024

Thanks a lot !
But as for the first question, I got different results when using bwa -Y. The codes and results are as follows.
bwa code without -Y
@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -M -R @RG\tID:SRR1264615\tLB:SRR1264615\tPL:Illumina\tPU:SRR1264615\tSM:SRR1264615 -t 20...
xtea result without -Y
awk '{print $1}' SRR1264615_sorted_ALU.vcf | uniq -c | tail -23
110 chr1 129 chr2 83 chr3 88 chr4 85 chr5 110 chr6 67 chr7 73 chr8 62 chr9 72 chr10 64 chr11 79 chr12 62 chr13 54 chr14 41 chr15 23 chr16 22 chr17 38 chr18 14 chr19 19 chr20 12 chr21 3 chr22 23 chrX

bwa code with -Y
@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -M -Y -t 20 -R @RG\tID:SRR1264615\tLB:SRR1264615\tPL:Illumina\tPU:SRR1264615\tSM:SRR1264615...
xtea result with -Y
awk '{print $1}' test_Y_SRR1264615_sorted_ALU.vcf | uniq -c | tail -23
108 chr1 126 chr2 80 chr3 82 chr4 82 chr5 108 chr6 66 chr7 72 chr8 63 chr9 72 chr10 63 chr11 72 chr12 58 chr13 52 chr14 37 chr15 22 chr16 21 chr17 39 chr18 14 chr19 18 chr20 13 chr21 4 chr22 21 chrX

I also want to extract the clipped and discordant reads from tmp/cns/temp_clip.sam and tmp/cns/temp_disc.sam just as #36 referred, but I could not fully understand these two files. What is the meaning of numbers between ~, i.e., ~R~1~ ~1~0~0~1~1~ in chr1~890330~R~1~890451~1~0(from temp_clip.sam) and SRR1264615.423066703~1~0~0~1~1~31020097~chr12~31020266~0(from temp_disc.sam)? How can I get the complete sequence of clipped reads? (They are incomplete in temp_clip.sam and identifiers are not given)

from xtea.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.