Comments (12)
I had faced a similar issue. I think the MC flag is not written in mem_aln2sam. You can add it back using https://github.com/lh3/bwa/blob/master/bwamem.c#L908
from bwa-mem2.
@arun-sub, thank you! There is no such line after https://github.com/bwa-mem2/bwa-mem2/blob/master/src/bwamem.cpp#L1565
@yuk12, could you fix it, please?
from bwa-mem2.
No. The outputs should match exactly.
Can you please provide the input data to reproduce the issue?
from bwa-mem2.
We used test files from here:
https://github.com/hartwigmedical/testdata/tree/master/100k_reads_hiseq/TESTX
We're aligning lanes separately and merging to sam after with Picard's MergeSamFiles
.
If you need our test results from these inputs - we can provide them, their size is 35 MB.
from bwa-mem2.
While we are here, @yuk12, do we need both .bwt.2bit.64
and .bwt.8bit.32
files for aligning? Their size is near 40 GB each, and we use a lot of networks for reference transfer - so can we use only one, or we need both?
from bwa-mem2.
I think the .bwt.2bit.64
index file is used only for systems without AVX2, but @yuk12 can confirm. You can check out:
Line 41 in 7798a1e
from bwa-mem2.
2bit..64 is required for SSE4.1 machine. Else for AVX machines, it uses 8bit.32.
from bwa-mem2.
Yes, the MC CIGAR tag is missing. The output of BWA-MEM2 is identical to bwa version 0.7.15. As far as I know, that version of bwa also does not have this CIGAR tag. It was added in version 0.7.17. We will try to incorporate it.
from bwa-mem2.
@serge2016 likely you already know this, but in the meantime you could use one of the following tools to add the mate cigar after alignment (you can pipe bwa
output into these tools):
- fgbio's
SetMateInformation
- Picard's
MergeBamAlignment
,FixMateInformation
, orRevertOriginalBaseQualitiesAndAddMateCigar OQ=false
Likely there are others, but these are some alternative.
from bwa-mem2.
@nh13, thank you! Yes, I do. But it's time-consuming, so I hope to see the fix in bwa-mem2
soon!
from bwa-mem2.
@nh13 Thanks for your suggestion, but these tools also add MQ:
field which is undesired for now, because output from bwa
doesn't contain it.
I've manually edited bwamem.cpp, but have some problems:
- If I only add this add_cigar function:
static inline void add_cigar(const mem_opt_t *opt, mem_aln_t *p, kstring_t *str, int which)
{
int i;
if (p->n_cigar) { // aligned
for (i = 0; i < p->n_cigar; ++i) {
int c = p->cigar[i]&0xf;
if (!(opt->flag&MEM_F_SOFTCLIP) && !p->is_alt && (c == 3 || c == 4))
c = which? 4 : 3; // use hard clipping for supplementary alignments
kputw(p->cigar[i]>>4, str); kputc("MIDSH"[c], str);
}
} else kputc('*', str); // having a coordinate but unaligned (e.g. when copy_mate is true)
}
and this line:
if (m && m->n_cigar) { kputsn("\tMC:Z:", 6, str); add_cigar(opt, m, str, which); }
everything works fine.
- But if I change old code:
if (p->rid >= 0) { // with coordinate
kputs(bns->anns[p->rid].name, str); kputc('\t', str); // RNAME
kputl(p->pos + 1, str); kputc('\t', str); // POS
kputw(p->mapq, str); kputc('\t', str); // MAPQ
if (p->n_cigar) { // aligned
for (i = 0; i < p->n_cigar; ++i) {
int c = p->cigar[i]&0xf;
if (!(opt->flag&MEM_F_SOFTCLIP) && !p->is_alt && (c == 3 || c == 4))
c = which? 4 : 3; // use hard clipping for supplementary alignments
kputw(p->cigar[i]>>4, str); kputc("MIDSH"[c], str);
}
} else kputc('*', str); // having a coordinate but unaligned (e.g. when copy_mate is true)
} else kputsn("*\t0\t0\t*", 7, str); // without coordinte
kputc('\t', str);
to this (like it was changed in bwa 0.7.15 -> 0.7.17)
if (p->rid >= 0) { // with coordinate
kputs(bns->anns[p->rid].name, str); kputc('\t', str); // RNAME
kputl(p->pos + 1, str); kputc('\t', str); // POS
kputw(p->mapq, str); kputc('\t', str); // MAPQ
add_cigar(opt, p, str, which);
} else kputsn("*\t0\t0\t*", 7, str); // without coordinte
kputc('\t', str);
for some reason I got differences in CIGAR string in resulting files.
Before change (same as bwa 0.7.17
):
HISEQ_HU01:89:H7YRLADXX:1:1101:1332:5389 69 chr17 57913232 0 * = 57913232 0 TGACCTTCAGCCACAAGAGAGAGATTAACTGAGGCAAAAAACTGGAATACATGCTAAAATATTATCATTCAGTGTTAGTATCTAGAACTACTTCTCTTTTT <@CFFFFFHHHHHJJIJIIEHGIEIGGHGICIGIJJIJJIIJJIGIHIIJJIIJGIGIIJJIJIHHHHHGHGFDBDFF@CFEEEDCDDDDDDCDDDDDCDD MC:Z:18M1D83M AS:i:0 XS:i:0 RG:Z:H7YRLADXX_L001
After change:
HISEQ_HU01:89:H7YRLADXX:1:1101:1332:5389 69 chr17 57913232 0 I = 57913232 0 TGACCTTCAGCCACAAGAGAGAGATTAACTGAGGCAAAAAACTGGAATACATGCTAAAATATTATCATTCAGTGTTAGTATCTAGAACTACTTCTCTTTTT <@CFFFFFHHHHHJJIJIIEHGIEIGGHGICIGIJJIJJIIJJIGIHIIJJIIJGIGIIJJIJIHHHHHGHGFDBDFF@CFEEEDCDDDDDDCDDDDDCDD MC:Z:18M1D83M AS:i:0 XS:i:0 RG:Z:H7YRLADXX_L001
As you can see difference is *
and I
in CIGAR strings. And I don't see any reason for that.
from bwa-mem2.
Added support for MC flag in the latest commit.
from bwa-mem2.
Related Issues (20)
- Does bwa-mem2 preserve a read order from fastq file? HOT 1
- Precompiled AVX512 binary not working for Zen 4 Genoa HOT 7
- Allocation of 40.69 GB for suffix_array failed. HOT 4
- Mapping quality: bad and normal HOT 1
- Index building error
- Read group tag RG:Z for every read is missing HOT 2
- Question regarding the mismatch penalty -B
- Update paths in bwa-mem2-lisa
- Tag New Release HOT 6
- Difference in mapping quality and alignment for bwa mem2 compared to bwa mem HOT 5
- Is bwa really haplotype aware in non-human?
- error while index building HOT 1
- BWA-MEM2-LISA crashed with code 139 HOT 3
- [Question] How to install bwa-mem2-lisa
- Problems creating index with bwa-mem2 HOT 1
- MacOS M2 chip support
- Segmentation fault with valle-inclan dataset - bwa-mem2 2.2.1 avx512bw HOT 1
- bwa and bwa-mem2 relative indexing memory usage HOT 2
- Segmentation fault using bwa-mem2:2.2.1--hd03093a_5 HOT 7
- Add option to align multiple .fastq files at once? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bwa-mem2.