Giter Club home page Giter Club logo

Comments (12)

arun-sub avatar arun-sub commented on May 29, 2024 1

I had faced a similar issue. I think the MC flag is not written in mem_aln2sam. You can add it back using https://github.com/lh3/bwa/blob/master/bwamem.c#L908

from bwa-mem2.

serge2016 avatar serge2016 commented on May 29, 2024 1

@arun-sub, thank you! There is no such line after https://github.com/bwa-mem2/bwa-mem2/blob/master/src/bwamem.cpp#L1565
@yuk12, could you fix it, please?

from bwa-mem2.

yuk12 avatar yuk12 commented on May 29, 2024

No. The outputs should match exactly.
Can you please provide the input data to reproduce the issue?

from bwa-mem2.

Stikus avatar Stikus commented on May 29, 2024

We used test files from here:
https://github.com/hartwigmedical/testdata/tree/master/100k_reads_hiseq/TESTX

We're aligning lanes separately and merging to sam after with Picard's MergeSamFiles.

If you need our test results from these inputs - we can provide them, their size is 35 MB.

from bwa-mem2.

Stikus avatar Stikus commented on May 29, 2024

While we are here, @yuk12, do we need both .bwt.2bit.64 and .bwt.8bit.32 files for aligning? Their size is near 40 GB each, and we use a lot of networks for reference transfer - so can we use only one, or we need both?

from bwa-mem2.

arun-sub avatar arun-sub commented on May 29, 2024

I think the .bwt.2bit.64 index file is used only for systems without AVX2, but @yuk12 can confirm. You can check out:

#if ((!__AVX2__))

from bwa-mem2.

yuk12 avatar yuk12 commented on May 29, 2024

2bit..64 is required for SSE4.1 machine. Else for AVX machines, it uses 8bit.32.

from bwa-mem2.

sanchit-misra avatar sanchit-misra commented on May 29, 2024

Yes, the MC CIGAR tag is missing. The output of BWA-MEM2 is identical to bwa version 0.7.15. As far as I know, that version of bwa also does not have this CIGAR tag. It was added in version 0.7.17. We will try to incorporate it.

from bwa-mem2.

nh13 avatar nh13 commented on May 29, 2024

@serge2016 likely you already know this, but in the meantime you could use one of the following tools to add the mate cigar after alignment (you can pipe bwa output into these tools):

  • fgbio's SetMateInformation
  • Picard's MergeBamAlignment, FixMateInformation, orRevertOriginalBaseQualitiesAndAddMateCigar OQ=false

Likely there are others, but these are some alternative.

from bwa-mem2.

serge2016 avatar serge2016 commented on May 29, 2024

@nh13, thank you! Yes, I do. But it's time-consuming, so I hope to see the fix in bwa-mem2 soon!

from bwa-mem2.

Stikus avatar Stikus commented on May 29, 2024

@nh13 Thanks for your suggestion, but these tools also add MQ: field which is undesired for now, because output from bwa doesn't contain it.

I've manually edited bwamem.cpp, but have some problems:

static inline void add_cigar(const mem_opt_t *opt, mem_aln_t *p, kstring_t *str, int which)
{
	int i;
	if (p->n_cigar) { // aligned
		for (i = 0; i < p->n_cigar; ++i) {
			int c = p->cigar[i]&0xf;
			if (!(opt->flag&MEM_F_SOFTCLIP) && !p->is_alt && (c == 3 || c == 4))
				c = which? 4 : 3; // use hard clipping for supplementary alignments
			kputw(p->cigar[i]>>4, str); kputc("MIDSH"[c], str);
		}
	} else kputc('*', str); // having a coordinate but unaligned (e.g. when copy_mate is true)
}

and this line:

if (m && m->n_cigar) { kputsn("\tMC:Z:", 6, str); add_cigar(opt, m, str, which); }

everything works fine.

  • But if I change old code:
	if (p->rid >= 0) { // with coordinate
		kputs(bns->anns[p->rid].name, str); kputc('\t', str); // RNAME
		kputl(p->pos + 1, str); kputc('\t', str); // POS
		kputw(p->mapq, str); kputc('\t', str); // MAPQ
		if (p->n_cigar) { // aligned
			for (i = 0; i < p->n_cigar; ++i) {
				int c = p->cigar[i]&0xf;
				if (!(opt->flag&MEM_F_SOFTCLIP) && !p->is_alt && (c == 3 || c == 4))
					c = which? 4 : 3; // use hard clipping for supplementary alignments
				kputw(p->cigar[i]>>4, str); kputc("MIDSH"[c], str);
			}
		} else kputc('*', str); // having a coordinate but unaligned (e.g. when copy_mate is true)
	} else kputsn("*\t0\t0\t*", 7, str); // without coordinte
	kputc('\t', str);

to this (like it was changed in bwa 0.7.15 -> 0.7.17)

	if (p->rid >= 0) { // with coordinate
		kputs(bns->anns[p->rid].name, str); kputc('\t', str); // RNAME
		kputl(p->pos + 1, str); kputc('\t', str); // POS
		kputw(p->mapq, str); kputc('\t', str); // MAPQ
		add_cigar(opt, p, str, which);
	} else kputsn("*\t0\t0\t*", 7, str); // without coordinte
	kputc('\t', str);

for some reason I got differences in CIGAR string in resulting files.

Before change (same as bwa 0.7.17):

HISEQ_HU01:89:H7YRLADXX:1:1101:1332:5389	69	chr17	57913232	0	*	=	57913232	0	TGACCTTCAGCCACAAGAGAGAGATTAACTGAGGCAAAAAACTGGAATACATGCTAAAATATTATCATTCAGTGTTAGTATCTAGAACTACTTCTCTTTTT	<@CFFFFFHHHHHJJIJIIEHGIEIGGHGICIGIJJIJJIIJJIGIHIIJJIIJGIGIIJJIJIHHHHHGHGFDBDFF@CFEEEDCDDDDDDCDDDDDCDD	MC:Z:18M1D83M	AS:i:0	XS:i:0	RG:Z:H7YRLADXX_L001

After change:

HISEQ_HU01:89:H7YRLADXX:1:1101:1332:5389	69	chr17	57913232	0	I	=	57913232	0	TGACCTTCAGCCACAAGAGAGAGATTAACTGAGGCAAAAAACTGGAATACATGCTAAAATATTATCATTCAGTGTTAGTATCTAGAACTACTTCTCTTTTT	<@CFFFFFHHHHHJJIJIIEHGIEIGGHGICIGIJJIJJIIJJIGIHIIJJIIJGIGIIJJIJIHHHHHGHGFDBDFF@CFEEEDCDDDDDDCDDDDDCDD	MC:Z:18M1D83M	AS:i:0	XS:i:0	RG:Z:H7YRLADXX_L001

As you can see difference is * and I in CIGAR strings. And I don't see any reason for that.

from bwa-mem2.

yuk12 avatar yuk12 commented on May 29, 2024

Added support for MC flag in the latest commit.

from bwa-mem2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.