Giter Club home page Giter Club logo

bwa-mem2's Introduction

GitHub Downloads BioConda Install

Important Information

We are happy to announce that the index size on disk is down by 8 times and in memory by 4 times due to moving to only one type of FM-index (2bit.64 instead of 2bit.64 and 8bit.32) and 8x compression of suffix array. For example, for human genome, index size on disk is down to ~10GB from ~80GB and memory footprint is down to ~10GB from ~40GB. There is a substantial reduction in index IO time due to the reduction and hardly any performance impact on read mapping. Due to this change in index structure (in commit #4b59796, 10th October 2020), you will need to rebuild the index.

Added MC flag in the output sam file in commit a591e22. Output should match original bwa-mem version 0.7.17.

As of commit e0ac59e, we have a git submodule safestringlib. To get it, use --recursive while cloning or use "git submodule init" and "git submodule update" in an already cloned repository (See below for more details).

Getting Started

# Use precompiled binaries (recommended)
curl -L https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.2.1/bwa-mem2-2.2.1_x64-linux.tar.bz2 \
  | tar jxf -
bwa-mem2-2.2.1_x64-linux/bwa-mem2 index ref.fa
bwa-mem2-2.2.1_x64-linux/bwa-mem2 mem ref.fa read1.fq read2.fq > out.sam

# Compile from source (not recommended for general users)
# Get the source
git clone --recursive https://github.com/bwa-mem2/bwa-mem2
cd bwa-mem2
# Or
git clone https://github.com/bwa-mem2/bwa-mem2
cd bwa-mem2
git submodule init
git submodule update
# Compile and run
make
./bwa-mem2

Introduction

The tool bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine.

The original bwa was developed by Heng Li (@lh3). Performance enhancement in bwa-mem2 was primarily done by Vasimuddin Md (@yuk12) and Sanchit Misra (@sanchit-misra) from Parallel Computing Lab, Intel. bwa-mem2 is distributed under the MIT license.

Installation

For general users, it is recommended to use the precompiled binaries from the release page. These binaries were compiled with the Intel compiler and runs faster than gcc-compiled binaries. The precompiled binaries also indirectly support CPU dispatch. The bwa-mem2 binary can automatically choose the most efficient implementation based on the SIMD instruction set available on the running machine. Precompiled binaries were generated on a CentOS7 machine using the following command line:

make CXX=icpc multi

Usage

The usage is exactly same as the original BWA MEM tool. Here is a brief synopsys. Run ./bwa-mem2 for available commands.

# Indexing the reference sequence (Requires 28N GB memory where N is the size of the reference sequence).
./bwa-mem2 index [-p prefix] <in.fasta>
Where 
<in.fasta> is the path to reference sequence fasta file and 
<prefix> is the prefix of the names of the files that store the resultant index. Default is in.fasta.

# Mapping 
# Run "./bwa-mem2 mem" to get all options
./bwa-mem2 mem -t <num_threads> <prefix> <reads.fq/fa> > out.sam
Where <prefix> is the prefix specified when creating the index or the path to the reference fasta file in case no prefix was provided.

Performance

Datasets:
Reference Genome: human_g1k_v37.fasta

Alias Dataset source No. of reads Read length
D1 Broad Institute 2 x 2.5M bp 151bp
D2 SRA: SRR7733443 2 x 2.5M bp 151bp
D3 SRA: SRR9932168 2 x 2.5M bp 151bp
D4 SRA: SRX6999918 2 x 2.5M bp 151bp

Machine details:
Processor: Intel(R) Xeon(R) 8280 CPU @ 2.70GHz
OS: CentOS Linux release 7.6.1810
Memory: 100GB

We followed the steps below to collect the performance results:
A. Data download steps:

  1. Download SRA toolkit from https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software#header-global
  2. tar xfzv sratoolkit.2.10.5-centos_linux64.tar.gz
  3. Download D2: sratoolkit.2.10.5-centos_linux64/bin/fastq-dump --split-files SRR7733443
  4. Download D3: sratoolkit.2.10.5-centos_linux64/bin/fastq-dump --split-files SRR9932168
  5. Download D4: sratoolkit.2.10.5-centos_linux64/bin/fastq-dump --split-files SRX6999918

B. Alignment steps:

  1. git clone https://github.com/bwa-mem2/bwa-mem2.git
  2. cd bwa-mem2
  3. make CXX=icpc (using intel C/C++ compiler)
    or make (using gcc compiler)
  4. ./bwa-mem2 index <ref.fa>
  5. ./bwa-mem2 mem [-t <#threads>] <ref.fa> <in_1.fastq> [<in_2.fastq>] > <output.sam>

For example, in our double socket (56 threads each) and double numa compute node, we used the following command line to align D2 to human_g1k_v37.fasta reference genome.

numactl -m 0 -C 0-27,56-83 ./bwa-mem2 index human_g1k_v37.fasta  
numactl -m 0 -C 0-27,56-83 ./bwa-mem2 mem -t 56 human_g1k_v37.fasta SRR7733443_1.fastq SRR7733443_2.fastq > d2_align.sam





bwa-mem2 seeding phase accelerated using LISA (Learned-Indexes for Sequence Analysis)

bwa-mem2-lisa is an accelerated version of bwa-mem2 where we apply learned-indexes to the seeding phase. bwa-mem2-lisa branch contains the source code of the implementation. Following are the features of bwa-mem2-lisa:

  1. Exact same output as bwa-mem2.
  2. All command-lines for creating an index and the read mapping are exactly same as bwa-mem2.
  3. bwa-mem2-lisa accelerates seeding phase (one of the major bottlenecks in bwa-mem2) by up to 4.5x compared to bwa-mem2.
  4. The memory footprint of bwa-mem2-lisa index is ~120GB for human genome.
  5. The code is present in bwa-mem2-lisa branch: https://github.com/bwa-mem2/bwa-mem2/tree/bwa-mem2-lisa

bwa-mem2 seeding speedup with Enumerated Radix Trees (Code in ert branch)

The ert branch of bwa-mem2 repository contains codebase of enuerated radix tree based acceleration of bwa-mem2. The ert code is built on the top of bwa-mem2 (thanks to the hard work by @arun-sub). The following are the highlights of the ert based bwa-mem2 tool:

  1. Exact same output as bwa-mem(2)
  2. The tool has two additional flags to enable the use of ert solution (for index creation and mapping), else it runs in vanilla bwa-mem2 mode
  3. It uses 1 additional flag to create ert index (different from bwa-mem2 index) and 1 additional flag for using that ert index (please see the readme of ert branch)
  4. The ert solution is 10% - 30% faster (tested on above machine configuration) in comparison to vanilla bwa-mem2 -- users are adviced to use option -K 1000000 to see the speedups
  5. The memory foot print of the ert index is ~60GB
  6. The code is present in ert branch: https://github.com/bwa-mem2/bwa-mem2/tree/ert

Citation

Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019. 10.1109/IPDPS.2019.00041

bwa-mem2's People

Contributors

arun-sub avatar edharry avatar hyphaltip avatar klmr avatar lh3 avatar mp15 avatar mr-c avatar sanchit-misra avatar saurabhkalikar avatar teepean avatar yuk12 avatar zamaudio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bwa-mem2's Issues

Support for Bam output

Would be nice to have as to not require pipes to samtools in order to generate the file.

bwa-mem2 always returns status code 1

bwa-mem2 reproducibly returns the status code 1, which causes scripts and Makefiles to assume failure.

To reproduce:

wget http://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz
bwa-mem2 index Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz
echo $?
# 1

Same when running on a gunzip-ped FASTA file instead.

Or, using human (sorry):

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
bwa-mem2 index hs37d5.fa.gz
echo $?
# 1

The same is true when invoking bwa-mem2 mem.

bwa-mem2 version
# 2.0pre1
uname -srvio
# Linux 4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019 x86_64 GNU/Linux
lsb_release -a
# No LSB modules are available.
# Distributor ID: Ubuntu
# Description:    Ubuntu 18.04.2 LTS
# Release:        18.04
# Codename:       bionic

segmentation fault (core dumped) error and difference with bwa-mem and bwa-mem2

pre-information is below
computer spec = Ubuntu, 80core, 760GB ram
bwa version = 2.0pre1
testfiles = 1.7G, 1.7G (paired)
Ref genome = GRCh38

cmd doesn't work.
bwa-mem2 mem -t 75 ref/grch38.fa ../test1.fastq.gz ../test2.fastq.gz > out.sam


-----------------------------
Executing in AVX2 mode!!
-----------------------------
Ref file: ref/grch38.fa
Entering FMI_search
reference seq len = 6176572803
count
0,      1
1,      1811087986
2,      3088286402
3,      4365484818
4,      6176572803

Reading other elements of the index from files ref/grch38.fa
prefix: ref/grch38.fa
[M::bwa_idx_load_ele] read 0 ALT contigs
Done reading Index!!
Reading reference genome..
Binary seq file = ref/grch38.fa.0123
Reference genome size: 6176572802 bp
Done readng reference genome !!

readLen: 151

[0000] 1: Calling process()

Threads used (compute): 75
Projected #read in a task: 4966897
------------------------------------------
Memory pre-allocation for chaining: 159218850232
Memory pre-allocation for BSW: 17971276800
Memory pre-allocation for BWT: 4638883200
------------------------------------------
No. of pipeline threads: 2
[0000] read_chunk: 750000000, work_chunk_size: 750000088, nseq: 4966888
        [0000][ M::kt_pipeline] read 4966888 sequences (750000088 bp)...
[0000] 2. Calling mem_process_seqs.., task: 0
[0000] 3. Calling kt_for - worker_bwt
Segmentation fault (core dumped)

but it did work well
bwa-mem2 mem -t 40 ref/grch38.fa ../test1.fastq.gz ../test2.fastq.gz > out.sam (20G)

also 'bwa-mem' works too. but file size with made from 'bwa-mem' is 26G. Is it just difference from algorithm? is it negligible?

Thanks

Compilation error using gcc

src/bwtbuild.cpp: In function โ€˜int build_index(const char*)โ€™:
src/bwtbuild.cpp:710:37: error: expected โ€˜)โ€™ before โ€˜PRIu64โ€™
  710 |     fprintf(stderr, "init ticks = %" PRIu64 "\n", __rdtsc() - startTick);
      |            ~                        ^~~~~~~
      |                                     )
src/bwtbuild.cpp:742:43: error: expected โ€˜)โ€™ before โ€˜PRIu64โ€™
  742 |     fprintf(stderr, "binary seq ticks = %" PRIu64 "\n", __rdtsc() - startTick);
      |            ~                              ^~~~~~~
      |                                           )
src/bwtbuild.cpp:752:44: error: expected โ€˜)โ€™ before โ€˜PRIu64โ€™
  752 |     fprintf(stderr, "build index ticks = %" PRIu64 "\n", __rdtsc() - startTick);
      |            ~                               ^~~~~~~
      |                                            )
make: *** [Makefile:74: src/bwtbuild.o] Error 1

Add the __STDC_FORMAT_MACROS before inttypes.h in src/bwtbuild.cpp fixed the issue:

#define __STDC_FORMAT_MACROS 1

#include <inttypes.h>

Potential memory leak?

I've been running some tests on a set of 30x WGS samples. I have been trying to run the sample on a c5d.9xlarge on AWS which has ~69 GB free RAM when I launch (~2 GB per core). So far, all of my attempts have failed to complete the mapping. The log doesn't contain any OOM errors but I have made the following observations:

  1. The run will complete without issue on a m5d.12xlarge which has ~190 GB of RAM (~4 GB per core).
  2. On the c5d.9xlarge, when mapping starts, it is initially using ~59 GB of RAM. Every few minutes, the RAM usage increases monotonically by about 1 GB. Towards the end of the run it is very close to the 69 GB total free - leading me to believe it is terminating because it runs out of RAM.

I then ran Intel's Inspector tool on the executable to look for potential memory leaks. The tool identified a few potential issues:

  1. In bwamem.cpp line 397, there is a calloc call and the resulting buffer is assigned to c->seeds. However, it doesn't appear, at least in this location, that any previous buffer was freed.
  2. In bwamem.cpp line 400 there is a realloc call that is then assigned to c->seeds. Realloc should free the buffer it's trying to reallocate if needed but...
  3. ... on my test run of just around 1,000,000 reads, Intel Inspector lists potential memory leads from the calloc call 73 times and the from the realloc call 66 times affecting 15,039,104 bytes.

I tried the naive thing of just checking if c->seed is not NULL and if it's not, calling
free(c->seed)
just before we set c->seed to the new auxSeedBuf buffer, right around line 398, but it looks like that's not quite right since I get an error message about free being called against an invalid pointer.

In case it is helpful, the results of the Intel Inspector run can be found in the tar file here:
https://dl.dnanex.us/F/D/Qbk7Jz49kB51828xbYgB9KQGXVx4gK71VGvzBvqv/bwa_intel_inspector_project.tar.gz

support for uBam input?

Hi - I know the team isn't currently focused on adding features, but it would be great to add support for uBam input. This is increasingly the recommended workflow (e.g. by GATK/Picard team) since uBams can store metadata e.g. UMIs that are not easily stored or parsed in fastqs, and is increasingly supported by other popular aligners e.g. Novoalign and Bowtie 2.

bwamem2 as a library and linking to own project

Hi,

I was wondering how one might go about using bwa-mem2 as a library in the same way you can in the original bwa;

https://github.com/lh3/bwa/blob/master/example.c

or is there now another way to interface our own applications with bwa-mem2?

I've had some success creating an archive of objects and linking to a basic program however I'm struggling to work out how to load an index and create a trivial example similar to the example in bwa.

$ ar -csr libbwa.a utils.o kthread.o kstring.o ksw.o bwa.o bwamem.o bwamem_pair.o bwamem_extra.o bwtbuild.o bwtindex.o bntseq.o
$ cat makefile
CC = g++ -std=gnu++14 
LIB = -lpthread -lz -lm -llzma -lbz2 -lrt
INC = -I./bwa-mem2/src
LD =  bwa-mem2/src/libbwa.a

all : hello

hello : hello.cpp
	$(CC) -g -O3 -fpermissive $^ $(LD) $(INC) $(LIB) -o $@

clean:
	rm hello

A trivial example that builds the index, however I can't then access functions such as bwa_idx_load_bwt because they are not being compiled

#if 0

$ cat hello.cpp
#include <stdio.h>
#include <zlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>
#include "bwa.h"
#include "bwamem.h"
#include "kseq.h" // for the FASTA/Q parser
KSEQ_DECLARE(gzFile)

int main(int argc, char *argv[])
{
	//bwaidx_t *idx;
	gzFile fp;
	kseq_t *ks;
	mem_opt_t *opt;

	if (argc < 2) {
		fprintf(stderr, "Usage: hello <idx.base>\n");
		return 1;
	}

        bwa_idx_build(argv[1], argv[1]);

        free(opt);
        kseq_destroy(ks);
        gzclose(fp);

	return 0;
}

I realise that I'm probably attempting to access the bwa-mem2 API incorrectly to build the index, I'm just not clear how this should be done. Any help would be greatly appreciated.

the sam file cannot be recognized

nohup ./bwa-mem2
the sam file cannot be recognized when I use samtools to convert to bam file

[W::sam_read1] Parse error at line 1 [main_samview] truncated file.

bwa-mem2: Assertion `ret->n_seqs <= nreads' failed.

Hi I tried to align reads to a genome with 2 small fragments but when I start bwa-mem2 I get:

Reference genome size: 526 bp
Done readng reference genome !!

readLen: 69

[0000] 1: Calling process()

Threads used (compute): 40
Projected #read in a task: 5797111

Memory pre-allocation for chaining: 185832190216
Memory pre-allocation for BSW: 9584680960
Memory pre-allocation for BWT: 1130583040

No. of pipeline threads: 2
bwa-mem2: src/fastmap.cpp:253: ktp_data_t *kt_pipeline(void *, int, void *, mem_opt_t *, worker_t &): Assertion `ret->n_seqs <= nreads' failed.

Sam header missing newline between user provided @RG line and @PG line

I'm specifying read group information at the command line and am noticing that in the SAM header output, there is no new line inserted between my read group information line and the following @pg line. Looking at the code, I see that there appears to be a #if statement that would lead to a path of code that would include the newline and another path that results in no newline:

bwa-mem2/src/bwa.cpp

Lines 543 to 553 in 30a633d

#if ORIG
if (hdr_line) err_printf("%s\n", hdr_line);
if (bwa_pg) err_printf("%s\n", bwa_pg);
#else
if (hdr_line) {
err_fputs(hdr_line, fp);
// err_printf("%s\n", hdr_line);
}
//printf("%s\n%s\n", hdr_line, bwa_pg);
if (bwa_pg) err_fputs(bwa_pg, fp);
#endif

Was there a reason for omitting the newline in the second path?

Missing CIGAR strings

Hello!
Thank you for this new version! We have started testing it and we have found that in the alignment produced by bwa-mem2 CIGAR tags like MC:Z:101M are missing comparing to the alignment produced by the original version.
Is this behaviour expected?

Mapping to a portion of reference genome

I have a one question on splitted reference genome.

I firstly mapped PE-WG sequence files to whole genome. Its output is about 80GB.
and then i splitted reference genome into only chr22 (the shortest chromosome) and mapped files to splitted genome.
But Its output is about 80GB at the same above.

Do i need to process something?

Thanks.

`bwa-mem2': free(): invalid next size (normal): 0x00007f3a1d66fb5

Setting:

  • Binary built using commit 091cf8a
  • Ran on AWS c5d.18xlarge with Linux 4.4.0-1099-aws and Ubuntu 16.04.6 LTS
  • Executing in AVX512 mode

Command:

โŸซ bwa-mem2 mem -t 8 genome/hs38DH.fa /home/dnanexus/in/reads_fastqgzs/0/ERR3242672_1.fastq.gz /home/dnanexus/in/reads2_fastqgzs/0/ERR3242672_2.fastq.gz -K 100000000 -Y -R '@RG\tID:ERR3242672_1\tPL:ILLUMINA\tPU:ERR3242672_1\tLB:ERR3242672\tSM:ER^C242672' > /dev/null

Error message:

...
[M::mem_pestat] (25, 50, 75) percentile: (827, 2626, 3427)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 8627)
[M::mem_pestat] mean and std.dev: (2383.79, 1436.55)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 11227)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[0000] 10. Calling kt_for - worker_sam
*** Error in `bwa-mem2': free(): invalid next size (normal): 0x00007f3a1d66fb50 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f48411137e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f484111c37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f484112053c]
bwa-mem2[0x44cdf8]
bwa-mem2[0x445869]
bwa-mem2[0x42658c]
bwa-mem2[0x46a631]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f4841f286ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f48411a341d]
======= Memory map: ========
00400000-0049d000 r-xp 00000000 00:2a 75543 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย /bwa-mem2/bwa-mem2.avx512bw
0069c000-006b0000 rw-p 0009c000 00:2a 75543 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย /bwa-mem2/bwa-mem2.avx512bw
006b0000-006d1000 rw-p 00000000 00:00 0 
025dd000-03a99000 rw-p 00000000 00:00 0 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย [heap]
7f38f4000000-7f38f8000000 rw-p 00000000 00:00 0 
7f38f8000000-7f38fbffd000 rw-p 00000000 00:00 0 
7f38fbffd000-7f38fc000000 ---p 00000000 00:00 0 
7f390c000000-7f3910000000 rw-p 00000000 00:00 0
...

Below are the files we used:
Reference genome (hs38DH.fa as a tar file, built after 6743183): https://dl.dnanex.us/F/D/19XXQfV51qBYpP4jgY46YzpX7ZB6GQYv78Zg6B4x/hs38DH.bwa-index.avx512.v2.tar.gz
ERR3242672_1.fastq.gz (12.03 GiB): https://dl.dnanex.us/F/D/FXyYZk9kqk8Y0qVGPV1VkBpY0byfQyB1Q2zf7v84/ERR3242672_1.fastq.gz
ERR3242672_2.fastq.gz (13.77 GiB): https://dl.dnanex.us/F/D/8y5qbZbkbVYyJJZXqB1PPg1YVZ5JFK5J7QjYx82j/ERR3242672_2.fastq.gz

Error in AVX2 mode

Hi,
First of all, thank you for creating bwa-mem2. It is a very cool tool.
I met a problem when I tried to execute in AVX2 mode.
make arch=avxw
./bwa-mem2 mem hg38.fa XXX.fastq
Executing in AVX2 mode!
Ref file:.....
Entering FMI_search
reference seq_len =
segmentation fault (core dumped)

Error during mapping: Assertion `numPairsRight < 512 * 500' failed

Ubuntu 14.04.5 LTS
Dual Intel Xeon E5-2620 v4 with 512GB RAM
Mapping NA12878 to HG38
Running in AVX2 mode

bwa-mem2 mem \
	-t 32 \
	-o ${OUT} \
	-R "@RG\tID:bench\tLB:bench\tSM:bench\tPU:bench\tPL:ILLUMINA" \
	Homo_sapiens_assembly38.fasta \
	ERR194147_1.fastq.gz \
	ERR194147_2.fastq.gz

Attempting the above with the precompiled binaries fails fairly quickly with:

[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[0000] 10. Calling kt_for - worker_sam
[0000] read_chunk: 320000000, work_chunk_size: 320000118, nseq: 3168318
	[0000][ M::kt_pipeline] read 3168318 sequences (320000118 bp)...
	[0000][ M::mem_process_seqs] Processed 3168318 reads in 650.924 CPU sec, 22.033 real sec
[0000] 2. Calling mem_process_seqs.., task: 20
[0000] 3. Calling kt_for - worker_bwt
ERROR! seedBuf count exceeds size. count = 256000, tmp.m = 1, size = 256000
Reserved memory allocated for storing seeds has falling short!!!
Please increase the value of macros: AVG_SEEDS_PER_READ & AVG_AUX_SEEDS_PER_READ, in src/macro.h, re-compile and re-run.
Thu Jul 25 11:33:23 EDT 2019

However, if I compile myself (GCC, I don't have a license for icpc) with all the current commits it gets much farther, but still fails with this:

[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[0000] 10. Calling kt_for - worker_sam
[0000] read_chunk: 320000000, work_chunk_size: 320000118, nseq: 3168318
	[0000][ M::kt_pipeline] read 3168318 sequences (320000118 bp)...
	[0000][ M::mem_process_seqs] Processed 3168318 reads in 566.920 CPU sec, 19.277 real sec
[0000] 2. Calling mem_process_seqs.., task: 398
[0000] 3. Calling kt_for - worker_bwt
[0000] read_chunk: 320000000, work_chunk_size: 320000118, nseq: 3168318
	[0000][ M::kt_pipeline] read 3168318 sequences (320000118 bp)...
[0000] 3. Calling kt_for - worker_aln
bwa-mem2: src/bwamem.cpp:2170: void mem_chain2aln_across_reads_V2(const mem_opt_t*, const bntseq_t*, const uint8_t*, bseq1_t*, int, mem_chain_v*, mem_alnreg_v*, mem_cache*, int64_t, int64_t, int64_t, int): Assertion `numPairsRight < 512 * 500' failed.
Thu Jul 25 19:40:16 EDT 2019

AMD Compatibility with AVX2

Hello,

I'm not that knowledgeable about AVX compilation processes so I was wondering what parameters do I change in the Makefile to make it compatible with AVX2 AMD CPUs?
Using the precompiled binaries gives me an error about using compatible AVX CPUs, and the makefile fails to get AVX2 to build.

Best,
Chang

c_bcast_array error

Hi, if I just run make on my amd3900x, there are some warnings and an error message (log shown at the end). make arch=avx2 ran smoothly.

Warning and error messages:
src/bwamem.cpp: In function โ€˜void mem_chain2aln_across_reads_V2(const mem_opt_t*, const bntseq_t*, const uint8_t*, bseq1_t*, int, mem_chain_v*, mem_alnreg_v*, mem_cache*, int64_t, int64_t, int64_t, int)โ€™:
src/bwamem.cpp:2016:37: warning: invalid conversion from โ€˜intโ€™ to โ€˜_mm_hintโ€™ [-fpermissive]
2016 | _mm_prefetch((const char*) query, 0);
| ^
| |
| int
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/emmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/pmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/tmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/smmintrin.h:32,
from src/bandedSWA.h:42,
from src/bwamem.h:39,
from src/bwamem.cpp:31:
/usr/lib/gcc/x86_64-redhat-linux/9/include/xmmintrin.h:52:46: note: initializing argument 2 of โ€˜void _mm_prefetch(const void*, _mm_hint)โ€™
52 | _mm_prefetch (const void __P, enum _mm_hint __I)
| ~~~~~~~~~~~~~~^~~
src/bwamem.cpp:2034:51: warning: invalid conversion from โ€˜intโ€™ to โ€˜_mm_hintโ€™ [-fpermissive]
2034 | _mm_prefetch((const char
) (srtg + spos + 64), 0);
| ^
| |
| int
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/emmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/pmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/tmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/smmintrin.h:32,
from src/bandedSWA.h:42,
from src/bwamem.h:39,
from src/bwamem.cpp:31:
/usr/lib/gcc/x86_64-redhat-linux/9/include/xmmintrin.h:52:46: note: initializing argument 2 of โ€˜void _mm_prefetch(const void*, _mm_hint)โ€™
52 | _mm_prefetch (const void __P, enum _mm_hint __I)
| ~~~~~~~~~~~~~~^~~
src/bwamem.cpp:2035:40: warning: invalid conversion from โ€˜intโ€™ to โ€˜_mm_hintโ€™ [-fpermissive]
2035 | _mm_prefetch((const char
) (lim_g), 0);
| ^
| |
| int
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/emmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/pmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/tmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/smmintrin.h:32,
from src/bandedSWA.h:42,
from src/bwamem.h:39,
from src/bwamem.cpp:31:
/usr/lib/gcc/x86_64-redhat-linux/9/include/xmmintrin.h:52:46: note: initializing argument 2 of โ€˜void _mm_prefetch(const void*, _mm_hint)โ€™
52 | _mm_prefetch (const void __P, enum _mm_hint __I)
| ~~~~~~~~~~~~~~^~~
src/bwamem.cpp:2072:37: warning: invalid conversion from โ€˜intโ€™ to โ€˜_mm_hintโ€™ [-fpermissive]
2072 | _mm_prefetch((const char
) rseq, 0);
| ^
| |
| int
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/emmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/pmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/tmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/smmintrin.h:32,
from src/bandedSWA.h:42,
from src/bwamem.h:39,
from src/bwamem.cpp:31:
/usr/lib/gcc/x86_64-redhat-linux/9/include/xmmintrin.h:52:46: note: initializing argument 2 of โ€˜void _mm_prefetch(const void*, _mm_hint)โ€™
52 | _mm_prefetch (const void __P, enum _mm_hint __I)
| ~~~~~~~~~~~~~~^~~
g++ -c -g -O3 -fpermissive -msse4.1 -DENABLE_PREFETCH -DPAIRED_END=1 -DMAINY=0 -DSAIS=1 -DDEB=0 -DRDT=0 -DMAXI=0 -DNEW=1 -DSORT_PAIRS=0 src/profiling.cpp -o src/profiling.o
g++ -c -g -O3 -fpermissive -msse4.1 -DENABLE_PREFETCH -DPAIRED_END=1 -DMAINY=0 -DSAIS=1 -DDEB=0 -DRDT=0 -DMAXI=0 -DNEW=1 -DSORT_PAIRS=0 src/bandedSWA.cpp -o src/bandedSWA.o
src/bandedSWA.cpp: In member function โ€˜void BandedPairWiseSW::smithWatermanBatchWrapper16(SeqPair
, uint8_t*, uint8_t*, int32_t, uint16_t, int8_t)โ€™:
src/bandedSWA.cpp:3755:63: warning: invalid conversion from โ€˜intโ€™ to โ€˜_mm_hintโ€™ [-fpermissive]
3755 | _mm_prefetch((const char*) seqBufRef + (int64_t)spf.idr, 0);
| ^
| |
| int
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/emmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/pmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/tmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/smmintrin.h:32,
from src/bandedSWA.h:42,
from src/bandedSWA.cpp:30:
/usr/lib/gcc/x86_64-redhat-linux/9/include/xmmintrin.h:52:46: note: initializing argument 2 of โ€˜void _mm_prefetch(const void*, _mm_hint)โ€™
52 | _mm_prefetch (const void __P, enum _mm_hint __I)
| ~~~~~~~~~~~~~~^~~
src/bandedSWA.cpp:3756:68: warning: invalid conversion from โ€˜intโ€™ to โ€˜_mm_hintโ€™ [-fpermissive]
3756 | _mm_prefetch((const char
) seqBufRef + (int64_t)spf.idr + 64, 0);
| ^
| |
| int
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/emmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/pmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/tmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/smmintrin.h:32,
from src/bandedSWA.h:42,
from src/bandedSWA.cpp:30:
/usr/lib/gcc/x86_64-redhat-linux/9/include/xmmintrin.h:52:46: note: initializing argument 2 of โ€˜void _mm_prefetch(const void*, _mm_hint)โ€™
52 | _mm_prefetch (const void __P, enum _mm_hint __I)
| ~~~~~~~~~~~~~~^~~
src/bandedSWA.cpp:3799:63: warning: invalid conversion from โ€˜intโ€™ to โ€˜_mm_hintโ€™ [-fpermissive]
3799 | _mm_prefetch((const char
) seqBufQer + (int64_t)spf.idq, 0);
| ^
| |
| int
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/emmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/pmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/tmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/smmintrin.h:32,
from src/bandedSWA.h:42,
from src/bandedSWA.cpp:30:
/usr/lib/gcc/x86_64-redhat-linux/9/include/xmmintrin.h:52:46: note: initializing argument 2 of โ€˜void _mm_prefetch(const void*, _mm_hint)โ€™
52 | _mm_prefetch (const void __P, enum _mm_hint __I)
| ~~~~~~~~~~~~~~^~~
src/bandedSWA.cpp:3800:68: warning: invalid conversion from โ€˜intโ€™ to โ€˜_mm_hintโ€™ [-fpermissive]
3800 | _mm_prefetch((const char
) seqBufQer + (int64_t)spf.idq + 64, 0);
| ^
| |
| int
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/emmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/pmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/tmmintrin.h:31,
from /usr/lib/gcc/x86_64-redhat-linux/9/include/smmintrin.h:32,
from src/bandedSWA.h:42,
from src/bandedSWA.cpp:30:
/usr/lib/gcc/x86_64-redhat-linux/9/include/xmmintrin.h:52:46: note: initializing argument 2 of โ€˜void _mm_prefetch(const void*, _mm_hint)โ€™
52 | _mm_prefetch (const void *__P, enum _mm_hint __I)
| ~~~~~~~~~~~~~~^~~
g++ -c -g -O3 -fpermissive -msse4.1 -DENABLE_PREFETCH -DPAIRED_END=1 -DMAINY=0 -DSAIS=1 -DDEB=0 -DRDT=0 -DMAXI=0 -DNEW=1 -DSORT_PAIRS=0 src/FMI_search.cpp -o src/FMI_search.o
src/FMI_search.cpp: In destructor โ€˜FMI_search::~FMI_search()โ€™:
src/FMI_search.cpp:129:14: error: โ€˜c_bcast_arrayโ€™ was not declared in this scope
129 | _mm_free(c_bcast_array);
| ^~~~~~~~~~~~~
make: *** [Makefile:75: src/FMI_search.o] Error 1

Error during mapping: query_pos_ar[pos] < len' failed

First, thank you for your work on this important program. I am looking forward to the improvements in BWA MEM2.

Running BWA MEM2 with the pre-built binaries results in the following for me:

bwa-mem2: src/bwamem.cpp:662: smem_struct *mem_collect_smem(const mem_opt_t *, const bseq1_t *, int, smem_struct *, int *, short *, unsigned char *, int *, long &): Assertion `query_pos_ar[pos] < len' failed.

The machine is Intel Skylake n1-standard-32 on GCP (32 core, 120GB RAM).

Reference genome: GRCh38.no_alt_analysis_set.fa.gz
Index generated: ./bwa-mem2 index GRCh38.no_alt_analysis_set.fa.gz
Input files - NA12878-I30 NovaSeq WGS from BaseSpace (the first 10,000 reads of each pair can reproduce the issue). These files are attached.

Command and error:

time ./bwa-mem2 mem ../references/GRCh38.no_alt_analysis_set.fa.gz NA12878-I30.10000r.R1.fastq.gz NA12878-I30.10000r.R2.fastq.gz -t 32 -R '@RG\tID:NA12878-I30\tLB:NA12878-I30\tPL:ILLUMINA\tPU:NONE\tSM:NA12878' | samtools sort -O BAM -o NA12878.bam -

Entering FMI_search
reference seq len = 6199845083
count
0,      1
1,      1817861264
2,      3099922542
3,      4381983820
4,      6199845083

Reading other elements of the index from files ../references/GRCh38.no_alt_analysis_set.fa.gz
prefix: ../references/GRCh38.no_alt_analysis_set.fa.gz
[M::bwa_idx_load_ele] read 0 ALT contigs
Done reading Index!!
Reading reference genome..
Binary seq file = ../references/GRCh38.no_alt_analysis_set.fa.gz.0123
Reference genome size: 6199845082 bp
Done readng reference genome !!

readLen: 150

[0000] 1: Calling process()

Threads used (compute): 32
Projected #read in a task: 2133343
------------------------------------------
Memory pre-allocation for chaining: 68386443208
Memory pre-allocation for BSW: 7667744768
Memory pre-allocation for BWT: 1966149632
------------------------------------------
No. of pipeline threads: 2
[0000] read_chunk: 320000000, work_chunk_size: 3008555, nseq: 20000
        [0000][ M::kt_pipeline] read 20000 sequences (3008555 bp)...
[0000] 2. Calling mem_process_seqs.., task: 0
[0000] 3. Calling kt_for - worker_bwt
[0000] read_chunk: 320000000, work_chunk_size: 0, nseq: 0
bwa-mem2: src/bwamem.cpp:662: smem_struct *mem_collect_smem(const mem_opt_t *, const bseq1_t *, int, smem_struct *, int *, short *, unsigned char *, int *, long &): Assertion `query_pos_ar[pos] < len' failed.

[NA12878-I30.10000r.R2.fastq.gz](https://github.com/bwa-mem2/bwa-mem2/files/3205601/NA12878-I30.10000r.R2.fastq.gz)
[NA12878-I30.10000r.R1.fastq.gz](https://github.com/bwa-mem2/bwa-mem2/files/3205602/NA12878-I30.10000r.R1.fastq.gz)

Release schedule?

Is there a timeline for when this will be considered production ready?

A ~50% savings in time / money / energy is pretty significant for such a widely used tool.

Need help with bwa mem2

Hi

First I thank you for for the job this is great keep going ;)
I write because I'm blocked with the index creation and the alignment part.

I checkout the source files and make with CXX=icpc multi

Then I start to build a reference form hg19.fa so i did

bwa-mem2 index hg19.fa

it generates the following files

hg19.amb
hg19.fa
hg19.fa.ann
hg19.fa.fai
hg19.fasta.map
hg19.rpac
hg19.ann
hg19.fa.0123
hg19.fa.bwt
hg19.fa.pac
hg19.pac
hg19.rsa
hg19.bwt
hg19.fa.amb
hg19.fa.bwt.8bit.32
hg19.fa.sa
hg19.rbwt
hg19.sa

with bwa-mem2.avx2

and finally I start the alignment with

bwa-mem -t 8 $REF $FASTQ1 $FASTQ2 > $SAM

and it used the bwa-mem2.avx2 version

but when I look the sam file all my reads are aligned in chr1 at position 1 and when I compare with the original bwa this is wrong...I think it's a matter of how the index is created but I'm not sure... Have you an idea when it comes from please?

I also tried the precompiled lib but no help...
I run on Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz

Bests

frederic

Dec 12th commit ICPC pre-compiled?

Dear authors,

I just used the latest commit (Dec 12, 2019), as gcc compiled version, for our GATK pipeline and it finally runs stable and produced comparable results to our bwa-mem pipeline. I have tried the previous versions with very issues posted here. Would it be possible to offer the latest commit as ICPC compiled binaries to optimize the speed?

Cheers,
Thomas

Pairing information causing errors?

I just gave bwa-mem2 (binaries; building from source doesn't change the effect) a quick try, using paired reads in two separate files, and I got the following error:

[0000] 1: Calling process()

Threads used (compute): 1
Projected #read in a task: 79375
------------------------------------------
Memory pre-allocation for chaining: 2544445000
Memory pre-allocation for BSW: 239617024
Memory pre-allocation for BWT: 51611776
------------------------------------------
No. of pipeline threads: 2
bwa-mem2: src/fastmap.cpp:253: ktp_data_t *kt_pipeline(void *, int, void *, mem_opt_t *, worker_t &): Assertion `ret->n_seqs <= nreads' failed.
Aborted (core dumped)

If I run the same command with only the left or right set of reads there isn't an issue. I think an important point here is the way the headers of the fastq encode the left and right read. The read set that I'm working that causes this error arises from mapping followed by subsetting of the bam to retain only mapped reads, and then output with samtools fastq with the -N flag (so left and right are distinguished by the /1 and /2 suffix). If I use the original reads that retain the @<instrument>:<run number>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos> <read>:<is filtered>:<control number>:<sample number> format then bwa-mem2 seems to work fine. Note that both read sets work with bwa-mem so I assume the error is just arising as a result of the pairing information? Please let me know if additional information would be useful.

src/bwamem.cpp:662 error

Hi,
I get the Error message:
readLen: 120

[0000] 1: Calling process()

Threads used (compute): 32
Projected #read in a task: 2666676

Memory pre-allocation for chaining: 85482965856
Memory pre-allocation for BSW: 7667744768
Memory pre-allocation for BWT: 1572933632

No. of pipeline threads: 2
[0000] read_chunk: 320000000, work_chunk_size: 320000191, nseq: 2320440
[0000][ M::kt_pipeline] read 2320440 sequences (320000191 bp)...
[0000] 2. Calling mem_process_seqs.., task: 0
[0000] 3. Calling kt_for - worker_bwt
bwa-mem2: src/bwamem.cpp:662: SMEM* mem_collect_smem(const mem_opt_t*, const bseq1_t*, int, SMEM*, int32_t*, int16_t*, uint8_t*, int32_t*, int64_t&): Assertion `query_pos_ar[pos] < len' failed.

nothing to output but the core file ~130G (core dumped). How can I fix this error?

EXIT==0 on failure to load index files

Should give non-zero exit codes if there is a problem with input data:

~> /opt/wtsi-cgp/bin/bwa-mem2 mem -T 30 -Y -p -t 6 /.../genome.fa /.../tmpMap_PD23336b/split/1/148698_i.fq_000000.gz > /dev/null
-----------------------------
Executing in AVX512 mode!!
-----------------------------
Ref file: /.../genome.fa
Entering FMI_search
ERROR! Unable to open the file: /.../genome.fa.bwt.8bit.32
~> echo $?
0

segmentation fault building index with commit 97a2db0

Hi,
I get a segmentation fault when building an index with commit 97a2db0, which I don't get with commit a9fd42d

Here's the assembly I was indexing, which now gives me a segmentation fault on commit 97a2db0

wget https://www.dropbox.com/s/jpozyb4avje01p8/abyss.fa.gz

gunzip abyss.fa.gz

./bwa-mem2 index -p abyss ../../abyss.fa
[bwa_index] Pack FASTA... Segmentation fault (core dumped)

The CPU architecture is Intel SSE4.1

ERROR! Unable to open the file: 2.bwt.8bit.32

Hello,

I am encountering this issue

./bwa-mem2-2.0pre1_x64-linux/bwa-mem2 index coli.fa
./bwa-mem2-2.0pre1_x64-linux/bwa-mem2 mem 2 -p -t 8 coli.fa paired.fq                                     
-----------------------------
Executing in AVX2 mode!!
-----------------------------
Ref file: 2
Entering FMI_search
ERROR! Unable to open the file: 2.bwt.8bit.32

The 2.bwt.8bit.32 file is there and not empty.

ls coli.fa*
coli.fa  coli.fa.0123  coli.fa.amb  coli.fa.ann  coli.fa.bwt.2bit.64  coli.fa.bwt.8bit.32  coli.fa.pac

I am running Ubuntu 18.04 and my processor is
Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz
thank you

`bwa-mem2': malloc(): memory corruption: 0x00007fa911bc31b0

Setting:

  • Binary built using commit 091cf8a
  • Ran on AWS c5d.18xlarge with Linux 4.4.0-1099-aws and Ubuntu 16.04.6 LTS
  • Executing in AVX512 mode

Command:

โŸซ bwa-mem2 mem -t 8 genome/hs38DH.fa /home/dnanexus/in/reads_fastqgzs/0/ERR3242459_1.fastq.gz /home/dnanexus/in/reads2_fastqgzs/0/ERR3242459_2.fastq.gz -K 100000000 -Y -R '@RG\tID:ERR3242459_1\tPL:ILLUMINA\tPU:ERR3242459_1\tLB:ERR3242459\tSM:ERR3242459' > /dev/null

Error message:

...
[M::mem_pestat] low and high boundaries for proper pairs: (1, 9676)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[0000] 10. Calling kt_for - worker_sam
*** Error in `bwa-mem2': malloc(): memory corruption: 0x00007fa911bc31b0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fb835a857e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8213e)[0x7fb835a9013e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7fb835a92184]
bwa-mem2[0x4720d0]
bwa-mem2[0x44d083]
bwa-mem2[0x4458b5]
bwa-mem2[0x42658c]
bwa-mem2[0x46a631]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fb83689a6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb835b1541d]
======= Memory map: ========
00400000-0049d000 r-xp 00000000 00:2a 75544 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย /bwa-mem2/bwa-mem2.avx512bw
0069c000-006b0000 rw-p 0009c000 00:2a 75544 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย /bwa-mem2/bwa-mem2.avx512bw
006b0000-006d1000 rw-p 00000000 00:00 0
01f8c000-04448000 rw-p 00000000 00:00 0 ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย [heap]
7fa878000000-7fa87bffc000 rw-p 00000000 00:00 0
7fa87bffc000-7fa87c000000 ---p 00000000 00:00 0
...

Below are the files we used:
Reference genome (hs38DH.fa as a tar file, built after 6743183): https://dl.dnanex.us/F/D/19XXQfV51qBYpP4jgY46YzpX7ZB6GQYv78Zg6B4x/hs38DH.bwa-index.avx512.v2.tar.gz
ERR3242459_1.fastq.gz (11.34 GiB): https://dl.dnanex.us/F/D/bBp5FZBgk9q8BpvFk1KXk06XGgbXGpfq9GQ1jBxY/ERR3242459_1.fastq.gz
ERR3242459_2.fastq.gz (11.92 GiB): https://dl.dnanex.us/F/D/8Bjk0vVV9k6f7v7Bx4X25JYZ2P3493P348yv7Xkv/ERR3242459_2.fastq.gz

Occasionally hitting Segmentation Fault

Setting:

  • Binary built using commit 091cf8a
  • Ran on AWS c5d.9xlarge with Linux-4.4.0-1092-aws-x86_64 and Ubuntu-16.04-xenial
  • Executing in AVX512 mode

Note:

  • A rerun may or may not hit Segmentation Fault. It was around 50:50 chance on finishing the mapping successfully or hitting Segmentation Fault.

Log (Segmentation Fault):

โŸซ bwa-mem2 mem -t 36 genome/hs38DH.fa /home/dnanexus/in/reads_fastqgzs/0/ERR2990063_1.fastq.gz /home/dnanexus/in/reads2_fastqgzs/0/ERR2990063_2.fastq.gz -K 100000000 -Y -R '@RG\tID:ERR2990063_1\tPL:ILLUMINA\tPU:ERR2990063_1\tLB:ERR2990063\tSM:ERR2990063' > /dev/null
-----------------------------
Executing in AVX512 mode!!
-----------------------------
Ref file: genome/hs38DH.fa
Entering FMI_search
reference seq len = 6434693835
... [Deleted for brievity]
[0000] 3. Calling kt_for - worker_bwt
[0000] read_chunk: 100000000, work_chunk_size: 7276690, nseq: 48190
        [0000][ M::kt_pipeline] read 48190 sequences (7276690 bp)...
[0000] 3. Calling kt_for - worker_aln
Inferring insert size distribution from data, l_pac: 3217346917, n: 662252
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (3, 279551, 4, 4)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (240, 282, 341)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (38, 543)
[M::mem_pestat] mean and std.dev: (294.94, 74.25)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 644)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[0000] 10. Calling kt_for - worker_sam
        [0000][ M::mem_process_seqs] Processed 662252 reads in 104.068 CPU sec, 3.060 real sec
[0000] 2. Calling mem_process_seqs.., task: 46
[0000] 3. Calling kt_for - worker_bwt
[0000] 3. Calling kt_for - worker_aln
Segmentation fault
139

Below are the files we used:
Reference genome (hs38DH.fa as a tar file): https://dl.dnanex.us/F/D/jG73Z8595GxZYXpKJbqGZzJyX15Q3V0YQG96BxJ7/hs38DH.bwa-index.avx512.v2.tar.gz
ERR2990063_1.fastq.gz: https://dl.dnanex.us/F/D/7Z6ygpJ3xVzf7bG04gx4JVPZ2y4G842g52fJJ9B9/ERR2990063_1.fastq.gz
ERR2990063_2.fastq.gz: https://dl.dnanex.us/F/D/8545B424zVg53Y3BKFyzXxXbJzJV1f3X6KP8V84g/ERR2990063_2.fastq.gz
STDOUT/STDERR of the run:
https://dl.dnanex.us/F/D/pv3VqJyv1842gkk0G45Q0qv362PqvP12J20G61ZX/ERR2990063.log

subtle differences in alignment when compared to bwa 0.7.17-r1188

Hi there,

It appears that there are some small differences in alignment when invoking bwa-mem2 vs bwa version 0.7.17-r1188.

To demonstrate, I obtained a pair of NA12878 FASTQ files from 1000 Genomes:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA12878/sequence_read/ERR001713_1.filt.fastq.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA12878/sequence_read/ERR001713_2.filt.fastq.gz

I then invoked both the bwa and bwa-mem2 as follows:

$ bwa mem -K 10000000 -t 8 hs37d5.fa ERR001713_1.filt.fastq.gz ERR001713_2.filt.fastq.gz  | samtools view -u - > bwa.bam
$ bwa-mem2 mem -K 10000000 -t 8 hs37d5.fa ERR001713_1.filt.fastq.gz ERR001713_2.filt.fastq.gz  | samtools view -u - > bwa-mem2.bam

Now, when looking at the "samtools stats" output for each BAM file, there are some subtle differences:

$ diff <(samtools stats bwa.bam | grep '^SN') <(samtools stats bwa-mem2.bam | grep '^SN')
20c20
< SN	bases mapped (cigar):	284761502	# more accurate
---
> SN	bases mapped (cigar):	284761501	# more accurate
23,24c23,24
< SN	mismatches:	1854434	# from NM fields
< SN	error rate:	6.512235e-03	# mismatches / bases mapped (cigar)
---
> SN	mismatches:	1854433	# from NM fields
> SN	error rate:	6.512232e-03	# mismatches / bases mapped (cigar)

And here's one example of a slightly different alignment:

$ samtools view bwa.bam  | grep '^ERR001713.5425457'
ERR001713.5425457	83	6	83350583	0	7S29M	=	83350484	-128	ACTAGAAAAATCCAATAGCATATCAAAAAGATAATC	(4&;-7>>>>4;,>>9>1<&>;<-7><>>><;<>;<	NM:i:1	MD:Z:28A0	MC:Z:36M	AS:i:28	XS:i:29
ERR001713.5425457	163	6	83350484	0	36M	=	83350583	128	TACCAAAACCAGGAATGGACATAACAAAAAAAGAAA	<;><;<>>><>7.<89.<49;9>,87;>><5.8&5+	NM:i:1	MD:Z:15A20	MC:Z:7S29M	AS:i:31	XS:i:31
$ samtools view bwa-mem2.bam  | grep '^ERR001713.5425457'
ERR001713.5425457	83	13	102092425	0	8S28M	=	102092325	-128	ACTAGAAAAATCCAATAGCATATCAAAAAGATAATC	(4&;-7>>>>4;,>>9>1<&>;<-7><>>><;<>;<	NM:i:0	MD:Z:28	AS:i:28	XS:i:29
ERR001713.5425457	163	13	102092325	0	36M	=	102092425	128	TACCAAAACCAGGAATGGACATAACAAAAAAAGAAA	<;><;<>>><>7.<89.<49;9>,87;>><5.8&5+	NM:i:1	MD:Z:15A20	AS:i:31	XS:i:31

That is, in the first read of "ERR001713.5425457", the soft clipping differs by one basepair.

Are these subtle differences in alignments between bwa and bwa-mem2 to be expected?

Segmentation fault

Dear professor,
I get an error when using mem2, can you give me some advice, thanks a lot
image

Refgenome indexing suffix array failed

Hi,

I am migrating from the original bwa to bwa-mem2 and I am trying unsuccessfully to generate the indexes for the reference genome GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna.gz, available at:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna.gz

The output is:

$ bwa-mem2 index GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna 
[bwa_index] Pack FASTA... 9.78 sec
init ticks = 177097768626
ref seq len = 6211430126
binary seq ticks = 121988005759
Allocation of 46.28 GB for suffix_array failed.
Current Allocation = 52.06 GB

The output is the same in different machines indexing this reference genome.

Indexing the same genome with original bwa does not report any error.

Assertion failed on mem_collect_smem

Hi,

We compiled the latest commit of bwa-mem2 with g++ (our icpc does not work somehow):
make -j20 multi ARCH_FLAGS=-mavx2
and used it in bwa-meth pipeline which pipes C-to-T converted fastq on the fly into bwa.

Note the index has been built with the same bwa-mem2 on a c-to-T converted genome "v37_decoy_plus_phage.fasta.bwameth.c2t"

We got an assertion failure which (by reading the readme) might due to memory allocation problem.

The error log is:

///

Executing in AVX2 mode!!

Ref file: /gpfs/genomedb/b37/b37_bwamem2/v37_decoy_plus_phage.fasta.bwameth.c2t
Entering FMI_search
reference seq len = 12550012029
count
0, 1
1, 4851497105
2, 6275006015
3, 7698514925
4, 12550012029

Reading other elements of the index from files /gpfs/genomedb/b37/b37_bwamem2/v37_decoy_plus_phage.fasta.bwameth.c2t
prefix: /gpfs/genomedb/b37/b37_bwamem2/v37_decoy_plus_phage.fasta.bwameth.c2t
[M::bwa_idx_load_ele] read 0 ALT contigs
Done reading Index!!
Reading reference genome..
Binary seq file = /gpfs/genomedb/b37/b37_bwamem2/v37_decoy_plus_phage.fasta.bwameth.c2t.0123
Reference genome size: 12550012028 bp
Done readng reference genome !!

readLen: 97

[0000] 1: Calling process()

Threads used (compute): 16
Projected #read in a task: 1649494

Memory pre-allocation for chaining: 52876179664
Memory pre-allocation for BSW: 3833872384
Memory pre-allocation for BWT: 635734016

No. of pipeline threads: 2
[0000] read_chunk: 160000000, work_chunk_size: 160000216, nseq: 1357552
[0000][ M::kt_pipeline] read 1357552 sequences (160000216 bp)...
[0000] 2. Calling mem_process_seqs.., task: 0
[M::kt_pipeline] 0 single-end sequences; 1357552 paired-end sequences.....
[0000] 3. Calling kt_for - worker_bwt
bwa-mem2: src/bwamem.cpp:662: smem_struct *mem_collect_smem(const mem_opt_t *, const bseq1_t *, int, smem_struct *, int *, short *, unsigned char *, int *, long &): Assertion `query_pos_ar[pos] < len' failed.

///

The readme file states that one could increase the values of corresponding macros but we would like to know exactly where and how to tweak...

Thanks a lot in ahead.

Best,
Yi

Assertion `numPairsRight < BATCH_SIZE * SEEDS_PER_READ' failed.

So, this is similar to #10, but a different error while mapping. Hopefully I have provided enough information below to make the error reproducible. I mapped the same reads to a different reference without bwa-mem2 stopping/failing during mapping as it does below.

Using def6c75 and sambamba

# Get reads
wget https://www.dropbox.com/s/vfn16yck7xohcnq/chicago-lib_001_R1.fq.gz
wget https://www.dropbox.com/s/1lwgr3sfbbyokcq/chicago-lib_002_R1.fq.gz
wget https://www.dropbox.com/s/e6lmcak6clhackc/chicago-lib_003_R1.fq.gz
wget https://www.dropbox.com/s/0o6jvk66uh6oe1x/hi-c-lib_003_R1.fq.gz
wget https://www.dropbox.com/s/h8oa2lryrpufk88/hi-c-lib_002_R1.fq.gz
wget https://www.dropbox.com/s/5c62ujxx5puad1q/hi-c-lib_001_R1.fq.gz

# Concatenate reads
cat *.fq.gz > chicago-hi-c_1.fastq.gz

# Get reference sequence
wget https://www.dropbox.com/s/le19d5bpfhahplh/dromedary.flye-corr.fa.gz

# Unzip reference
gunzip dromedary.flye-corr.fa.gz

# Index reference
bwa-mem2 index -p test dromedary.flye-corr.fa

# map reads    
bwa-mem2 mem -t 75 test fastq/chicago-hi-c_1.fastq.gz 2> test.log |sambamba view -q -t 75 -S -f bam /dev/stdin > test_1.bam  &

# standard error
-----------------------------
Executing in SSE4.1 mode!!
-----------------------------
Ref file: test
Entering FMI_search
reference seq len = 4025262461
count
0,	1
1,	1178846653
2,	2012631231
3,	2846415809
4,	4025262461

Reading other elements of the index from files test
prefix: test
[M::bwa_idx_load_ele] read 0 ALT contigs
Done reading Index!!
Reading reference genome..
Binary seq file = test.0123
Reference genome size: 4025262460 bp
Done readng reference genome !!

[0000] 1: Calling process()

Threads used (compute): 75
Info: projected #read in a task: 4966897
------------------------------------------
Memory pre-allocation for chaining: 10450.3513 MB
Memory pre-allocation for BSW: 17971.2768 MB
Memory pre-allocation for BWT: 5798.5632 MB
------------------------------------------
No. of pipeline threads: 2
[0000] read_chunk: 750000000, work_chunk_size: 750000088, nseq: 4966888
	[0000][ M::kt_pipeline] read 4966888 sequences (750000088 bp)...
[0000] 2. Calling mem_process_seqs.., task: 0
[0000] 3. Calling kt_for - worker_bwt
[0000] 3. Calling kt_for - worker_aln
[0000] 10. Calling kt_for - worker_sam
	[0000][ M::mem_process_seqs] Processed 4966888 reads in 2968.276 CPU sec, 166.577 real sec
[0000] read_chunk: 750000000, work_chunk_size: 750000088, nseq: 4966888
	[0000][ M::kt_pipeline] read 4966888 sequences (750000088 bp)...
[0000] 2. Calling mem_process_seqs.., task: 1
[0000] 3. Calling kt_for - worker_bwt
[0000] 3. Calling kt_for - worker_aln
[0000] 10. Calling kt_for - worker_sam
	[0000][ M::mem_process_seqs] Processed 4966888 reads in 2904.072 CPU sec, 87.171 real sec
[0000] read_chunk: 750000000, work_chunk_size: 750000088, nseq: 4966888
	[0000][ M::kt_pipeline] read 4966888 sequences (750000088 bp)...
[0000] 2. Calling mem_process_seqs.., task: 2
[0000] 3. Calling kt_for - worker_bwt
[0000] 3. Calling kt_for - worker_aln
[0000] read_chunk: 750000000, work_chunk_size: 750000088, nseq: 4966888
	[0000][ M::kt_pipeline] read 4966888 sequences (750000088 bp)...
[0000] 10. Calling kt_for - worker_sam
	[0000][ M::mem_process_seqs] Processed 4966888 reads in 3518.648 CPU sec, 65.854 real sec
[0000] 2. Calling mem_process_seqs.., task: 3
[0000] 3. Calling kt_for - worker_bwt
[0000] 3. Calling kt_for - worker_aln
[0000] 10. Calling kt_for - worker_sam
	[0000][ M::mem_process_seqs] Processed 4966888 reads in 3089.436 CPU sec, 74.955 real sec
[0000] read_chunk: 750000000, work_chunk_size: 750000088, nseq: 4966888
	[0000][ M::kt_pipeline] read 4966888 sequences (750000088 bp)...
[0000] 2. Calling mem_process_seqs.., task: 4
[0000] 3. Calling kt_for - worker_bwt
[0000] 3. Calling kt_for - worker_aln
[0000] 10. Calling kt_for - worker_sam
	[0000][ M::mem_process_seqs] Processed 4966888 reads in 3153.928 CPU sec, 105.077 real sec
[0000] read_chunk: 750000000, work_chunk_size: 750000088, nseq: 4966888
	[0000][ M::kt_pipeline] read 4966888 sequences (750000088 bp)...
[0000] 2. Calling mem_process_seqs.., task: 5
[0000] 3. Calling kt_for - worker_bwt
[0000] 3. Calling kt_for - worker_aln
[0000] 10. Calling kt_for - worker_sam
	[0000][ M::mem_process_seqs] Processed 4966888 reads in 3100.744 CPU sec, 64.002 real sec
[0000] read_chunk: 750000000, work_chunk_size: 750000088, nseq: 4966888
	[0000][ M::kt_pipeline] read 4966888 sequences (750000088 bp)...
[0000] 2. Calling mem_process_seqs.., task: 6
[0000] 3. Calling kt_for - worker_bwt
[0000] 3. Calling kt_for - worker_aln
bwa-mem2: src/bwamem.cpp:2170: void mem_chain2aln_across_reads_V2(const mem_opt_t*, const bntseq_t*, const uint8_t*, bseq1_t*, int, mem_chain_v*, mem_alnreg_v*, mem_cache*, int64_t, int64_t, int64_t, int): Assertion `numPairsRight < BATCH_SIZE * SEEDS_PER_READ' failed.

Assertion `((*auxSeedBufCount) + c->m) <= auxSeedBufSize' failed.`

I'm attempting to align a single lane of Novaseq-sequenced COLO829 paired-end using the pre-built binaries (https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.0pre1/bwa-mem2-2.0pre1_x64-linux.tar.bz2 per the README). I indexed the reference genome using bwa-mem2 index with no arguments and am running under a GCP instance with 408GB RAM. I've tried with various values for the -t parameter with the same results.

$ bwa-mem2 mem -o colo.bam -t 8 reference_genome/Homo_sapiens.GRCh37.GATK.illumina.fasta R1_001.fastq R2_001.fastq
-----------------------------
Executing in AVX512 mode!!
-----------------------------
Ref file: reference_genome/Homo_sapiens.GRCh37.GATK.illumina.fasta
Entering FMI_search
reference seq len = 6191387963
count
0,      1
1,      1808064624
2,      3095693982
3,      4383323340
4,      6191387963

Reading other elements of the index from files reference_genome/Homo_sapiens.GRCh37.GATK.illumina.fasta
prefix: reference_genome/Homo_sapiens.GRCh37.GATK.illumina.fasta
[M::bwa_idx_load_ele] read 0 ALT contigs
Done reading Index!!
Reading reference genome..
Binary seq file = reference_genome/Homo_sapiens.GRCh37.GATK.illumina.fasta.0123
Reference genome size: 6191387962 bp
Done readng reference genome !!

readLen: 151

[0000] 1: Calling process()

Threads used (compute): 8
Projected #read in a task: 529811
------------------------------------------
Memory pre-allocation for chaining: 16983621416
Memory pre-allocation for BSW: 1916936192
Memory pre-allocation for BWT: 494814208
------------------------------------------
No. of pipeline threads: 2
[0000] read_chunk: 80000000, work_chunk_size: 80000102, nseq: 529802
        [0000][ M::kt_pipeline] read 529802 sequences (80000102 bp)...
[0000] 2. Calling mem_process_seqs.., task: 0
[0000] 3. Calling kt_for - worker_bwt
bwa-mem2: src/bwamem.cpp:394: int test_and_merge(const mem_opt_t *, long, mem_chain_t *, const abc *, int, abc *, long *, long): Assertion `((*auxSeedBufCount) + c->m) <= auxSeedBufSize' failed.

Segmentation fault (core dumped) when indexing hg19, hg38

Hi,

I tried with pre-compiled and compiled (hg19 and hg38). The same output:

bwa-mem2/bwa-mem2 index Homo_sapiens_assembly19.fasta                                                                                                                 
[bwa_index] Pack FASTA... 13.35 sec
init ticks = 204490443602
ref seq len = 6203953124
binary seq ticks = 117021996235
build index ticks = 6870572410362
ref_seq_len = 6203953124
count = 0, 1811627992, 3101976562, 4392325132, 6203953124
Segmentation fault (core dumped)

Is this being worked on? Where can I get the index files for the reference genome suitable for bwa-mem2.

Thanks.

Non-zero exit code when building genome index

I'm trying out the pre-compiled bwa-mem2 binaries on an AWS r5d.12xlarge (384 GB of RAM, 900 GB storage, 48 cores) and keep exiting with a non-zero exit code while it is writing out the .2bit.64 file. No error message is presented, so I'm not sure exactly what the underlying problem is. It is reproducible and deterministic though. Running on the same reference genome two times in row will result in the same failure with the output files in the exact same state, as determined by running md5sum on all of the output files upon the program exiting:

> md5sum hs37d5.fa.*
4a509f50e9ca3027fdb211af51e02bd2  hs37d5.fa.0123
c5287824ea836aed6e34dd2400f83643  hs37d5.fa.amb
b9445ac9206ed3710dce7c6bffcf54a6  hs37d5.fa.ann
c2bb6afb52e2efff51288d33f2b5bcd2  hs37d5.fa.bwt.2bit.64
191da2e42cf9d6f9b7553a9d3cbbdbef  hs37d5.fa.bwt.8bit.32
cbc647939a64ce1e5600f17e499db4f1  hs37d5.fa.pac

My command is simply

./bwa-mem2 index hs37d5.fa

Do you have any suggestions on what I might try?

What is the expected speed improvement?

I wanted to test speed improvements of BWA-MEM2 over BWA-MEM and run the same WGS 30x sample on AWS instance c5.9xlarge (36CPUS, 72GB):

bwa mem -R '@RG\tID:1\tLB:FDA_GARVAN\tPL:Illumina-HiSeq-XTen\tSM:NA12878' -t 36 human_g1k_v37_decoy.fasta NA12878-Garvan-Vial1_R1.fastq.gz NA12878-Garvan-Vial1_R2.fastq.gz  > NA12878-Garvan-Vial1_R.sam 

bwa-mem2 mem -R '@RG\tID:1\tLB:FDA_GARVAN\tPL:Illumina-HiSeq-XTen\tSM:NA12878'  -t 36 human_g1k_v37_decoy.fasta NA12878-Garvan-Vial1_R1.fastq.gz NA12878-Garvan-Vial1_R2.fastq.gz > NA12878-Garvan-Vial1_R.sam

BWA-MEM finished in 4 hours, 12 minutes and BWA-MEM2 in 3 hours, 40 minutes.
Is that expected?
BWA-MEM2 log is attached..

`bwa-mem2': free(): invalid next size (normal): 0x00007f4344068230

Setting:

  • Binary built using commit 1038fe3
  • Ran on AWS c5d.9xlarge with Linux-4.4.0-1092-aws-x86_64 and Ubuntu-16.04-xenial
  • Executing in AVX512 mode

Note:

  • Same error happens when using AWS c5d.18xlarge. Executing in AVX512 mode.
  • Same binary run successfully without error on AWS c3.8xlarge. Executing in SSE4.1 mode.

Command:

bwa-mem2 mem -t 36 genome/hs38DH.fa /home/dnanexus/in/reads_fastqgzs/0/150420_HG005_29-40_R1.fastq.gz /home/dnanexus/in/reads2_fastqgzs/0/150420_HG005_29-40_R2.fastq.gz -K 100000000 -Y -R '@RG\tID:150420_HG005_29-40_1\tPL:ILLUMINA\tPU:150420_HG005_29-40_1\tLB:150420_HG005_29-40\tSM:150420_HG005_29-40'

Error message:

*** Error in `bwa-mem2': free(): invalid next size (normal): 0x00007f4344068230 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f527cb417e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f527cb4a37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f527cb4e53c]
bwa-mem2[0x44a38d]
bwa-mem2[0x42acba]
bwa-mem2[0x40a501]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f527d9566ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f527cbd141d]

We were mapping downsampled HG005 WGS FASTQ as input. We realized that there are specific 12 consecutive read pairs triggering the error. When further removing the read pairs from the top or bottom, mapping succeed without any error.

Below are the files we used:
Reference genome (hs38DH.fa as a tar file): https://dl.dnanex.us/F/D/jG73Z8595GxZYXpKJbqGZzJyX15Q3V0YQG96BxJ7/hs38DH.bwa-index.avx512.v2.tar.gz
150420_HG005_29-40_R1.fastq.gz: https://dl.dnanex.us/F/D/J8J0fp049fG3Yzfyxf0KG7k03YJ6J1P1q4vyz27y/150420_HG005_29-40_R1.fastq.gz
150420_HG005_29-40_R2.fastq.gz: https://dl.dnanex.us/F/D/vbPkzZkjyq6JZQf1j2Jgjjb1xV248fvx10jv254K/150420_HG005_29-40_R2.fastq.gz
STDOUT/STDERR of the run:
https://dl.dnanex.us/F/D/8KzbQy2j5zZ10V6bvK3bjf2Pg9Gqf1bg5qbzbbX7/150420_HG005_29-40.log

What do we do of the different executables?

Hello,
I guess a simple question but what would be the difference between running

bwa-mem2 mem
VS
bwa-mem2.avx2 mem

On a proc supporting AVX2 instructions? Is the first executable a kind of wrapper that would identify the correct architecture?

Thank you

Everything aligns to chr1 pos 1, but doesnt in standard bwa

Hi Bwa-mem2 team,

first of all thank you guys for creating this project, it looks really promissing and exciting. Halving the runtime of a well known tools is no small task!

I ran some experiments on Bwa-mem2 vs bwa mem, and initially it looked really promissing with the diff between 2 files only beeing the expected difference in the header of the sam file. This was done on only a single human chromosome with reads known to come from this chromosome.

However when i alligned reads from a different file against the full human reference genome there was an apperent problem where every read was mapped to chr1:1, while the same reads mapped to the same reference genome with the original dont map there at all.

to reproduce:
Input file:
SRR7890883_1k.fastq.gz

code:

#Unzip input file
gunzip SRR7890883_1k.fastq.gz
# get latest release
curl -L https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.0pre1/bwa-mem2-2.0pre1_x64-linux.tar.bz2 | tar jxf - 
#update path to add bwa-mem2
PATH="$PATH:$(pwd)/bwa-mem2-2.0pre1_x64-linux/"
# get reference genome
gsutil cp gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta .
 # index file
bwa-mem2 index Homo_sapiens_assembly38.fasta
# align 1000 reads
bwa-mem2 mem -t 16 -o rerun2.sam Homo_sapiens_assembly38.fasta SRR7890883_1k.fastq

Please note:
ran on gcp (n1-highmem-16 (16 vCPUs, 104 GB memory) )
release: VN:2.0pre1
exit code: 1 (appears to be an issue in the release according to #2 )
SRR7890883_1k.fastq contains the first 1000 reads from another file

Output:
samtools view rerun2.sam | head

SRR7890883.2    0       chr1    1       60      51M199S *       0       0       GGGATGGCGGGCGCCGGGAGCGAAGCCCGGTTCGCCGGGCTGTCGCTGGTGNAGCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNNNNNNNNNNNNNNNNNGAGNTGANCCGAGCCCGCCTTAAGCCCAGATCGGCNNNNNNNNNNNNNNNTCTGNANNNNNNNNNNNNNNNNNNNNNNNNNCTNCAGCAGCTAGTTGAGCTGCACCAGCGACAGCCCGGCGAACCGGGCTTCGNNNNNN     AAFFFKKKKKKFKKAFFAAFKAKKKKKKKAKFKKK7FAAFKKKKAKKKFKK#KKKK##################################F####################A7,#<A7#<FKKA(AAFFFKKKKKKKKKK,7,FKK###############KKFF#,#########################FF#KKKK7FKK,A,AFKKKKKAAKKFFKFKK<FKK,,,AA777AFFF(7A,,######        NM:i:32 MD:Z:0C2G0A0A0C0A1C0T1T2T0T4T0C0T1T0G0G1T0G0A0T1G0G0T1A2C0C0T1A0A0A1C1       AS:i:51 XS:i:0  SA:Z:chr1,1,+,210S34M6S,60,21;
SRR7890883.2    2048    chr1    1       60      210H34M6H       *       0       0       CTGCACCAGCGACAGCCCGGCGAACCGGGCTTCG   KKKAAKKFFKFKK<FKK,,,AA777AFFF(7A,,      NM:i:21 MD:Z:1G1G1A4T0C0T0C0C0T0T0G0A2T0C0T0G0T3T0G0A0T1     AS:i:34 XS:i:23 SA:Z:chr1,1,+,51M199S,60,32;
SRR7890883.3    0       chr1    1       60      196S47M7S       *       0       0       GCCATTCATATACTAAGCAACAAAACATCAGCAGGATGCGGAAGGTCCCGANAGTANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCAANTGTNCATACAACAAAATAAGTCAGGTTAATTNNNNNNNNNNNNNNNTCCANNNNNNNNNNNNNNNNNNNNNNNNNNNTTNNTGAGATACAGGTTGATGGACGGATGGCTACATGGATGTGATGGAGAAGNNNNNN     <<AFAFFKF<F,FKFAAFKKFKKKFFKKKAKK7AK7FKFKKKA,F,AFAAF#,<<K#######################################################KK,#<AK#FKKKA,AAFFAKKKKKKA<<FK7FFKK###############F,AF###########################FK##F7A77AFKKAAFAFKKKKFFAAFKK,,A,A7FFFFKFKK,,<A,A,,<######        NM:i:33      MD:Z:0C1G2A0C0A0G0C0T0C1C0C1T2G0C0T0C2T0G0G0G0T0G0A0T1G0G0T0G0A0C1C0C0T3        AS:i:47 XS:i:0
     SA:Z:chr1,1,+,12S39M199S,60,31;
SRR7890883.3    2048    chr1    1       60      12H39M199H      *       0       0       CTAAGCAACAAAACATCAGCAGGATGCGGAAGGTCCCGA      FKFAAFKKFKKKFFKKKAKK7AK7FKFKKKA,F,AFAAF NM:i:31 MD:Z:1G0G0G0A0A0C1G0C0T0C0T1C1T0G0A0G0C0T0C0T0G0T0G2T0G0A0T0G0G0G0T2 AS:i:39 XS:i:28 SA:Z:chr1,1,+,196S47M7S,60,33;
SRR7890883.4    0       chr1    1       60      51M199S *       0       0       CACTAAAGTGTTCATGACTTGTTCTACATAAAACTAATTCAACCTGTATGANAGGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTCNACCNGTTTTGAGCAATGCAGCAAATCTAGCCNNNNNNNNNNNNNNNTCTANNNNNNNNNNNNNNNNNNNNNNNNNNNTANNCAAACAGGTAGAAAACATGGCCTGAAGGATTGTCTCTGCCACCACCTCNNNNNN     AAFFFKFKKKKKKKKKAKKKF,FAAFKKKKKAFFKKKFKFAKF,7,F<AAK#KKK,#######################################################,,,#F<F#,,,7<,AAFFFFFKAFKKFKKKFKFAK###############,,<7###########################,7##,7A,<AKKKKFKFK<FK<FK,7AA,AAFKAKKK7FKAFKAAKAKKKFA######        NM:i:42 MD:Z:1G0G0G2C0A0G0C1C0T0C0C0T0T0G0A0G0C1C0T0G0T0G0G0G0T0G1T0G0G0G0T0G0A1T0C1T0G0A0A1G0C0G0   AS:i:51 XS:i:0
     SA:Z:chr1,1,+,197S47M6S,60,37;
SRR7890883.4    2048    chr1    1       60      197H47M6H       *       0       0       AAACAGGTAGAAAACATGGCCTGAAGGATTGTCTCTGCCACCACCTC      7A,<AKKKKFKFK<FK<FK,7AA,AAFKAKKK7FKAFKAAKAKKKFA NM:i:37 MD:Z:0C0G0G0G1A0C0A0G0C0T0C0T0C1T2A0G2C0T0G0T1G0G2A0T0G0G0G0T0G0A0C0T1C0T0G0A0A0     AS:i:47 XS:i:0  SA:Z:chr1,1,+,51M199S,60,42;
SRR7890883.5    0       chr1    1       60      51M199S *       0       0       GGTGACTGTCACAAAGCAATGCCTGTTGCCAAACCTGTGGCTCCTCCTTCTNCTTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNGNGTCNGATCGGGACCTCAACTGCAGGAAAGGNNNNNNNNNNNNNNNNGGACNNNNNNNNNNNNNNNNNNNNNNNNNNNAGNNGGAGGAGCCACAGGTTTGGCAACAGGCATTGCTTTGTGACAGTCACCANNNNNN     ,A<A<F<FKKKKKKKKKKKKKKKKKFKKKKKKKKKF7KFAKKKKKKKKFKK#K<FK#######################################################7#,#FKK#K,F<AAAA<FAA7FAKKKKKKKF<AK################KFFK###########################KK##KKFKKFFFKKKKKKKKKKKKKKKKKKAKKKKKKKKKFKF7<AAAAKAF######        NM:i:37 MD:Z:0C1G2A0C0A0G1T1T0C0C0T0T0G1G0C0T4G1G0T0G1T0G0G0G0T0G0A0C0T0C1T0G0A0A0A0G1G0     AS:i:51 XS:i:0  SA:Z:chr1,1,+,196S48M6S,60,31;
SRR7890883.5    2048    chr1    1       60      196H48M6H       *       0       0       GGAGGAGCCACAGGTTTGGCAACAGGCATTGCTTTGTGACAGTCACCA     KKFKKFFFKKKKKKKKKKKKKKKKKKAKKKKKKKKKFKF7<AAAAKAF        NM:i:31      MD:Z:0C1G1A1C0A0G0C0T0C0T0C0C3A0G0C0T1T1T0G0G0G2A1G0G5T0C0C0T0G0A0A1    AS:i:48 XS:i:0  SA:Z:chr1,1,+,51M199S,60,37;
SRR7890883.6    0       chr1    1       60      51M199S *       0       0       GCACAACTGACGACTGAGAGGACATGAACACACACGTGAATAAGTGCATGGNTTCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNANCNCNCNTCACATACAAATAGGAAACCGTCTTGNNNNNNNNNNNNNNNNAGCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNCNNCTCAGTCGTCAGTTGTGCAGATCGGAAGAGCGTCGTGTAGGGATAGAGNNNNNN     AAAFFKKFKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK#FFKK#######################################################K#F#<#,#<FFF<AAAFFFKKKKKKKKKKKKKKK################AAFK############################F##KFKKKKKKKKKFKKKKKFFKFKKKKKFFAFFFKFFKFFAAAA<,<F,A######        NM:i:34 MD:Z:0C0G0G0G3A1C0T0C0T1C0T0T3C0T1T0G0T0G0G0G0T0G1T0G0G4C1C0C0T0G0A0A1G0C1   AS:i:51 XS:i:0

Output from same sequences aligned with bwa mem (original)

    AS:i:54 XS:i:22 SA:Z:chr6,50729317,+,192S52M6S,60,2;
SRR7890883.1    2048    chr6    50729317        60      192H52M6H       *      00       GANTTTTCAGTTTAGCAAAATAATGCTGGTAGGTTGGCGTGTCTTTGAAACT    A,#F,,F<AA,7F,FAFF,A<7<A,FFK<,,A<AK,A,<FFF<<7AKKF7<F    NM:i:2  MD:Z:2C26A22    AS:i:45 XS:i:0 SA:Z:chr6,50729331,-,194S56M,60,1;
SRR7890883.2    16      chr12   122896010       60      194S56M *       0      0NNNNNNCGAAGCCCGGTTCGCCGGGCTGTCGCTGGTGCAGCTCAACTAGCTGCTGNAGNNNNNNNNNNNNNNNNNNNNNNNNNTNCAGANNNNNNNNNNNNNNNGCCGATCTGGGCTTAAGGCGGGCTCGGNTCANCTCNNNNNNNNNNNNNNNNNNNNCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGCTNCACCAGCGACAGCCCGGCGAACCGGGCTTCGCTCCCGGCGCCCGCCATCCC      ######,,A7(FFFA777AA,,,KKF<KKFKFFKKAAKKKKKFA,A,KKF7KKKK#FF#########################,#FFKK###############KKF,7,KKKKKKKKKKFFFAA(AKKF<#7A<#,7A####################F##################################KKKK#KKFKKKAKKKKFAAF7KKKFKAKKKKKKKAKFAAFFAKKFKKKKKKFFFAA      NM:i:1  MD:Z:4G51
    AS:i:54 XS:i:0  SA:Z:chr12,122895994,+,192S52M6S,60,2;
SRR7890883.2    2048    chr12   122895994       60      192H52M6H       *      00       CTNCAGCAGCTAGTTGAGCTGCACCAGCGACAGCCCGGCGAACCGGGCTTCG    FF#KKKK7FKK,A,AFKKKKKAAKKFFKFKK<FKK,,,AA777AFFF(7A,,    NM:i:2  MD:Z:2C8C40     AS:i:45 XS:i:0 SA:Z:chr12,122896010,-,194S56M,60,1;
SRR7890883.3    16      chr16   74377111        0       194S56M *       0      0NNNNNNCTTCTCCATCACATCCATGTAGCCATCCGTCCATCAACCTGTATCTCANNAANNNNNNNNNNNNNNNNNNNNNNNNNNNTGGANNNNNNNNNNNNNNNAATTAACCTGACTTATTTTGTTGTATGNACANTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTACTNTCGGGACCTTCCGCATCCTGCTGATGTTTTGTTGCTTAGTATATGAATGGC      ######<,,A,A<,,KKFKFFFF7A,A,,KKFAAFFKKKKFAFAAKKFA77A7F##KF###########################FA,F###############KKFF7KF<<AKKKKKKAFFAA,AKKKF#KA<#,KK#######################################################K<<,#FAAFA,F,AKKKFKF7KA7KKAKKKFFKKKFKKFAAFKF,F<FKFFAFA<<      NM:i:2  MD:Z:4A39G11    AS:i:49 XS:i:49 SA:Z:chr16_KI270853v1_alt,541677,+,196S46M8S,0,0;
SRR7890883.3    2048    chr16_KI270853v1_alt    541677  0       196H46M8H      *0       0       TGAGATACAGGTTGATGGACGGATGGCTACATGGATGTGATGGAGA  F7A77AFKKAAFAFKKKKFFAAFKK,,A,A7FFFFKFKK,,<A,A,  NM:i:0  MD:Z:46 AS:i:46 XS:i:46 SA:Z:chr16,74377111,-,194S56M,0,2;
SRR7890883.4    16      chr12   70521176        60      194S56M *       0      0NNNNNNGAGGTGGTGGCAGAGACAATCCTTCAGGCCATGTTTTCTACCTGTTTGNNTANNNNNNNNNNNNNNNNNNNNNNNNNNNTAGANNNNNNNNNNNNNNNGGCTAGATTTGCTGCATTGCTCAAAACNGGTNGAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCCTNTCATACAGGTTGAATTAGTTTTATGTAGAACAAGTCATGAACACTTTAGTG      ######AFKKKAKAAKFAKF7KKKAKFAA,AA7,KF<KF<KFKFKKKKA<,A7,##7,###########################7<,,###############KAFKFKKKFKKFAKFFFFFAA,<7,,,#F<F#,,,#######################################################,KKK#KAA<F,7,FKAFKFKKKFFAKKKKKFAAF,FKKKAKKKKKKKKKFKFFFAA      NM:i:1  MD:Z:4G51
    AS:i:54 XS:i:0  SA:Z:chr12,70521109,+,197S47M6S,60,0;
SRR7890883.4    2048    chr12   70521109        60      197H47M6H       *      00       AAACAGGTAGAAAACATGGCCTGAAGGATTGTCTCTGCCACCACCTC 7A,<AKKKKFKFK<FK<FK,7AA,AAFKAKKK7FKAFKAAKAKKKFA NM:i:0  MD:Z:47 AS:i:47 XS:i:0  SA:Z:chr12,70521176,-,194S56M,60,1;
SRR7890883.5    0       chr12   2963243 60      56M194S *       0       0      GGTGACTGTCACAAAGCAATGCCTGTTGCCAAACCTGTGGCTCCTCCTTCTNCTTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNGNGTCNGATCGGGACCTCAACTGCAGGAAAGGNNNNNNNNNNNNNNNNGGACNNNNNNNNNNNNNNNNNNNNNNNNNNNAGNNGGAGGAGCCACAGGTTTGGCAACAGGCATTGCTTTGTGACAGTCACCANNNNNN       ,A<A<F<FKKKKKKKKKKKKKKKKKFKKKKKKKKKF7KFAKKKKKKKKFKK#K<FK#######################################################7#,#FKK#K,F<AAAA<FAA7FAKKKKKKKF<AK################KFFK###########################KK##KKFKKFFFKKKKKKKKKKKKKKKKKKAKKKKKKKKKFKF7<AAAAKAF######      NM:i:1  MD:Z:51C4
    AS:i:54 XS:i:0  SA:Z:chr12,2963243,-,7S47M196S,60,0;
SRR7890883.5    2064    chr12   2963243 60      7H47M196H       *       0      0GGTGACTGTCACAAAGCAATGCCTGTTGCCAAACCTGTGGCTCCTCC AKAAAA<7FKFKKKKKKKKKAKKKKKKKKKKKKKKKKKKFFFKKFKK NM:i:0  MD:Z:47 AS:i:47 XS:i:0  SA:Z:chr12,2963243,+,56M194S,60,1;```

Assertion `reference_seq_len > 0' failed

Hi,

I am using the precompiled version 2.0pre2 and built the index using bwa-mem2 index for Grch38. I am now trying to align but get the error:

-----------------------------
Executing in AVX512 mode!!
-----------------------------
Ref file: /reference/GRCh38_full_analysis_set_plus_decoy_hla.fa
Entering FMI_search
bwa-mem2: src/FMI_search.cpp:56: FMI_search::FMI_search(char *): Assertion `reference_seq_len > 0' failed.

The command looks like:

docker run \
    -v /media/maxh/fast_storage/1x_reads:/data \
    -v /media/maxh/slow_storage/human_data/reference_genome/mem2_index:/reference \
    cannabinoid/bwa-mem2:0.1 bwa-mem2 mem \
        /reference/GRCh38_full_analysis_set_plus_decoy_hla.fa \
        /data/ERR3239279_R1_1x.fastq.gz \
        /data/ERR3239279_R2_1x.fastq.gz 

It is detecting the indices because if I run with a malformed reference path it gives a different error.

I am going to try and rebuild the indices, will update when this is complete.

Wheat 17G genome build index Segmentation fault

Hi
I want to use bwa-mem2 in wheat genome (17G) .
bwa-mem2 was compiled from source, but the program had an segmentation fault when building index.

bwa-mem2 index -p CS_parts_bwamem2 161010_Chinese_Spring_v1.0_pseudomolecules_parts.fasta

[bwa_index] Pack FASTA... 85.80 sec
init ticks = 763897684114
ref seq len = 29094523130
binary seq ticks = 497034160160
build index ticks = 21694227971367
ref_seq_len = 29094523130
count = 0, 7837288578, 14547261565, 21257234552, 29094523130
BWT[12766107114] = 4
CP_SHIFT = 5, CP_MASK = 31
sizeof CP_OCC = 64
Segmentation fault

When segementation fault, I got 5 output files:

  28G CS_parts_bwamem2.0123
  12M CS_parts_bwamem2.amb
 2.1K CS_parts_bwamem2.ann
  55G CS_parts_bwamem2.bwt.8bit.32
 3.4G CS_parts_bwamem2.pac

My system info : Linux debian 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux ; 512G memory

By the way, bwa-mem works well. The genome file : ftp://ftp.ensemblgenomes.org/pub/plants/release-43/fasta/triticum_aestivum/dna/

error "SSE4.1 instruction set not enabled"

Hi,
I get an error when make the code,

g++ -c -g -O3 -fpermissive -DENABLE_PREFETCH -DPAIRED_END=1 -DMAINY=0 -DSAIS=1 -DDEB=0 -DRDT=0 -DMAXI=0 -DNEW=1 -DSORT_PAIRS=0 src/fastmap.cpp -o src/fastmap.o
In file included from src/bandedSWA.h:42:0,
from src/bwamem.h:39,
from src/fastmap.h:45,
from src/fastmap.cpp:39:
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/smmintrin.h:31:3: error: #error "SSE4.1 instruction set not enabled"

error "SSE4.1 instruction set not enabled"

^
make: *** [src/fastmap.o] Error 1

What is the problem?
Thank you!

avx512bw fails where sse41/avx2 succeed

We have attempted to use bwa-mem2 to process a large dataset.

  • ~18 Terabases of processing was successful
    • 549 "lanes"
    • ~34 Gbases/lane
  • 174 "lanes" failed with various memory errors or non-zero exit codes
  • Mapping to this version of GRCh38
  • Execution was using the pre-compiled binaries from 2.0pre2 (on Ubuntu Bionic)
  • Mapping is successful under legacy bwa mem
  • All test runs were provided 120GB of memory on an otherwise empty host (didn't exceed 60GB)
  • ./bin/bwa-mem2 mem -p ref/genome.fa data/mini.xi.gz > /dev/null

Taking 1 of the cases the error messages change depending on number of CPUs:

  • 1 thread : double free or corruption (!prev)
  • 8 threads: free(): invalid next size (normal)
  • 1 thread + valgrind = success - forced to AVX2

As it was working with AVX2, I ran tests on each of the binaries in isolation:

  • sse41 - works
  • avx2 - works
  • avx512bw - fails

The host used supports avx512bw.

casm3-os0000001: grep avx512bw /proc/cpuinfo | uniq -c
     60 flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke

With help from @mp15 we have a dataset that reliably reproduces this error with just 212 read pairs (151 length).

@mp15 has traced the problem with "Address Sanitiser" :

Just a brief update as to where I have got to. It turns out that whilst Valgrind doesn't work on AVX-512 instructions, Address Sanitiser does and you can breakpoint the process at __asan_on_error@plt to see what is about to start writing over your stack/heap. I have managed to trace the issue we are seeing to the following line:

src/kswv.cpp:326
mySeq1SoA[k * SIMD_WIDTH8 + j] = (seq1[k] == AMBIG_ ? AMBR:seq1[k]);

Which as far as I can tell is part of a AVX-512 banded smith-waterman implementation and I believe part of the mate pair rescue feature?

The array it is indexing into is a subunit of:

#define MAX_SEQ_LEN_REF_SAM 2048
#define SIMD_WIDTH8 64

uint8_t *seq1SoA = NULL;
seq1SoA = (uint8_t *)_mm_malloc(MAX_SEQ_LEN_REF_SAM * SIMD_WIDTH8 * numThreads * sizeof(uint8_t), 64);

via:

//#pragma omp parallel num_threads(numThreads)
  {
    int32_t i;
    // uint16_t tid = omp_get_thread_num();
uint16_t tid = 0;
    uint8_t *mySeq1SoA = seq1SoA + tid * MAX_SEQ_LEN_REF_SAM * SIMD_WIDTH8;

Apparently the thread number here will always be 0 and I assume numThreads will always be 1? In anycase that makes the maximum array size: 2048641=131072. The line in question is part of a loop:

for(k = 0; k < sp.len1; k++)
        {
          // mySeq1SoA[k * SIMD_WIDTH8 + j] = (seq1[k] == AMBIG_?0xFF:seq1[k]);
mySeq1SoA[k * SIMD_WIDTH8 + j] = (seq1[k] == AMBIG_ ? AMBR:seq1[k]);
        }

And when it errors, k is 2048 and j is 0, with sp.len1 equal to 20577 and past the end of the array, therefore it is likely to be sp.len1 at fault. I am now trying to figure out why this is happening but we may need to pass this to Intel and ask them to figure it out.

Unfortunately, I am unable to attach the data directly to the issue due to DAA requirements.
To apply for access please see here, I believe it requires you to be a member of an academic
institution though:

https://www.sanger.ac.uk/legal/DAA/MasterController

Please let me know if there is any further information I can provide.

P.S.

The following have been observed on other datasets, I am still confirming if these are specific to avx512bw, so far all 174 are shown to work on sse41 and 81 on avx2 (incomplete on remainder):

bwa-mem2: malloc.c:2927: __libc_malloc: Assertion `!victim || chunk_is_mmapped (mem2chunk (victim)) || ar_ptr == arena_for_chunk (mem2chunk (victim))' failed.
bwa-mem2: malloc.c:3567: _int_malloc: Assertion `(fwd->size & NON_MAIN_ARENA) == 0' failed.
bwa-mem2: malloc.c:3722: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.