Giter Club home page Giter Club logo

abra2's People

Contributors

alanhoyle avatar dmcmanam avatar lmose avatar mozack avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abra2's Issues

Unknown format conversion exception

khanabadosh@KhanaBadosh:/mnt/c/Users/Abhishek Kumar/Desktop/NGS_Data_Analysis/Test1$ java -Xmx2g -jar abra2-2.23.jar --in Sorted_out.bam --out
Sorted_out.abra.bam --ref S288C_reference_sequence_R64-2-1_20150113.fsa threads 4 --tmpdir tmpdir > abra.log
INFO Sat Aug 26 18:25:56 IST 2023 Abra version: 2.23
Exception in thread "main" java.util.UnknownFormatConversionException: Conversion = 'K'
at java.util.Formatter$FormatSpecifier.conversion(Formatter.java:2691)
at java.util.Formatter$FormatSpecifier.(Formatter.java:2720)
at java.util.Formatter.parse(Formatter.java:2560)
at java.util.Formatter.format(Formatter.java:2501)
at java.util.Formatter.format(Formatter.java:2455)
at java.lang.String.format(String.java:2940)
at abra.Logger.info(Logger.java:45)
at abra.ReAligner.run(ReAligner.java:1739)
at abra.Abra.main(Abra.java:12)

crash with java IndexOutOfBoundsException

Hi,

I'm running ABRA2 on WGS data ([30,80]x samples) and I get this error multiple time but each time on a different region.

INFO Wed Mar 07 01:29:48 CET 2018 PROCESS_REGION_MSECS: 1_23599001_23599401 1 0 0 0
ERROR Wed Mar 07 01:29:48 CET 2018 Error parsing assembled contigs. Line: [>1_121484601_121485001_21]
[...]
java.lang.ArrayIndexOutOfBoundsException: 4 at abra.ScoredContig.convertAndFilter(ScoredContig.java:53) at abra.ReAligner.assemble(ReAligner.java:1096) at abra.ReAligner.processRegion(ReAligner.java:1262) at abra.ReAligner.processChromosomeChunk(ReAligner.java:342) at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21) at abra.AbraRunnable.run(AbraRunnable.java:20) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

I saw a previous issue describing this error but it was with version 2.9 but glk is now by default disabled.

I'm using ABRA2 version 2.14 compiled on CentOS 6.2 with jdk1.8.0_162,
ABRA2 was launched with default options (except 8 threads) and normal and tumor bam were provided.
Alignments were made with BWA aln on GRCh37
4/8 samples have already crashed with this error and the 4 remaining are running since 7 days (8 cpu + 30Go ram allocated)
Can you also tell me if the running time seems correct for WGS or if I can speed it up (by removing centromere regions for exemple ?).

Thanks in advance,

Anne-Sophie

MD tags for realigned reads

I am using abra2.2.19. Thanks a lot for this great tool.

Is it possible to update MD tags of realigned reads?
They are using the tags based on the original alignments.
Also, some realigned reads do not have MD at all.

Thanks,

Error when running abra2: Inappropriate call if not paired read

Hi Mozack,

I tried to use novoalign3 to map the pair-end DNA reads and samtools to sort the reads into a sorted bam file. After that, I used abra2 to realign the reads but got the following error after processing the reads:

java.lang.IllegalStateException: Inappropriate call if not paired read
at htsjdk.samtools.SAMRecord.requireReadPaired(SAMRecord.java:871)
at htsjdk.samtools.SAMRecord.getFirstOfPairFlag(SAMRecord.java:929)
at abra.SortedSAMWriter.getOriginalReadInfo(SortedSAMWriter.java:300)
at abra.SortedSAMWriter.processChromosome(SortedSAMWriter.java:208)
at abra.SortedSAMWriter.outputFinal(SortedSAMWriter.java:150)
at abra.SortedSAMWriterRunnable.go(SortedSAMWriterRunnable.java:18)
at abra.AbraRunnable.run(AbraRunnable.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Do you have an idea how should this error be resolved?

Thanks,
Tenghui

Score threshold and content of the printed INFO

Hello,

the pipeline I am working on use in parrallele consensus (based on unique molecular index) or classical non consensus reads.
So for a sample, I have in the same time the bwa aligned reads, and the bwa conensus reads (duplicates reads coming from the same initial fragment are merge into a single consensus read, and the base quality is adjusted).

Using ABRA2, I was able to correctly adjust CIGAR for a 71 bp del event. However, ABRA2 doesn't find the same results depending of the input reads (classical or merge reads). More precisely, for a given read with the exact same alignment/CIGAR and sequence before ABRA2 realignment, only the deletion present in the consensus reads (with higher baseQ) are corrected.

Exemple for 2 identical reads, one coming from the basic process and the other from the consensus process :

1 - For basic reads (no difference before and after ABRA2)
before ABRA2 --------------------------------------
A00514:721:HHC55DRXY:1:2126:13856:21167 147 chr7 116411750 60 102M26S = 116411564 -288 AACACAGTCATTACAGTTTAAGATTGTCGTCGATTCTTGTGTGCTGTCTTATATGTAGTCCATAAAACCCATGAGTTCTGGGCACTGGGTCAAAGTCTCCTGCGCTACGATGCAAGAGTACACACTCC FFFFFFFF:FFFF:FF:FFFFF:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:102 AS:i:102 XS:i:20 RG:Z:D721R130

after ABRA2--------------------------------------
A00514:721:HHC55DRXY:1:2126:13856:21167 147 chr7 116411750 60 102M26S = 116411564 -288 AACACAGTCATTACAGTTTAAGATTGTCGTCGATTCTTGTGTGCTGTCTTATATGTAGTCCATAAAACCCATGAGTTCTGGGCACTGGGTCAAAGTCTCCTGCGCTACGATGCAAGAGTACACACTCC FFFFFFFF:FFFF:FF:FFFFF:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:102 AS:i:102 XS:i:20 RG:Z:D721R130

2 - For consensus reads (71pb DEL correctly aligned after ABRA2)
before ABRA2--------------------------------------
Illumina:608133 147 chr7 116411750 60 102M26S = 116411564 -288 AACACAGTCATTACAGTTTAAGATTGTCGTCGATTCTTGTGTGCTGTCTTATATGTAGTCCATAAAACCCATGAGTTCTGGGCACTGGGTCAAAGTCTCCTGCGCTACGATGCAAGAGTACACACTCC NNNNNNNNNNNNNNNNMNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NM:i:0 MD:Z:102 AS:i:102 XS:i:20 RG:Z:D721R130

after ABRA2--------------------------------------
Illumina:608133 147 chr7 116411750 60 102M71D26M = 116411564 -385 AACACAGTCATTACAGTTTAAGATTGTCGTCGATTCTTGTGTGCTGTCTTATATGTAGTCCATAAAACCCATGAGTTCTGGGCACTGGGTCAAAGTCTCCTGCGCTACGATGCAAGAGTACACACTCC NNNNNNNNNNNNNNNNMNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN YA:Z:chr7:116411008:844M71D663M MD:Z:102 RG:Z:D721R130 NM:i:71YM:i:0 YO:Z:chr7:116411750:-:102M26S AS:i:102 XS:i:20 YX:i:20

The only difference I found was the base quality (baseQ is increase if the consensus is good between duplicate reads), but in the command I launched, I precise --mapq 0 --mbq 0 to be sure to not filter reads because of their poor quality.
Here is the complete command line I used :

"""
java -Xmx35g -Djava.io.tmpdir=$TMPDIR -jar ${ABRA2JAR}
--ref hg19_base.fa
--in ${INPUT_BAM}
--targets $BED
--out ${OUTPUT_BAM}
--mapq 0 --mbq 0
2> Abra2.log
"""

Finally I wanted to compare the INFO of the given segment between the 2 realignments but I don't find how to interpret the differences I saw :

for basic reads------------------------
INFO Fri Nov 05 20:32:32 UTC 2021 PROCESS_REGION_MSECS: chr7_116411546_116412048 63 0 13 0

for consensus reads------------------------
INFO Fri Nov 05 22:17:42 UTC 2021 PROCESS_REGION_MSECS: chr7_116411546_116412048 148 0 24 0

What is the meaning of the 4 last numbers? I didn't find the information...

If you could please explain me what could cause such differences and how to interprete the INFO, that would be great!
I can provide more information if needed!

Camille

Fusions

Hello,
I am interested in using ABRA2 as part of a cancer DNA/RNAseq pipeline. However, I'm curious how ABRA2 behaves as far as structural variants and fusion transcripts are concerned. Any experience with that ?
Thanks.

what's wrong with that ref

INFO    Tue Feb 26 11:47:24 CST 2019    Processing chromosome chunk: chrM_1_16571
INFO    Tue Feb 26 11:47:24 CST 2019    Processing chromosome chunk: chr1_1_25000000
INFO    Tue Feb 26 11:47:24 CST 2019    Processing chromosome chunk: chr1_25000001_50000000
INFO    Tue Feb 26 11:47:24 CST 2019    Processing chromosome chunk: chr1_50000001_75000000
java.lang.IllegalArgumentException: Invalid reference index -1

IndexOutOfBoundsException at AltContigGenerator.java#L273

In running abra2, I'm hitting this runtime exception:

INFO	Thu Nov 02 15:50:04 UTC 2017	Abra version: 2.11
INFO	Thu Nov 02 15:50:04 UTC 2017	Abra params: [/hemp/abra2-2.11.jar --tmpdir /hemp/javatmpdir --ref /hg38/Homo_sapiens_assembly38.fasta --dist 1000 --in /hemp/NA12878_chr22.bam --threads 4 --gkl --targets chr22.bed --log error --out output-chr22.bam]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
	at java.util.ArrayList.rangeCheck(ArrayList.java:657)
	at java.util.ArrayList.get(ArrayList.java:433)
	at abra.AltContigGenerator.getAltContigs(AltContigGenerator.java:273)
	at abra.ReAligner.processRegion(ReAligner.java:1233)
	at abra.ReAligner.processChromosomeChunk(ReAligner.java:336)
	at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
	at abra.AbraRunnable.run(AbraRunnable.java:20)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Indel prev = indel.components.get(0);

In reviewing the code it looks like there is a path where indelComponents could be an empty ArrayList based upon the for-loop over CigarElements:

Indel indel = new Indel('C', read.getReferenceName(), indelComponents, firstIdx, SAMRecordUtils.sumBaseQuals(read));

I apologize up front, I haven't narrowed down a smaller recreate yet. This occurs consistently about 40 minutes into a run.

crash when writing files

Hi,

I tried to use abra2 to realign trio WGS germline DNA data. The server has 118G memory, and I have set -Xmx50G. After realignment is finished, the program was crashed when writing the files. Here is the log:

INFO Fri Jun 02 21:45:29 CST 2017 Waiting on 3 queued threads.
java.lang.OutOfMemoryError: Java heap space
at java.lang.String.(Unknown Source)
at htsjdk.samtools.util.StringUtil.bytesToString(StringUtil.java:301)
at htsjdk.samtools.util.StringUtil.bytesToString(StringUtil.java:288)
at htsjdk.samtools.BinaryTagCodec.readNullTerminatedString(BinaryTagCodec.java:423)
at htsjdk.samtools.BinaryTagCodec.readSingleValue(BinaryTagCodec.java:318)
at htsjdk.samtools.BinaryTagCodec.readTags(BinaryTagCodec.java:282)
at htsjdk.samtools.BAMRecord.decodeAttributes(BAMRecord.java:313)
at htsjdk.samtools.BAMRecord.getAttribute(BAMRecord.java:293)
at htsjdk.samtools.SAMRecord.getAttribute(SAMRecord.java:1110)
at htsjdk.samtools.SAMRecord.getStringAttribute(SAMRecord.java:1220)
at abra.SortedSAMWriter.getOriginalReadInfo(SortedSAMWriter.java:255)
at abra.SortedSAMWriter.processChromosome(SortedSAMWriter.java:186)
at abra.SortedSAMWriter.outputFinal(SortedSAMWriter.java:132)
at abra.SortedSAMWriterRunnable.go(SortedSAMWriterRunnable.java:18)
at abra.AbraRunnable.run(AbraRunnable.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

compilation terminated during make

Hi,

I keep facing this issue when making the Makefile

rm -rf target
mvn clean
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------------------< abra2:abra2 >-----------------------------
[INFO] Building abra 2.19
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ abra2 ---
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.197 s
[INFO] Finished at: 2019-07-20T23:40:00-06:00
[INFO] ------------------------------------------------------------------------
mkdir target
g++ -g -O2 -Isrc/main/c -I/Users/tzhou_admin/anaconda3/include -I/Users/tzhou_admin/anaconda3/include/linux -shared -fPIC src/main/c/assembler.cpp src/main/c/sg_aligner.cpp -o target/libAbra.so
src/main/c/assembler.cpp:1:19: fatal error: stdio.h: No such file or directory
#include <stdio.h>
^
compilation terminated.
src/main/c/sg_aligner.cpp:1:19: fatal error: stdio.h: No such file or directory
#include <stdio.h>
^
compilation terminated.

Could you please help figure out what is problem during the installation?

Thanks very much!

Expected memory requirements

We have been using 60GB of ram for ABRA2 on deeply-sequenced bams (20,000x total, 1,200x unique, ~10 GB in compressed size).

We would like to be able to realign as many bams together as possible, and have made attempts with 4-6 of these deeply-sequenced bams together, which has caused some memory issues. Is it reasonable to expect memory to scale linearly with the number of bams to be realigned? If there are any benchmarking results for such tests we would be interested to see them.

Thank you kindly

Nog4j2 error message.

Got the below error message running abra2 with --log error. Anyway to turn this off.

ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.

max_paths_from_root

Hello,
I have abra2 integrated in a snakemake workflow for variant calling and it usually does a great job. With my current dataset on Maniok esculenta, however, Abra2 exits with error(s). Attached a screenshot shows tail -n 15 of 3 log files. Its a bunch of bam files in 3 different sample-sets mapped against the same reference genome. "resistant", "susceptible", and "all_samples", where "all_samples" is a combination of the other two

"susceptible": abra2 finished and exited cleanly

"resistant": abra2 exits with

TOO_MANY_PATHS_FROM_ROOT: NC_035164.1_3550000_3550400 - ATTCTAATGCCTATTCT
TOO_MANY_PATHS_FROM_ROOT: NC_035164.1_3550000_3550400 - CTTATCGGCCATCAAAAGCAT

all_samples: abra2 exits with
INFO Thu Aug 12 21:27:26 CEST 2021 PROCESS_REGION_MSECS: NC_035170.1_25785600_25786000 478 26 24 0
java: src/main/c/sparsehash/internal/densehashtable.h:930: std::pair<google::dense_hashtable_iterator<V, K, HF, ExK, SetK, EqK, A>, bool> google::dense_hashtable<Value, Key, HashFcn, ExtractKey, SetKey, EqualKey, Alloc>::insert_noresize(google::dense_hashtable<Value, Key, HashFcn, ExtractKey, SetKey, EqualKey, Alloc>::const_reference) [with Value = int; Key = int; HashFcn = std::tr1::hash; ExtractKey = google::dense_hash_set<int, std::tr1::hash, eqint>::Identity; SetKey = google::dense_hash_set<int, std::tr1::hash, eqint>::SetKey; EqualKey = eqint; Alloc = google::libc_allocator_with_realloc; google::dense_hashtable<Value, Key, HashFcn, ExtractKey, SetKey, EqualKey, Alloc>::const_reference = const int&]: Assertion `(!settings.use_empty() || !equals(get_key(obj), get_key(val_info.emptyval))) && "Inserting the empty key"' failed.


Any help would be greatly appreciated. Ie., where can I set max_path_from_root and what is the reason for the error with sample-set "all-files"? I realise that I can skip particular regions, however, I am not sure I know what reason causes the error in "all_samples"

All this is running on an Ubuntu box with 40 cores and 188 GB RAM. I am happy to provide additional information if needed.

thanks a lot for your time
Norman

Screen Shot 2021-08-23 at 14 38 07

java.lang.OutOfMemoryError: Java heap space

Hello,

I run abra2.22 as part of cwl pipeline of nucleo.
abra2 runs for few days and than crashed:

INFO	Fri Sep 09 09:01:34 UTC 2022	PROCESS_REGION_MSECS:	chr9_5391847_5392563	802613	0	1834	0
INFO	Fri Sep 09 09:01:40 UTC 2022	PROCESS_REGION_MSECS:	chr9_5394500_5394751	27	0	6	0
INFO	Fri Sep 09 09:01:44 UTC 2022	PROCESS_REGION_MSECS:	chr9_5395989_5396107	21	0	0	0
INFO	Fri Sep 09 09:01:53 UTC 2022	PROCESS_REGION_MSECS:	chr9_5402364_5402613	8967	0	1	0
INFO	Fri Sep 09 09:01:58 UTC 2022	PROCESS_REGION_MSECS:	chr9_5403925_5404129	5748	0	1	0
INFO	Fri Sep 09 09:03:56 UTC 2022	PROCESS_REGION_MSECS:	chr9_5404631_5405449	117556	5	109	0
INFO	Fri Sep 09 09:04:10 UTC 2022	PROCESS_REGION_MSECS:	chr9_5405647_5406088	14397	1	19	0
INFO	Fri Sep 09 09:04:47 UTC 2022	PROCESS_REGION_MSECS:	chr9_5406273_5406759	36378	0	44	0
INFO	Fri Sep 09 09:04:52 UTC 2022	PROCESS_REGION_MSECS:	chr9_5414156_5414311	5786	0	4	0
INFO	Fri Sep 09 09:04:59 UTC 2022	PROCESS_REGION_MSECS:	chr9_5418517_5418643	6388	0	2	0
INFO	Fri Sep 09 09:04:59 UTC 2022	PROCESS_REGION_MSECS:	chr9_5418888_5419047	27	0	3	0
INFO	Fri Sep 09 09:20:44 UTC 2022	chr10:43114788 : 	Curr reads size: 575009
INFO	Fri Sep 09 10:00:55 UTC 2022	chr1:156874308 : 	Curr reads size: 554415
INFO	Fri Sep 09 10:25:22 UTC 2022	chr4:1808187 : 	Curr reads size: 408300
INFO	Fri Sep 09 11:20:25 UTC 2022	chr6:117321312 : 	Curr reads size: 618944
INFO	Fri Sep 09 12:42:03 UTC 2022	PROCESS_REGION_MSECS:	chr9_5420197_5421057	1331503	1	2437	0
INFO	Fri Sep 09 12:42:08 UTC 2022	PROCESS_REGION_MSECS:	chr9_5421423_5421585	5741	0	1	0
INFO	Fri Sep 09 12:42:14 UTC 2022	PROCESS_REGION_MSECS:	chr9_5422622_5422717	5211	0	1	0
INFO	Fri Sep 09 12:52:30 UTC 2022	chr1:156874311 : 	Curr reads size: 556194
INFO	Fri Sep 09 15:12:08 UTC 2022	chr1:156874314 : 	Curr reads size: 558110
INFO	Fri Sep 09 17:33:53 UTC 2022	chr6:117321315 : 	Curr reads size: 619345
INFO	Fri Sep 09 18:16:03 UTC 2022	PROCESS_REGION_MSECS:	chr2_29222739_29223539	3495717	1	3989	0
INFO	Fri Sep 09 19:09:00 UTC 2022	chr1:156874318 : 	Curr reads size: 559856
INFO	Fri Sep 09 19:16:38 UTC 2022	chr14:104773477 : 	Curr reads size: 542026
java.lang.OutOfMemoryError: Java heap space
	at java.lang.String.<init>(String.java:325)
	at htsjdk.samtools.util.StringUtil.bytesToString(StringUtil.java:301)
	at htsjdk.samtools.util.StringUtil.bytesToString(StringUtil.java:288)
	at htsjdk.samtools.BinaryTagCodec.readNullTerminatedString(BinaryTagCodec.java:423)
	at htsjdk.samtools.BinaryTagCodec.readSingleValue(BinaryTagCodec.java:318)
	at htsjdk.samtools.BinaryTagCodec.readTags(BinaryTagCodec.java:282)
	at htsjdk.samtools.BAMRecord.decodeAttributes(BAMRecord.java:313)
	at htsjdk.samtools.BAMRecord.getAttribute(BAMRecord.java:293)
	at htsjdk.samtools.SAMRecord.getAttribute(SAMRecord.java:1110)
	at htsjdk.samtools.SAMRecord.getStringAttribute(SAMRecord.java:1220)
	at abra.SortedSAMWriter.addAlignment(SortedSAMWriter.java:104)
	at abra.ReAligner.remapReads(ReAligner.java:779)
	at abra.ReAligner.processChromosomeChunk(ReAligner.java:424)
	at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
	at abra.AbraRunnable.run(AbraRunnable.java:20)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
java.lang.OutOfMemoryError: Java heap space
	at abra.SimpleMapper.getPositionMismatches(SimpleMapper.java:80)
	at abra.SimpleMapper.map(SimpleMapper.java:157)
	at abra.ReadEvaluator.getImprovedAlignment(ReadEvaluator.java:70)
	at abra.ReadEvaluator.getImprovedAlignment(ReadEvaluator.java:34)
	at abra.ReAligner.remapRead(ReAligner.java:592)
	at abra.ReAligner.remapReads(ReAligner.java:771)
	at abra.ReAligner.processChromosomeChunk(ReAligner.java:424)
	at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
	at abra.AbraRunnable.run(AbraRunnable.java:20)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
INFO [job abra2_2_22] Max memory used: 24762MiB

When I it run on a small input it finished successfully. But for bigger fastq files - 6 GB - it was very slow and didn't finished.
The command is:
/usr/src/abra2-2.22.jar --threads 16 --tmpdir /tmp --cons --ca 10,1 --in gatk_uncollapsed_MD.bam --mad 1000 --mmr 0.1 --no-edge-ci --nosort --out UBG_abra2_uncollapsed_IR.bam --ref hg38.fasta --sga 8,32,48,1 --sc 100,30,80,15 --targets gatk_uncollapsed_MD.bed --ws 800,700

What can be the reason for using so many memory and why each step by interval took so long?

Thank you in advance for your help,
Adily

UnsatisfiedLinkError on Mac OS

Hi,

when trying to run the pre-compiled abra2-2.11.jar on a Mac I get this error:

INFO	Fri Oct 06 16:02:15 BST 2017	Abra version: 2.11
INFO	Fri Oct 06 16:02:15 BST 2017	Abra params: [/Users/pmca/Software/ngs/abra2/abra2-2.11.jar --in ind123.bwa.clean.bam --out ind123.bwa.clean.abra.bam --mapq 10 --ref /Users/pmca/Scripts/test_data/ngs-data/snp1-illumina/refseqs.3contigs.100kb.fas]
INFO	Fri Oct 06 16:02:15 BST 2017	ABRA version: 2.11
INFO	Fri Oct 06 16:02:15 BST 2017	input0: ind123.bwa.clean.bam
INFO	Fri Oct 06 16:02:15 BST 2017	output0: ind123.bwa.clean.abra.bam
INFO	Fri Oct 06 16:02:15 BST 2017	regions: null
INFO	Fri Oct 06 16:02:15 BST 2017	reference: /Users/pmca/Scripts/test_data/ngs-data/snp1-illumina/refseqs.3contigs.100kb.fas
INFO	Fri Oct 06 16:02:15 BST 2017	num threads: 4
INFO	Fri Oct 06 16:02:15 BST 2017	minEdgeFrequency: 0
minNodeFrequncy: 1
minContigLength: -1
minBaseQuality: 20
minReadCandidateFraction: 0.01
maxAverageRegionDepth: 1000
minEdgeRatio: 0.01

INFO	Fri Oct 06 16:02:15 BST 2017	paired end: true
INFO	Fri Oct 06 16:02:15 BST 2017	isSkipAssembly: false
INFO	Fri Oct 06 16:02:15 BST 2017	useSoftClippedReads: true
INFO	Fri Oct 06 16:02:15 BST 2017	SW scoring: [8, 32, 48, 1]
INFO	Fri Oct 06 16:02:15 BST 2017	Soft clip params: [16, 13, 80, 15]
INFO	Fri Oct 06 16:02:15 BST 2017	Java version: 1.8.0_60
INFO	Fri Oct 06 16:02:15 BST 2017	hostname: MacBook-Pro.local
INFO	Fri Oct 06 16:02:15 BST 2017	SG match,mismatch,gap_open_penalty,gap_extend_penalty: 8,-32,-48,-1
INFO	Fri Oct 06 16:02:15 BST 2017	Using temp directory: /var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563
INFO	Fri Oct 06 16:02:15 BST 2017	Loading native library from: /var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563/libAbra.so
ERROR	Fri Oct 06 16:02:15 BST 2017	Error loading: libAbra.so from : /var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563
java.lang.UnsatisfiedLinkError: /private/var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563/libAbra.so: dlopen(/private/var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563/libAbra.so, 1): no suitable image found.  Did find:
	/private/var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563/libAbra.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
	at java.lang.ClassLoader$NativeLibrary.load(Native Method)
	at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1938)
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1821)
	at java.lang.Runtime.load0(Runtime.java:809)
	at java.lang.System.load(System.java:1086)
	at abra.NativeLibraryLoader.load(NativeLibraryLoader.java:45)
	at abra.ReAligner.init(ReAligner.java:1533)
	at abra.ReAligner.reAlign(ReAligner.java:159)
	at abra.ReAligner.run(ReAligner.java:1711)
	at abra.Abra.main(Abra.java:12)
Exception in thread "main" java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: /private/var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563/libAbra.so: dlopen(/private/var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563/libAbra.so, 1): no suitable image found.  Did find:
	/private/var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563/libAbra.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
	at abra.NativeLibraryLoader.load(NativeLibraryLoader.java:57)
	at abra.ReAligner.init(ReAligner.java:1533)
	at abra.ReAligner.reAlign(ReAligner.java:159)
	at abra.ReAligner.run(ReAligner.java:1711)
	at abra.Abra.main(Abra.java:12)
Caused by: java.lang.UnsatisfiedLinkError: /private/var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563/libAbra.so: dlopen(/private/var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563/libAbra.so, 1): no suitable image found.  Did find:
	/private/var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563/libAbra.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
	at java.lang.ClassLoader$NativeLibrary.load(Native Method)
	at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1938)
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1821)
	at java.lang.Runtime.load0(Runtime.java:809)
	at java.lang.System.load(System.java:1086)
	at abra.NativeLibraryLoader.load(NativeLibraryLoader.java:45)
	... 4 more

The command line to call ABRA2 was:

java -jar ~/Software/ngs/abra2/abra2-2.11.jar --in ind123.bwa.clean.bam --out ind123.bwa.clean.abra.bam --mapq 10 --ref /Users/pmca/Scripts/test_data/ngs-data/snp1-illumina/refseqs.3contigs.100kb.fas

if I do an ls in the directory ABRA2 claims to not find the library I see:

> ls -l /private/var/folders/wb/_zmb2_3n1cj6clc4_x1yv5300000gn/T/abra2_2fdaca04-d870-4fa6-9539-dbd4f569928a6991233934775758563/
total 1512
-rw-r--r--  1 pmca  staff  770248  6 Oct 16:02 libAbra.so

My system is:
Darwin MacBook-Pro.local 15.6.0 Darwin Kernel Version 15.6.0: Sun Jun 4 21:43:07 PDT 2017; root:xnu-3248.70.3~1/RELEASE_X86_64 x86_64

Thanks for the help.
Pedro

abra2 error java.lang.NumberFormatException

Once I try to run ABRA2, I get the following error. Would you please look into it? Thanks.

cmd=java -Xmx16G -jar target/abra2-2.19-jar-with-dependencies.jar --in /path_to/filesample_dedupSort.bam --out /path_to/filesample_abraRealign.bam --ref /path_to/hg19.fasta --threads 8 --targets /path_to/uploads_3091241_Covered-2.bed --tmpdir /path_to/tmpDir

INFO Fri Jan 18 10:34:36 CET 2019 Abra version: 2.19
INFO Fri Jan 18 10:34:36 CET 2019 Abra params: [/usr/local/bioinfo/Tools/abra2/target/abra2-2.19-jar-with-dependencies.jar --in /mnt/analyses/KDM-PROD/IOV_baseline/ResVarCall/IOV-KDM_N12enz/alignment/ALN_IOV_IOV_OKDM_IOV_N12enz_dedupSort.bam --out /mnt/analyses/KDM-PROD/IOV_baseline/ResVarCall/IOV-KDM_N12enz/alignment/ALN_IOV_IOV_OKDM_IOV_N12enz_20181012091021_abraRealign.bam --ref /mnt/analyses/Results/Reference/hg19.fasta --threads 8 --targets /mnt/analyses/PROJECT/kdm_IOV/uploads_3091241_Covered-ModifForAbraTest.bed --tmpdir /usr/local/bioinfo/Tools/abra2/tmpRim]
INFO Fri Jan 18 10:34:36 CET 2019 ABRA version: 2.19
INFO Fri Jan 18 10:34:36 CET 2019 input0: /mnt/analyses/KDM-PROD/IOV_baseline/ResVarCall/IOV-KDM_N12enz/alignment/ALN_IOV_IOV_OKDM_IOV_N12enz_dedupSort.bam
INFO Fri Jan 18 10:34:36 CET 2019 output0: /mnt/analyses/KDM-PROD/IOV_baseline/ResVarCall/IOV-KDM_N12enz/alignment/ALN_IOV_IOV_OKDM_IOV_N12enz_20181012091021_abraRealign.bam
INFO Fri Jan 18 10:34:36 CET 2019 regions: /mnt/analyses/PROJECT/kdm_IOV/uploads_3091241_Covered-ModifForAbraTest.bed
INFO Fri Jan 18 10:34:36 CET 2019 reference: /mnt/analyses/Results/Reference/hg19.fasta
INFO Fri Jan 18 10:34:36 CET 2019 num threads: 8
INFO Fri Jan 18 10:34:36 CET 2019 minEdgeFrequency: 0
minNodeFrequncy: 1
minContigLength: -1
minBaseQuality: 20
minReadCandidateFraction: 0.01
maxAverageRegionDepth: 1000
minEdgeRatio: 0.01

INFO Fri Jan 18 10:34:36 CET 2019 paired end: true
INFO Fri Jan 18 10:34:36 CET 2019 isSkipAssembly: false
INFO Fri Jan 18 10:34:36 CET 2019 useSoftClippedReads: true
INFO Fri Jan 18 10:34:36 CET 2019 SW scoring: [8, 32, 48, 1]
INFO Fri Jan 18 10:34:36 CET 2019 Soft clip params: [16, 13, 80, 15]
INFO Fri Jan 18 10:34:36 CET 2019 Java version: 1.8.0_191
INFO Fri Jan 18 10:34:36 CET 2019 hostname: bioit-dev
INFO Fri Jan 18 10:34:36 CET 2019 SG match,mismatch,gap_open_penalty,gap_extend_penalty: 8,-32,-48,-1
INFO Fri Jan 18 10:34:36 CET 2019 Using temp directory: /usr/local/bioinfo/Tools/abra2/tmpRim/abra2_358def45-c903-44a8-a674-5bc2c317367f1665716855282894176
INFO Fri Jan 18 10:34:36 CET 2019 Loading native library from: /usr/local/bioinfo/Tools/abra2/tmpRim/abra2_358def45-c903-44a8-a674-5bc2c317367f1665716855282894176/libAbra.so
INFO Fri Jan 18 10:34:36 CET 2019 Loading reference map: /mnt/analyses/Results/Reference/hg19.fasta
INFO Fri Jan 18 10:36:10 CET 2019 Done loading ref map. Elapsed secs: 93
INFO Fri Jan 18 10:36:10 CET 2019 Reading Input SAM Header and identifying read length
INFO Fri Jan 18 10:36:10 CET 2019 Identifying header and determining read length
INFO Fri Jan 18 10:36:13 CET 2019 Min insert length: 0
INFO Fri Jan 18 10:36:13 CET 2019 Max insert length: 230110226
INFO Fri Jan 18 10:36:13 CET 2019 Max read length is: 150
INFO Fri Jan 18 10:36:13 CET 2019 Min contig length: 151
INFO Fri Jan 18 10:36:13 CET 2019 Read length: 150
INFO Fri Jan 18 10:36:13 CET 2019 Loading target regions
INFO Fri Jan 18 10:36:13 CET 2019 Loading target regions from : /mnt/analyses/PROJECT/kdm_IOV/uploads_3091241_Covered-ModifForAbraTest.bed
INFO Fri Jan 18 10:36:13 CET 2019 Collapsed regions from 1160 to 1029
INFO Fri Jan 18 10:36:13 CET 2019 Num regions: 1300
INFO Fri Jan 18 10:36:13 CET 2019 Total junctions input: 0
INFO Fri Jan 18 10:36:13 CET 2019 Final Junctions: 0, Variant Junctions: 0
INFO Fri Jan 18 10:36:13 CET 2019 Intel deflater disabled
INFO Fri Jan 18 10:36:13 CET 2019 Processing chromosome chunk: chr1_1_25000000
INFO Fri Jan 18 10:36:13 CET 2019 Processing chromosome chunk: chr1_25000001_50000000
INFO Fri Jan 18 10:36:13 CET 2019 Processing chromosome chunk: chr1_50000001_75000000
INFO Fri Jan 18 10:36:13 CET 2019 Processing chromosome chunk: chr1_75000001_100000000
INFO Fri Jan 18 10:36:13 CET 2019 Processing chromosome chunk: chr1_100000001_125000000
INFO Fri Jan 18 10:36:13 CET 2019 Processing chromosome chunk: chr1_125000001_150000000
INFO Fri Jan 18 10:36:13 CET 2019 Processing chromosome chunk: chr1_175000001_200000000
INFO Fri Jan 18 10:36:13 CET 2019 Processing chromosome chunk: chr1_150000001_175000000
java.lang.NumberFormatException: For input string: "-3,265707"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at abra.ScoredContig.convertAndFilter(ScoredContig.java:53)
at abra.ReAligner.assemble(ReAligner.java:1114)
at abra.ReAligner.processRegion(ReAligner.java:1293)
at abra.ReAligner.processChromosomeChunk(ReAligner.java:361)
at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
at abra.AbraRunnable.run(AbraRunnable.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

ABRA versions starting from v2.07 crash with single end data

Hi,

I have been successfully using ABRA 2 on single end data, but versions 2.07 and above crash with this error:

java.lang.IllegalStateException: Inappropriate call if not paired read
	at htsjdk.samtools.SAMRecord.requireReadPaired(SAMRecord.java:871)
	at htsjdk.samtools.SAMRecord.getFirstOfPairFlag(SAMRecord.java:929)
	at abra.ReAligner.subsetReads(ReAligner.java:722)
	at abra.ReAligner.processRegion(ReAligner.java:1126)
	at abra.ReAligner.processChromosomeChunk(ReAligner.java:474)
	at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
	at abra.AbraRunnable.run(AbraRunnable.java:20)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

Flags in my BAM file are all 0 or 16 as expected for single end data.

If this can help version 2.07 outputs this message just before crashing:

INFO	Wed Sep 27 15:51:03 CEST 2017	Waiting for writer threads to complete
INFO	Wed Sep 27 15:51:03 CEST 2017	Finishing: test2.abra.bam

v2.24 gives "Error loading: libAbra.so", older versions do not

On a Ubuntu base image, if I try to use the precompiled abra2-2.24.jar, using command:

java -jar abra2-2.24.jar --in input.bam --out out.abra.bam --ref reference.fa --threads 2 --tmpdir /tmp

I get the following error:

ERROR   Tue Sep 14 23:20:22 CDT 2021    Error loading: libAbra.so from : /tmp/abra2_e0d4e0c6-313a-4445-bb03-44d341dd5c338049499640746259132
java.lang.RuntimeException: Unable to load library: libAbra.so from path [/libAbra.so] into tempdir: [/tmp/abra2_e0d4e0c6-313a-4445-bb03-44d341dd5c33804949964
0746259132]
        at abra.NativeLibraryLoader.load(NativeLibraryLoader.java:50)
        at abra.ReAligner.init(ReAligner.java:1627)
        at abra.ReAligner.reAlign(ReAligner.java:163)
        at abra.ReAligner.run(ReAligner.java:1810)
        at abra.Abra.main(Abra.java:12)
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to load library: libAbra.so from path [/libAbra.so] into tempdir: [/
tmp/abra2_e0d4e0c6-313a-4445-bb03-44d341dd5c338049499640746259132]
        at abra.NativeLibraryLoader.load(NativeLibraryLoader.java:57)
        at abra.ReAligner.init(ReAligner.java:1627)
        at abra.ReAligner.reAlign(ReAligner.java:163)
        at abra.ReAligner.run(ReAligner.java:1810)
        at abra.Abra.main(Abra.java:12)
Caused by: java.lang.RuntimeException: Unable to load library: libAbra.so from path [/libAbra.so] into tempdir: [/tmp/abra2_e0d4e0c6-313a-4445-bb03-44d341dd5c
338049499640746259132]
        at abra.NativeLibraryLoader.load(NativeLibraryLoader.java:50)
        ... 4 more

If I fall back to an older release abra2-2.19.jar, the identical command works just fine, which leads me to suspect something may be wrong with the latest release.

Identical runs give slightly different results

My team recently performed two separate identical runs of abra2 (version 2.15) on copies of the same set of BAM files. We expected the resulting processed BAMs to be identical in size between the two sets. However, they ended up having slightly differing file sizes. I had a look at the log file (from stdout/stderr) and retreived the values corresponding to assembledContigCount and nonAssembledContigCount in src/main/java/abra/ReAligner.java for each region. I found that there was some variation in these values between the two runs (see the attached plot for the assembledContigCount variable - note that the values have been log10 transformed).

This suggests that abra2 is non-deterministic. We also found that downstream variant calling was different between the two runs, possibly as a result of the varying output from abra2.

Do you know what might be the cause of this non-determinisim and Is there is any way to make abra2 deterministic?

assembled_contigs_plot

Bed file format

My bed file with its one-line header made abra2 crash with the following error:
Loading target regions from : target.bed
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at abra.RegionLoader.load(RegionLoader.java:49)
at abra.ReAligner.getRegions(ReAligner.java:1347)
at abra.ReAligner.loadRegions(ReAligner.java:1379)
at abra.ReAligner.reAlign(ReAligner.java:180)
at abra.ReAligner.run(ReAligner.java:1833)
at abra.Abra.main(Abra.java:12)
Once I removed the header line, it run well. Many bed files have headers so maybe it would be useful to allow them.
Thanks in advance

Abra2 creating large (around 100bp) deletions to alignment

Hello,

I have been using abra2 for viral reference based assembly for a while now and have not had any issues. However, recently I have noticed that in a few of my assemblies, there extremely large gaps being added in to the alignment by abra2. It appears that the tool is taking a few bases from the ends of a read and moving to a distant position. It is also odd because these regions do not contain any mismatches from the reference...

Below are screenshots of my alignment before and after running abra2.

Before:
image

After:
image

I am currently running abra2 version 2.23 installed via conda.

Let me know what other information I can provide! Thank you!

Best use cases?

Hey Lisle,

Great to see a new implementation of ABRA in the works! Just curious- which application would you recommending switching from ABRA to ABRA2 as it stands currently? Not sure how alpha certain aspects are and how niche the development/overall use case is meant to be.

Appreciate any feedback!

Supplementary alignment behavior question

Thanks for the software, it has been incredibly helpful. Hopefully, this question is not a repeat and this is an appropriate place to ask this questions.

I’m curious about some behavior I’m seeing in regards to supplementary alignments.

First case is a fairly standard case (input bam: out.perfect.bam output: abra.perfect.bam). We have a perfect alignment from hg19, chr12 start=12043874 stop=12044029 except I put a 34 bp insertion into this region 87 bases in. So, the “perfect” cigar is 87M34I35M, however, after alignment I get primary cigar: 87M69S and supplementary cigar: 121H35M. After running abra, I get primary cigar 87M34I35M (perfect, exactly what we expect), but the supplementary cigar is (unmodified)121H35M.

My first question is why is the supplementary read kept? It seems since the primary read was fixed and overlaps with this supplementary read that this supplementary read should be removed completely. Is there a flag that would produce this removal behavior or is this expected behavior?

Second case is a bit more odd (input bam: out_noise.bam output: abra.noise.bam). The input is approximately the same, (34 length insertion), but there is some noise on the two sides (primary cigar: 37M3D47M73S, supplementary cigar: 118H14M2D25M). The primary alignment again is exactly what I would expect 37M3D47M34I14M2D25M (perfect), however, the supplementary alignment the hard clips were trimmed off 14M2D25M, which I found odd.

Why would Abra2 modify the supplementary cigar modified in this way? It seems it should have been modified like this in case 1 as well or just removed.

This analysis was ran with very vanilla options (java -Xmx6G -jar /usr/bin/abra2.jar --in out.perfect.bam --out abra.perfect.bam --ref /usr/share/archer/reference/hg19/hg19.fa --threads 1 --tmpdir tmp), so I might just be missing a parameter. Attached are the input/output bams. Let me know if any logs/additional information would be useful.

Thank you for your help and your software.

abra_inputs_and_outputs.tar.gz

[root@5e9b723c8cbd tests]# java -Xmx6G -jar /usr/bin/abra2.jar
INFO    Tue Mar 12 18:21:24 UTC 2019    Abra version: 2.19

Generating BED targets

Hi,
do you have any recommended way of generating the BED targets file to give to ABRA2 --targets. I know that
"The targets argument is not required. When omitted, the entire genome will be eligible for realignment."
but I guess that specifying a targets file could potentialy speed up the ABRA2 process.

v2.08 crash

I tried the 2.08 version, and it crashed. Here's log:

ERROR Thu Aug 24 21:46:29 CST 2017 Error parsing assembled contigs. Line: [>chr4_49108801_49109201_1]

Contigs: [

chr4_49108801_49109201_0_-2.495560
ATTCCATTTCATTCCACTGGTGTTTATTCCATTCCACACCATTAAATTCCATTCCATTCTATTCCATTCGGGTTTATTCCAATTCATTCCATTCCATTCCATTCCATTCCATTCGGGTTTCATTCCATTCCACATTCCATTCCGTGTTGATTCCATTCCATTCCATTCCATTCCATTCCACTCCATTCCAATCCATTACATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCAGTGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCCACTCGGGTTGTTCCATTCCATTCCATTCCTTTCCATTCCATTCCATTCCGTTCCACTCGGCTTGA
chr4_49108801_49109201_1_-3.251221
ATTCCATTTCATTCCACTGGTGTTTATTCCATTCCACACCATTAAATTCCATTCCATTCTATTCCATTCGGGTTTATTCCAATTCATTCCATTCCATTCCATTCCATTCCATTCGGGTTTCATTCCATTCCATCAATCAAµ�ATTACATTCCACTCGGGTTTCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCGTTCC
chr4_49108801_49109201_2_-1.858778
TCCATTCCATTCCATTCCATTCCATTCGGGTTT¸õ_�TTTGTGTTGAATCCTTTCCATTCCTTTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATACCGTTCTGTTCCACTCCGTTCCATTGCATTCCATACCATTCTATTCCACGCGGGTTGATTCCATTGTATTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTCCACCA
chr4_49108801_49109201_3_-2.588145
TCCATTCCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCCATTCCATTCCATTCCATTCCTTTCCATTTGTGTTGATTCCATTCCATTCCAATCCATTCCATTTCATTCCACTCCATTCCAATCCATTACATTCCACCTGGGTTGAATCCTTTCCATTC
chr4_49108801_49109201_4_-3.234833
TCCATTCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCATTCCATTTGTGTTGATTCCATTCCATTGCATTCCATTCCATTCCACTCCATTCCAATCCATTACATTCCACTCGGGTTGAATCTTTTCCATTCCATTCCAATGCCTTCCCTTCCATTCAATTCCACTCGGATTCAGTCAATTCCATTCTATTCCATTCCATTACATTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTT
chr4_49108801_49109201_5_-2.855427
TCCATTCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCATTCCTTTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCGATTGCATTCCATTCCTTTCCAATTCAATCTATTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCGTACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTC
chr4_49108801_49109201_6_-2.855427
TCCATTCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCATTCCTTTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCGATTGCATTCCATTCCTTTCCATTCTATTCCATTCCGTTCCATTCCATTCCATTTGTGTTGATTCCATT
chr4_49108801_49109201_7_-2.980366
TCCATTCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCATTCCTTTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCGATTGCATTCCATTCCTTTCCATTCTATTCCATTCCGTTCCATTCCATTCCATTTGTGTTGATTCCATTTGCATTCCATTCCATTCCACTCCATTCCAATCCATTACATTCCACTCGGGTTGAATCCTTTCCATTCCTTTGCAATGCATTCCCTTCCATTCAATTCCACTCGAATTCAATCTATTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCGTACCATTCCATTCCACT
chr4_49108801_49109201_8_-3.202215
TCCATTCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCATTCCTTTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCGATTGCATTCCATTCCTTTCCATTCTATTCCATTCCGTTCCATTCCATTCCATTTGTGTTGATTCCATTTGCATTCCATTCCATTCCACTCCATTi
chr4_49108801_49109201_9_-2.498880
TCCATTCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCATTCCTTTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCGATTGCATTCCATTCCTTTCCATTCTATTCCATTCCGTTCCATTCCATTCCATTTGTGTTGATTCCATTCCATTGCATTCCATTCCATTCCACTCCATTCCAATCCATTACATTCCACTCGGGTTGAATCTTTTCCATTCCATTCCAATGCCTTCCCTTCCATTCAATTCCACTCGGATTCAGTCAATTCCATTCTATTCCATTCCATTACATTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTT
chr4_49108801_49109201_10_-2.748390
TTCCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCC�6TTGCTTCCATTCCATTCCATTCCATTCCACTCCATTCCAATCCATTACATTCCACTCGGGTTGAATCATTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCC
chr4_49108801_49109201_11_-3.162363
TTCCATTCCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATCCCATTTGTGTTGCTTCCATTCCATTCCATTCCATTCCACTCCATTCCAATCCATTGCATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCATTGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCTGTTTCACTCCATTCCCTTGCATTCCATACCATTCCATTCCACT
chr4_49108801_49109201_12_-2.685242
TTCCATTCCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATCCCATTTGTGTTGCTTCCATTCCATTCCATTCCATTCCACTCCATTCCAATCCATTGCATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCATTGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCTGTTTCACTCCATTCCCTTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAGTTCCATCCCATTCCATTCCATTACATTCTATTGCATTCCATTCCATTCCATACCATTCCACTTGGGTTCATTCCATACCATTCCATTCCATTCCATTCCAT
chr4_49108801_49109201_13_-2.999463
ATTCAATCTATTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCGTACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTC
chr4_49108801_49109201_14_-2.999463
TTCCATTCCATTCTTCTCGGGTTTC�ATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATTCCTTTCCATTTGTGTTGATTCCATT chr4_49108801_49109201_15_-3.124402 TTCCATTCCATTCTTCTCGGGTTTC�ATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATTCCTTTCCATTTGTGTTGATTCCATTTGCATTCCATTCCATTCCACTCCATTCCAATCCATTACATTCCACTCGGGTTGAATCCTTTCCATTCCTTTGCAATGCATTCCCTTCCATTCAATTCCACTCGAATTCAATCTATTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCGTACCATTCCATTCCACT
chr4_49108801_49109201_16_-3.268308
TTCCATTCCATTCTTCTCGGGTTTTCCATTCCATTCCACATTCTATTCCATTCCATTCCATTCCTTTCCATTTGTGTTGATTCCATTCCATTCCAATCCATTCCATTACATTCCACTCCATTCCTATCCATTACATTCCATTCGGGTTGAATCCTTTCCATTCCTTTCCAATGCTTTCTCTTCCATTCAATTCCACTCAGATTCAATCATTTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCGTTGCATTCCATACCATTCCATTCCACCCGG
chr4_49108801_49109201_17_-3.143369
TTCCATTCCATTCTTCTCGGGTTTTCCATTCCATTCCACATTCTATTCCATTCCATTCCATTCCTTTCCATTTGTGTTGATTCCATTCCATTCCAATCCATTCCATTACATTCCACTCCATTCCTATCCATTACATTCCATTCGGGTTGAATCCTTTCCATTCCTTTCCAATGCTTTCTCTTCCATTCAATTCCACTCAGATTCAATCATTTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCGTTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTTCATTCCATTCCATTCCATTGCATTCCAATCCATTACATTCCACTCGGGTTG
chr4_49108801_49109201_18_-3.197528
TTCCATTCCATTCTTCTCGGGTTTTCCATTCCATTCCACATTCTATTCCATTCCATTCCATTCCTTTCCATTTGTGTTGATTCCATTCCATTCCAATCCATTCCATTACATTCCACTCCATTCCTATCCATTACATTCCATTCGGGTTGAATCCTTTCCATTCCTTTCCAATGCTTTCTCTTCCATTCAATTCCACTCAGATTCAATCATTTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCGTTGCATTCCATACCATTCCATT
chr4_49108801_49109201_19_-3.021436
TTCCATTCCATTCTTCTCGGGTTTTCCATTCCATTCCACATTCTATTCCATTCCATTCCATTCCTTTCCATTTGTGTTGATTCCATTCCATTCCAATCCATTCCATTACATTCCACTCCATTCCTATCCATTACATTCCATTCGGGTTGAATCCTTTCCATTCCTTTCCAATGCTTTCTCTTCCATTCAATTCCACTCAGATTCAATCATTTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCGTTGCATTCCATACCATTCCATT
chr4_49108801_49109201_20_-3.031124
TTCCATTCCATTCTTCTCGGGTTTTCCATTCCATTCCACATTCTATTCCATTCCATTCCATTCCTTTCCATTTGTGTTGATTCCATTCCATTCCAATCCATTCCATTACATTCCACTCCATTCCTATCCATTACATTCCATTCGGGTTGAATCCTTTCCATTCCTTTCCAATGCTTTCTCTTCCATTCAATTCCACTCAGATTCAATCATTTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCGTTGCATTCCATACCEUiCh
chr4_49108801_49109201_21_-2.934214
TTCCATTCCATTCTTCTCGGGTTTTCCATTCCATTCCACATTCTATTCCATTCCATTCCATTCCTTTCCATTTGTGTTGATTCCATTCCATTCCAATCCATTCCATTACATTCCACTCCATTCCTATCCATTACATTCCATTCGGGTTGAATCCTTTCCATTCCTTTCCAATGCTTTCTCTTCCATTCAATTCCACTCAGATTCAATCATTTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCGTTGCATTCCATACCATTC
chr4_49108801_49109201_22_-3.052313
TTCCATTCCATTCTTCTCGGGTTTTCCATTCCATTCCACATTCTATTCCATTCCATTCCATTCCTTTCCATTTGTGTTGATTCCATTCCATTCCAATCCATTCCATTACATTCCACTCCATTCCTATCCATTACATTCCATTCGGGTTGAATCCTTTCCATTCCTTTCCAATGCTTTCTCTTCCATTCAATTCCACTCAGATTCAATCATTTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCGTTGCATTCCATACCEUiCCATTCCøh
chr4_49108801_49109201_23_-3.098070
TTCCATTCCATTCTTCTCGGGTTTTCCATTCCATTCCACATTCTATTCCATTCCATTCCATTCCTTTCCATTTGTGTTGATTCCATTCCATTCCAATCCATTCCATTACATTCCACTCCATTCCTATCCATTACATTCCATTCGGGTTGAATCCTTTCCATTCCTTTCCAATGCTTTCTCTTCCATTCAATTCCACTCAGATTCAATCATTTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCGTTGCATTCCATACC!0iCCATTh
chr4_49108801_49109201_24_-3.190414
TTCCATTCCATTCTTCTTTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATTC��TGAATCCTTTCCATTCCGTTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAACCAATTCCATTCTATTAAATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATn�TCGTGTTTCC chr4_49108801_49109201_25_-2.968565 TTCCATTCCATTCTTCTTTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATTC��TGAATCCTTTCCATTCCGTTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAACCAATTCCATTCTATTAAATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATn�TCGTGTTTCCCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCATTCCATTCCATTCCATTCCATTCCATTCAATTCCATTCCATTCCATTCGTGTTGATGCCATTCCAATCCATA
chr4_49108801_49109201_26_-3.269595
TTCCATTCCATTCTTCTTTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATTC��TGAATCCTTTCCATTCCGTTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAACCAATTCCATTCTATTAAATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATn�TCGTGTTTCCCCATTC chr4_49108801_49109201_27_-3.269595 TTCCATTCCATTCTTCTTTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATTC��TGAATCCTTTCCATTCCGTTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAACCAATTCCATTCTATTAAATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATn�TCGTGTTTCCCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCATTCCATTCCATTCCATTCCATTCCATTCCATTCCACTCCATTCCAT
chr4_49108801_49109201_28_-3.269595
TTCCATTCCATTCTTCTTTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATTC��TGAATCCTTTCCATTCCGTTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAACCAATTCCATTCTATTAAATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATn`�TCGTGTTTCCCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATT
chr4_49108801_49109201_29_-2.772878
TTCCATTCCATTCTTCTTTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATTC��TGAATCCTTTCCATTCCGTTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAACCAATTCCATTCTATTAAATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCAT
chr4_49108801_49109201_30_-2.427795
TTCCATTCCATTCTTCTGATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATTCCATTCCATTCCATTCCATTGCATTCCATTCCACTCCATTCCAATCGATTACATTCCACTCGGTTTGAATCCT
chr4_49108801_49109201_31_-2.752935
TTCCATTCCATTCTTCTTCGGGTTGAGTU�CATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCGTTCC
chr4_49108801_49109201_32_-2.974784
TTCCATTCCATTCTTCTTCGGGTTGAGTU�CATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCTATTCC
chr4_49108801_49109201_33_-3.278945
TCAATCAAµ�ATCCATTACATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTGAGTTCTATTCCATTCCGTTCTGTTGCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATCCCATTCCCTTGCATTCCATGCTATTGT
chr4_49108801_49109201_34_-3.043925
TCAATCAAµ�ATTACATTCCACTCGGGTTTCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCGTTCC
chr4_49108801_49109201_35_-3.265774
TCAATCAAµ�ATTACATTCCACTCGGGTTTCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCTATTCC
chr4_49108801_49109201_36_-2.795212
TTCCATTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTACCATTCCATTCCACTGCATTCCAACCCATTACATTGCACTCGGGTTGAATC©hTCCATTCCATTCCAATGCATTCCCTTTCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCA}°h
chr4_49108801_49109201_37_-3.021895
TTCCATTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTACCATTCCATTCCACTGCATTCCAACCCATTACATTGCACTCGGGTTGAATCTGTTGATCATTCCATTGTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGG��
chr4_49108801_49109201_38_-3.197987
TTCCATTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTACCATTCCATTCCACTGCATTCCAACCCATTACATTGCACTCGGGTTGAATCTGTTGATCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCATTCCATTCCATTCCATTCCATTTCATTCCATTCCGTTCCTTTCCATTCGTGTTGATGCCATTCCAATCCATAC
chr4_49108801_49109201_39_-2.800047
TTCCATTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTACCATTCCATTCCACTGCATTCCAACCCATTACATTGCACTCGGGTTGAATCU¥hTCTATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGG��
chr4_49108801_49109201_40_-3.261299
TTCCATTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTACCATTCCATTCCACTGCATTCCAACCCATTACATTGCACTCGGGTTGAATC3¥hTCCATTCCATTCCAATGCATTCCCTTTCATTCAATTCC�i
chr4_49108801_49109201_41_-2.767398
TTCCATTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTACCATTCCATTCCACTGCATTCCAACCCATTACATTGCACTCGGGTTGAATCTCTATTCCATTCCGTTCTGTTCCACTCCATTCCATTGTTCC�i
chr4_49108801_49109201_42_-2.988822
TTCCATTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTACCATTCCATTCCACTGCATTCCAACCCATTACATTGCACTCGGGTTGAATCTCTATGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTTTATTGCATTCC��
chr4_49108801_49109201_43_-3.143724
TTCCATTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTACCATTCCATTCCACTGCATTCCAACCCATTACATTGCACTCGGGTTGAATCC�CCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCAATTCCATTGCACTCGTGTTGATTCCCTTACATTCCATTCCATTCCATTCCACTCGGGTTGATTCCATTCCTTTCCTTTCCATTGCATTCTATTC
chr4_49108801_49109201_44_-3.226215
TTCCATTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTACCATTCCATTCCACTGCATTCCAACCCATTACATTGCACTCGGGTTGAATCC�GCATTCCATACATTCTATTGCATTCCATTCCATTCCATTGi
chr4_49108801_49109201_45_-2.742254
TTCCATTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTACCATTCCATTCCACTGCATTCCAACCCATTACATTGCACTCGGGTTGAATC
©hCATTCCATACCATTCCATTCCACTCG}°hTTCAATTCC�i
chr4_49108801_49109201_46_-2.975396
TCCATTTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTAAATTCCATTCCATTCTATTCCATTCGGGTTTATTCCAATTCATTCCATTCC
#�CCATTCCATTCCACTGCATTCCAACCCATTACATTGCACTCGGGTTGAATCCTTTCCATTCCATTCCAATGCATTCCCTTTCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTT~©hTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCA}°h
chr4_49108801_49109201_47_-3.202080
TCCATTTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTAAATTCCATTCCATTCTATTCCATTCGGGTTTATTCCAATTCATTCCATTCC
TGTTGATCATTCCATTGTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGG��ATGCATTCCCTTTCATTCAATTCC�i
chr4_49108801_49109201_48_-2.980231
TCCATTTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTAAATTCCATTCCATTCTATTCCATTCGGGTTTATTCCAATTCATTCCATTCC
U¥hTCTATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGG��ATGCATTCCCTTTCATTCAATTCC�i
chr4_49108801_49109201_49_-2.947582
TCCATTTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTAAATTCCATTCCATTCTATTCCATTCGGGTTTATTCCAATTCATTCCATTCC
TCTATTCCATTCCGTTCTGTTCCACTCCATTCCATTGGCACTCGGGTTGAATCCTTTCCATTCCATTCCAATGCATTCCCTTTCATTCAATTCC�i
chr4_49108801_49109201_50_-3.169007
TCCATTTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTAAATTCCATTCCATTCTATTCCATTCGGGTTTATTCCAATTCATTCCATTCC
TCTATGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTTTATTGCATTCC��
chr4_49108801_49109201_51_-2.922439
TCCATTTCCATTCTTCTTCGGGTTTATTCCATTTCATTCCATTAAATTCCATTCCATTCTATTCCATTCGGGTTTATTCCAATTCATTCCATTCC
~©hCATTCCATACCATTCCATTCCACTCG}°hACATTGCACTCGGGTTGAATCCTTTCCATTCCATTCCAATGCATTCCCTTTCATTCAATTCC�i
chr4_49108801_49109201_52_-2.831805
CCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTGCATTCCATTCTGTTCCACTCCATTCCATTGCATTCCATACAATTCCCTTCCACTCGGGTTGATTCCATTGAATTCCATCCCATTCCATTTCATTCCATTCTATTGCATTCCATTCCATTCCATTCCACTCGTGTTGATTCCTTT
chr4_49108801_49109201_53_-2.954651
ATCCATTACATÆn�TAAATCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCATTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTTAATTCCATCCCATTCCA
chr4_49108801_49109201_54_-3.080724
ATCCATTACATÆn�TTCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCGTTCC
chr4_49108801_49109201_55_-3.268197
ATCCATTACATÆn�TATCCTTTCCATTAATTCCATTCTATTCCATTCCGTTCTGTTTCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAGTTCCATCCCATTCCATTCCATTACATTCTATTGCATTCCATTCCATTCCATACCATTCCACTTGGGTTCATTCCATACCATTCCATTCCATTCCATTCCAT
chr4_49108801_49109201_56_-3.240169
ATCCATTACATÆn�TATCCTTTCCATTCTTTCCATTCTATTCCATTCCATTCCATTCCATTTGTGhTTCGGGTTTATTCCAATTCATTCCATTCC
chr4_49108801_49109201_57_-3.057595
ATCCATTACATÆn�TATCCTTCCATTCCTTTCCATTCTATTCCATTCCATTCCATTCCATTTGTGhTTCGGGTTTATTCCAATTCATTCCATTCC
chr4_49108801_49109201_58_-2.595500
TCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATTCCATTTGTGTTGATTCCATTCCATTCCATTCCATTCCGCTCCTTTGCAATCCATTACATTCCACTCGTGTTGAATCCTTTCCATTCCATTCCAAAGCATTCCCTTCCATTCAATTCCACTCGGAT
chr4_49108801_49109201_59_-2.072622
TCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCT6®ATTAAATTCCATTCCATTCTATTCCATTCGGGTTTATTCCAATTCATTCCATTCC
chr4_49108801_49109201_60_-2.698163
CCAATGCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTACACTCGGGTTGATTCCATTGAATTCCATTCCGTTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTATACTCGAGTTCATTCCATTCCATTCCTTTCCACTCCACTCCGGTTGATTCCATTCCATTAAATTCCATTCCATTGCATTCCATTCCATTCCAT
chr4_49108801_49109201_61_-2.698163
CCAATGCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTACACTCGGGTTGATTCCATTGAATTCCATTCCGTTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTCTAC
chr4_49108801_49109201_62_-2.397133
CCAATGCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTACACTCGGGTTGATTCCATTGAATTCCATTCCATTCCATTCCATTCCATTCTACTGCATTCCATTCCATTCCATTCCTCTGCGGTTCATTCCATT
chr4_49108801_49109201_63_-2.841816
CCAATGCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTTGGGTTGATTCCATTGAATTCCGTTCCATTCCATTCCATTCAATTCTATTCCATTCCA
chr4_49108801_49109201_64_-2.540786
CCAATGCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTTGGGTTGATTCCATTGAATTCCGTTCCATTCCATTCCATTCAATTCTATTGCATTCCATTCCATTCAATTCTATTGCATTCCATTCCATTCCATTCCACTGGGGTTGATTCCATTCC
chr4_49108801_49109201_65_-2.870462
CATTCCATTCCATTCCATTCTATTGCATACCATTCCAATCCATTCCACTGGGGTTGATTGCATTCCATTCCATTCTACTCCACTGCGGTTGATTCCATTTCCATTCCATACCATTGCATACAó5GGG�®h
chr4_49108801_49109201_66_-2.870462
CATTCCATTCCATTCCATTCTATTGCATACCATTCCAATCCATTCCACTGGGGTTGATTGCATTCCATTCCATTCTACTCCACTGCAGTTCCCTTCCTTTCCATTCCATACCATTGCATACAó5GGG�®h
chr4_49108801_49109201_67_-2.702228
CCAATGCCATTCAATTCCACTCGGATAGTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATACCATTCCATTCCATTCCCTTCCTTTCCATTCCATACCATTGCATACAó5GGG�®h
#�CTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATTCCATTCCATTCCATTCCATTCTTTTGCATTCCATTCCATTCCATTCCACTGCGGTTGATTCCATTCCAT
chr4_49108801_49109201_68_-3.012322
CCAATGCCATTCAATTCCACTCGGATAGTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATACCATTCCATTCCATTCCCTTCCTTTCCATTCCATACCATTGCATACAó5GGG�®h
#�CTTCTGTTCCACTCCATTCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTGCAATGG
chr4_49108801_49109201_69_-2.808202
CCAATGCCATTCAATTCCACTCGGATAGTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATACCATTCCATTCCATTCCCTTCCTTTCCATTCCATACCATTGCATACAó5GGG�®h
#�CTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATTCCATTCCATTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTCCACTGCGGTTCATTCCATTCCATTCCATTCTACTCCACTCGGGTTGATTCCATTCCATTCCATTCCA
chr4_49108801_49109201_70_-0.477121
CCATTGCATTCCATTCCTTTCCATTCCGTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATACCATTCCATTCCATTCTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTTTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTACACTCGGGTTGATTCCATTGAATTCCATTCCGTTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTATACTCGAGTTCATTCCATTCCATTCCTTTCCACTCCACTCCGGTTGATTCCATTCCATTAAATTCCATTCCATTGCATTCCATTCCATTCCAT
chr4_49108801_49109201_71_-2.152610
ACTGGTGTTTATTCCTTTCCACTCCATTCCATTCCATTCCATTCGGGTTTATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTTGTGTTGATTCCC�TACATTCCACTCGGGTTGAATCCTTTCCATTCCTTTCCAATGCATTCCCTTCCATTCAATTCCACTCGGTTTCAATCAATTCCATTCCATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCTGTCGGGATGATTCCGTTCCATTCCATTATATTCTGTTCCATTCCATTCCATTCCATTTAACTCGGGTTGATTCCATTC
chr4_49108801_49109201_72_-2.358327
ACTGGTGTTTATTCCTTTCCACTCCATTCCATTCCATTCCATTCGGGTTTATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTTGTGTTGATTCCACTTCACATTCCACTCGGGTTGAATCCTTTCCATTCCTTTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCTTTCCATTCCATTCCGTTCTGTTCCACTCCATTCCATTGTATTCCATACCATTCCATTCCACTCGGGTTGATTCCGT
chr4_49108801_49109201_73_-2.068520
ACTGGTGTTTATTCCTTTCCACTCCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATCCCATTTGTGTTGCTTCCATTCCATTCCATTCCATTCCACTCCATTCCAATCCATTGCATTCCACTCGGGTTTCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCGTTCC
chr4_49108801_49109201_74_-2.290369
ACTGGTGTTTATTCCTTTCCACTCCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATCCCATTTGTGTTGCTTCCATTCCATTCCATTCCATTCCACTCCATTCCAATCCATTGCATTCCACTCGGGTTTCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCTATTCC
chr4_49108801_49109201_75_-1.710205
ACTGGTGTTTATTCCTTTCCACTCCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATCCCATTTGTGTTGCTTCCATTCCATTCCATTCCATTCCACTCCATTCCAATCCATTGCATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCATTGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCTGTTTCACTCCATTCCCTTGCATTCCATACCATTCCATTCCACT
chr4_49108801_49109201_76_-1.233084
ACTGGTGTTTATTCCTTTCCACTCCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATCCCATTTGTGTTGCTTCCATTCCATTCCATTCCATTCCACTCCATTCCAATCCATTGCATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCATTGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCTGTTTCACTCCATTCCCTTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAGTTCCATCCCATTCCATTCCATTACATTCTATTGCATTCCATTCCATTCCATACCATTCCACTTGGGTTCATTCCATACCATTCCATTCCATTCCATTCCAT
chr4_49108801_49109201_77_-1.869203
°�iGTGTTGCTCCACTCCATTû�iCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCT|°h
ATCCATTACATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTGAGTTCTATTCCATTCCGTTCTGTTGCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATCCCATTCCCTTGCATTCCATGCTATTGT
chr4_49108801_49109201_78_-2.394722
°�iGTGTTGCTCCACTCCATTû�iCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCT|°h
ATTACATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATGCAATTCCATTCCGTTCTGTTCCACTCCATTCCATTGCATTCCATAGCATTCCATTCCACTCGGGTTGATTCCATTGAATTCC
chr4_49108801_49109201_79_-2.188785
°�iGTGTTGCTCCACTCCATTû�iCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCT|°h
ATTACATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTACGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTC
chr4_49108801_49109201_80_-1.634183
°�iGTGTTGCTCCACTCCATTû�iCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCT|°h
ATTACATTCCACTCGGGTTTCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCGTTCC
chr4_49108801_49109201_81_-1.856032
°�iGTGTTGCTCCACTCCATTû�iCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCT|°h
ATTACATTCCACTCGGGTTTCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCTATTCC
chr4_49108801_49109201_82_-2.258810
°�iGTGTTGCTCCACTCCATTû�iCATTCCATTCCATTCCATTCGGGTTTATTCCATTTCATTCCATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCT|°h
ATTACATTCCACTCGGGTTGTCCTTTCCATTAATTCCATTCTATTCCATTCCGTTCTGTTTCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAGTTCCATCCCATTCCATTCCATTACATTCTATTGCATTCCATTCCATTCCATACCATTCCACTTGGGTTCATTCCATACCATTCCATTCCATTCCATTCCAT
chr4_49108801_49109201_83_-2.337121
TGTTGATCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCATTCCATTCCATTCCATTCCATTTCATTCCATTCCGTTCCTTTCCATTCGTGTTGATGCCATTCCAATCCATAC
chr4_49108801_49109201_84_-2.127957
TGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTTTATTGCATTCC���CATTCTATTCCATTCCATTCT|°h
chr4_49108801_49109201_85_-2.282859
C�CCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCAATTCCATTGCACTCGTGTTGATTCCCTTACATTCCATTCCATTCCATTCCACTCGGGTTGATTCCATTCCTTTCCTTTCCATTGCATTCTATTC
chr4_49108801_49109201_86_-2.114431
°�iTCCATTCCGTGTTGA6CCATCCACTCCATTû�i
ATTACATTCCACTCGGGTTTTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTGCACTCCATTCCATTGCATTCCACCCCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATCCTATTGCATTCCATTCCACTCGTGTTGATTCCGTTCCATTCCATTCCATTCCATTCCACTTCGGTTGATTCCATTCCGTTCCTTTCCATTGCATTCCATTCCATTCC
chr4_49108801_49109201_87_-1.328814
°�iTCCATTCCGTGTTGA6CCATCCACTCCATTû�i
ATTACATTCCACTCGGGTTTCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCGTTCC
chr4_49108801_49109201_88_-1.550663
°�iTCCATTCCGTGTTGA6CCATCCACTCCATTû�i
ATTACATTCCACTCGGGTTTCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCTATTCC
chr4_49108801_49109201_89_-1.401401
ACTCCATTCCATTCCATTCCATTGCATTCCATTCCATTCCATTGCATTCCATTGCATTCCATTCGATTCCATTCGGGTTTGCATTTTATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCCGTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATACCATTCCATTCCATTCCCTTCCTTTCCATTCCATACCATTGCATACAó5GGGTTGAATCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCCACTCGGGTTGTTCCATTCCATTCCATTCCTTTCCATTCCATTCCATTCCGTTCCACTCGGCTTGA
chr4_49108801_49109201_90_-1.146128
ACTCCATTCCATTCCATTCCATTGCATTCCATTCCATTCCATTGCATTCCATTGCATTCCATTCGATTCCATTCGGGTTTGCATTTTATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCCGTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATACCATTCCATTCCATTCTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTTTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTACACTCGGGTTGATTCCATTGAATTCCATTCCGTTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTATACTCGAGTTCATTCCATTCCATTCCTTTCCACTCCACTCCGGTTGATTCCATTCCATTAAATTCCATTCCATTGCATTCCATTCCATTCCAT
chr4_49108801_49109201_91_-1.146128
ACTCCATTCCATTCCATTCCATTGCATTCCATTCCATTCCATTGCATTCCATTGCATTCCATTCGATTCCATTCGGGTTTGCATTTTATTCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCCGTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATACCATTCCATTCCATTCTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTTTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTACACTCGGGTTGATTCCATTGAATTCCATTCCGTTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTCTAC
chr4_49108801_49109201_92_-1.460522
ACTCCATTCCATTCCATTCCATTGCATTCCATTCCATTCCATTGCATTCCATTGCATTCCATTCGATTCCATTCGGGTTà�M�TTCCATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATACCATTCCATTCCATTCCCTTCCTTTCCATTCCATACCATTGCATACAó5GGGTTGAATCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTTTGTTCCACTCCATTCCATTGCATTCCATACCATTCCA
chr4_49108801_49109201_93_-0.924279
ACTCCATTCCATTCCATTCCATTGCATTCCATTCCATTCCATTGCATTCCATTGCATTCCATTCGATTCCATTCGGGTTà�M�TTCCATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATACCATTCCATTCCATTCCCTTCCTTTCCATTCCATACCATTGCATACAó5GGGTTGAATCCTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCCACTCGGGTTGTTCCATTCCATTCCATTCCTTTCCATTCCATTCCATTCCGTTCCACTCGGCTTGA
chr4_49108801_49109201_94_-0.669007
ACTCCATTCCATTCCATTCCATTGCATTCCATTCCATTCCATTGCATTCCATTGCATTCCATTCGATTCCATTCGGGTTà�M�TTCCATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATACCATTCCATTCCATTCTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTTTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTACACTCGGGTTGATTCCATTGAATTCCATTCCGTTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTATACTCGAGTTCATTCCATTCCATTCCTTTCCACTCCACTCCGGTTGATTCCATTCCATTAAATTCCATTCCATTGCATTCCATTCCATTCCAT
chr4_49108801_49109201_95_-0.669007
ACTCCATTCCATTCCATTCCATTGCATTCCATTCCATTCCATTGCATTCCATTGCATTCCATTCGATTCCATTCGGGTTà�M�TTCCATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATACCATTCCATTCCATTCTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTTTGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTACACTCGGGTTGATTCCATTGAATTCCATTCCGTTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTCTAC
chr4_49108801_49109201_96_-1.643060
TGTTGATCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCATTCCATTCCATTCCATTCCATTTCATTCCATTCCGTTCCTTTCCATTCGTGTTGATGCCATTCCAATCCATAC
chr4_49108801_49109201_97_-1.522879
CTATTCCATTCCATTCCATTCCATTCCATTTGTGTTGATTCCTTTTGCATTCCATTCCATTCCACTCCATTCCAATCCATGTTGAATCCTTTCCATTCCTTTGCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCCGTTCTGTTCCACTGCATTCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTGCAATGG
chr4_49108801_49109201_98_-0.176091
CCATTACATTCCACTCGGGTTGAATCCTTTCCATTCCGTTCCAATGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTGCACTCCATTCCATTGCATTCCACCCCATTCCATTCCACTCGGGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATCCTATTGCATTCCATTCCACTCGTGTTGATTCCGTTCCATTCCATTCCATTCCATTCCACTTCGGTTGATTCCATTCCGTTCCTTTCCATTGCATTCCATTCCATTCC
chr4_49108801_49109201_99_-0.176091
TCCATTCCATTCCATTCCACTCCATTCCAATCCATTACATTCCACTCGGTTTGAATCTTTTCCATTCCATTCCAATGCATTCCCTTCCATTCAATTCAACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTTCACTGTGGTTGATTCCATTGAATTCCATCCCATTCCATTCCTTTCCATTCTATTGCATTCCATTCCATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCTTTCCACTTGGGTTGATTCCAT
chr4_49108801_49109201_100_-0.454932
TCCATTCCATTCCATTCCATTGCATTCCATTCCTTTCCATTCTATTCCATTCCATTCCATCCCATTTGTGTTGCTTCCATTCCATTCCATTCCATTCCACTCCATTCCAATCCATTGCATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCATTGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCTGTTTCACTCCATTCCCTTGCATTCCATACCATTCCATTCCACTCGGGTTGATTCCATTGAGTTCCATCCCATTCCATTCCATTACATTCTATTGCATTCCATTCCATTCCATACCATTCCACTTGGGTTCATTCCATACCATTCCATTCCATTCCATTCCAT
chr4_49108801_49109201_101_-0.490844
TCCATTCCAAAGCATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCTATTCCATTCCGTTCCGTTCCACTCCATTCCATTGCATTCCATACCATTCCATTCCACTCGTGTTGATTCCATTGAATTCCATCCCATTCCATTCCATTCCATTCTATTGCATTCCATTCTATTCCATTCCACTCGTGTTGATTCCCTTCCATTCCATTCCATTCCATTCCACTTGGGTTGATTCCATTCCGTTCCTTTCCATTGCTTTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCGTTCC
chr4_49108801_49109201_102_0.000000
CATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTGGTGTTGATTCCATTCCATTCCATTCCACTCCTCTCCATTGCAATCCATTACATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCAATGAATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCTGTTCTGTTCCACTGCATTCCATTGCATTTCATACCATTCCATTCCACTCGGTTTGATTCCATTGAATTCCATTGAATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTCC
chr4_49108801_49109201_103_0.000000
TTTATTCCATTTCATTCTATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTGGTGTTGATTCCATTCCATTCCATTCCACTCCTCTCCATTGCAATCCATTACATTCCACTCGGGTTGAATCCTTTCCATTCCATTCCAATGAATTCCCTTCCATTCAATTCCACTCGGATTCAATCAATTCCATTCCATTCCATTCTGTTCTGTTCCACTGCATTCCATTGCATTTCATACCATTCCATTCCACTCGGTTTGATTCCATTGAATTCCATTGAATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCTATTGCATTCCATTCCATTCCATTCC
chr4_49108801_49109201_1
]
java.lang.ArrayIndexOutOfBoundsException: 4
at abra.ScoredContig.convertAndFilter(ScoredContig.java:53)
at abra.ReAligner.assemble(ReAligner.java:1030)
at abra.ReAligner.processRegion(ReAligner.java:1196)
at abra.ReAligner.processChromosomeChunk(ReAligner.java:339)
at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
at abra.AbraRunnable.run(AbraRunnable.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Abra2 output BAM processing

Thanks for this software. Very useful in my WES/targeted sequencing workflow!

I have a question about the processing requirements of the BAM file generated by ABRA2. Per documentation of ABRA (original version), it states "It is currently necessary to sort and index the output. At present, the mate information may not be 100% accurate. Samtools fixmate or Picard Tools FixMateInformation may optionally be used to correct this."
Is this still a requirement for the realigned BAM file generated by ABRA2? I did not see this in the documentation for ABRA2.

Thank you!

Negative Array Size Exception

Any thoughts on how to get around this error? It's executed as:
abra2-2.09.jar --in B.bam --out B_realigned.bam --ref hg19.fa --targets targets.bed --threads 12 --tmpdir ABRA_tmp_B --mad 999999 --mrr 999999

and here is the stack trace with the error:

INFO Tue Sep 26 11:10:41 EDT 2017 PROCESS_REGION_MSECS: chr3_121351913_121352018 1 0 0 0
INFO Tue Sep 26 11:10:41 EDT 2017 PROCESS_REGION_MSECS: chr3_129389372_129389772 268 7 6 0
java.lang.NegativeArraySizeException
at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:68)
at java.lang.StringBuffer.(StringBuffer.java:128)
at abra.CompareToReference2.getSequence(CompareToReference2.java:393)
at abra.KmerSizeEvaluator.getBases(KmerSizeEvaluator.java:44)
at abra.KmerSizeEvaluator.identifyMinKmer(KmerSizeEvaluator.java:98)
at abra.NativeAssembler.assembleContigs(NativeAssembler.java:268)
at abra.ReAligner.assemble(ReAligner.java:1026)
at abra.ReAligner.processRegion(ReAligner.java:1200)
at abra.ReAligner.processChromosomeChunk(ReAligner.java:336)
at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
at abra.AbraRunnable.run(AbraRunnable.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
java.lang.RuntimeException: java.lang.NegativeArraySizeException
at abra.NativeAssembler.assembleContigs(NativeAssembler.java:390)
at abra.ReAligner.assemble(ReAligner.java:1026)
at abra.ReAligner.processRegion(ReAligner.java:1200)
at abra.ReAligner.processChromosomeChunk(ReAligner.java:336)
at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
at abra.AbraRunnable.run(AbraRunnable.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NegativeArraySizeException
at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:68)
at java.lang.StringBuffer.(StringBuffer.java:128)
at abra.CompareToReference2.getSequence(CompareToReference2.java:393)
at abra.KmerSizeEvaluator.getBases(KmerSizeEvaluator.java:44)
at abra.KmerSizeEvaluator.identifyMinKmer(KmerSizeEvaluator.java:98)
at abra.NativeAssembler.assembleContigs(NativeAssembler.java:268)
... 10 more
INFO Tue Sep 26 11:10:41 EDT 2017 PROCESS_REGION_MSECS: chr3_42259734_42260134 508 20 13 0

glibc2.14

I tried to run abra2-2.19.jar but get the following error:

Caused by: java.lang.UnsatisfiedLinkError:................./lib64/libc.so.
6: version `GLIBC_2.14' not found

updating glibc without root access seem quite a hassle and isn't advised, so, is it possible to run abra without this version of glibc? Would it work if I build ABRA from source? I don't have experience with building java tools from source, so will only try that if I'm quite sure this is the solution, and when its possible without sudo rights.

a previous version (ABRA0.96) did work on the system, but gave some memory issues which seem to be fixed in this new version.

Hope you can help

Alternate or duplicate alignment in output

Hi,

I've got a question about duplicate reads seen in Abra2 (v. 2.23) output during regression testing. For context, I've recently replaced Abra with Abra2 in a long standing RNA/DNA variant calling workflow. The move to Abra2 was prompted by a switch to STAR for RNA alignment. All of the DNA workflow, with exception of Abra, remained the same. A handful of DNA based regression tests "failed" with slight improvements in variant allele frequencies. However, one test failed with a lost variant. After a little investigation, I found that a post-alignment counting step was balking at duplicate reads created by Abra2.

In the example below, a read pair from the input bam is ouptut by Abra2 as a pair plus an alternate, sense strand read with an upstream aligned YA tag added (YA:Z:chr8:117878503:354M16I431M).

Input Bam
---------
ATTTCCGTACAATCAACACTTAGGTTTT_molbar_1	147	chr8	117878856	30	150M	=	117878784	-222	ACTAAATTACACTCGAACACATGGGCTTTGGTTAGCTTCTTATCCCAATGGGCCGCTAGCCAAATTTTGGCCAGAGGCCCTCTTTTACTGAGAACAAAATGTGCGTAGAACATTGTTCTGGCTGGCTATGAAAACAGAAGAAAACCTAAG	GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGHHCGGFFFFGHGGGGGECGHHHGGGHHHHHGGHHHHHGGGGGHGHHHHHHFHHHHHHHHHHHHHHHHHHH!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0T149	YS:i:-53	YT:Z:CP	AS:i:295	RG:Z:Sample_11_R1	MQ:i:30	MC:Z:73M16I40M

ATTTCCGTACAATCAACACTTAGGTTTT_molbar_1	99	chr8	117878784	30	73M16I40M	=	117878856	222	ACAATCAACAAAGTGAATAAAAATGTCAACATCAAACATACCTTTGGTGAGATGATACTCTCCACGCTGCTCTCTAAATTACACTCGAACTAAATTACACTCGAACACATGGGCTTTGGTTAGCTTCTT	GHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHHHGHGGGGGHHHHHGGHHHHHGGGHHHGCEGGGGGHGFFFFGGCHHGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG	                                        XN:i:0	XM:i:0	XO:i:1	XG:i:16	NM:i:16	MD:Z:113	YS:i:-5	YT:Z:CP	AS:i:205	RG:Z:Sample_11_R1	MQ:i:30	MC:Z:150M

Output Bam
----------
ATTTCCGTACAATCAACACTTAGGTTTT_molbar_1	147	chr8	117878856	30	150M	=	117878784	-222	ACTAAATTACACTCGAACACATGGGCTTTGGTTAGCTTCTTATCCCAATGGGCCGCTAGCCAAATTTTGGCCAGAGGCCCTCTTTTACTGAGAACAAAATGTGCGTAGAACATTGTTCTGGCTGGCTATGAAAACAGAAGAAAACCTAAG	GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGHHCGGFFFFGHGGGGGECGHHHGGGHHHHHGGHHHHHGGGGGHGHHHHHHFHHHHHHHHHHHHHHHHHHH!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:0T149	YS:i:-53	YT:Z:CP	AS:i:295	RG:Z:Sample_11_R1	MQ:i:30	MC:Z:73M16I40M

ATTTCCGTACAATCAACACTTAGGTTTT_molbar_1	99	chr8	117878784	30	73M16I40M	=	117878857	222	ACAATCAACAAAGTGAATAAAAATGTCAACATCAAACATACCTTTGGTGAGATGATACTCTCCACGCTGCTCTCTAAATTACACTCGAACTAAATTACACTCGAACACATGGGCTTTGGTTAGCTTCTT	GHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHHHGHGGGGGHHHHHGGHHHHHGGGHHHGCEGGGGGHGFFFFGGCHHGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG	YA:Z:chr8:117878503:354M16I431M	MC:Z:1I149M	MD:Z:113	RG:Z:Sample_11_R1	XG:i:16	NM:i:16	XM:i:0XN:i:0	XO:i:1	MQ:i:30	AS:i:205	YS:i:-5	YT:Z:CP

ATTTCCGTACAATCAACACTTAGGTTTT_molbar_1	99	chr8	117878784	30	73M16I40M	=	117878856	222	ACAATCAACAAAGTGAATAAAAATGTCAACATCAAACATACCTTTGGTGAGATGATACTCTCCACGCTGCTCTCTAAATTACACTCGAACTAAATTACACTCGAACACATGGGCTTTGGTTAGCTTCTT	GHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHHHGHGGGGGHHHHHGGHHHHHGGGHHHGCEGGGGGHGFFFFGGCHHGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG	                                        XN:i:0	XM:i:0	XO:i:1	XG:i:16	NM:i:16	MD:Z:113	YS:i:-5	YT:Z:CP	AS:i:205	RG:Z:Sample_11_R1	MQ:i:30	MC:Z:150M

Is this behavior expected? I noticed Issue 29, posted by a former colleague of mine, but the data in that case seems to have had alternate reads/alignments in the input.

The DNA command line used for Abra2 is identical to the Abra command line, after accounting for the change from --working to --tmpdir and a larger java heap.

java -Xmx16G -jar abra2.jar --in Sample_11_R1.molbar.trimmed.deduped.masked.snv.nuc_counts.variants.bam --out Sample_11_R1.molbar.trimmed.deduped.masked.snv.nuc_counts.variants.abra2.bam --ref hg19.fa --targets tmp0BgLKG.abraroi.bed --threads 1 --tmpdir tmpCUTlEb_IndelRealigner

I've attached the Abra2 debug log which records specific processing at the alternate, upstream alignment position of chr8:117878503.

abra2_debug.log
.
.
.
At them moment, I'm planning to rollback the DNA workflow to use Abra but plan to continue using Abra2 with duplicate filtering for RNA. I assume the same read duplication is possible for RNA? And the filtering on the YA tag is a reasonable solution?

crash with java IndexOutOfBoundsException

Tried several input configuration but keep getting stuck at this error:

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at abra.AltContigGenerator.getAltContigs(AltContigGenerator.java:273)
at abra.ReAligner.processRegion(ReAligner.java:1222)
at abra.ReAligner.processChromosomeChunk(ReAligner.java:339)
at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
at abra.AbraRunnable.run(AbraRunnable.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

workflow of data pre-processing of RNAseq

Thank you for creating this software. This seems to improve the accuracy of somatic variant calling with my RNAseq data.

I have a question about data pre-processing workflow for RNAseq data.
According to GATK best practice, data processing workflow consists of these steps;

  1. STAR 2-pass mapping
  2. MergeBamAlignment (Picard); merge with unmapped BAM
  3. MarkDuplicates (Picard)
  4. SplitNCigarReads (GATK)
  5. BaseRecalibrator & ApplyBQSR (GATK)

Because I want to use non-GATK variant caller such as VarScan2, I'd like to perform indel realignment using ABRA2 in addition to GATK workflow.
Is it necessary to perform SplitNCigarReads and BaseRecalibration after ABRA2? (Can I use ABRA2 between step3 and step4 described above?)

Thanks!

Using abra/abra2 for local assembly of pacbio reads?

Hi

Just stumbled on your interesting project and was wondering if you had thoughts regarding its applicability to pacbio DNA reads. Essentially, I have 13x WGS pacbio data for a mouse strain and, having tried to map to mm10, I get a large gap of little/no mapping in a locus of interest. The gap is approx 400Kb wide and my average read length is 15Kb.

I want to find out whether this is a large deletion or maybe just a repetitive region, or a combination of both. I thought I'd try locally assembling the pacbio reads to find out whether, without the mm10 reference, they assemble in an uninterrupted contig or not. The latter would perhaps indicate a very repetitive region.

Would abra/abra2 be of help in this scenario? Thanks in any case!

Invalid read written

Hi,

I have re-analyzes our samples with ABRA2 (they were done initially with ABRA), and encountered one case where it runs through, but the output BAM file is not valid:

#make indel_realign
java -jar /mnt/share/opt/abra2_2.17/abra2-2.17.jar --in SIBPlatte2A11_01.bam --out SIBPlatte2A11_01_realign.bam --target-kmers kmers.bed --threads 4 --ref /tmp/local_ngs_data//GRCh37.fa --mer 0.1 > indel_realign.log 2>&1
samtools index SIBPlatte2A11_01_realign.bam
[E::bam_read1] CIGAR and query sequence lengths differ for D00388:195:CAUD5ANXX:6:2208:3316:13407
samtools index: failed to create index for "SIBPlatte2A11_01_realign.bam"

You can download a minimal example that reproduces the problem here:
https://drive.google.com/open?id=1IGFE24lNmTrElvCLqwHPEcdf4-zrwwMk
You just have to adapt the paths for abra2, samtools and GRCh37.

Best,
Marc

abra 2.07 removes half of pair completely

Thank you for writing abra. I am currently in the process of upgrading to abra 2.07 from abra 1.XXX.

For many of my samples, abra 2 works ok, but for at least one, it removes one half of a paired read completely. This leaves a read with RNEXT/PNEXT set, but it's mate doesn't appear in the file.

This is uncaught by default, and unfixable generally, for picard FixMateInformation. This can cause various picard tools to fail downstream of any abra operation.

Is this behaviour by design? Recommendations to make this bam work, short of removing the erroneous reads by hand? Thanks! CCH

Option to prevent reads ending with complex indel

When using manta downstream of ABRA2 realigned alignments, manta will abort if it encounters an alignment with a CIGAR string ending in a combination of D/I operations. As they are not planning to change this behavior (Illumina/manta#137), could you implement an option to soft-clip the complex indel at the end?

Example data:
in.sam.txt
out.sam.txt

Output generated with:

java -jar abra2-2.18.jar --in in.bam --out out.bam --threads 1 --mer 0.1 --mad 250 --ref GRCh37.fa

Read with ID 'NB501582:124:HLMWFBGX7:3:21604:17458:10789' will overlap/end with the complex indel (CIGAR: 73M4D2I).

Compilation on Mac

Hi, I found the tool useful and easy to use, therefore I wanted to include it in a pipeline that I would like to make available in Mac and Linux. The compiler gives this error:

$ make
rm -rf target
mvn clean
[INFO] Scanning for projects...
[INFO]
[INFO] ----------------------------< abra2:abra2 >-----------------------------
[INFO] Building abra 2.19
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ abra2 ---
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  0.303 s
[INFO] Finished at: 2019-08-15T23:12:11+02:00
[INFO] ------------------------------------------------------------------------
mkdir target
clang -g -O2 -Isrc/main/c -I/Users/pbiodiv/miniconda2/include -I/Users/pbiodiv/miniconda2/include/linux -shared -fPIC src/main/c/assembler.cpp src/main/c/sg_aligner.cpp -o target/libAbra.so
In file included from src/main/c/assembler.cpp:11:
src/main/c/sparsehash/sparse_hash_map:94:10: fatal error: 'tr1/functional' file not found
#include HASH_FUN_H                 // for hash<>
         ^
src/main/c/sparsehash/internal/sparseconfig.h:10:20: note: expanded from macro 'HASH_FUN_H'
#define HASH_FUN_H <tr1/functional>
                   ^~~~~~~~~~~~~~~~
1 error generated.
make: *** [native] Error 1

And doing some research I found out it is caused by the use of libstdc++ which became deprecated in Mac since version 10.8. Is it possible to modify ABRA2 to be able to compile using libc++ ? or it basically would imply re-writing it?

Thanks

Edgardo

Abra and log4j

Hi,

Abra is being picked up by a log4j vulnerability detection tool. I would guess the threat level is low (not web based), but am not sure.

Is there any version without this or with a high enough version to not be affected ?

java -jar log4j-detector-2021.12.13.jar tools/
-- Analyzing paths (could take a long time).
-- Note: specify the '--verbose' flag to have every file examined printed to STDERR.
/mnt/ngsnfs/tools/abra2/abra2-2.11.jar contains Log4J-2.x   >= 2.0-beta9 (< 2.10.0) _VULNERABLE_ :-(


Thanks.

Crash when target region BED file contains header lines

Hi,

I found a small bug when using a BED with with browser/track header for k-mer size calculation, e.g. this file:

browser position chr1:12081-12251
track name="Covered" description="Agilent SureSelect DNA - SureSelect Clinical Research Exome V2 -Genomic regions covered by probes" color=0,0,128
chr1 12080 12251
chr1 12595 12802
chr1 13163 13658

I then tried to calcualte k-mer sizes:

java -Xmx16G -cp /mnt/share/opt/abra2_2.05/abra2-2.05.jar abra.KmerSizeEvaluator ... [bed file]

It then crashes when trying to access the second tab-separated entry of the first line, which does not exists:

INFO Thu Nov 09 14:47:19 CET 2017 Loading reference map: /tmp/local_ngs_data//GRCh37.fa
INFO Thu Nov 09 14:49:58 CET 2017 Done loading ref map. Elapsed secs: 158
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at abra.RegionLoader.load(RegionLoader.java:49)
at abra.ReAligner.getRegions(ReAligner.java:839)
at abra.KmerSizeEvaluator.run(KmerSizeEvaluator.java:50)
at abra.KmerSizeEvaluator.main(KmerSizeEvaluator.java:240)`

This is not a big problem since one can remove the lines, but I think ABRA should handle this more gracefully, e.g. by ignoring all lines with less than three tab-separated parts.

Best,
Marc

Realigned output bam TLEN field plus/minus sign when FLAG == 147

Hi there!
We are recently very interested in abra2 for fast and accurate reassembly/realignment of InDels.
When using other tools with the realinged bam from abra2, we discovered this following potential issue. Please see the following example read pair:

A00337:46:HHGVNDMXX:1:1441:31946:25316:CTGCAGTA:CTGCAGTA:GA:AA  147     chr16   3727646 60      139M    =       3727648 139     TTCCTAGATGCCTGGATTTTCAGTACAAAAGGTCCAAGAACATGAAAGGGGAAAGGTGATGCTCTCACAATGCTACAAGCCCTCCACAAACTTCTCTAGCGTGTCCCCCGTGGTGTCCCCGACCAGGGACAGTTCGCTG     :FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF     YA:Z:chr16:3727129:964M MD:Z:48A88      RG:Z:4  NM:i:3  YM:i:2  YO:Z:chr16:3727648:-:2S137M     AS:i:132        XS:i:23 YX:i:3
A00337:46:HHGVNDMXX:1:1441:31946:25316:CTGCAGTA:CTGCAGTA:GA:AA  99      chr16   3727648 60      8S130M  =       3727646 -139    TTTTTATTC
CTAGATGCCTGGATTTTCAGTACAAAAGGTCCAAGAACATGAAAGGGGAAAGGTGATGCTCTCACAATGCTACAAGCCCTCCACAAACTTCTCTAGCGTGTCCCCCGTGGTGTCCCCGACCAGGGACAG      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      YA:Z:chr16:3727129:964M MD:Z:48A81      RG:Z:4  NM:i:1  AS:i:125        XS:i:23

In this example, for the FLAG == 147 read, the POS column (col4, here 3727646) is less than PNEXT (col 8, here 3727648), and the TLEN (col 9, here 139) receives a plus sign.

However, when I check other bam files not realigned/reassembled, in such situation (FLAG == 147 & POS < PNEXT), TLEN is always with minus sign.

According to SAM format specification , for TLEN, the leftmost segment has a plus sign and the rightmost has a minus sign. For FLAG==147 (second of a pair / reverse-complemented), when POS < PNEXT, the segment should still be the rightmost.

Please don't hesitate to let me know if the TLEN sign should be modified.
Thanks a lot!!

Julie

YO tag format

Thank you for your work in developing this tool,

We are wondering about samples which have YO:Z:N/A for the YO tag. Is this expected behavior and if so would you be able to provide any detail on the reason for an N/A value?

Thank you!

java.lang.IllegalArgumentException: Invalid reference index -1

This my command:

java -Xmx1000g -jar /.../abra2-2.24-0/abra2.jar --sa --cons --amq 32 --in in.bam --out out.bam --ref ~/.../XENLA_9.2_genome.fa --threads 50 --targets /.../X9.2_opt_SORTED.bed --tmpdir /.../tmp > abra_s10.log

After running for 30min to an hour (Depending on the settings), it stop with the below message. Any idea why this is happening?

java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.<init>(QueryInterval.java:24) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:528) at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:390) at abra.MultiSamReader.<init>(MultiSamReader.java:40) at abra.ReAligner.processChromosomeChunk(ReAligner.java:262) at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21) at abra.AbraRunnable.run(AbraRunnable.java:20) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

java.lang.IllegalArgumentException: Contig too long

Hello,
I am using abra2 as intel realigner in my workflow after alignment with bwa and before variant calling with freebayes. It works quite nicely, until now. In my most recent run, the workflow crashed at the abra2 step. Apparently because abra2 hit a contig length limit. The tail of my error log is attached. MAX_CONTIG_LEN seems hardcoded at 2000-1, while the contig in question is 2035. I am now wondering what my options are to have my workflow run. Can I override this somehow? or skip it? Why is there a length limit in the first place? Thanks a lot! This is somewhat urgent.

Num reads to scan for read length/insert length/mapq

Hi Lisle,
I'm applying Abra2 to rna data in which the read lengths vary dramatically from the first record to the last.
I noticed in the Abra logs that it had underestimated read length; it seems that this was caused by a hardcoded number of reads to scan in Realigner.java (See line 1455)

Would it be possible to allow the user to pass in the number of reads to scan for estimating the various read-specific quantities? Or perhaps a boolean indicating that all reads should be scanned?
Thanks,
Dan

Explosion on CIGARS using = and X operators

I was trying out ABRA2 on a set of alignments produced by RTG, which use the = and X CIGAR operators to indicate match/mismatch instead of the generic M operator. These CIGARS make ABRA fail with:

java.lang.UnsupportedOperationException: Unhandled cigar operator: = in: 2195212 : 124=
at abra.SAMRecordWrapper.getSpanningRegions(SAMRecordWrapper.java:203)
at abra.Feature.findAllOverlappingRegions(Feature.java:126)
at abra.ReAligner.processChromosomeChunk(ReAligner.java:314)
at abra.ReAlignerRunnable.go(ReAlignerRunnable.java:21)
at abra.AbraRunnable.run(AbraRunnable.java:20)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

BTW, this is using the 2.12 release.

Increase MAX_SAMPLES to 12 or 16

Hello, thanks for the amazing work with Abra! We have 16 whole-exome BAMs from the same patient, which would fail through Abra 2.16 with a SIGSEGV (0xb) [libc.so.6] error detailed in the log below. Our fix was to increase MAX_SAMPLES to 16 in this line.

hs_err_pid54727.log.txt

Would you consider increasing this constant for your next release? What are the downsides? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.