Giter Club home page Giter Club logo

strelka's Introduction

Strelka2 Small Variant Caller

Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. The germline caller employs an efficient tiered haplotype model to improve accuracy and provide read-backed phasing, adaptively selecting between assembly and a faster alignment-based haplotyping approach at each variant locus. The germline caller also analyzes input sequencing data using a mixture-model indel error estimation method to improve robustness to indel noise. The somatic calling model improves on the original Strelka method for liquid and late-stage tumor analysis by accounting for possible tumor cell contamination in the normal sample. A final empirical variant re-scoring step using random forest models trained on various call quality features has been added to both callers to further improve precision.

Compared with submissions to the recent PrecisionFDA Consistency and Truth challenges, the average indel F-score for Strelka2 running in its default configuration is 3.1% and 0.08% higher, respectively, than the best challenge submissions. Runtime on a 28-core server is ~40 minutes for 40x WGS germline analysis and ~3 hours for a 110x/40x WGS tumor-normal somatic analysis. More details on Strelka2 methods and benchmarking for both germline and somatic calling are described in:

Kim, S., Scheffler, K. et al. (2018) Strelka2: fast and accurate calling of germline and somatic variants. Nature Methods, 15, 591-594. doi:10.1038/s41592-018-0051-x

...and the corresponding open-access pre-print

Strelka accepts input read mappings from BAM or CRAM files, and optionally candidate and/or forced-call alleles from VCF. It reports all small variant predictions in VCF 4.1 format. Germline variant reporting uses the gVCF conventions to represent both variant and reference call confidence. For best somatic indel performance, Strelka is designed to be run with the Manta structural variant and indel caller, which provides additional indel candidates up to a given maximum indel size (49 by default). By design, Manta and Strelka run together with default settings provide complete coverage over all indel sizes (in additional to SVs and SNVs). See the user guide for a full description of capabilities and limitations.

Getting Started

To get started installing and using Strelka, please consult the quick start guide.

Data Analysis and Interpretation

After completing installation and reviewing the quick start guide, see the Strelka user guide for full instructions on how to run Strelka, interpret results and estimate hardware requirements/compute cost, in addition to a high-level methods overview.

License

Strelka source code is provided under the GPLv3 license. Strelka includes several third party packages provided under other open source licenses, please see COPYRIGHT.txt for additional details.

Strelka Code Development

For strelka code development and debugging details, see the Strelka developer guide. This includes details on Strelka's development protocols, special build instructions, recommended workflows for investigating calls, and internal documentation details.

strelka's People

Contributors

ahalpern avatar ctsa avatar eunhonoh avatar haifangge avatar jdlogicman avatar kscheffler avatar mbekritsky avatar mkallberg avatar nnariai avatar olest avatar pkrusche avatar x-chen avatar yeonbinkim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

strelka's Issues

Strelka2 crashes on targeted sequencing experiment

Hello,

I've read your guide, and I knew beforehand that strelka2 is not intended to run germline analysis in a larger scale. Anyway, I decided to give it a try (I have a set of around 800 case samples for a small panel of 40 genes), and I would like to know if the error I'm facing has to do with this known limitation, or may it be something else I'm missing.

Please find attached the last 50 lines of log of the run. (Full log is 35Mb big)
strelka2.log

Thanks in advance,
Best,
Pedro

vcf output has either very high tumor and low normal coverage or the other way around

Hello!

I am using strelka2 for calling indels. The indels called by other software and strelka1 all look normal. However, when I look into the output of strelka2, all variants have either very high tumor and very low normal coverage, or the other way around. And they have almost no overlap with the indels called by other software. After some obvious filtering, zero valid mutations called by strelka2 will be left.

Please see attached file. It looks like there is a bug in strelka2.

somatic.indels.txt

Thanks!

Tao

different result after cut primers

I call forcedGT using strelka.

The first try, I only cut adapters from reads, the given snp is called correctly.

The second try, I cut the 5p/3p primers in read1 and read2, however, the result tells that LowGQX, with depth only 4, the first call depth is 37877.

I viewed the bam alignment using IGV, the alignment is just right there, why does strelka give such different result?

this snp is not the only one, I have see many snp had such problem

about Hard filtering in strelka2

Hi, chris

Ⅰ. When I checked these "PASS" results, I found some QSS_NT or QSI_NT flags of some variants ( SNP or INDEL) are a constant 3070 , Does those variants should be filtered out ? and I wondered the formula for these parameters ?
Ⅱ. I want to do some Hard filter after I got the variants results with Strelka2 ,But can not found
SiteFilteredBasecallFrac , SpanningDeletionFraction and IndelWindowFilteredBasecallFrac parameters in the VCF FORMAT, I do not know how to do this filter?

Thanks a lot

Realigned BAM output producing an error

Hello,

I tried running the latest version with option for outputting realigned BAM set to 1 but sadly there is an error in the samtools command:

2017-10-09T14:41:23.315974622Z [2017-10-09T14:41:23.315688] [beb8080319ee] [13_1] [WorkflowRunner] [ERROR] Failed to complete command task: 'CallGenome+sortRealignedSegment_normal_chromId_080_GL000193_1_0000' launched from sub-workflow 'CallGenome', error code: 1, command: '"/opt/strelka-2.8.3/libexec/samtools" sort "StrelkaSomaticWorkflow/workspace/genomeSegment.tmpdir/normal.chromId_080_GL000193_1_0000.unsorted.realigned.bam" "StrelkaSomaticWorkflow/workspace/genomeSegment.tmpdir/normal.chromId_080_GL000193_1_0000.realigned" && rm -f "StrelkaSomaticWorkflow/workspace/genomeSegment.tmpdir/normal.chromId_080_GL000193_1_0000.unsorted.realigned.bam"'
...Some lines omitted...
[CallGenome+sortRealignedSegment_normal_chromId_080_GL000193_1_0000] Usage: samtools sort [options...] [in.bam]
...

As far as I can see the command line for samtools is not correct. Could it be that the -o argument is missing for the output filename or at least a redirection >?

Thank you in advance.

Provide genotypes (GT) for somatic variant calls

Chris and Sangtae;
This is a follow-up to a bcbio discussion (bcbio/bcbio-nextgen#2112 (comment)) which I thought could use it's own thread. In bcbio we're converting NT and SGT into FORMAT genotypes (GT) for the tumor and normal. Many downstream tools like validation make use of these so having a consistent output from different callers is very useful.

In mapping these over we've identified some edge cases where we can't cleanly do this due to 'SGT' not referencing alleles found in the VCF. Would it be possible to have strelka fill in all possible alternative alleles and place genotypes directly so we don't need to post-process?

I recognize there are some trickier edge cases with multiallelic positions and multiple low frequency reads. We've also been discussing these with MuTect2 output (broadinstitute/gatk#3564) and suggested normalizing to report multiple variants at these positions.

Thanks for considering this and starting the discussion.

can strelka2 export variants in halplotype form?

Hi, strelka team

I was really impressed by the model and performance of strelka in calling somatic mutations. But I have a question on the variant form in the vcf file. Can strelka report MNPs, complex variants such as:
--MNP--
chr1 100 AG GC
--Complex--
chr7 200 TTTCA AT

Thanks!

Error running demo germline data

I installed the software and I was able to run the demo data for the somatic version with no errors. When I tried to run it for the germline version I got an error.

Attached you can find the error log file reported.

workflow.error.log.txt

SGT and ref/alt don't match

Just want to ask if this is correct behaviour?
Running with --forceGT
This variant is not in the input VCF

19 3120981 . C T . LowEVS SOMATIC;QSS=2;TQSS=2;NT=ref;QSS_NT=2;TQSS_NT=2;SGT=AC->AC;DP=17;MQ=58.97;MQ0=0;ReadPosRankSum=0;SNVSB=0;SomaticEVS=0.75 DP:FDP:SDP:SUBDP:AU:CU:GU:TU 4:0:0:0:0,1:4,7:0,0:0,0 6:0:0:0:0,0:4,7:0,0:2,2

Compile error when building version 2.9.0

I'm trying to build Strelka 2.9.0 from source, but I get a compile error:

$ git clone https://github.com/Illumina/strelka.git && cd strelka
$ mkdir bin
$ mkdir build && cd build
$ export CC=gcc-7
$ export CXX=g++-7
$ ../configure --jobs=4 --prefix=../bin

cmake version 3.9.6 (>= 2.8.12) is already installed
Using existing cmake: cmake
-- ==== Initializing project cmake configuration ====
-- BUILD_TYPE: Release
-- CMAKE_PARALLEL: 4
-- TARGET_ARCHITECTURE: x86_64
-- install prefix: /data/apps/strelka/bin
-- Boost version: 1.65.1
-- Building external tools
-- zlib found
-- Verifying target directories access
-- No ccache found
-- Using compiler: g++ version 7
-- Building in developer mode: treating compiler warnings as errors
-- Adding c++ library subdirectory: blt_util
-- Adding c++ test subdirectory:    blt_util/test
-- Adding c++ library subdirectory: common
-- Adding c++ library subdirectory: htsapi
-- Adding c++ test subdirectory:    htsapi/test
-- Adding c++ library subdirectory: appstats
-- Adding c++ library subdirectory: options
-- Adding c++ library subdirectory: errorAnalysis
-- Adding c++ library subdirectory: calibration
-- Adding c++ library subdirectory: blt_common
-- Adding c++ test subdirectory:    blt_common/test
-- Adding c++ library subdirectory: assembly
-- Adding c++ test subdirectory:    assembly/test
-- Adding c++ library subdirectory: alignment
-- Adding c++ test subdirectory:    alignment/test
-- Adding c++ library subdirectory: starling_common
-- Adding c++ test subdirectory:    starling_common/test
-- Adding c++ library subdirectory: strelka_common
-- Adding c++ library subdirectory: DumpSequenceErrorCounts
-- Adding c++ library subdirectory: EstimateParametersFromErrorCounts
-- Adding c++ library subdirectory: EstimateVariantErrorRates
-- Adding c++ library subdirectory: GetChromDepth
-- Adding c++ library subdirectory: GetRegionDepth
-- Adding c++ library subdirectory: GetSequenceErrorCounts
-- Adding c++ library subdirectory: MergeRunStats
-- Adding c++ library subdirectory: MergeSequenceErrorCounts
-- Adding c++ library subdirectory: starling
-- Adding c++ test subdirectory:    starling/test
-- Adding c++ library subdirectory: strelka
-- Adding c++ test subdirectory:    strelka/test
-- Adding c++ library subdirectory: strelkaNoiseExtractor
-- Adding c++ program subdirectory: bin
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) 
-- Doxygen: DOXYGEN_EXECUTABLE-NOTFOUND. Dot: DOXYGEN_DOT_EXECUTABLE-NOTFOUND.
-- Configuring done
-- Generating done
-- Build files have been written to: /data/apps/strelka/build

The build directory /data/apps/strelka/build was configured successfully

Type "make -C /data/apps/strelka/build" to build

$ make -j4 install

...

[ 30%] Building CXX object src/c++/lib/errorAnalysis/CMakeFiles/strelka_errorAnalysis.dir/SequenceErrorCounts.cpp.o
In file included from /usr/local/include/boost/serialization/map.hpp:25:0,
                 from /data/apps/strelka/src/c++/lib/errorAnalysis/BasecallErrorCounts.hh:27,
                 from /data/apps/strelka/src/c++/lib/errorAnalysis/SequenceErrorCounts.hh:26,
                 from /data/apps/strelka/src/c++/lib/errorAnalysis/SequenceErrorCounts.cpp:24:
/usr/local/include/boost/serialization/access.hpp: In instantiation of ‘static void boost::serialization::access::serialize(Archive&, T&, unsigned int) [with Archive = boost::archive::binary_iarchive; T = boost::array<unsigned int, 6>]’:
/usr/local/include/boost/serialization/serialization.hpp:68:22:   required from ‘void boost::serialization::serialize(Archive&, T&, unsigned int) [with Archive = boost::archive::binary_iarchive; T = boost::array<unsigned int, 6>]’
/usr/local/include/boost/serialization/serialization.hpp:126:14:   required from ‘void boost::serialization::serialize_adl(Archive&, T&, unsigned int) [with Archive = boost::archive::binary_iarchive; T = boost::array<unsigned int, 6>]’
/usr/local/include/boost/archive/detail/iserializer.hpp:188:40:   required from ‘void boost::archive::detail::iserializer<Archive, T>::load_object_data(boost::archive::detail::basic_iarchive&, void*, unsigned int) const [with Archive = boost::archive::binary_iarchive; T = boost::array<unsigned int, 6>]’
/usr/local/include/boost/archive/detail/iserializer.hpp:120:1:   required from ‘class boost::archive::detail::iserializer<boost::archive::binary_iarchive, boost::array<unsigned int, 6> >’
/usr/local/include/boost/archive/detail/iserializer.hpp:410:13:   required from ‘static void boost::archive::detail::load_non_pointer_type<Archive>::load_standard::invoke(Archive&, const T&) [with T = boost::array<unsigned int, 6>; Archive = boost::archive::binary_iarchive]’
/usr/local/include/boost/archive/detail/iserializer.hpp:462:22:   [ skipping 175 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/include/boost/archive/detail/iserializer.hpp:625:18:   required from ‘void boost::archive::load(Archive&, T&) [with Archive = boost::archive::binary_iarchive; T = IndelErrorCounts]’
/usr/local/include/boost/archive/detail/common_iarchive.hpp:66:22:   required from ‘void boost::archive::detail::common_iarchive<Archive>::load_override(T&) [with T = IndelErrorCounts; Archive = boost::archive::binary_iarchive]’
/usr/local/include/boost/archive/basic_binary_iarchive.hpp:75:7:   required from ‘void boost::archive::basic_binary_iarchive<Archive>::load_override(T&) [with T = IndelErrorCounts; Archive = boost::archive::binary_iarchive]’
/usr/local/include/boost/archive/binary_iarchive_impl.hpp:58:9:   required from ‘void boost::archive::binary_iarchive_impl<Archive, Elem, Tr>::load_override(T&) [with T = IndelErrorCounts; Archive = boost::archive::binary_iarchive; Elem = char; Tr = std::char_traits<char>]’
/usr/local/include/boost/archive/detail/interface_iarchive.hpp:68:9:   required from ‘Archive& boost::archive::detail::interface_iarchive<Archive>::operator>>(T&) [with T = IndelErrorCounts; Archive = boost::archive::binary_iarchive]’
/data/apps/strelka/src/c++/lib/errorAnalysis/SequenceErrorCounts.cpp:88:11:   required from here
/usr/local/include/boost/serialization/access.hpp:116:11: error: ‘class boost::array<unsigned int, 6>’ has no member named ‘serialize’
         t.serialize(ar, file_version);
         ~~^~~~~~~~~
/usr/local/include/boost/serialization/access.hpp: In instantiation of ‘static void boost::serialization::access::serialize(Archive&, T&, unsigned int) [with Archive = boost::archive::binary_oarchive; T = boost::array<unsigned int, 6>]’:
/usr/local/include/boost/serialization/serialization.hpp:68:22:   required from ‘void boost::serialization::serialize(Archive&, T&, unsigned int) [with Archive = boost::archive::binary_oarchive; T = boost::array<unsigned int, 6>]’
/usr/local/include/boost/serialization/serialization.hpp:126:14:   required from ‘void boost::serialization::serialize_adl(Archive&, T&, unsigned int) [with Archive = boost::archive::binary_oarchive; T = boost::array<unsigned int, 6>]’
/usr/local/include/boost/archive/detail/oserializer.hpp:150:40:   required from ‘void boost::archive::detail::oserializer<Archive, T>::save_object_data(boost::archive::detail::basic_oarchive&, const void*) const [with Archive = boost::archive::binary_oarchive; T = boost::array<unsigned int, 6>]’
/usr/local/include/boost/archive/detail/oserializer.hpp:103:1:   required from ‘class boost::archive::detail::oserializer<boost::archive::binary_oarchive, boost::array<unsigned int, 6> >’
/usr/local/include/boost/archive/detail/oserializer.hpp:255:13:   required from ‘static void boost::archive::detail::save_non_pointer_type<Archive>::save_standard::invoke(Archive&, const T&) [with T = boost::array<unsigned int, 6>; Archive = boost::archive::binary_oarchive]’
/usr/local/include/boost/archive/detail/oserializer.hpp:310:22:   [ skipping 177 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/local/include/boost/archive/detail/oserializer.hpp:534:18:   required from ‘void boost::archive::save(Archive&, T&) [with Archive = boost::archive::binary_oarchive; T = const IndelErrorCounts]’
/usr/local/include/boost/archive/detail/common_oarchive.hpp:70:22:   required from ‘void boost::archive::detail::common_oarchive<Archive>::save_override(T&) [with T = const IndelErrorCounts; Archive = boost::archive::binary_oarchive]’
/usr/local/include/boost/archive/basic_binary_oarchive.hpp:80:7:   required from ‘void boost::archive::basic_binary_oarchive<Archive>::save_override(const T&) [with T = IndelErrorCounts; Archive = boost::archive::binary_oarchive]’
/usr/local/include/boost/archive/binary_oarchive_impl.hpp:59:9:   required from ‘void boost::archive::binary_oarchive_impl<Archive, Elem, Tr>::save_override(T&) [with T = const IndelErrorCounts; Archive = boost::archive::binary_oarchive; Elem = char; Tr = std::char_traits<char>]’
/usr/local/include/boost/archive/detail/interface_oarchive.hpp:70:9:   required from ‘Archive& boost::archive::detail::interface_oarchive<Archive>::operator<<(const T&) [with T = IndelErrorCounts; Archive = boost::archive::binary_oarchive]’
/data/apps/strelka/src/c++/lib/errorAnalysis/SequenceErrorCounts.cpp:68:11:   required from here
/usr/local/include/boost/serialization/access.hpp:116:11: error: ‘class boost::array<unsigned int, 6>’ has no member named ‘serialize’
src/c++/lib/errorAnalysis/CMakeFiles/strelka_errorAnalysis.dir/build.make:110: recipe for target 'src/c++/lib/errorAnalysis/CMakeFiles/strelka_errorAnalysis.dir/SequenceErrorCounts.cpp.o' failed
make[2]: *** [src/c++/lib/errorAnalysis/CMakeFiles/strelka_errorAnalysis.dir/SequenceErrorCounts.cpp.o] Error 1
CMakeFiles/Makefile2:1094: recipe for target 'src/c++/lib/errorAnalysis/CMakeFiles/strelka_errorAnalysis.dir/all' failed
make[1]: *** [src/c++/lib/errorAnalysis/CMakeFiles/strelka_errorAnalysis.dir/all] Error 2

Am I doing something wrong?

Missing contigs in BAM files

I tried to run configureStrelkaSomaticWorkflow.py for the WGS BAM files. But got an error:
CONFIGURATION ERROR:
'normal' BAM/CRAM file is missing reference fasta chromosome: 'hs37d5'
I guess that the BAM header exclude the decoy sequence. How to work around this issue?

Thanks,

Missing tags in VCF header

I've used strelka on a tumor-normal pair to get somatic calls. Now I need to subset out just the tumor calls from the vcf, which I'm trying to do with bcftools view -s TUMOR. This gives me the following error:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR
[W::vcf_parse] INFO 'QSS' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'TQSS' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'QSS_NT' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'TQSS_NT' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'DP' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'ReadPosRankSum' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'SNVSB' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'FDP' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'SDP' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'SUBDP' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'AU' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'CU' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'GU' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'TU' is not defined in the header, assuming Type=String

In fact these tags really are missing. Here is my vcf header:

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20171107
##source=strelka
##source_version=2.8.3
##startTime=Tue Nov 7 13:43:55 2017
##cmdline=/data/Phil/software/strelka/configureStrelkaSomaticWorkflow.py --normalBam patient_3/patient_3_blood_BQSR.bam --tumorBam patient_3/patient_3_tumor_DNA_BQSR.bam --referenceFasta /data/Phil/ref_phil/GATK_resource/b37/human_g1k_v37.fasta --runDir patient_3 --exome --callRegions ../../intervals/S07604624_Covered.bed.gz
##reference=file:///data/Phil/ref_phil/GATK_resource/b37/human_g1k_v37.fasta
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<ID=4,length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>
##contig=<ID=7,length=159138663>
##contig=<ID=8,length=146364022>
##contig=<ID=9,length=141213431>
##contig=<ID=10,length=135534747>
##contig=<ID=11,length=135006516>
##contig=<ID=12,length=133851895>
##contig=<ID=13,length=115169878>
##contig=<ID=14,length=107349540>
##contig=<ID=15,length=102531392>
##contig=<ID=16,length=90354753>
##contig=<ID=17,length=81195210>
##contig=<ID=18,length=78077248>
##contig=<ID=19,length=59128983>
##contig=<ID=20,length=63025520>
##contig=<ID=21,length=48129895>
##contig=<ID=22,length=51304566>
##contig=<ID=X,length=155270560>
##contig=<ID=Y,length=59373566>
##contig=<ID=MT,length=16569>
##contig=<ID=GL000207.1,length=4262>
##contig=<ID=GL000226.1,length=15008>
##contig=<ID=GL000229.1,length=19913>
##contig=<ID=GL000231.1,length=27386>
##contig=<ID=GL000210.1,length=27682>
##contig=<ID=GL000239.1,length=33824>
##contig=<ID=GL000235.1,length=34474>
##contig=<ID=GL000201.1,length=36148>
##contig=<ID=GL000247.1,length=36422>
##contig=<ID=GL000245.1,length=36651>
##contig=<ID=GL000197.1,length=37175>
##contig=<ID=GL000203.1,length=37498>
##contig=<ID=GL000246.1,length=38154>
##contig=<ID=GL000249.1,length=38502>
##contig=<ID=GL000196.1,length=38914>
##contig=<ID=GL000248.1,length=39786>
##contig=<ID=GL000244.1,length=39929>
##contig=<ID=GL000238.1,length=39939>
##contig=<ID=GL000202.1,length=40103>
##contig=<ID=GL000234.1,length=40531>
##contig=<ID=GL000232.1,length=40652>
##contig=<ID=GL000206.1,length=41001>
##contig=<ID=GL000240.1,length=41933>
##contig=<ID=GL000236.1,length=41934>
##contig=<ID=GL000241.1,length=42152>
##contig=<ID=GL000243.1,length=43341>
##contig=<ID=GL000242.1,length=43523>
##contig=<ID=GL000230.1,length=43691>
##contig=<ID=GL000237.1,length=45867>
##contig=<ID=GL000233.1,length=45941>
##contig=<ID=GL000204.1,length=81310>
##contig=<ID=GL000198.1,length=90085>
##contig=<ID=GL000208.1,length=92689>
##contig=<ID=GL000191.1,length=106433>
##contig=<ID=GL000227.1,length=128374>
##contig=<ID=GL000228.1,length=129120>
##contig=<ID=GL000214.1,length=137718>
##contig=<ID=GL000221.1,length=155397>
##contig=<ID=GL000209.1,length=159169>
##contig=<ID=GL000218.1,length=161147>
##contig=<ID=GL000220.1,length=161802>
##contig=<ID=GL000213.1,length=164239>
##contig=<ID=GL000211.1,length=166566>
##contig=<ID=GL000199.1,length=169874>
##contig=<ID=GL000217.1,length=172149>
##contig=<ID=GL000216.1,length=172294>
##contig=<ID=GL000215.1,length=172545>
##contig=<ID=GL000205.1,length=174588>
##contig=<ID=GL000219.1,length=179198>
##contig=<ID=GL000224.1,length=179693>
##contig=<ID=GL000223.1,length=180455>
##contig=<ID=GL000195.1,length=182896>
##contig=<ID=GL000212.1,length=186858>
##contig=<ID=GL000222.1,length=186861>
##contig=<ID=GL000200.1,length=187035>
##contig=<ID=GL000193.1,length=189789>
##contig=<ID=GL000194.1,length=191469>
##contig=<ID=GL000225.1,length=211173>
##contig=<ID=GL000192.1,length=547496>
##content=strelka somatic indel calls
##priorSomaticIndelRate=1e-06
##INFO=<ID=QSI,Number=1,Type=Integer,Description="Quality score for any somatic variant, ie. for the ALT haplotype to be present at a significantly different frequency in the tumor and normal">
##INFO=<ID=TQSI,Number=1,Type=Integer,Description="Data tier used to compute QSI">
##INFO=<ID=NT,Number=1,Type=String,Description="Genotype of the normal in all data tiers, as used to classify somatic variants. One of {ref,het,hom,conflict}.">
##INFO=<ID=QSI_NT,Number=1,Type=Integer,Description="Quality score reflecting the joint probability of a somatic variant and NT">
##INFO=<ID=TQSI_NT,Number=1,Type=Integer,Description="Data tier used to compute QSI_NT">
##INFO=<ID=SGT,Number=1,Type=String,Description="Most likely somatic genotype excluding normal noise states">
##INFO=<ID=RU,Number=1,Type=String,Description="Smallest repeating sequence unit in inserted or deleted sequence">
##INFO=<ID=RC,Number=1,Type=Integer,Description="Number of times RU repeats in the reference allele">
##INFO=<ID=IC,Number=1,Type=Integer,Description="Number of times RU repeats in the indel allele">
##INFO=<ID=IHP,Number=1,Type=Integer,Description="Largest reference interrupted homopolymer length intersecting with the indel">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic mutation">
##INFO=<ID=OVERLAP,Number=0,Type=Flag,Description="Somatic indel possibly overlaps a second indel.">
##INFO=<ID=SomaticEVS,Number=1,Type=Float,Description="Somatic Empirical Variant Score (EVS) expressing the phred-scaled probability of the call being a false positive observation.">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth for tier1">
##FORMAT=<ID=DP2,Number=1,Type=Integer,Description="Read depth for tier2">
##FORMAT=<ID=TAR,Number=2,Type=Integer,Description="Reads strongly supporting alternate allele for tiers 1,2">
##FORMAT=<ID=TIR,Number=2,Type=Integer,Description="Reads strongly supporting indel allele for tiers 1,2">
##FORMAT=<ID=TOR,Number=2,Type=Integer,Description="Other reads (weak support or insufficient indel breakpoint overlap) for tiers 1,2">
##FORMAT=<ID=DP50,Number=1,Type=Float,Description="Average tier1 read depth within 50 bases">
##FORMAT=<ID=FDP50,Number=1,Type=Float,Description="Average tier1 number of basecalls filtered from original read depth within 50 bases">
##FORMAT=<ID=SUBDP50,Number=1,Type=Float,Description="Average number of reads below tier1 mapping quality threshold aligned across sites within 50 bases">
##FORMAT=<ID=BCN50,Number=1,Type=Float,Description="Fraction of filtered reads within 50 bases of the indel.">
##FILTER=<ID=LowEVS,Description="Somatic Empirical Variant Score (SomaticEVS) is below threshold">
##FILTER=<ID=LowDepth,Description="Tumor sample read depth at this locus is below 2">
##bcftools_viewVersion=1.3.1+htslib-1.3.1
##bcftools_viewCommand=view -h patient_3_strelka.vcf.gz
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR

Difference between strelka2 and strelka1

We've just run strelka2 over the Dream challenge WGS datasets. I found if using "PASS" as the filter, the numbers of SNVs are far less than that from strelka1. E.g, Strelka2 only return 427 SNVs for Dream dataset1 while Strelka1 returns 5573. I run Manta first and then run Strelka2 as you mentioned in the user guide.
Thanks,

Mulit-allelic sites

Hi there.

I just noticed a weird issue, which may or may not be a bug. I attached a single variant in a code block below. The workflow is pretty standard, call variants on samples, merge to generate site list, then recall with forced sites. I notice that some samples are getting GT calls in the format field, even though they have a "NotGenotyped" filter associated with them (see below). It's straightforward enough for me to just set those genotypes to "./.", but I figured I'd let you know in case it is a bug.

Best,
Stefan

2/2:.:.:.:.:NotGenotyped:.:.:999
[quser13::/projects/b1059/workflows/strelka-nf/results] 🏀  bcftools view --regions III:3529601-3550320 pruned_recalled.vcf.gz | grep 3545101 
III	3545101	.	CA	C,CAA	3070	.	CIGAR=1M1D,1M1I;IDREP=8,10;MQ=50;REFREP=9,9;RU=A,A	GT:AD:ADF:ADR:DPI:FT:GQ:GQX:PL	0/0:22,0:16,0:6,0:23:PASS:61:61:.	0/0:32,0:25,0:7,0:38:PASS:91:91:.	0/0:29,0:25,0:4,0:36:PASS:84:84:.	0/0:47,0:36,0:11,0:50:PASS:99:137:.	0/0:24,0:15,0:9,0:25:PASS:68:68:.	0/0:61,0:47,0:14,0:63:PASS:99:176:.	0/0:12,0:9,0:3,0:12:PASS:33:33:.	0/0:30,0:21,0:9,0:33:PASS:85:85:.	0/0:26,0:15,0:11,0:27:PASS:75:75:.	0/0:43,0:30,0:13,0:46:PASS:99:123:.	./.:.:.:.:.:NotGenotyped:.:.:0	0/0:39,0:28,0:11,0:43:PASS:99:112:.	0/0:19,0:10,0:9,0:20:PASS:53:53:.	2/2:.:.:.:.:NotGenotyped:.:.:999	2/2:.:.:.:.:NotGenotyped:.:.:811	2/2:.:.:.:.:NotGenotyped:.:.:840	0/0:11,0:10,0:1,0:12:PASS:30:30:.	2/2:.:.:.:.:NotGenotyped:.:.:881	0/0:38,0:23,0:15,0:42:PASS:99:110:.	0/0:20,0:13,0:7,0:20:PASS:57:57:.	2/2:.:.:.:.:NotGenotyped:.:.:999	2/2:.:.:.:.:NotGenotyped:.:.:425	2/2:.:.:.:.:NotGenotyped:.:.:447	0/0:38,0:29,0:9,0:39:PASS:99:109:.	0/0:41,0:26,0:15,0:44:PASS:99:119:.	2/2:.:.:.:.:NotGenotyped:.:.:899	0/0:11,0:9,0:2,0:12:PASS:30:30:.	0/0:47,0:29,0:18,0:51:PASS:99:136:.	0/2:21,0:14,0:7,0:23:PASS:60:60:.	2/2:.:.:.:.:NotGenotyped:.:.:322	0/0:35,0:29,0:6,0:39:PASS:99:102:.	0/0:20,0:16,0:4,0:22:PASS:56:56:.	0/0:20,0:10,0:10,0:26:PASS:57:57:.	0/0:11,0:9,0:2,0:11:PASS:30:30:.	2/2:.:.:.:.:NotGenotyped:.:.:264	0/0:28,0:17,0:11,0:29:PASS:81:81:.	0/0:33,0:17,0:16,0:42:PASS:96:96:.	0/0:28,0:16,0:12,0:31:PASS:81:81:.	0/0:26,0:13,0:13,0:27:PASS:72:72:.	0/0:27,0:15,0:12,0:28:PASS:75:75:.	0/0:63,0:32,0:31,0:66:PASS:99:177:.	0/0:30,0:16,0:14,0:32:PASS:81:81:.	0/0:46,0:27,0:19,0:49:PASS:99:129:.	0/0:20,0:9,0:11,0:22:PASS:57:57:.	0/0:21,0:13,0:8,0:22:PASS:57:57:.	0/0:78,0:38,0:40,0:84:PASS:99:225:.	0/0:33,0:14,0:19,0:34:PASS:96:96:.	0/0:77,1:44,1:33,0:80:PASS:99:208:.	0/0:73,0:34,0:39,0:77:PASS:99:216:.	0/0:23,0:11,0:12,0:23:PASS:66:66:.	0/0:35,0:19,0:16,0:39:PASS:97:97:.	0/0:52,0:22,0:30,0:62:PASS:99:146:.	0/0:52,0:32,0:20,0:56:PASS:99:148:.	0/0:30,0:17,0:13,0:31:PASS:87:87:.	0/0:37,0:16,0:21,0:37:PASS:99:108:.	0/0:34,0:16,0:18,0:38:PASS:96:96:.	0/0:22,0:8,0:14,0:24:PASS:60:60:.	0/0:25,0:9,0:16,0:31:PASS:72:72:.	0/0:55,0:29,0:26,0:60:PASS:99:162:.	0/0:31,0:15,0:16,0:32:PASS:90:90:.	0/0:61,0:27,0:34,0:64:PASS:99:174:.	0/0:40,0:25,0:15,0:42:PASS:99:117:.	0/0:27,0:14,0:13,0:31:PASS:75:75:.	./.:.:.:.:.:NotGenotyped:.:.:0	0/0:75,0:35,0:40,0:80:PASS:99:212:.	0/0:64,0:32,0:32,0:76:PASS:99:182:.	0/0:29,0:17,0:12,0:32:PASS:82:82:.0/0:21,0:11,0:10,0:21:PASS:58:58:.	0/2:2,0:0,0:2,0:2:LowDepth;LowGQX:5:5:.	0/0:24,0:17,0:7,0:24:PASS:69:69:.	2/2:.:.:.:.:NotGenotyped:.:.:189	0/0:28,0:20,0:8,0:32:PASS:78:78:.	0/0:56,0:43,0:13,0:61:PASS:99:161:.	2/2:.:.:.:.:NotGenotyped:.:.:999	2/2:.:.:.:.:NotGenotyped:.:.:902	0/0:14,0:10,0:4,0:14:PASS:38:38:.	2/2:.:.:.:.:NotGenotyped:.:.:999	0/0:36,0:29,0:7,0:38:PASS:99:104:.	0/0:37,0:20,0:17,0:40:PASS:99:105:.	0/0:49,0:35,0:14,0:55:PASS:99:141:.	0/0:13,0:8,0:5,0:14:PASS:36:36:.	0/0:25,0:18,0:7,0:27:PASS:68:68:.	2/2:.:.:.:.:NotGenotyped:.:.:807	0/0:47,0:34,0:13,0:47:PASS:99:132:.	0/0:42,0:29,0:13,0:50:PASS:99:121:.	0/0:38,0:25,0:13,0:39:PASS:99:108:.	0/0:25,0:18,0:7,0:26:PASS:71:71:.	2/2:.:.:.:.:NotGenotyped:.:.:999	0/0:29,0:20,0:9,0:31:PASS:81:81:.	0/0:42,0:32,0:10,0:50:PASS:99:121:.	0/0:16,0:12,0:4,0:17:PASS:44:44:.	0/0:53,0:44,0:9,0:61:PASS:99:155:.	2/2:.:.:.:.:NotGenotyped:.:.:254	0/2:20,0:13,0:7,0:21:PASS:57:57:.	0/1:17,4:7,4:10,0:22:LowGQX:3:0:.	0/0:48,0:41,0:7,0:56:PASS:99:140:.	2/2:.:.:.:.:NotGenotyped:.:.:999	0/0:29,1:21,1:8,0:34:PASS:68:68:.	0/0:39,0:28,0:11,0:40:PASS:99:111:.	2/2:.:.:.:.:NotGenotyped:.:.:853	0/0:52,0:36,0:16,0:61:PASS:99:151:.	0/0:16,0:12,0:4,0:16:PASS:43:43:.	0/0:23,0:12,0:11,0:24:PASS:63:63:.	0/0:40,0:33,0:7,0:46:PASS:99:115:.	0/0:54,0:36,0:18,0:59:PASS:99:156:.	0/0:39,0:26,0:13,0:42:PASS:99:112:.	0/0:14,0:10,0:4,0:15:PASS:39:39:.	0/0:36,0:29,0:7,0:40:PASS:99:105:.	0/0:26,0:16,0:10,0:28:PASS:74:74:.	0/0:80,0:64,0:16,0:88:PASS:99:235:.	0/0:30,0:26,0:4,0:34:PASS:86:86:.	0/0:18,0:9,0:9,0:19:PASS:50:50:.	0/0:26,0:19,0:7,0:31:PASS:75:75:.	2/2:.:.:.:.:NotGenotyped:.:.:812	0/0:65,0:50,0:15,0:70:PASS:99:188:.	0/0:30,0:23,0:7,0:34:PASS:84:84:.	0/0:30,0:22,0:8,0:33:PASS:87:87:.	2/2:.:.:.:.:NotGenotyped:.:.:999	2/2:.:.:.:.:NotGenotyped:.:.:604	0/0:41,0:29,0:12,0:42:PASS:99:119:.	0/0:25,0:20,0:5,0:30:PASS:72:72:.	2/2:.:.:.:.:NotGenotyped:.:.:427	0/0:38,0:29,0:9,0:43:PASS:99:109:.	0/0:16,0:10,0:6,0:16:PASS:44:44:.	0/0:41,0:30,0:11,0:45:PASS:99:117:.	0/0:14,0:9,0:5,0:15:PASS:39:39:.	0/0:38,0:33,0:5,0:41:PASS:99:109:.	0/0:17,0:12,0:5,0:18:PASS:47:47:.	2/2:.:.:.:.:NotGenotyped:.:.:999	2/2:.:.:.:.:NotGenotyped:.:.:248	2/2:.:.:.:.:NotGenotyped:.:.:315	0/0:34,0:27,0:7,0:38:PASS:97:97:.	0/0:27,0:22,0:5,0:33:PASS:77:77:.	0/0:25,0:20,0:5,0:28:PASS:71:71:.	2/2:.:.:.:.:NotGenotyped:.:.:933	0/0:42,0:33,0:9,0:47:PASS:99:121:.	0/0:34,0:27,0:7,0:44:PASS:98:98:.	0/0:40,0:31,0:9,0:45:PASS:99:116:.	0/0:32,0:27,0:5,0:35:PASS:91:91:.	0/0:24,0:16,0:8,0:26:PASS:69:69:.	0/1:13,1:9,1:4,0:16:LowGQX:26:0:.	2/2:.:.:.:.:NotGenotyped:.:.:999	0/0:21,0:14,0:7,0:24:PASS:58:58:.	0/0:32,0:25,0:7,0:35:PASS:92:92:.	0/0:36,0:27,0:9,0:40:PASS:99:102:.	0/0:55,0:45,0:10,0:59:PASS:99:159:.	0/0:35,0:28,0:7,0:38:PASS:99:100:.	0/0:20,0:17,0:3,0:20:PASS:55:55:.	0/0:58,0:48,0:10,0:67:PASS:99:169:.	0/0:15,0:12,0:3,0:20:PASS:41:41:.	0/0:33,0:31,0:2,0:39:PASS:93:93:.	0/0:50,0:40,0:10,0:56:PASS:99:146:.	0/0:36,0:32,0:4,0:41:PASS:99:105:.	0/0:37,0:29,0:8,0:40:PASS:99:107:.	2/2:.:.:.:.:NotGenotyped:.:.:781	0/0:39,0:37,0:2,0:43:PASS:99:112:.	0/0:31,0:21,0:10,0:35:PASS:88:88:.	0/0:37,0:33,0:4,0:39:PASS:99:107:.	0/0:29,0:24,0:5,0:34:PASS:82:82:.	0/0:18,0:14,0:4,0:24:PASS:51:51:.	0/0:37,0:18,0:19,0:41:PASS:99:107:.	0/0:40,0:26,0:14,0:44:PASS:99:113:.	0/0:36,0:29,0:7,0:42:PASS:99:103:.	2/2:.:.:.:.:NotGenotyped:.:.:339	2/2:.:.:.:.:NotGenotyped:.:.:72	0/0:6,0:4,0:2,0:6:LowGQX:14:14:.	0/2:25,0:14,0:11,0:26:PASS:72:72:.	0/0:15,0:10,0:5,0:15:PASS:40:40:.	0/0:9,0:5,0:4,0:11:PASS:24:24:.	2/2:.:.:.:.:NotGenotyped:.:.:221	2/2:.:.:.:.:NotGenotyped:.:.:317	0/0:22,0:14,0:8,0:25:PASS:61:61:.	0/2:10,0:6,0:4,0:10:PASS:26:26:.	0/0:21,0:8,0:13,0:24:PASS:59:59:.	2/2:.:.:.:.:NotGenotyped:.:.:194	2/2:.:.:.:.:NotGenotyped:.:.:999	0/2:1,0:1,0:0,0:2:LowDepth;LowGQX:3:2:.	0/0:7,0:4,0:3,0:7:PASS:18:18:.	0/0:7,0:4,0:3,0:7:PASS:18:18:.	./2:.:.:.:.:NotGenotyped:.:.:875	0/0:47,0:33,0:14,0:49:PASS:99:135:.	0/0:14,0:7,0:7,0:16:PASS:39:39:.	0/0:27,0:14,0:13,0:31:PASS:74:74:.	0/0:20,0:10,0:10,0:23:PASS:55:55:.	0/0:49,0:40,0:9,0:55:PASS:99:141:.	0/0:40,0:35,0:5,0:44:PASS:99:115:.	0/0:35,0:28,0:7,0:35:PASS:99:100:.	0/0:16,0:10,0:6,0:16:PASS:45:45:.	0/0:15,0:8,0:7,0:16:PASS:41:41:.	2/2:.:.:.:.:NotGenotyped:.:.:999	0/0:25,0:19,0:6,0:25:PASS:69:69:.	0/0:43,0:35,0:8,0:44:PASS:99:123:.	0/0:26,0:20,0:6,0:27:PASS:73:73:.	2/2:.:.:.:.:NotGenotyped:.:.:495	0/0:16,0:8,0:8,0:20:PASS:45:45:.	2/2:.:.:.:.:NotGenotyped:.:.:680	2/2:.:.:.:.:NotGenotyped:.:.:460	./2:.:.:.:.:NotGenotyped:.:.:999	0/1:61,2:45,2:16,0:70:LowGQX:49:2:.	2/2:.:.:.:.:NotGenotyped:.:.:999	0/0:11,0:9,0:2,0:12:PASS:30:30:.	0/0:8,0:5,0:3,0:10:PASS:20:20:.	0/0:23,0:15,0:8,0:25:PASS:66:66:.	2/2:.:.:.:.:NotGenotyped:.:.:999	0/0:43,0:29,0:14,0:48:PASS:99:123:.	0/0:41,0:26,0:15,0:47:PASS:99:118:.	2/2:.:.:.:.:NotGenotyped:.:.:627	0/0:22,0:17,0:5,0:24:PASS:63:63:.	0/0:40,0:32,0:8,0:41:PASS:99:114:.	0/0:42,0:32,0:10,0:46:PASS:99:122:.	0/0:16,0:11,0:5,0:17:PASS:45:45:.	0/0:42,0:30,0:12,0:45:PASS:99:120:.	2/2:.:.:.:.:NotGenotyped:.:.:999	0/0:7,0:3,0:4,0:9:PASS:18:18:.	0/0:38,0:28,0:10,0:45:PASS:99:109:.	0/0:50,0:31,0:19,0:59:PASS:99:143:.	0/0:42,0:28,0:14,0:47:PASS:99:122:.	0/0:53,0:38,0:15,0:56:PASS:99:155:.	0/0:44,0:32,0:12,0:49:PASS:99:126:.	0/0:34,0:22,0:12,0:38:PASS:95:95:.	0/0:11,0:7,0:4,0:12:PASS:30:30:.	0/0:24,0:18,0:6,0:27:PASS:67:67:.	0/0:28,0:20,0:8,0:32:PASS:80:80:.	0/0:57,0:37,0:20,0:65:PASS:99:164:.	0/0:33,0:25,0:8,0:40:PASS:94:94:.	0/0:32,0:23,0:9,0:32:PASS:91:91:.	0/0:30,0:18,0:12,0:31:PASS:85:85:.	0/0:25,0:23,0:2,0:28:PASS:72:72:.	0/0:49,0:44,0:5,0:56:PASS:99:142:.	0/0:30,0:15,0:15,0:36:PASS:83:83:.	0/0:14,0:9,0:5,0:15:PASS:39:39:.	0/0:25,0:16,0:9,0:28:PASS:70:70:.	0/0:28,0:25,0:3,0:29:PASS:80:80:.	0/0:28,0:17,0:11,0:30:PASS:80:80:.	2/2:.:.:.:.:NotGenotyped:.:.:222	0/0:36,0:24,0:12,0:38:PASS:99:103:.	2/2:.:.:.:.:NotGenotyped:.:.:497	0/0:13,0:6,0:7,0:16:PASS:35:35:.	2/2:.:.:.:.:NotGenotyped:.:.:894	2/2:.:.:.:.:NotGenotyped:.:.:417	0/0:8,0:6,0:2,0:8:PASS:20:20:.	2/2:.:.:.:.:NotGenotyped:.:.:391	2/2:.:.:.:.:NotGenotyped:.:.:931	2/2:.:.:.:.:NotGenotyped:.:.:999	2/2:.:.:.:.:NotGenotyped:.:.:999	2/2:.:.:.:.:NotGenotyped:.:.:999	2/2:.:.:.:.:NotGenotyped:.:.:241	0/0:14,0:9,0:5,0:17:PASS:38:38:.	0/0:16,0:12,0:4,0:17:PASS:45:45:.	0/0:37,0:24,0:13,0:41:PASS:99:106:.	0/0:16,0:14,0:2,0:19:PASS:44:44:.	0/0:21,0:12,0:9,0:24:PASS:60:60:.	2/2:.:.:.:.:NotGenotyped:.:.:999	2/2:.:.:.:.:NotGenotyped:.:.:859	2/2:.:.:.:.:NotGenotyped:.:.:730	2/2:.:.:.:.:NotGenotyped:.:.:351	2/2:.:.:.:.:NotGenotyped:.:.:273	0/0:21,0:13,0:8,0:22:PASS:58:58:.	0/0:53,0:38,0:15,0:63:PASS:99:154:.	0/0:10,0:8,0:2,0:11:PASS:27:27:.	0/0:71,0:59,0:12,0:81:PASS:99:209:.	2/2:.:.:.:.:NotGenotyped:.:.:798	2/2:.:.:.:.:NotGenotyped:.:.:566	0/0:34,0:27,0:7,0:37:PASS:99:99:.	0/0:17,0:13,0:4,0:19:PASS:47:47:.	0/0:35,0:29,0:6,0:37:PASS:98:98:.	0/0:20,0:18,0:2,0:27:PASS:56:56:.	0/0:40,0:34,0:6,0:44:PASS:99:116:.	0/0:36,0:28,0:8,0:41:PASS:99:105:.	0/0:21,0:17,0:4,0:25:PASS:59:59:.	0/0:43,0:32,0:11,0:45:PASS:99:125:.	2/2:.:.:.:.:NotGenotyped:.:.:459	2/2:.:.:.:.:NotGenotyped:.:.:480	0/0:30,0:16,0:14,0:30:PASS:84:84:.	2/2:.:.:.:.:NotGenotyped:.:.:565	0/0:39,1:20,0:19,1:43:PASS:99:108:.	0/0:44,1:23,0:21,1:47:PASS:99:113:.	0/0:24,0:11,0:13,0:26:PASS:66:66:.	2/2:.:.:.:.:NotGenotyped:.:.:999	2/2:.:.:.:.:NotGenotyped:.:.:193	2/2:.:.:.:.:NotGenotyped:.:.:474	2/2:.:.:.:.:NotGenotyped:.:.:917	./2:.:.:.:.:NotGenotyped:.:.:38	2/2:.:.:.:.:NotGenotyped:.:.:911	2/2:.:.:.:.:NotGenotyped:.:.:506	2/2:.:.:.:.:NotGenotyped:.:.:411	2/2:.:.:.:.:NotGenotyped:.:.:632	0/0:37,0:22,0:15,0:40:PASS:99:103:.	0/0:76,0:41,0:35,0:84:PASS:99:214:.	0/0:28,0:15,0:13,0:29:PASS:77:77:.	0/0:84,0:43,0:41,0:88:PASS:99:238:.	0/0:35,0:17,0:18,0:38:PASS:99:99:.	0/0:60,0:30,0:30,0:61:PASS:99:171:.	0/0:32,0:18,0:14,0:34:PASS:88:88:.	0/0:21,0:13,0:8,0:24:PASS:57:57:.	0/0:74,0:35,0:39,0:76:PASS:99:211:.	0/0:40,0:19,0:21,0:43:PASS:99:111:.	0/0:43,0:22,0:21,0:46:PASS:99:122:.	0/0:57,0:30,0:27,0:62:PASS:99:163:.	0/0:48,0:25,0:23,0:50:PASS:99:136:.	1/1:1,58:1,38:0,20:65:PASS:99:17:.	2/2:.:.:.:.:NotGenotyped:.:.:529	0/0:15,0:9,0:6,0:15:PASS:42:42:.	0/0:18,0:14,0:4,0:20:PASS:51:51:.	0/0:44,0:26,0:18,0:47:PASS:99:127:0/0:61,0:48,0:13,0:66:PASS:99:176:.	2/2:.:.:.:.:NotGenotyped:.:.:319	0/0:36,0:30,0:6,0:38:PASS:99:101:.	2/2:.:.:.:.:NotGenotyped:.:.:999	0/0:33,0:27,0:6,0:35:PASS:92:92:.	0/0:44,0:33,0:11,0:44:PASS:99:123:.	0/0:69,0:49,0:20,0:77:PASS:99:201:.	0/0:35,0:23,0:12,0:36:PASS:99:100:.	0/0:21,0:14,0:7,0:21:PASS:60:60:.	0/0:45,0:31,0:14,0:55:PASS:99:130:.	0/0:41,0:25,0:16,0:49:PASS:99:118:.	2/2:.:.:.:.:NotGenotyped:.:.:370	0/0:16,1:11,0:5,1:17:PASS:43:43:.	0/0:46,1:22,1:24,0:52:PASS:99:119:.	0/0:26,0:22,0:4,0:31:PASS:74:74:.	0/0:17,0:15,0:2,0:20:PASS:48:48:.	0/0:29,0:22,0:7,0:32:PASS:80:80:.	0/0:25,0:21,0:4,0:27:PASS:72:72:.	0/0:4,0:3,0:1,0:4:LowGQX:9:9:.	0/0:16,0:7,0:9,0:17:PASS:43:43:.	0/0:4,0:4,0:0,0:4:LowGQX:9:9:.	0/2:8,0:4,0:4,0:8:PASS:21:21:.	0/0:8,0:5,0:3,0:8:PASS:21:21:.	0/2:18,0:6,0:12,0:20:PASS:51:51:.	2/2:.:.:.:.:NotGenotyped:.:.:53

starling2 segfault

I have repetitive problem with strelka/starling2.
Workflow executed independently on two different machines, fails during processing the same area of chromosome 21 (log fragment below). Bam file has been processed by latest version of Isaac.
What could be causing these problems?
strelka_error_selected.txt

big genome problem

I have used isaac4 to assemble my reads with big genome, but i can't successfully annotation my bam with a big genome with strelka.
So,could you help me to solve the problem?

'force' call for every position in provided bed file

For a gene panel (regions available as bed file) we know all mutated positions (ground truth dataset).
I'd like strelka to call all positions so that i can develop my own filter that optimizes sensitivity and specificity. Is that already possible? I'm not sure if it can be achieved with the option forceGT.
Thanks for helping!
Regards, Christian

Unknown htslib error value in sam_read1

I have three sample,(A,B,C). after aligned, A,with B is well ,but C with B ouput some error, in below:

[ERROR] [2018-01-04T12:30:20.971184] [localhost.localdomain] [15578_1] [CallGenome+callGenomeSegment_chromId_011_Chr12_0000] FATAL_ERROR: strelka EXCEPTION: ERROR: Unknown htslib error value in sam_read1 '-2' while attempting to read BAM/CRAM file:

add option to supply a list of regions instead of one at a time

Instead of doing this in Strelka --region chr2:100-2000 --region chr3:2500-3000 to limit the analysis to a region of the genome for debugging purposes, it would be nice if we could supply these regions in a file (e.g. in bed format).

Is there a reason why one should NOT do this. i.e. in a context other than for debugging purposes? For example, if we know in advance that we only care about variants in certain regions.

Filtering variants

When analyzing the sample (for looking mosaic), I get for an exome sample (exome option included) more or less 40000 variants when filtering for just taking the "PASS" flag. Anyway, that´s too much. I would like to get around 10 variants per sample. Could you please recommend some other filters to apply to the variants.vcf result file?

Thanks in advance

FATAL_ERROR: Attempting to lookup basecall quality score 72 which exceeds the maximum cached basecall quality score of 70

Hi,

I am running Strelka on RNA-seq data. I have had no problems running it before but for this particular dataset, it is throwing an error. Here is my command line:

configureStrelkaGermlineWorkflow.py --bam=sample.sorted.deduped.bam --referenceFasta=Homo_sapiens_assembly19.fasta --rna --runDir=outdir

outdir/runWorkflow.py -m local -j 8

Here are the bam, fasta files as well as the resulting output directory: s3://strelka-test

Please let me know if you need more information.

Thanks
Komal

Recall germline variants error : Assertion `index < size()' failed

Hi there,

I am running into a confusing issue while running the germline workflow. Here is what I am doing (in Nextflow):

Call sample variants:

    python2 ${strelka_path}/bin/configureStrelkaGermlineWorkflow.py \\
      --bam \${SM_use} \\
      --referenceFasta ${reference_handle_uncompressed} \\
      --runDir .

    python2 runWorkflow.py -m local -j ${task.cpus-1}

    bcftools view \\
    -Oz \\
    -o ${SM}_strelka.vcf.gz \\
    results/variants/variants.vcf.gz

    bcftools index ${SM}_strelka.vcf.gz 

Note that SM_use is a bam file that is subsampled to 100x depth if it has >100x depth genome-wide.

Then I merge sample VCFs:

    awk '{ if (\$0 !~ />/) {print toupper(\$0)} else {print \$0} }' ${reference_handle_uncompressed} > uppercase_ref.fa
    samtools faidx uppercase_ref.fa

    bcftools merge -m both --missing-to-ref -Oz ${merged_deletion_vcf} | \\
    bcftools norm -m -any -Oz | \\
    bcftools norm --fasta-ref uppercase_ref.fa -Oz -o merged_strelka.vcf.gz 

    tabix -p vcf merged_strelka.vcf.gz

Side note that I am changing the fasta file to be uppercase because Strelka2 throws an error if it sees mixed-cases in REF/ALT columns of the VCF

Finally, recall using identified sites in the population:

    python2 ${strelka_path}/bin/configureStrelkaGermlineWorkflow.py \\
      --bam \${SM_use} \\
      --referenceFasta ${reference_handle_uncompressed} \\
      --forcedGT ${joint_vcf} \\
      --runDir .

    python2 runWorkflow.py -m local -j ${task.cpus-1}

    bcftools view -Oz -o ${SM}_strelka_recalled.vcf.gz results/variants/variants.vcf.gz 

    bcftools index ${SM}_strelka_recalled.vcf.gz 

Where ${joint_vcf} is the output of the merge command. Again, note that SM_use is a subsampled bam file, if depth is >100x.

The error I am getting from the recall step is below. The reason I cam confused about this error is because Strelka2 has no issue recalling variants with the above pipeline for more than 50 of my samples. However, I keep getting the error below for one of my samples - I am not sure if any other samples will throw the same error. I assume my BAM file for this sample is fine because the first pass of calling worked as expected. Note, that this particular sample does not require subsampling, so it isn't an issue with subsampling.

Any help is greatly appreciated.
Best,
S

Command executed:

  # Subsample high-depth bams
     coverage=`goleft covstats QG2836.bam | awk 'NR > 1 { printf "%5.0f", $1 }'`
  
     if [ ${coverage} -gt 100 ];
     then
  
         # Add a trap to remove temp files
         function finish {
             rm "QG2836.subsample.bam"
         }
         trap finish EXIT
  
         echo "Coverage is above 100x; Subsampling to 100x"
         # Calculate fraction of reads to keep
         frac_keep=`echo "100.0 / ${coverage}" | bc -l | awk '{printf "%0.2f", $0 }'`
         SM_use="QG2836.subsample.bam"
         sambamba view --nthreads=8 --show-progress --format=bam --with-header --subsample=${frac_keep} QG2836.bam > ${SM_use}
         sambamba index --nthreads 8 ${SM_use}
     else
         echo "Coverage is below 100x; No subsampling"
         SM_use="QG2836.bam"
     fi;
  
  export SM_use
  
     python2 ~/.pyenv/versions/miniconda3-4.3.27/envs/py2-2018-03-09/share/strelka-2.9.2-0/bin/configureStrelkaGermlineWorkflow.py \
       --bam ${SM_use} \
       --referenceFasta /projects/b1059/data/genomes/c_elegans/WS245/WS245.fa \
       --forcedGT merged_strelka.vcf.gz \
       --runDir .
  
     python2 runWorkflow.py -m local -j 7
  
     bcftools view -Oz -o QG2836_strelka_recalled.vcf.gz results/variants/variants.vcf.gz 
  
     bcftools index QG2836_strelka_recalled.vcf.gz

Command exit status:
  1

Command output:
  Coverage is below 100x; No subsampling
  
  Successfully created workflow run script.
  To execute the workflow, run the following script and set appropriate options:
  
  runWorkflow.py

Command error:
  [2018-03-23T16:55:04.331149Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+compressGenomeSegment_chromId_000_I_0000_gVCF_S1' launched from sub-workflow 'CallGenome'
  [2018-03-23T16:55:04.333726Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Launching command task: 'CallGenome+callGenomeSegment_chromId_006_MtDNA_0000' from sub-workflow 'CallGenome'
  [2018-03-23T16:55:04.340385Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskRunner:CallGenome+callGenomeSegment_chromId_006_MtDNA_0000] Task initiated on local node
  [2018-03-23T16:55:10.353593Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+callGenomeSegment_chromId_006_MtDNA_0000' launched from sub-workflow 'CallGenome'
  [2018-03-23T16:55:10.355513Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Launching command task: 'CallGenome+compressGenomeSegment_chromId_006_MtDNA_0000_gVCF_S1' from sub-workflow 'CallGenome'
  [2018-03-23T16:55:10.360215Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskRunner:CallGenome+compressGenomeSegment_chromId_006_MtDNA_0000_gVCF_S1] Task initiated on local node
  [2018-03-23T16:55:10.470456Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+compressGenomeSegment_chromId_006_MtDNA_0000_gVCF_S1' launched from sub-workflow 'CallGenome'
  [2018-03-23T16:55:10.473088Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Launching command task: 'CallGenome+callGenomeSegment_chromId_000_I_0001' from sub-workflow 'CallGenome'
  [2018-03-23T16:55:10.479057Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskRunner:CallGenome+callGenomeSegment_chromId_000_I_0001] Task initiated on local node
  [2018-03-23T16:55:12.643132Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+callGenomeSegment_chromId_003_IV_0000' launched from sub-workflow 'CallGenome'
  [2018-03-23T16:55:12.645692Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Launching command task: 'CallGenome+compressGenomeSegment_chromId_003_IV_0000_variants' from sub-workflow 'CallGenome'
  [2018-03-23T16:55:12.652749Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskRunner:CallGenome+compressGenomeSegment_chromId_003_IV_0000_variants] Task initiated on local node
  [2018-03-23T16:55:16.619594Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+compressGenomeSegment_chromId_003_IV_0000_variants' launched from sub-workflow 'CallGenome'
  [2018-03-23T16:55:16.621815Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Launching command task: 'CallGenome+compressGenomeSegment_chromId_006_MtDNA_0000_variants' from sub-workflow 'CallGenome'
  [2018-03-23T16:55:16.628144Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskRunner:CallGenome+compressGenomeSegment_chromId_006_MtDNA_0000_variants] Task initiated on local node
  [2018-03-23T16:55:16.736578Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+compressGenomeSegment_chromId_006_MtDNA_0000_variants' launched from sub-workflow 'CallGenome'
  [2018-03-23T16:55:16.738933Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Launching command task: 'CallGenome+compressGenomeSegment_chromId_003_IV_0000_gVCF_S1' from sub-workflow 'CallGenome'
  [2018-03-23T16:55:16.744519Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskRunner:CallGenome+compressGenomeSegment_chromId_003_IV_0000_gVCF_S1] Task initiated on local node
  [2018-03-23T16:55:18.393563Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskRunner:CallGenome+callGenomeSegment_chromId_004_V_0001] Task initiated on local node
  [2018-03-23T16:55:18.407326Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] Failed to complete command task: 'CallGenome+callGenomeSegment_chromId_004_V_0001' launched from sub-workflow 'CallGenome', error code: 1, command: '/home/szs315/.pyenv/versions/miniconda3-4.3.27/envs/py2-2018-03-09/share/strelka-2.9.2-0/libexec/starling2 --region V:10462091-20924180 --ref /projects/b1059/data/genomes/c_elegans/WS245/WS245.fa --max-indel-size 49 --force-output-vcf merged_strelka.vcf.gz --min-mapping-quality 20 --gvcf-output-prefix workspace/genomeSegment.tmpdir/segment.chromId_004_V_0001. --gvcf-min-gqx 15 --gvcf-min-homref-gqx 15 --gvcf-max-snv-strand-bias 10 --enable-read-backed-phasing --stats-file workspace/genomeSegment.tmpdir/runStats.chromId_004_V_0001.xml --snv-scoring-model-file /home/szs315/.pyenv/versions/miniconda3-4.3.27/envs/py2-2018-03-09/share/strelka-2.9.2-0/share/config/germlineSNVScoringModels.json --indel-scoring-model-file /home/szs315/.pyenv/versions/miniconda3-4.3.27/envs/py2-2018-03-09/share/strelka-2.9.2-0/share/config/germlineIndelScoringModels.json --align-file QG2836.bam --gvcf-skip-header --chrom-depth-file workspace/chromDepth.tsv --indel-error-models-file workspace/sequenceErrorModel.Sample000.json --theta-file /home/szs315/.pyenv/versions/miniconda3-4.3.27/envs/py2-2018-03-09/share/strelka-2.9.2-0/share/config/theta.json'
  [2018-03-23T16:55:18.407607Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] Error Message:
  [2018-03-23T16:55:18.407677Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] Anomalous task wrapper stderr output. Wrapper signal file: 'workspace/pyflow.data/logs/tmp/taskWrapperLogs/000/115/pyflowTaskWrapper.signal.txt'
  [2018-03-23T16:55:18.407734Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] Logging 6 line(s) of task wrapper log output below:
  [2018-03-23T16:55:18.407790Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] [2018-03-23T16:51:00.728318Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [pyflowTaskWrapper:CallGenome+callGenomeSegment_chromId_004_V_0001] [wrapperSignal] wrapperStart
  [2018-03-23T16:55:18.407844Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] [2018-03-23T16:51:00.741129Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [pyflowTaskWrapper:CallGenome+callGenomeSegment_chromId_004_V_0001] [wrapperSignal] taskStart
  [2018-03-23T16:55:18.407899Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] [2018-03-23T16:52:49.277194Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [pyflowTaskWrapper:CallGenome+callGenomeSegment_chromId_004_V_0001] [wrapperSignal] taskExitCode -6
  [2018-03-23T16:55:18.407951Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] [2018-03-23T16:52:49.279489Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [pyflowTaskWrapper:CallGenome+callGenomeSegment_chromId_004_V_0001] [wrapperSignal] taskStderrTail 2
  [2018-03-23T16:55:18.408001Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] Last 1 stderr lines from task (of 1 total lines):
  [2018-03-23T16:55:18.409728Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] [2018-03-23T16:52:49.270843Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [CallGenome+callGenomeSegment_chromId_004_V_0001] starling2: /builder/src/c++/lib/starling_common/OrthogonalVariantAlleleCandidateGroup.hh:59: const AlleleIter_t& OrthogonalVariantAlleleCandidateGroup::iter(unsigned int) const: Assertion `index < size()' failed.
  [2018-03-23T16:55:18.409802Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] Shutting down task submission. Waiting for remaining tasks to complete.
  [2018-03-23T16:55:19.026735Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] [ERROR] Failed to complete sub-workflow task: 'CallGenome' launched from master workflow, failed sub-workflow classname: 'CallWorkflow'
  [2018-03-23T16:55:21.009055Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskRunner:CallGenome+callGenomeSegment_chromId_002_III_0000] Task initiated on local node
  [2018-03-23T16:55:21.047621Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+callGenomeSegment_chromId_002_III_0000' launched from sub-workflow 'CallGenome'
  [2018-03-23T16:55:21.047825Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+callGenomeSegment_chromId_003_IV_0001' launched from sub-workflow 'CallGenome'
  [2018-03-23T16:55:23.969515Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+compressGenomeSegment_chromId_003_IV_0000_gVCF_S1' launched from sub-workflow 'CallGenome'
  [2018-03-23T16:56:41.670945Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+callGenomeSegment_chromId_004_V_0000' launched from sub-workflow 'CallGenome'
  [2018-03-23T16:59:05.126732Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+callGenomeSegment_chromId_000_I_0001' launched from sub-workflow 'CallGenome'
  [2018-03-23T16:59:46.013245Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [TaskManager] Completed command task: 'CallGenome+callGenomeSegment_chromId_001_II_0000' launched from sub-workflow 'CallGenome'
  [2018-03-23T17:00:03.858058Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] Workflow terminated due to the following task errors:
  [2018-03-23T17:00:03.870423Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] Failed to complete command task: 'CallGenome+callGenomeSegment_chromId_004_V_0001' launched from sub-workflow 'CallGenome', error code: 1, command: '/home/szs315/.pyenv/versions/miniconda3-4.3.27/envs/py2-2018-03-09/share/strelka-2.9.2-0/libexec/starling2 --region V:10462091-20924180 --ref /projects/b1059/data/genomes/c_elegans/WS245/WS245.fa --max-indel-size 49 --force-output-vcf merged_strelka.vcf.gz --min-mapping-quality 20 --gvcf-output-prefix workspace/genomeSegment.tmpdir/segment.chromId_004_V_0001. --gvcf-min-gqx 15 --gvcf-min-homref-gqx 15 --gvcf-max-snv-strand-bias 10 --enable-read-backed-phasing --stats-file workspace/genomeSegment.tmpdir/runStats.chromId_004_V_0001.xml --snv-scoring-model-file /home/szs315/.pyenv/versions/miniconda3-4.3.27/envs/py2-2018-03-09/share/strelka-2.9.2-0/share/config/germlineSNVScoringModels.json --indel-scoring-model-file /home/szs315/.pyenv/versions/miniconda3-4.3.27/envs/py2-2018-03-09/share/strelka-2.9.2-0/share/config/germlineIndelScoringModels.json --align-file QG2836.bam --gvcf-skip-header --chrom-depth-file workspace/chromDepth.tsv --indel-error-models-file workspace/sequenceErrorModel.Sample000.json --theta-file /home/szs315/.pyenv/versions/miniconda3-4.3.27/envs/py2-2018-03-09/share/strelka-2.9.2-0/share/config/theta.json'
  [2018-03-23T17:00:03.870560Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] Error Message:
  [2018-03-23T17:00:03.870637Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] Anomalous task wrapper stderr output. Wrapper signal file: 'workspace/pyflow.data/logs/tmp/taskWrapperLogs/000/115/pyflowTaskWrapper.signal.txt'
  [2018-03-23T17:00:03.870710Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] Logging 6 line(s) of task wrapper log output below:
  [2018-03-23T17:00:03.870780Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] [2018-03-23T16:51:00.728318Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [pyflowTaskWrapper:CallGenome+callGenomeSegment_chromId_004_V_0001] [wrapperSignal] wrapperStart
  [2018-03-23T17:00:03.870850Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] [2018-03-23T16:51:00.741129Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [pyflowTaskWrapper:CallGenome+callGenomeSegment_chromId_004_V_0001] [wrapperSignal] taskStart
  [2018-03-23T17:00:03.871315Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] [2018-03-23T16:52:49.277194Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [pyflowTaskWrapper:CallGenome+callGenomeSegment_chromId_004_V_0001] [wrapperSignal] taskExitCode -6
  [2018-03-23T17:00:03.871410Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] [2018-03-23T16:52:49.279489Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [pyflowTaskWrapper:CallGenome+callGenomeSegment_chromId_004_V_0001] [wrapperSignal] taskStderrTail 2
  [2018-03-23T17:00:03.871484Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] Last 1 stderr lines from task (of 1 total lines):
  [2018-03-23T17:00:03.871565Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] [CallGenome+callGenomeSegment_chromId_004_V_0001] [taskWrapper-stderr] [2018-03-23T16:52:49.270843Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [CallGenome+callGenomeSegment_chromId_004_V_0001] starling2: /builder/src/c++/lib/starling_common/OrthogonalVariantAlleleCandidateGroup.hh:59: const AlleleIter_t& OrthogonalVariantAlleleCandidateGroup::iter(unsigned int) const: Assertion `index < size()' failed.
  [2018-03-23T17:00:03.871640Z] [qnode5160.quest.it.northwestern.edu] [1533_1] [WorkflowRunner] [ERROR] Failed to complete sub-workflow task: 'CallGenome' launched from master workflow, failed sub-workflow classname: 'CallWorkflow'

Assertion `callRegion.end > callRegion.begin' failed

Hello,

where would I start to debug this error ?

The call is:

$STRELKA_DIR/configureStrelkaSomaticWorkflow.py
--targeted
--normalBam $STRELKA_BAM_CONTROL
--tumorBam $STRELKA_BAM_TUMOR_DIR_P1/$STRELKA_INPUT/$STRELKA_BAM_TUMOR_DIR_P2
--referenceFasta $STRELKA_FASTA
--callRegions $STRELKA_CALL_REGION
--runDir ./${STRELKA_INPUT}_${STRELKA_OUTPUT_DIR}

So this is a somatic call, using a --callRegion bed file. The region bed file is built to only have Exon locations in GRCh38 v91. (bgzip and tabix processed). This file was build from something like this: http://crazyhottommy.blogspot.ca/2013/05/find-exons-introns-and-intergenic.html

From the pyflowTaskWrapper.signal.txt error file:
[2018-03-13T21:55:38.141622] [cp2329.m] [15635_1] [WorkflowRunner] [ERROR] [CallGenome+callGenomeSegment_chromId_002_11_0008] [taskWrapper-stderr] [2018-03-13T20:30:01.442615] [cp2329.m] [15635_1] [CallGenome+callGenomeSegment_chromId_002_11_0008] strelka2: /builder/src/c++/lib/starling_common/starling_pos_processor_util.cpp:104: void getSubRegionsFromBedTrack(const string&, const string&, const known_pos_range2&, std::vector<known_pos_range2>&): Assertion `callRegion.end > callRegion.begin' failed.
[2018-03-13T21:55:58.998848] [cp2329.m] [15635_1] [WorkflowRunner] [ERROR] Failed to complete sub-workflow task: 'CallGenome' launched from master workflow, failed sub-workflow classname: 'CallWorkflow'

Any help would be appreciated,

Thanks,

B.

Confusing version numbers (strelka "1")

Hi

When downloading the tar.gz from ftp://ftp.illumina.com/v1-branch/v1.0.14/ we get a strelka_workflow-1.0.14 directory with

 $ cat version.txt 
1.0.14

But then the "main" c++ file is called strelka/src/bin/strelka2.cpp and in strelka/src/lib/strelka/strelka_info.hh:

    const char* version() const {
        static const char VERSION[] = "2.0.17.strelka1";
        return VERSION;
    }

What does all this mean?

Thanks

CC: @julia326

Different results after upgrading from Strelka2.7.1 to 2.9.2

Dear developers,

After I upgraded from Strelka2.7.1 to 2.9.2 (via conda), I am getting quite different somatic calls for INDELS. Even though total number is close (200 vs 217), only 14 overlap when I compare coordinates. I also checked configureStrelkaSomaticWorkflow files, they are identical, with only difference that 2.9.2 has "maxIndelSize = 49" parameter.

What can de the reason of that?

Thank you

Vlad

"PASS" somatic variant has no supporting reads

Hi
here is the output by strelka somatic calling mode
chr16 50327364 . C T . PASS SOMATIC;QSS=53;TQSS=1;NT=ref;QSS_NT=53;TQSS_NT=1;SGT=CC->CC;DP=52;MQ=60.00;MQ0=0;ReadPosRankSum=0.00;SNVSB=0.00;SomaticEVS=20.72 DP:FDP:SDP:SUBDP:AU:CU:GU:TU 51:0:0:0:0,0:47,48:0,0:4,4 0:0:0:0:0,0:0,0:0,0:0,0
The last field corresponds the reads count information in tumor samples. However, there's no reads at all but it is marked as "PASS" somatic snv.
how dose this happen?
my strelka version is strelka-2.7.1.centos5_x86_64.

thanks!

No variants reported from RNA-seq data

Hi,

I am trying to run Strelka germline calling on a RNA-seq dataset. I have successfully been able to run it on a different dataset and have had variants called in it but this other dataset I am getting 0 variants reported in the variants.vcf.gz file (for all the samples in the dataset). There are also no errors or warnings reported.

Can you please look into this and let me know what could be the issue? Here is my command:

# configure workflow
configureStrelkaGermlineWorkflow.py --bam=sample_1stpass.bam --referenceFasta=Homo_sapiens_assembly19.fasta --rna --runDir=outdir

# run workflow
outdir/runWorkflow.py -m local -j 8

For your reference, I have put the bam, fasta reference and results folder on this s3 bucket: s3://strelka-test/

P.S.: There are two bam files in the bucket, one has the suffix _1stpass.bam that successfully runs but has no variants reported. The other has the suffix .deduped.sorted.bam which actually throws an error while running. I'll post that as a separate issue.

Please let me know if you need any other information.

Thanks,
Komal

All somatic SNVs filtered for some reason (mostly as "LowEVS" or "IRC")

We are recently trying to update to Strelka2 for the first time. We are getting what seems to be plausible indel results but for SNVs, all SNVs fail filtering steps (almost all say "LowEVS", "IRC", or both).

Any ideas on what might be going wrong here?

Here are some example records from variants that we called by both Mutect and Varscan:

chr3	196487542	.	T	A	.	IRC	DP=161;MQ=60;MQ0=0;NT=ref;QSS=94;QSS_NT=94;ReadPosRankSum=-0.15;SGT=TT->AT;SNVSB=0;SOMATIC;SomaticEVS=42.38;TQSS=2;TQSS_NT=2	GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU	./.:42:0:0:0:0,0:0,0:0,0:42,42	./.:118:2:0:0:29,29:0,0:0,0:87,90
chr5	88195147	.	G	A	.	IRC	DP=79;MQ=60;MQ0=0;NT=ref;QSS=102;QSS_NT=102;ReadPosRankSum=-0.64;SGT=GG->AG;SNVSB=0;SOMATIC;SomaticEVS=44.82;TQSS=1;TQSS_NT=1	GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU	./.:42:0:0:0:0,0:0,0:42,42:0,0	./.:37:0:0:0:13,13:0,0:24,24:0,0
chr5	103555912	.	T	C	.	IRC	DP=116;MQ=60;MQ0=0;NT=ref;QSS=83;QSS_NT=83;ReadPosRankSum=0.32;SGT=TT->CT;SNVSB=0;SOMATIC;SomaticEVS=38.06;TQSS=1;TQSS_NT=1	GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU	./.:46:0:0:0:0,0:0,0:0,0:46,47	./.:69:0:0:0:0,0:13,13:0,0:56,56
chr5	132543517	.	G	C	.	IRC	DP=94;MQ=60;MQ0=0;NT=ref;QSS=78;QSS_NT=78;ReadPosRankSum=-0.48;SGT=GG->CG;SNVSB=0;SOMATIC;SomaticEVS=41.82;TQSS=1;TQSS_NT=1	GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU	./.:33:0:0:0:0,0:0,0:33,33:0,0	./.:61:0:0:0:0,0:13,13:48,48:0,0
chr9	83867656	.	G	C	.	IRC	DP=69;MQ=45.9;MQ0=2;NT=ref;QSS=68;QSS_NT=68;ReadPosRankSum=-0.14;SGT=GG->CG;SNVSB=0;SOMATIC;SomaticEVS=28.74;TQSS=1;TQSS_NT=1	GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU	./.:22:0:0:0:0,0:0,0:22,22:0,0	./.:45:0:0:0:0,0:9,11:36,36:0,0
chr19	10154788	.	G	A	.	IRC	DP=273;MQ=60;MQ0=0;NT=ref;QSS=130;QSS_NT=130;ReadPosRankSum=1.09;SGT=GG->AG;SNVSB=0;SOMATIC;SomaticEVS=39.73;TQSS=1;TQSS_NT=1	GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU	./.:94:0:0:0:0,0:0,0:94,96:0,0	./.:177:0:0:0:37,37:0,0:138,138:2,2
chr20	409866	.	T	A	.	IRC	DP=263;MQ=59.94;MQ0=0;NT=ref;QSS=103;QSS_NT=103;ReadPosRankSum=-2.67;SGT=TT->AT;SNVSB=0;SOMATIC;SomaticEVS=36.96;TQSS=1;TQSS_NT=1	GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU	./.:78:0:0:0:0,0:1,1:0,0:77,78	./.:184:0:0:0:35,35:0,0:0,0:149,149
chr20	33667668	.	G	A	.	IRC	DP=405;MQ=60;MQ0=0;NT=ref;QSS=159;QSS_NT=157;ReadPosRankSum=0.49;SGT=GG->AG;SNVSB=0;SOMATIC;SomaticEVS=36.88;TQSS=2;TQSS_NT=2	GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU	./.:137:0:0:0:0,0:0,0:137,138:0,0	./.:264:3:0:0:54,54:0,0:207,213:0,0
chrX	100408064	.	G	C	.	IRC	DP=393;MQ=60;MQ0=0;NT=ref;QSS=158;QSS_NT=157;ReadPosRankSum=0.54;SGT=GG->CG;SNVSB=0;SOMATIC;SomaticEVS=37.41;TQSS=1;TQSS_NT=1	GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU	./.:103:0:0:0:0,0:0,0:103,103:0,0	./.:288:2:0:0:0,0:71,74:215,216:0,0

Too many somatic mutations in the vcf output

HI
I want to use Strelka to call somatic snvs and indels for my WGS data. I have called somatic snvs by Mutect, there are ~8000 somatic snvs.
However, there is about 100,000 indels in the results/variants/somatic.indel.vcf and about 150,000 snvs in the results/variants/somatic.snv.vcf. Obviously, the somatic mutations are too too many. As the workflow is easy (I have to say it is the most convenient and user-friendiness software I have used, THANKS!), I think my command line should not be wrong. So, I think there must be some filtering things I have not done for the final results. I saw at this page (https://sites.google.com/site/strelkasomaticvariantcaller/home/somatic-variant-output)that there is an all.somatic.vcf and pass.somatic.vcf in the variants directory, I wonder if my output vcf with so many somatic mutations is the all.somatic.vcf. But I cannot find the pass.somatic.vcf. Could you give me some advice?
Thanks a lot!
Yang

Stuck on IndelErrorModel::checkSampleIndex

Hello, I've run the somatic workflow in Strelka2 version 2.9.2 on a tumour-only-calling assay. But I get the error: "Requested indel error rates for sample index 1 when only 1 samples are defined", "FATAL_ERROR: 2018-May-10 22:01:06 /builder/src/c++/lib/calibration/IndelErrorModel.cpp(136): Throw in function void IndelErrorModel::checkSampleIndex(unsigned int) const". See the pyflow_log.txt snippet below. I don't understand what is the problem.

Here are the commands i'm running:

nohup ~/anaconda3/envs/mutation_detection/share/strelka-2.9.2-0/bin/configureStrelkaSomaticWorkflow.py \ --exome \ --tumorBam ~/Data/A/exome/A_004_UNAL_Exome_HHYWCCCXY_L2_pe_sorted_filtered_deduplicated.bam \ --referenceFasta ~/mapping_tophat/index/bwa_GRCh37/GCF_000001405.25_GRCh37.p13_genomic.fna \ --indelCandidates ~/Data/A/exome/manta/results/variants/candidateSmallIndels.vcf.gz \ --runDir ~/Data/A/exome/strelka 2> config_strelka.log &

Here are the last few lines of the 'pyflow_log.txt' file from my second try on one of the samples:
[2018-05-12T18:18:34.711111Z] [localhost.localdomain] [112359_1] [WorkflowRunner] [ERROR] [2018-05-12T18:18:12.170792Z] [localhost.localdomain] [112359_1] [CallGenome+callGenomeSegment_chromId_000_NC_000001_10_0007] bam record RNAME: NC_000001.10 [2018-05-12T18:18:34.711111Z] [localhost.localdomain] [112359_1] [WorkflowRunner] [ERROR] [2018-05-12T18:18:12.229205Z] [localhost.localdomain] [112359_1] [CallGenome+callGenomeSegment_chromId_000_NC_000001_10_0007] bam record POS: 83094170 [2018-05-12T18:18:34.711111Z] [localhost.localdomain] [112359_1] [WorkflowRunner] [ERROR] [2018-05-12T18:18:12.270980Z] [localhost.localdomain] [112359_1] [CallGenome+callGenomeSegment_chromId_000_NC_000001_10_0007] FATAL_ERROR: 2018-May-12 13:18:11 /builder/src/c++/lib/calibration/IndelErrorModel.cpp(136): Throw in function void IndelErrorModel::checkSampleIndex(unsigned int) const [2018-05-12T18:18:34.711111Z] [localhost.localdomain] [112359_1] [WorkflowRunner] [ERROR] [2018-05-12T18:18:12.287534Z] [localhost.localdomain] [112359_1] [CallGenome+callGenomeSegment_chromId_000_NC_000001_10_0007] Dynamic exception type: boost::exception_detail::clone_impl<illumina::common::GeneralException> [2018-05-12T18:18:34.711111Z] [localhost.localdomain] [112359_1] [WorkflowRunner] [ERROR] [2018-05-12T18:18:12.337546Z] [localhost.localdomain] [112359_1] [CallGenome+callGenomeSegment_chromId_000_NC_000001_10_0007] std::exception::what: Requested indel error rates for sample index 1 when only 1 samples are defined [2018-05-12T18:18:34.711111Z] [localhost.localdomain] [112359_1] [WorkflowRunner] [ERROR] [2018-05-12T18:18:12.370861Z] [localhost.localdomain] [112359_1] [CallGenome+callGenomeSegment_chromId_000_NC_000001_10_0007] [2018-05-12T18:18:34.711111Z] [localhost.localdomain] [112359_1] [WorkflowRunner] [ERROR] [2018-05-12T18:18:12.404326Z] [localhost.localdomain] [112359_1] [CallGenome+callGenomeSegment_chromId_000_NC_000001_10_0007] cmdline: /home/diaamayaram/anaconda3/envs/mutation_detection/share/strelka-2.9.2-0/libexec/strelka2 --region NC_000001.10:83083544-94952620 --ref /home/diaamayaram/mapping_tophat/index/bwa_GRCh37/GCF_000001405.25_GRCh37.p13_genomic.fna --max-indel-size 49 --candidate-indel-input-vcf /home/diaamayaram/Data/A/exome/manta/results/variants/candidateSmallIndels.vcf.gz --min-mapping-quality 20 --somatic-snv-rate 0.0001 --shared-site-error-rate 0.0000000005 --shared-site-error-strand-bias-fraction 0.0 --somatic-indel-rate 0.000001 --shared-indel-error-factor 2.2 --tier2-min-mapping-quality 0 --strelka-snv-max-filtered-basecall-frac 0.4 --strelka-snv-max-spanning-deletion-frac 0.75 --strelka-snv-min-qss-ref 15 --strelka-indel-max-window-filtered-basecall-frac 0.3 --strelka-indel-min-qsi-ref 40 --ssnv-contam-tolerance 0.15 --indel-contam-tolerance 0.15 --somatic-snv-scoring-model-file /home/diaamayaram/anaconda3/envs/mutation_detection/share/strelka-2.9.2-0/share/config/somaticSNVScoringModels.json --somatic-indel-scoring-model-file /home/diaamayaram/anaconda3/envs/mutation_detection/share/strelka-2.9.2-0/share/config/somaticIndelScoringModels.json --tumor-align-file /home/diaamayaram/Data/A/exome/A_004_UNAL_Exome_HHYWCCCXY_L2_pe_sorted_filtered_deduplicated.bam --somatic-snv-file /home/diaamayaram/Data/A/exome/strelka/workspace/genomeSegment.tmpdir/somatic.snvs.unfiltered.chromId_000_NC_000001_10_0007.vcf --somatic-indel-file /home/diaamayaram/Data/A/exome/strelka/workspace/genomeSegment.tmpdir/somatic.indels.unfiltered.chromId_000_NC_000001_10_0007.vcf --stats-file /home/diaamayaram/Data/A/exome/strelka/workspace/genomeSegment.tmpdir/runStats.chromId_000_NC_000001_10_0007.xml --strelka-skip-header [2018-05-12T18:18:34.711111Z] [localhost.localdomain] [112359_1] [WorkflowRunner] [ERROR] [2018-05-12T18:18:12.437466Z] [localhost.localdomain] [112359_1] [CallGenome+callGenomeSegment_chromId_000_NC_000001_10_0007] version: 2.9.2 [2018-05-12T18:18:34.711111Z] [localhost.localdomain] [112359_1] [WorkflowRunner] [ERROR] [2018-05-12T18:18:12.462577Z] [localhost.localdomain] [112359_1] [CallGenome+callGenomeSegment_chromId_000_NC_000001_10_0007] buildTime: 2018-03-02T22:08:15.960987Z [2018-05-12T18:18:34.711111Z] [localhost.localdomain] [112359_1] [WorkflowRunner] [ERROR] [2018-05-12T18:18:12.545928Z] [localhost.localdomain] [112359_1] [CallGenome+callGenomeSegment_chromId_000_NC_000001_10_0007] compiler: g++-6.3.1

Thanks in advance for your help

Higher number of somatic variants on strelka2 vs strelka

I am evaluating strelka2 vs strelka on tumor/normal pairs, and find that the number of missense variants called by strelka2 is almost 2x the number on strelka for the same samples. Is strelka2 more sensitive, and less specific than strelka? Is using a filter on SomaticEVS recommended to filter out lower confidence variants? I noticed that "LowEVS" filters out variants with low SomaticEVS (< 6?), and was wondering if this could be changed to make it more stringent.

Detecting indels > 50 bases in length

From the user guide:
"Strelka is capable of detecting SNVs and indels up to a predefined maximum size, currently defaulting to 50 bases or less. "

Is there a way to increase the default limit of 50?

demo directory not present with conda installation

Hi,

I am trying to install Strelka using conda:

$ conda install -c bioconda strelka

$ bash runStrelkaSomaticWorkflowDemo.bash

**** Starting demo configuration and run.
**** Configuration cmd: './configureStrelkaSomaticWorkflow.py --tumorBam='./../share/demo/strelka/data/NA12891_demo20.bam' --normalBam='./../share/demo/strelka/data/NA12892_demo20.bam' --referenceFasta='./../share/demo/strelka/data/demo20.fa' --callMemMb=1024 --exome --runDir=./strelkaSomaticDemoAnalysis'

Usage: configureStrelkaSomaticWorkflow.py [options]

configureStrelkaSomaticWorkflow.py: error: Can't find reference fasta file: '/mnt/isilon/cbmi/variome/rathik/tools/miniconda3/envs/mut-env/share/demo/strelka/data/demo20.fa'

ERROR: Demo configuration step failed

But there is no demo directory under share:

ls /mnt/isilon/cbmi/variome/rathik/tools/miniconda3/envs/mut-env/share/

doc  info  man  readline  strelka-2.9.2-0  tabset  terminfo

Thanks!!

Trouble reading VCF file with --ploidy option

Hi there,

I'm doing some variant calling on mitochondrial reads, and I want to specify that the MT chromosome is haploid using the --ploidy argument.

The config script runs fine, but the workflow script is failing to read the vcf file. My vcf file is attached (I set it up based on the example on the README).

[CallGenome+callGenomeSegment_chromId_000_1_0013] ERROR: Failed to load header for VCF file: '/data/nextseq-validation/benchmark-mito/benchmark_afk/ploidy_MT.vcf.gz'

Thanks,
Alex Koeppel

P.S. My apologies if I've just made an error creating the VCF file. I tried a number of different modifications but all resulted in the same error.

ploidy_MT.vcf.gz

@stephenturner

Stuck on CallGenome+callGenomeSegment_chromId_024_chrM_0000

I've run the somatic workflow in Strelka2 version 2.9.0 on 32 cancer cell lines against an unmatched normal cell line on an SGE cluster using 16 cores per run. 24 of the runs finished and produced the expected output but 8 of the runs seemed to get hung up on the 'CallGenome+callGenomeSegment_chromId_024_chrM_0000' task (see the pyflow_log.txt snippet below). I've let this step run for up to ~35 hours without it completing. I tried a second time to run these 8 samples and am still seeing the same thing. I ran Manta on these samples to generate indel candidates and it completed successfully. All 32 samples I aligned/processed in the same way.

Here are the commands i'm running:

 python $STRELKA_INSTALL_PATH/bin/configureStrelkaSomaticWorkflow.py \
  --normalBam $bamDir/$normalBam \
  --tumorBam $bamDir/$tumorBam \
  --referenceFasta $refSequence \
  --indelCandidates $outDir/Manta/$tumorPrefix/results/variants/candidateSmallIndels.vcf.gz \
  --runDir $outDir/Strelka/$tumorPrefix

  python $outDir/Strelka/$tumorPrefix/runWorkflow.py -m local -j 16 -g 24

Here are the last few lines of the 'pyflow_log.txt' file from my second try on one of the samples:

[2018-04-06T05:33:47.643303] [compute-094.cm.cluster] [31335_1] [TaskRunner:CallGenome+compressSegmentOutput_chromId_001_chr2_0019] Task initiated on local node
[2018-04-06T05:33:52.541374] [compute-094.cm.cluster] [31335_1] [TaskManager] Completed command task: 'CallGenome+compressSegmentOutput_chromId_005_chr6_0000' launched from sub-workflow 'CallGenome'
[2018-04-06T05:33:58.495792] [compute-094.cm.cluster] [31335_1] [TaskManager] Completed command task: 'CallGenome+compressSegmentOutput_chromId_003_chr4_0014' launched from sub-workflow 'CallGenome'
[2018-04-06T05:34:05.675754] [compute-094.cm.cluster] [31335_1] [TaskManager] Completed command task: 'CallGenome+callGenomeSegment_chromId_001_chr2_0011' launched from sub-workflow 'CallGenome'
[2018-04-06T05:34:10.914164] [compute-094.cm.cluster] [31335_1] [TaskManager] Launching command task: 'CallGenome+compressSegmentOutput_chromId_001_chr2_0011' from sub-workflow 'CallGenome'
[2018-04-06T05:34:16.381824] [compute-094.cm.cluster] [31335_1] [TaskManager] Completed command task: 'CallGenome+compressSegmentOutput_chromId_022_chrX_0009' launched from sub-workflow 'CallGenome'
[2018-04-06T05:34:22.054163] [compute-094.cm.cluster] [31335_1] [TaskRunner:CallGenome+compressSegmentOutput_chromId_001_chr2_0011] Task initiated on local node
[2018-04-06T05:34:27.987441] [compute-094.cm.cluster] [31335_1] [TaskManager] Completed command task: 'CallGenome+compressSegmentOutput_chromId_001_chr2_0019' launched from sub-workflow 'CallGenome'
[2018-04-06T05:34:53.859932] [compute-094.cm.cluster] [31335_1] [TaskManager] Completed command task: 'CallGenome+compressSegmentOutput_chromId_001_chr2_0011' launched from sub-workflow 'CallGenome'
[2018-04-06T05:36:01.709591] [compute-094.cm.cluster] [31335_1] [TaskManager] Completed command task: 'CallGenome+callGenomeSegment_chromId_008_chr9_0007' launched from sub-workflow 'CallGenome'
[2018-04-06T05:36:07.217912] [compute-094.cm.cluster] [31335_1] [TaskManager] Launching command task: 'CallGenome+compressSegmentOutput_chromId_008_chr9_0007' from sub-workflow 'CallGenome'
[2018-04-06T05:36:16.997267] [compute-094.cm.cluster] [31335_1] [TaskRunner:CallGenome+compressSegmentOutput_chromId_008_chr9_0007] Task initiated on local node
[2018-04-06T05:36:44.223167] [compute-094.cm.cluster] [31335_1] [TaskManager] Completed command task: 'CallGenome+compressSegmentOutput_chromId_008_chr9_0007' launched from sub-workflow 'CallGenome'
[2018-04-06T06:27:44.617031] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] ===== StrelkaSomaticWorkflow StatusUpdate =====
[2018-04-06T06:27:50.717555] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Workflow specification is complete?: True
[2018-04-06T06:27:56.745874] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Task status (waiting/queued/running/complete/error): 8/0/1/572/0
[2018-04-06T06:28:03.932609] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task time (hrs): 0.0000
[2018-04-06T06:28:10.431732] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task name: ''
[2018-04-06T06:28:17.494200] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task time (hrs): 2.6058
[2018-04-06T06:28:25.181594] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task name: 'CallGenome+callGenomeSegment_chromId_024_chrM_0000'
[2018-04-06T07:28:39.405537] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] ===== StrelkaSomaticWorkflow StatusUpdate =====
[2018-04-06T07:28:44.476434] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Workflow specification is complete?: True
[2018-04-06T07:28:50.662695] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Task status (waiting/queued/running/complete/error): 8/0/1/572/0
[2018-04-06T07:28:56.250609] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task time (hrs): 0.0000
[2018-04-06T07:29:01.175093] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task name: ''
[2018-04-06T07:29:06.575173] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task time (hrs): 3.6211
[2018-04-06T07:29:13.182473] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task name: 'CallGenome+callGenomeSegment_chromId_024_chrM_0000'
[2018-04-06T08:29:23.498677] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] ===== StrelkaSomaticWorkflow StatusUpdate =====
[2018-04-06T08:29:27.801686] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Workflow specification is complete?: True
[2018-04-06T08:29:32.924974] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Task status (waiting/queued/running/complete/error): 8/0/1/572/0
[2018-04-06T08:29:38.404786] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task time (hrs): 0.0000
[2018-04-06T08:29:43.035196] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task name: ''
[2018-04-06T08:29:48.820011] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task time (hrs): 4.6333
[2018-04-06T08:29:53.142311] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task name: 'CallGenome+callGenomeSegment_chromId_024_chrM_0000'
[2018-04-06T09:30:04.562687] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] ===== StrelkaSomaticWorkflow StatusUpdate =====
[2018-04-06T09:30:08.188054] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Workflow specification is complete?: True
[2018-04-06T09:30:12.008269] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Task status (waiting/queued/running/complete/error): 8/0/1/572/0
[2018-04-06T09:30:16.135301] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task time (hrs): 0.0000
[2018-04-06T09:30:20.814257] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task name: ''
[2018-04-06T09:30:24.682400] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task time (hrs): 5.6447
[2018-04-06T09:30:27.858536] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task name: 'CallGenome+callGenomeSegment_chromId_024_chrM_0000'
[2018-04-06T10:30:35.991724] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] ===== StrelkaSomaticWorkflow StatusUpdate =====
[2018-04-06T10:30:39.894089] [compute-094.cm.cluster] [31335_1] [WorkflowRunner] [StatusUpdate] Workflow specification is complete?: True

Here are some of the software versions i'm using:
python version: 2.7.9
pyFlow version: 1.1.19
strelka2 version: 2.9.0

Low QUAL yet PASS

Hi strelka team,

Thank you for developing a good tool 🥇.

I have a question. I ran strelka (v2.8.4) on a single sample whole-genome sequence obtained with Illumina X Ten. I noticed that there were some variants with low QUAL (< 10) with FILTER of PASS. Why does this happen? I expected that strelka's filtering would not let variants with low QUAL get FILTER of PASS. Please let me know why this is the case.

Thanks in advance,
Kwat

somatic indel of high AF can't be called by strelka2, but vardictJava can, why?

Hi,
I have a high-depth amplicon sequencing data of a tumor-normal paired samples. I use strelka2 to call somatic mutations of LC1801 whose negative control is LC18NC, and I found that strelka2 called far less mutations than vardictJava. Particularly, the variant "chr7 55242464 rs121913421;163343;COSM6223 AGGAATTAAGAGAAGC A" with a AF about 10% is missing by strelka2, while called by vardictJava. I'm very confused about the results, I wonder are there any parameters of streka2 which can be tuned to be more sensitive. I can send the testing data and running scripts to you to reproduce the issue if you could tell me your email, my email is [email protected]. I have also uploaded the data to the baidu netdisk which is very popular in china, the download link is https://pan.baidu.com/s/1AdZVKoJjXQO3ldpkUcNn2g with password: yher
Can you check that for me please ? Thanks a lot.

Overlapping read pairs are double counted

Hi- It seems to me when the two reads in a pair overlap because of small insert size, strelka (v2.8.3) double counts them. Shouldn't this behavior be changed? I see calls where the alternate allele is supported by, say, 6 reads of which 2 or 4 come from the same pair.

This below is an example. The A allele is supported by 4 reads:

chr11 26772548 . G A . PASS SOMATIC;QSS=38;TQSS=2;NT=ref;QSS_NT=38;TQSS_NT=2;SGT=GG->AG;DP=388;MQ=60;MQ0=0;ReadPosRankSum=0.92;SNVSB=0;SomaticEVS=7.28 DP:FDP:SDP:SUBDP:AU:CU:GU:TU 219:0:0:0:0,0:1,1:218,223:0,0 163:0:0:0:4,4:0,0:159,160:0,0       

These are the four supporting reads, which in fact are just 2 pairs with mates overlapping:

K00319:42:HGYWGBBXX:1:1211:1945:10528  99  chr11 26772499 60 73M3S = 26772504 73  GCGGG[+71] AAFFF[+71] MC:Z:68M7S MD:Z:49G23 RG:Z:TT001T01 NM:i:1 AS:i:68 XS:i:19                                                                                    
K00319:42:HGYWGBBXX:1:1211:1945:10528  147 chr11 26772504 60 68M7S = 26772499 -73 ATCAG[+70] JJJF7[+70] MC:Z:73M3S MD:Z:44G23 RG:Z:TT001T01 NM:i:1 AS:i:63 XS:i:19                                                                                    
K00319:42:HGYWGBBXX:1:1202:17178:43216 99  chr11 26772499 60 73M3S = 26772504 73  GCGGG[+71] AAFFF[+71] MC:Z:68M8S MD:Z:49G23 RG:Z:TT001T01 NM:i:1 AS:i:68 XS:i:19                                                                                    
K00319:42:HGYWGBBXX:1:1202:17178:43216 147 chr11 26772504 60 68M8S = 26772499 -73 ATCAG[+71] <JJFJ[+71] MC:Z:73M3S MD:Z:44G23 RG:Z:TT001T01 NM:i:1 AS:i:63 XS:i:19                                                                                    

error about genome size and bad lexical cast when using a reference genome larger than 2^32

I installed strelka and test the demo script without any problem, and run on my own several human genome sequencing data successfully. But when I use strelka in wheat genome (15Gbp in size), the job always failed because the following error:

[2017-11-14T02:27:54.171311] [Memmery01] [21053_1] [WorkflowRunner] [ERROR] [2017-11-14T02:27:01.344216] [Memmery01] [21053_1] [CallGenome+callGenomeSegment_chromId_000_chr1A_part1_0002] ******** COMMAND-LINE ERROR:: argument after flag -genome-size (10323499860) cannot be parsed to expected type: bad lexical cast: source type value could not be interpreted as target ********

Then I tried to run on each chromosome individually (independent fasta file and bam file), and no problem exist.

Any solution to this issue?
Thanks very much.
Jingzhong

Tumour-only calling?

Are there any plans to implement tumour-only calling, perhaps with a panel of normals as Mutect2 does? I have a batch of tumour FFPE samples with no matched normals that I'd quite like to get Strelka calls on.

Error on reads containing 'Ns'?

I'm seeing error messages relating to certain reads that seem unremarkable except that they contain N's. Like this:

Z/r/job6pcRlf    [2017-07-05T15:44:45.538328] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskRunner:CallGenome+callGenomeSegment_chromId_308_chr13_KI270840v1_alt_0000] Task initiated on local node
Z/r/job6pcRlf    [2017-07-05T15:44:47.843651] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskManager] [ERROR] Failed to complete command task: 'CallGenome+callGenomeSegment_chromId_308_chr13_KI270840v1_alt_0000' launched from sub-workflow 'CallGenome', error code: 1, command: '/opt/strelka/libexec/strelka2 --region chr13_KI270840v1_alt:1-191684 -filter-unanchored -min-mapping-quality 20 -min-qscore 0 --ref /gscmnt/gc2764/cad/HCC1395/arvados/refseq/GRCh38DH/GRCh38_full_analysis_set_plus_decoy_hla.fa -max-window-mismatch 3 20 -genome-size 3043453468 -max-indel-size 50 -indel-nonsite-match-prob 0.5 --somatic-snv-rate 0.0001 --shared-site-error-rate 0.0000000005 --shared-site-error-strand-bias-fraction 0.0 --somatic-indel-rate 0.000001 --shared-indel-error-factor 2.2 --tier2-min-mapping-quality 0 --tier2-mismatch-density-filter-count 10 --tier2-no-filter-unanchored --tier2-indel-nonsite-match-prob 0.25 --tier2-include-singleton --tier2-include-anomalous --strelka-snv-max-filtered-basecall-frac 0.4 --strelka-snv-max-spanning-deletion-frac 0.75 --strelka-snv-min-qss-ref 15 --strelka-indel-max-window-filtered-basecall-frac 0.3 --strelka-indel-min-qsi-ref 40 --ssnv-contam-tolerance 0.15 --indel-contam-tolerance 0.15 --somatic-snv-scoring-model-file /opt/strelka/share/config/somaticVariantScoringModels.json --normal-align-file /gscmnt/gc2736/griffithlab_gms/Breast_cfDNA/cwl_toil_runs/results/somatic/NTN001_Baseline_tumor_breast_g-dna_Exome/tmpeNG7FE/stg39e32e18-d393-4993-b773-31ed60c547b4/final.cram --tumor-align-file /gscmnt/gc2736/griffithlab_gms/Breast_cfDNA/cwl_toil_runs/results/somatic/NTN001_Baseline_tumor_breast_g-dna_Exome/tmpeNG7FE/stg9e6a8b1e-8555-4f43-997d-17db835a2ea9/final.cram --somatic-snv-file /gscmnt/gc2736/griffithlab_gms/Breast_cfDNA/cwl_toil_runs/results/somatic/NTN001_Baseline_tumor_breast_g-dna_Exome/tmpwLWDBy/workspace/genomeSegment.tmpdir/somatic.snvs.unfiltered.chromId_308_chr13_KI270840v1_alt_0000.vcf --somatic-indel-file /gscmnt/gc2736/griffithlab_gms/Breast_cfDNA/cwl_toil_runs/results/somatic/NTN001_Baseline_tumor_breast_g-dna_Exome/tmpwLWDBy/workspace/genomeSegment.tmpdir/somatic.indels.unfiltered.chromId_308_chr13_KI270840v1_alt_0000.vcf --stats-file /gscmnt/gc2736/griffithlab_gms/Breast_cfDNA/cwl_toil_runs/results/somatic/NTN001_Baseline_tumor_breast_g-dna_Exome/tmpwLWDBy/workspace/genomeSegment.tmpdir/runStats.chromId_308_chr13_KI270840v1_alt_0000.xml --strelka-skip-header'
Z/r/job6pcRlf    [2017-07-05T15:44:47.854048] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskManager] [ERROR] [CallGenome+callGenomeSegment_chromId_308_chr13_KI270840v1_alt_0000] Error Message:
Z/r/job6pcRlf    [2017-07-05T15:44:47.876409] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskManager] [ERROR] [CallGenome+callGenomeSegment_chromId_308_chr13_KI270840v1_alt_0000] Last 7 stderr lines from task (of 7 total lines):
Z/r/job6pcRlf    [2017-07-05T15:44:47.876409] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskManager] [ERROR] [2017-07-05T15:44:47.700428] [blade15-1-16.gsc.wustl.edu] [21_1] [CallGenome+callGenomeSegment_chromId_308_chr13_KI270840v1_alt_0000] ERROR: unsupported base(s) in read sequence: GGGCGGGGCCGCGGGTGTGGGGCGTAGGGGGGGGGGGGGGGATGGGGGGGGCGGGGGGGGGGTGGGGGGGGGGGGGGGGGGGGGTGGGGGGGGTGTGGGGGGGGCGCGGCTGTCAGGGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCN
Z/r/job6pcRlf    [2017-07-05T15:44:47.876409] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskManager] [ERROR] [2017-07-05T15:44:47.704841] [blade15-1-16.gsc.wustl.edu] [21_1] [CallGenome+callGenomeSegment_chromId_308_chr13_KI270840v1_alt_0000]   bam_stream_label: /gscmnt/gc2736/griffithlab_gms/Breast_cfDNA/cwl_toil_runs/results/somatic/NTN001_Baseline_tumor_breast_g-dna_Exome/tmpeNG7FE/stg9e6a8b1e-8555-4f43-997d-17db835a2ea9/final.cram
Z/r/job6pcRlf    [2017-07-05T15:44:47.876409] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskManager] [ERROR] [2017-07-05T15:44:47.706795] [blade15-1-16.gsc.wustl.edu] [21_1] [CallGenome+callGenomeSegment_chromId_308_chr13_KI270840v1_alt_0000]   bam_stream_selected_region: chr13_KI270840v1_alt:1-191734
Z/r/job6pcRlf    [2017-07-05T15:44:47.876409] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskManager] [ERROR] [2017-07-05T15:44:47.708755] [blade15-1-16.gsc.wustl.edu] [21_1] [CallGenome+callGenomeSegment_chromId_308_chr13_KI270840v1_alt_0000]   bam_stream_record_no: 2
Z/r/job6pcRlf    [2017-07-05T15:44:47.876409] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskManager] [ERROR] [2017-07-05T15:44:47.712734] [blade15-1-16.gsc.wustl.edu] [21_1] [CallGenome+callGenomeSegment_chromId_308_chr13_KI270840v1_alt_0000]   bam_record QNAME/read_number: K00193:51:H75J7BBXX:6:1207:31111:34582/2
Z/r/job6pcRlf    [2017-07-05T15:44:47.876409] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskManager] [ERROR] [2017-07-05T15:44:47.715511] [blade15-1-16.gsc.wustl.edu] [21_1] [CallGenome+callGenomeSegment_chromId_308_chr13_KI270840v1_alt_0000]   bam record RNAME: chr13_KI270840v1_alt
Z/r/job6pcRlf    [2017-07-05T15:44:47.876409] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskManager] [ERROR] [2017-07-05T15:44:47.716993] [blade15-1-16.gsc.wustl.edu] [21_1] [CallGenome+callGenomeSegment_chromId_308_chr13_KI270840v1_alt_0000]   bam record POS: 177225
Z/r/job6pcRlf    [2017-07-05T15:44:47.904768] [blade15-1-16.gsc.wustl.edu] [21_1] [TaskManager] [ERROR] Shutting down task submission. Waiting for remaining tasks to complete.

Is this expected? Or perhaps we are misinterpreting.

Can't find configure when installing from source

Hi,

I tried installing Strelka from source following the instructions here: https://github.com/Illumina/strelka/blob/master/docs/userGuide/installation.md#build-procedure but can't find configure. However, I was successfully able to install it using the instructions here: https://github.com/Illumina/strelka/blob/master/docs/userGuide/quickStart.md#installation:

These are the commands I followed for installing from source:

$ wget https://github.com/Illumina/strelka/releases/download/v2.9.2/strelka-2.9.2.centos6_x86_64.tar.bz2

$ tar -xjf strelka-2.9.2.centos6_x86_64.tar.bz2

$ tree strelka-2.9.2.centos6_x86_64
.
├── bin
│   ├── configureStrelkaGermlineWorkflow.py
│   ├── configureStrelkaGermlineWorkflow.py.ini
│   ├── configureStrelkaSomaticWorkflow.py
│   ├── configureStrelkaSomaticWorkflow.py.ini
│   ├── runStrelkaGermlineWorkflowDemo.bash
│   └── runStrelkaSomaticWorkflowDemo.bash
├── lib
│   └── python
│       ├── checkChromSet.py
│       ├── checkChromSet.pyc
│       ├── configBuildTimeInfo.py
│       ├── configBuildTimeInfo.pyc
│       ├── configureOptions.py
│       ├── configureOptions.pyc
│       ├── configureUtil.py
│       ├── configureUtil.pyc
│       ├── estimateHardware.py
│       ├── estimateHardware.pyc
│       ├── makeRunScript.py
│       ├── makeRunScript.pyc
│       ├── pyflow
│       │   ├── pyflowConfig.py
│       │   ├── pyflowConfig.pyc
│       │   ├── pyflow.py
│       │   ├── pyflow.pyc
│       │   ├── pyflowTaskWrapper.py
│       │   └── pyflowTaskWrapper.pyc
│       ├── sequenceErrorCountsWorkflow.py
│       ├── sequenceErrorCountsWorkflow.pyc
│       ├── sharedWorkflow.py
│       ├── sharedWorkflow.pyc
│       ├── snoiseWorkflow.py
│       ├── snoiseWorkflow.pyc
│       ├── strelkaGermlineWorkflow.py
│       ├── strelkaGermlineWorkflow.pyc
│       ├── strelkaSequenceErrorEstimation.py
│       ├── strelkaSequenceErrorEstimation.pyc
│       ├── strelkaSharedOptions.py
│       ├── strelkaSharedOptions.pyc
│       ├── strelkaSharedWorkflow.py
│       ├── strelkaSharedWorkflow.pyc
│       ├── strelkaSomaticWorkflow.py
│       ├── strelkaSomaticWorkflow.pyc
│       ├── workflowUtil.py
│       └── workflowUtil.pyc
├── libexec
│   ├── bgzf_cat
│   ├── bgzip
│   ├── bgzip9
│   ├── cat.py
│   ├── configureSequenceErrorCountsWorkflow.py
│   ├── configureSequenceErrorCountsWorkflow.py.ini
│   ├── configureStrelkaNoiseWorkflow.py
│   ├── configureStrelkaNoiseWorkflow.py.ini
│   ├── DumpSequenceErrorCounts
│   ├── EstimateParametersFromErrorCounts
│   ├── EstimateVariantErrorRates
│   ├── extractSmallIndelCandidates.py
│   ├── GetChromDepth
│   ├── GetRegionDepth
│   ├── GetSequenceErrorCounts
│   ├── htsfile
│   ├── mergeChromDepth.py
│   ├── MergeRunStats
│   ├── MergeSequenceErrorCounts
│   ├── samtools
│   ├── sortVcf.py
│   ├── starling2
│   ├── starlingSiteSimulator
│   ├── strelka2
│   ├── strelkaNoiseExtractor
│   ├── strelkaSiteSimulator
│   ├── tabix
│   ├── updateNoPassedVariantGTsFilter.py
│   └── vcfCmdlineSwapper.py
└── share
    ├── CHANGELOG.md
    ├── config
    │   ├── germlineIndelScoringModels.json
    │   ├── germlineSNVScoringModels.json
    │   ├── indelErrorModel.json
    │   ├── RNAIndelScoringModels.json
    │   ├── RNASNVScoringModels.json
    │   ├── somaticIndelScoringModels.json
    │   ├── somaticSNVScoringModels.json
    │   └── theta.json
    ├── COPYRIGHT.txt
    ├── demo
    │   └── strelka
    │       ├── data
    │       │   ├── demo20.fa
    │       │   ├── demo20.fa.fai
    │       │   ├── NA12891_demo20.bam
    │       │   ├── NA12891_demo20.bam.bai
    │       │   ├── NA12892_demo20.bam
    │       │   └── NA12892_demo20.bam.bai
    │       ├── expectedResults
    │       │   ├── somatic.indels.vcf.gz
    │       │   └── somatic.snvs.vcf.gz
    │       └── README.md
    ├── LICENSE.txt
    └── scoringModelTraining
        ├── germline
        │   ├── bin
        │   │   ├── evs_evaluate.py
        │   │   ├── evs_exportmodel.py
        │   │   ├── evs_learn.py
        │   │   ├── evs_pr.py
        │   │   ├── evs_qq.py
        │   │   ├── filterTrainingVcf.py
        │   │   └── parseAnnotatedTrainingVcf.py
        │   ├── lib
        │   │   └── evs
        │   │       ├── features
        │   │       │   ├── GermlineIndel.py
        │   │       │   ├── GermlineSNV.py
        │   │       │   ├── __init__.py
        │   │       │   ├── RNAIndel.py
        │   │       │   ├── RNASNV.py
        │   │       │   └── VcfFeatureSet.py
        │   │       ├── germline_rf.py
        │   │       ├── __init__.py
        │   │       └── tools
        │   │           ├── bedintervaltree.py
        │   │           ├── __init__.py
        │   │           ├── io.py
        │   │           └── vcf.py
        │   └── README.md
        └── somatic
            ├── bin
            │   ├── calc_features.py
            │   ├── evs_evaluate.py
            │   ├── evs_exportmodel.py
            │   ├── evs_learn.py
            │   ├── evs_pr.py
            │   ├── evs_random_sample_tpfp.py
            │   ├── evs_random_split_csv.py
            │   └── vcf_to_feature_csv.py
            ├── lib
            │   └── evs
            │       ├── features
            │       │   ├── __init__.py
            │       │   ├── PosAndAlleles.py
            │       │   ├── SomaticIndel.py
            │       │   ├── SomaticSNV.py
            │       │   └── VcfFeatureSet.py
            │       ├── __init__.py
            │       ├── somatic_rf.py
            │       ├── strelka_rf_indel.py
            │       └── tools
            │           ├── bedintervaltree.py
            │           ├── __init__.py
            │           ├── io.py
            │           └── vcf.py
            └── README.md

24 directories, 132 files

about proper pair filtering in strelka2

continue from #40

Hi, chris

I checked these sam records again, but I found that the sam flags of them are 99 or 147, which mean they are proper paired. Does streka check 'proper paired' from the sam flags or somewhere else?

cons:168234 99 chr7 55242432 60 100M = 55242432 100 GGGTGAGAAAGTTAAAATTCCCGTCGC
cons:168234 147 chr7 55242432 60 100M = 55242432 -100 GGGTGAGAAAGTTAAAATTCCCGTCGC
cons:168235 99 chr7 55242432 60 100M = 55242432 100 TGGTGNGAAAGTTAAAATTCCCGTCGC
cons:168235 147 chr7 55242432 60 100M = 55242432 -100 TGGTGNGAAAGTTAAAATTCCCGTCGC
cons:168236 99 chr7 55242432 60 33M15D52M15S = 55242433 100 TGGTGNGAANGTTAAAATT
cons:168236 147 chr7 55242433 60 16S32M15D52M = 55242432 -100 CTATCAATCCAATACTGGT
cons:168237 99 chr7 55242432 60 33M15D52M15S = 55242433 100 TGGNGAGAAAGTTAAAATT
cons:168237 147 chr7 55242433 60 16S32M15D52M = 55242432 -100 CTAACAACCAACCCCTGGT

SAMPLE in multisample calling

Hi,

cheers for a professional package. I am testing the multisample variant calling on RNA-seq, which looks pretty decent.

However, one issue in the output file is the use of SAMPLE instead of the input filename/sample name. Can this be defined ìn the config file ? Or defaults(filenames) be used ? Anything has to be better than SAMPLE multiple times.

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE SAMPLE

Thanks,
Colin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.