receptive field calculation

SpliceAI: A deep learning-based tool to identify splice variants

This package annotates genetic variants with their predicted effect on splicing, as described in Jaganathan et al, Cell 2019 in press. The annotations for all possible substitutions, 1 base insertions, and 1-4 base deletions within genes are available here for download. These annotations are free for academic and not-for-profit use; other use requires a commercial license from Illumina, Inc.

License

SpliceAI source code is provided under the GPLv3 license. SpliceAI includes several third party packages provided under other open source licenses, please see NOTICE for additional details. The trained models used by SpliceAI (located in this package at spliceai/models) are provided under the CC BY NC 4.0 license for academic and non-commercial use; other use requires a commercial license from Illumina, Inc.

Installation

The simplest way to install SpliceAI is through pip or conda:

pip install spliceai
# or
conda install -c bioconda spliceai

Alternately, SpliceAI can be installed from the github repository:

git clone https://github.com/Illumina/SpliceAI.git
cd SpliceAI
python setup.py install

SpliceAI requires tensorflow>=1.2.0, which is best installed separately via pip or conda (see the TensorFlow website for other installation options):

pip install tensorflow
# or
conda install tensorflow

Usage

SpliceAI can be run from the command line:

spliceai -I input.vcf -O output.vcf -R genome.fa -A grch37
# or you can pipe the input and output VCFs
cat input.vcf | spliceai -R genome.fa -A grch37 > output.vcf

Required parameters:

-I: Input VCF with variants of interest.
-O: Output VCF with SpliceAI predictions ALLELE|SYMBOL|DS_AG|DS_AL|DS_DG|DS_DL|DP_AG|DP_AL|DP_DG|DP_DL included in the INFO column (see table below for details). Only SNVs and simple INDELs (REF or ALT is a single base) within genes are annotated. Variants in multiple genes have separate predictions for each gene.
-R: Reference genome fasta file. Can be downloaded from GRCh37/hg19 or GRCh38/hg38.
-A: Gene annotation file. Can instead provide grch37 or grch38 to use GENCODE V24 canonical annotation files included with the package. To create custom gene annotation files, use spliceai/annotations/grch37.txt in repository as template.

Optional parameters:

-D: Maximum distance between the variant and gained/lost splice site (default: 50).
-M: Mask scores representing annotated acceptor/donor gain and unannotated acceptor/donor loss (default: 0).

Details of SpliceAI INFO field:

ID	Description
ALLELE	Alternate allele
SYMBOL	Gene symbol
DS_AG	Delta score (acceptor gain)
DS_AL	Delta score (acceptor loss)
DS_DG	Delta score (donor gain)
DS_DL	Delta score (donor loss)
DP_AG	Delta position (acceptor gain)
DP_AL	Delta position (acceptor loss)
DP_DG	Delta position (donor gain)
DP_DL	Delta position (donor loss)

Delta score of a variant, defined as the maximum of (DS_AG, DS_AL, DS_DG, DS_DL), ranges from 0 to 1 and can be interpreted as the probability of the variant being splice-altering. In the paper, a detailed characterization is provided for 0.2 (high recall), 0.5 (recommended), and 0.8 (high precision) cutoffs. Delta position conveys information about the location where splicing changes relative to the variant position (positive values are downstream of the variant, negative values are upstream).

Examples

A sample input file and the corresponding output file can be found at examples/input.vcf and examples/output.vcf respectively. The output T|RYR1|0.00|0.00|0.91|0.08|-28|-46|-2|-31 for the variant 19:38958362 C>T can be interpreted as follows:

The probability that the position 19:38958360 (=38958362-2) is used as a splice donor increases by 0.91.
The probability that the position 19:38958331 (=38958362-31) is used as a splice donor decreases by 0.08.

Similarly, the output CA|TTN|0.07|1.00|0.00|0.00|-7|-1|35|-29 for the variant 2:179415988 C>CA has the following interpretation:

The probability that the position 2:179415981 (=179415988-7) is used as a splice acceptor increases by 0.07.
The probability that the position 2:179415987 (=179415988-1) is used as a splice acceptor decreases by 1.00.

Frequently asked questions

1. Why are some variants not scored by SpliceAI?

SpliceAI only annotates variants within genes defined by the gene annotation file. Additionally, SpliceAI does not annotate variants if they are close to chromosome ends (5kb on either side), deletions of length greater than twice the input parameter -D, or inconsistent with the reference fasta file.

2. What are the differences between raw (-M 0) and masked (-M 1) precomputed files?

The raw files also include splicing changes corresponding to strengthening annotated splice sites and weakening unannotated splice sites, which are typically much less pathogenic than weakening annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing changes are set to 0 in the masked files. We recommend using raw files for alternative splicing analysis and masked files for variant interpretation.

3. Can SpliceAI be used to score custom sequences?

Yes, install SpliceAI and use the following script:

from keras.models import load_model
from pkg_resources import resource_filename
from spliceai.utils import one_hot_encode
import numpy as np

input_sequence = 'CGATCTGACGTGGGTGTCATCGCATTATCGATATTGCAT'
# Replace this with your custom sequence

context = 10000
paths = ('models/spliceai{}.h5'.format(x) for x in range(1, 6))
models = [load_model(resource_filename('spliceai', x)) for x in paths]
x = one_hot_encode('N'*(context//2) + input_sequence + 'N'*(context//2))[None, :]
y = np.mean([models[m].predict(x) for m in range(5)], axis=0)

acceptor_prob = y[0, :, 1]
donor_prob = y[0, :, 2]

Contact

Kishore Jaganathan: [email protected]

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowQual,Description="Low quality">
##FILTER=<ID=my_snp_filter,Description="QD < 2.0 \|\| FS > 60.0 \|\| MQ < 40.0 \|\| MQRankSum < -12.5 \|\| ReadPosRankSum < -8.0">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.7-0-gcfedb67,Date="Tue Jul 23 11:51:30 EDT 2019",Epoch=1563897090053,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[Reorder_dedup_NGTS1801253601.sam.bam.bam] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=../../ucsc.hg19.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=20 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=LINEAR variant_index_parameter=128000 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false out=/external/rprshnas01/wgs_data/ALS-WES/Analysis/Reorder_dedup_NGTS1801253601.sam.bam.bam.vcf likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name= source=UNBOUND) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[] excludeAnnotation=[] group=[StandardAnnotation, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=NONE bamOutput=null bamWriterType=CALLED_HAPLOTYPES emitDroppedReads=false disableOptimizations=false annotateNDA=false useNewAFCalculator=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 heterozygosity_stdev=0.01 standard_min_confidence_threshold_for_calling=10.0 standard_min_confidence_threshold_for_emitting=30.0 max_alternate_alleles=6 max_genotype_count=1024 max_num_PL_values=100 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=false gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING pair_hmm_sub_implementation=ENABLE_ALL always_load_vector_logless_PairHMM_lib=false phredScaledGlobalReadMismappingRate=45 noFpga=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=true keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxReadsInMemoryPerSample=30000 maxTotalReadsInMemory=10000000 maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
##GATKCommandLine.SelectVariants=<ID=SelectVariants,Version=3.7-0-gcfedb67,Date="Thu Jul 25 13:08:54 EDT 2019",Epoch=1564074534110,CommandLineOptions="analysis_type=SelectVariants input_file=[] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=../../ucsc.hg19.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false variant=(RodBinding name=variant source=Reorder_dedup_NGTS1801253601.sam.bam.bam.vcf) discordance=(RodBinding name= source=UNBOUND) concordance=(RodBinding name= source=UNBOUND) out=/external/rprshnas01/wgs_data/ALS-WES/Analysis/rawsnps_Reorder_dedup_NGTS1801253601.sam.bam.bam.vcf.vcf sample_name=[] sample_expressions=null sample_file=null exclude_sample_name=[] exclude_sample_file=[] exclude_sample_expressions=[] selectexpressions=[] invertselect=false excludeNonVariants=false excludeFiltered=false preserveAlleles=false removeUnusedAlternates=false restrictAllelesTo=ALL keepOriginalAC=false keepOriginalDP=false mendelianViolation=false invertMendelianViolation=false mendelianViolationQualThreshold=0.0 select_random_fraction=0.0 remove_fraction_genotypes=0.0 selectTypeToInclude=[SNP] selectTypeToExclude=[] keepIDs=null excludeIDs=null fullyDecode=false justRead=false maxIndelSize=2147483647 minIndelSize=0 maxFilteredGenotypes=2147483647 minFilteredGenotypes=0 maxFractionFilteredGenotypes=1.0 minFractionFilteredGenotypes=0.0 maxNOCALLnumber=2147483647 maxNOCALLfraction=1.0 setFilteredGtToNocall=false ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES=false forceValidOutput=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
##GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.7-0-gcfedb67,Date="Thu Jul 25 14:08:22 EDT 2019",Epoch=1564078102578,CommandLineOptions="analysis_type=VariantFiltration input_file=[] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=../../ucsc.hg19.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false variant=(RodBinding name=variant source=rawsnps_Reorder_dedup_NGTS1801253601.sam.bam.bam.vcf.vcf) mask=(RodBinding name= source=UNBOUND) out=/external/rprshnas01/wgs_data/ALS-WES/Analysis/filtered_snps_NGTS1801253601.vcf filterExpression=[QD < 2.0 \|\| FS > 60.0 \|\| MQ < 40.0 \|\| MQRankSum < -12.5 \|\| ReadPosRankSum < -8.0] filterName=[my_snp_filter] genotypeFilterExpression=[] genotypeFilterName=[] clusterSize=3 clusterWindowSize=0 maskExtension=0 maskName=Mask filterNotInMask=false missingValuesInExpressionsShouldEvaluateAsFailing=false invalidatePreviousFilters=false invertFilterExpression=false invertGenotypeFilterExpression=false setFilteredGtToNocall=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
##contig=<ID=chrM,length=16571,assembly=hg19>
##contig=<ID=chr1,length=249250621,assembly=hg19>
##contig=<ID=chr2,length=243199373,assembly=hg19>
##contig=<ID=chr3,length=198022430,assembly=hg19>
##contig=<ID=chr4,length=191154276,assembly=hg19>
##contig=<ID=chr5,length=180915260,assembly=hg19>
##contig=<ID=chr6,length=171115067,assembly=hg19>
##contig=<ID=chr7,length=159138663,assembly=hg19>
##contig=<ID=chr8,length=146364022,assembly=hg19>
##contig=<ID=chr9,length=141213431,assembly=hg19>
##contig=<ID=chr10,length=135534747,assembly=hg19>
##contig=<ID=chr11,length=135006516,assembly=hg19>
##contig=<ID=chr12,length=133851895,assembly=hg19>
##contig=<ID=chr13,length=115169878,assembly=hg19>
##contig=<ID=chr14,length=107349540,assembly=hg19>
##contig=<ID=chr15,length=102531392,assembly=hg19>
##contig=<ID=chr16,length=90354753,assembly=hg19>
##contig=<ID=chr17,length=81195210,assembly=hg19>
##contig=<ID=chr18,length=78077248,assembly=hg19>
##contig=<ID=chr19,length=59128983,assembly=hg19>
##contig=<ID=chr20,length=63025520,assembly=hg19>
##contig=<ID=chr21,length=48129895,assembly=hg19>
##contig=<ID=chr22,length=51304566,assembly=hg19>
##contig=<ID=chrX,length=155270560,assembly=hg19>
##contig=<ID=chrY,length=59373566,assembly=hg19>
##contig=<ID=chr1_gl000191_random,length=106433,assembly=hg19>
##contig=<ID=chr1_gl000192_random,length=547496,assembly=hg19>
##contig=<ID=chr4_ctg9_hap1,length=590426,assembly=hg19>
##contig=<ID=chr4_gl000193_random,length=189789,assembly=hg19>
##contig=<ID=chr4_gl000194_random,length=191469,assembly=hg19>
##contig=<ID=chr6_apd_hap1,length=4622290,assembly=hg19>
##contig=<ID=chr6_cox_hap2,length=4795371,assembly=hg19>
##contig=<ID=chr6_dbb_hap3,length=4610396,assembly=hg19>
##contig=<ID=chr6_mann_hap4,length=4683263,assembly=hg19>
##contig=<ID=chr6_mcf_hap5,length=4833398,assembly=hg19>
##contig=<ID=chr6_qbl_hap6,length=4611984,assembly=hg19>
##contig=<ID=chr6_ssto_hap7,length=4928567,assembly=hg19>
##contig=<ID=chr7_gl000195_random,length=182896,assembly=hg19>
##contig=<ID=chr8_gl000196_random,length=38914,assembly=hg19>
##contig=<ID=chr8_gl000197_random,length=37175,assembly=hg19>
##contig=<ID=chr9_gl000198_random,length=90085,assembly=hg19>
##contig=<ID=chr9_gl000199_random,length=169874,assembly=hg19>
##contig=<ID=chr9_gl000200_random,length=187035,assembly=hg19>
##contig=<ID=chr9_gl000201_random,length=36148,assembly=hg19>
##contig=<ID=chr11_gl000202_random,length=40103,assembly=hg19>
##contig=<ID=chr17_ctg5_hap1,length=1680828,assembly=hg19>
##contig=<ID=chr17_gl000203_random,length=37498,assembly=hg19>
##contig=<ID=chr17_gl000204_random,length=81310,assembly=hg19>
##contig=<ID=chr17_gl000205_random,length=174588,assembly=hg19>
##contig=<ID=chr17_gl000206_random,length=41001,assembly=hg19>
##contig=<ID=chr18_gl000207_random,length=4262,assembly=hg19>
##contig=<ID=chr19_gl000208_random,length=92689,assembly=hg19>
##contig=<ID=chr19_gl000209_random,length=159169,assembly=hg19>
##contig=<ID=chr21_gl000210_random,length=27682,assembly=hg19>
##contig=<ID=chrUn_gl000211,length=166566,assembly=hg19>
##contig=<ID=chrUn_gl000212,length=186858,assembly=hg19>
##contig=<ID=chrUn_gl000213,length=164239,assembly=hg19>
##contig=<ID=chrUn_gl000214,length=137718,assembly=hg19>
##contig=<ID=chrUn_gl000215,length=172545,assembly=hg19>
##contig=<ID=chrUn_gl000216,length=172294,assembly=hg19>
##contig=<ID=chrUn_gl000217,length=172149,assembly=hg19>
##contig=<ID=chrUn_gl000218,length=161147,assembly=hg19>
##contig=<ID=chrUn_gl000219,length=179198,assembly=hg19>
##contig=<ID=chrUn_gl000220,length=161802,assembly=hg19>
##contig=<ID=chrUn_gl000221,length=155397,assembly=hg19>
##contig=<ID=chrUn_gl000222,length=186861,assembly=hg19>
##contig=<ID=chrUn_gl000223,length=180455,assembly=hg19>
##contig=<ID=chrUn_gl000224,length=179693,assembly=hg19>
##contig=<ID=chrUn_gl000225,length=211173,assembly=hg19>
##contig=<ID=chrUn_gl000226,length=15008,assembly=hg19>
##contig=<ID=chrUn_gl000227,length=128374,assembly=hg19>
##contig=<ID=chrUn_gl000228,length=129120,assembly=hg19>
##contig=<ID=chrUn_gl000229,length=19913,assembly=hg19>
##contig=<ID=chrUn_gl000230,length=43691,assembly=hg19>
##contig=<ID=chrUn_gl000231,length=27386,assembly=hg19>
##contig=<ID=chrUn_gl000232,length=40652,assembly=hg19>
##contig=<ID=chrUn_gl000233,length=45941,assembly=hg19>
##contig=<ID=chrUn_gl000234,length=40531,assembly=hg19>
##contig=<ID=chrUn_gl000235,length=34474,assembly=hg19>
##contig=<ID=chrUn_gl000236,length=41934,assembly=hg19>
##contig=<ID=chrUn_gl000237,length=45867,assembly=hg19>
##contig=<ID=chrUn_gl000238,length=39939,assembly=hg19>
##contig=<ID=chrUn_gl000239,length=33824,assembly=hg19>
##contig=<ID=chrUn_gl000240,length=41933,assembly=hg19>
##contig=<ID=chrUn_gl000241,length=42152,assembly=hg19>
##contig=<ID=chrUn_gl000242,length=43523,assembly=hg19>
##contig=<ID=chrUn_gl000243,length=43341,assembly=hg19>
##contig=<ID=chrUn_gl000244,length=39929,assembly=hg19>
##contig=<ID=chrUn_gl000245,length=36651,assembly=hg19>
##contig=<ID=chrUn_gl000246,length=38154,assembly=hg19>
##contig=<ID=chrUn_gl000247,length=36422,assembly=hg19>
##contig=<ID=chrUn_gl000248,length=39786,assembly=hg19>
##contig=<ID=chrUn_gl000249,length=38502,assembly=hg19>
##reference=file:///external/rprshnas01/wgs_data/ALS-WES/Analysis/../../ucsc.hg19.fasta
##source=SelectVariants
##INFO=<ID=SpliceAI,Number=.,Type=String,Description="SpliceAI variant annotation. These include delta scores (DS) and delta positions (DP) for acceptor gain (AG), acceptor loss (AL), donor gain (DG), and donor loss (DL). Format: ALLELE\|SYMBOL\|DS_AG\|DS_AL\|DS_DG\|DS_DL\|DP_AG\|DP_AL\|DP_DG\|DP_DL">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT
chrM	150	.	T	C	1685.78	PASS	AC=1;AF=0.5;AN=2;BaseQRankSum=1.606;ClippingRankSum=0;DP=53;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=33.72;ReadPosRankSum=-1.21;SOR=0.693;SpliceAI=C\|.\|.\|.\|.\|.\|.\|.\|.\|.	GT:AD:DP:GQ:PL
chrM	152	.	T	C	395.77	PASS	AC=1;AF=0.5;AN=2;BaseQRankSum=-0.703;ClippingRankSum=0;DP=53;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=7.92;ReadPosRankSum=0.42;SOR=0.743;SpliceAI=C\|.\|.\|.\|.\|.\|.\|.\|.\|.	GT:AD:DP:GQ:PL
chrM	184	.	G	A	371.77	PASS	AC=1;AF=0.5;AN=2;BaseQRankSum=-0.521;ClippingRankSum=0;DP=49;ExcessHet=3.0103;FS=1.257;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=8.08;ReadPosRankSum=-0.813;SOR=0.495;SpliceAI=A\|.\|.\|.\|.\|.\|.\|.\|.\|.	GT:AD:DP:GQ:PL
chrM	195	.	C	T	1827.77	PASS	AC=2;AF=1;AN=2;DP=50;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=34.04;SOR=0.874;SpliceAI=T\|.\|.\|.\|.\|.\|.\|.\|.\|.	GT:AD:DP:GQ:PL
chrM	199	.	T	C	552.77	PASS	AC=1;AF=0.5;AN=2;BaseQRankSum=0.582;ClippingRankSum=0;DP=49;ExcessHet=3.0103;FS=2.66;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=12.28;ReadPosRankSum=0.281;SOR=0.977;SpliceAI=C\|.\|.\|.\|.\|.\|.\|.\|.\|.	GT:AD:DP:GQ:PL
chrM	200	.	A	G	357.77	PASS	AC=1;AF=0.5;AN=2;BaseQRankSum=0.837;ClippingRankSum=0;DP=47;ExcessHet=3.0103;FS=4.97;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=8.32;ReadPosRankSum=0.291;SOR=1;SpliceAI=G\|.\|.\|.\|.\|.\|.\|.\|.\|.	GT:AD:DP:GQ:PL
chrM	204	.	T	C	271.77	PASS	AC=1;AF=0.5;AN=2;BaseQRankSum=0.832;ClippingRankSum=0;DP=46;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=6.47;ReadPosRankSum=0.177;SOR=0.936;SpliceAI=C\|.\|.\|.\|.\|.\|.\|.\|.\|.	GT:AD:DP:GQ:PL
chrM	235	.	A	G	243.77	PASS	AC=1;AF=0.5;AN=2;BaseQRankSum=2.066;ClippingRankSum=0;DP=41;ExcessHet=3.0103;FS=1.448;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=6.59;ReadPosRankSum=-0.394;SOR=0.404;SpliceAI=G\|.\|.\|.\|.\|.\|.\|.\|.\|.	GT:AD:DP:GQ:PL

illumina / spliceai Goto Github PK

spliceai's Introduction

SpliceAI: A deep learning-based tool to identify splice variants

License

Installation

Usage

Examples

Frequently asked questions

Contact

spliceai's People

Contributors

Stargazers

Watchers

Forkers

spliceai's Issues

TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 ... UserWarning: No training configuration found in save file: the model was not compiled. Compile it manually. warnings.warn('No training configuration found in save file: ' tensorflow/1.4.1

Hello! I am trying to use SpliceAI 1.3.1 on Ubuntu 14.04 operating system. When I was running the sample input file, I got an ImportError (vide infra). Thank you in advance!

Summary

Cause

Solution

Recommend Projects

Recommend Topics

Recommend Org

TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 ... UserWarning: No training configuration found in save file: the model was not compiled. Compile it manually.
warnings.warn('No training configuration found in save file: '
tensorflow/1.4.1

Hello! I am trying to use SpliceAI 1.3.1 on Ubuntu 14.04 operating system. When I was running the sample input file, I got an ImportError (vide infra).
Thank you in advance!