genedx / scramble Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Issue
If neither --eval-meis
nor --eval-dels
is set the script will run without errors and without generating any output.
It is hard to debug.
# Error while running:
Rscript --vanilla /path/to/scramble/cluster_analysis/bin/SCRAMble.R \
--out-name ...
Suggestions
Either:
Hi,
When running scramble on the test .bam file, I get the following error message from do.meis.R
Error in .Call2("XStringSet_align_pairwiseAlignment", pattern, subject, :
key 0 not in lookup table
Calls: do.meis ... QualityScaledXStringSet.pairwiseAlignment -> .Call2
Execution halted
Thanks for your help!
Hi~
I am trying to run SCRAMble.R
.
I have error report by using different --ref path
.
But, both of attempts can't get a success report.
1.
Rscript --vanilla ${bin}/SCRAMble.R \
--out-name $OUTPUT_PATH/SCRAMble_${hg}/${ID} \
--cluster-file $OUTPUT_PATH/SCRAMble_${hg}/clusters/${ID}.clusters.txt \
--install-dir ${bin} \
--mei-refs ${MEI_consensus_seqs} \
--ref /staging/biology/zxc898977/writeCodeing/debugs/ref_hg19/ucsc.hg19.blastdb.fa \
--eval-meis \
--eval-dels
Rscript --vanilla ${bin}/SCRAMble.R \
--out-name $OUTPUT_PATH/SCRAMble_${hg}/${ID} \
--cluster-file $OUTPUT_PATH/SCRAMble_${hg}/clusters/${ID}.clusters.txt \
--install-dir ${bin} \
--mei-refs ${MEI_consensus_seqs} \
--ref /staging/biology/zxc898977/writeCodeing/debugs/ref_hg19/ucsc.hg19.blastdb \
--eval-meis \
--eval-dels
PS. ucsc.hg19.blastdb.fa
is equal to ucsc.hg19.fasta
my all files structure
I am curious how the successful report looks like.
Hope can get a solution to produce correct reports.
Thanks a lot!
Hi, SCRAMBLE is such a first-tier tool for MEI detection.
Can output txt file be converted into VCF file for further gene-based annotation or 1000G frequency annotation(dbRIP) ?
Thank you!
Hi.
I'm trying to run scramble on some WGS data. Some of the samples have processed through correctly, but some give the following error while running the analyse step:
Error in .Call2("PairwiseAlignmentsSingleSubject_align_aligned", x, gapCode, :
negative length vectors are not allowed
Calls: do.meis ... as.matrix -> aligned -> aligned -> .local -> .Call2
Execution halted
Please let me know if you need any further information. I will now try and see if I can isolate the row in the cluster file that causes this error.
Hi,
I'm trying to run my samples through SCRAMble, 44/293 samples failed due to this error, the rest ran smoothly.
The output file is .vcf that is 3321 in size that contain the title row.
10436 Writing VCF file to sample.vcf...
10437 Error in .Call2("C_solve_user_SEW", refwidths, start, end, width, translate.negative.coord, :
10438 solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start (0) is < 1
10439 Calls: write.scramble.vcf ... make_IRanges_from_windows_args -> solveUserSEW -> .Call2
10440 Execution halted
Hi,
I am working on a research project on MEIs and was wondering if this is still being maintained or developed for future use cases?
Best wishes,
Robert Wilson
Of the 359,984 clusters produced by processing the 1000 genomes cram for sample HG00150, 5 have nothing but "n" calls in the 4th field. Here are two examples:
chr1:24599821 left 7 nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn ccccccccccccccccccttcgttaacgacgctattaccaactaaa
chr2:232427156 left 5 nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn ttttttttttttttttttttttttttttttttttttctctccccccccccccccccctctccctttcttgtgtttttttttttttttttttttccgcctcccccccacctcccaggtt
When the R script attempts find deletions using such a cluster, the analysis fails with this error:
Error in width(strings) : NAs in 'x' are not supported
Calls: do.dels ... .charToXStringSet -> solveUserSEW -> width -> width
My workaround is to clean the cluster file of these few bad guys before processing, but perhaps do.dels.R could exclude them.
Thanks for the software.
Hi, I try running scramble on some eome giab NA12878 sample data. It interrupts with the following message:
Sample had 22 MEI(s)
Done analyzing MEIs
Loading required package: GenomicRanges
Error: subscript contains invalid names
Execution halted
The output file result.txt_MEIs.txt
shows these 22 MEIs, but the according vcf file result.txt.vcf
shows only the header.
If I understand this correctly, it failed within an R script. The installed version of R in the used environment is 4.1.3; GenomicRanges
used in the R environment is version 1.46.1.
What further information can I provide to enable you to propose a solution?
Thank you in advance both for helping and for having developed this amazing tool,
Vinzenz
Hi everyone,
Hope you are well-- would you all be up for me making a bioconda package for scramble?
We are adding support for MEI calling in bcbio (https://github.com/bcbio/bcbio-nextgen) using scramble-- we were originally going to use MELT, but the license is one of the most insane licenses I've ever seen for software, and made it so we couldn't use it at all in bcbio.
On the topic of licenses, we were also wondering if you would consider a change in the license-- right now it's non-commercial only, but a lot of companies use bcbio and it would be awesome to extend MEI calling for them. Titus had a nice blog post a while back about the benefits of having a freer license for academics: http://ivory.idyll.org/blog/2015-on-licensing-in-bioinformatics.html and managed to change Lior's mind about it here: https://liorpachter.wordpress.com/2017/08/03/i-was-wrong-part-2/ in case that might sway your feelings. As a huge open source contributor I understand wanting to get compensated somehow for lots of hard, thankless work in the ditches, I totally get it. I think most of the time though, the benefits of having a tool get more widely adopted weighs whatever amount of money you could squeeze out of companies.
Anyway, no worries if the license switch is a no, we can work around it as we support some other non-free software as well.
Thanks so much. Please let me know if I can help you all with anything.
Dear scramble developper,
I am running the cluster_identifier and get a Segmentation fault.
I tried to backtrace with gdb and got: "Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000055bf4e8e2c4e in handle_cluster.isra ()"
I had the same error on multiple BAM files and also using both the Docker and the installed versions (on Ubuntu 20.04 with all installed packages needed).
I would be very grateful for your help.
Mathieu
Issue
For a couple of my samples I had problems while writing the MEIs to VCF, negative coordinate issue. Library kit: 'Agilent SureSelect Human All Exon V8'. Could you please help me deal with this issue?
scramble.sh --ref /path/to/hs37d5.fa --out-name /path/to/targeted_seq_mei_calling/work/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1/out/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1 --cluster-file /path/to/targeted_seq_mei_calling/work/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1/out/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1_cluster.txt --nCluster 5 --mei-score 50 --indel-score 80 --poly-a-frac 0.75 --eval-meis
Running sample: /path/to/targeted_seq_mei_calling/work/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1/out/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1_cluster.txt
Running scramble with options:
INSTALL.DIR : /path/to/targeted_seq_mei_calling/.snakemake/conda/9154f892d04f9bfe82a4d010855d834d/share/scramble/bin
blastRef : /path/to/hs37d5.fa
clusterFile : /path/to/targeted_seq_mei_calling/work/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1/out/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1_cluster.txt
deletions : FALSE
indelScore : 80
mei.refs : /path/to/targeted_seq_mei_calling/.snakemake/conda/9154f892d04f9bfe82a4d010855d834d/share/scramble/resources/MEI_consensus_seqs.fa
meiScore : 50
meis : TRUE
minDelLen : 50
nCluster : 5
no.vcf : FALSE
outFilePrefix : /path/to/targeted_seq_mei_calling/work/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1/out/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1
pctAlign : 90
polyAFrac : 0.75
polyAdist : 100
Useful Functions Loaded
Loading required package: BiocGenerics
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colnames, dirname, do.call,
duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
I, expand.grid, unname
Loading required package: IRanges
Loading required package: XVector
Loading required package: GenomeInfoDb
Attaching package: 'Biostrings'
The following object is masked from 'package:base':
strsplit
Done analyzing l1
Done analyzing sva
Done analyzing alu
Done analyzing l1
Done analyzing sva
Done analyzing alu
Sample had 38 MEI(s)
Done analyzing MEIs
Writing VCF file to /path/to/targeted_seq_mei_calling/work/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1/out/bwa.scramble.<SAMPLE_ID>-N1-DNA1-WES1.vcf...
Error in .Call2("C_solve_user_SEW", refwidths, start, end, width, translate.negative.coord, :
solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start (0) is < 1
Calls: write.scramble.vcf ... make_IRanges_from_windows_args -> solveUserSEW -> .Call2
Hi
Running the scramle docker, it seems to randomly fail for some bam files with a seg fault.
This doesn't seem to be for all bam files but I can't identify any common aspect of the ones for which it does fail vs succeed.
I don't see any core dump file to include (where would these save to?)
Docker command used:
docker run -v /mnt/qsg-results-3/:/LIfolder -it --rm scramble:latest bash
Command used:
root@2704cc4af852:/# /app/cluster_identifier/src/build/cluster_identifier /LIfolder/LI6073/Sorted_LI6073.bam
output:
chr1:189991284 right 6 cctgccacaaccactccccagtgcctttaagagtttctacacctgcatccagatgtttaaatacaggaaactgctgt ttttc
chr1:200911586 right 6 cctgccactcctgctcagaagacagtggctctgacgtctccagcatctcccaccccacttcgccgggcagcagcagccccgacatctcctttctgca ccc
chr1:201048681 right 10 ccagccgctgtacagggagacgcagtggcctgccgctgagctgggaccaggccaagcttggcaagtcatcctcacccggtctgtggaccgggagg cccc
chr1:201066388 right 9 ccagccatatcctgccctccacaccagctgcctttctgcctgaaaacactcccaccttccccttccctttcctccagccgtgagagtgtgccc ccc
chr1:201782348 right 12 cctgccacaggcacagccactgtcatgcaaactggtggttcagccactctcagcaagatccagaagtcctcaggcatccctgtca ccc
chr1:203498891 right 10 ccctgccactaggacagtcaccaacactgtgttagtgccccccgtgttagctctttcctgttggctctcagtccctccagctgtcaaaggga ccc
chr1:204249208 right 6 ccagccggtcctgctccctcaccaccttgttctgctcacacaattttcccagcagcttctagggacccagagagtggagaaggagagggagaaa cc
chr1:204460083 right 7 tgaaggagggagagtcttt gccctgggaaacccattttctccctccctctcctcagctcacactctgatttaaaggagttcccactctttctatatgtcctgtgaagac
chr1:205065724 left 15 tcatgtctcatggcctgctgcactggtcagggccggtggtc tggcctgctgcactggtcagggccggtggtctgacctgctgcccctaactgtccccgtgtgcagaaggagaccattggggatctgacca
chr1:205065766 right 11 actggtcagggccggtggtctgacc ccatggatgtcatggagtaggggactcccaagcgctgcctcatgtctcatggcctgctgcactggtcagggccggtggtctggcctgctgc
chr1:205069461 right 7 aagccatatcggtgtcccttggcccttgacagccccctcggtggcaccctcaggactcagcggaggaggtggagcccccggagagct aaa
chr1:205339789 right 8 cctgccgcatatactcggtaatctaagaagaaaggccatacatgcccctggcttagctca ccc
chr1:205339790 right 9 cctgccgcatatactcggtaatctaagaagaaaggccatacatgcccctggcttagctcacaggtacagcaagacaggcccaccagtatctattg cccc
chr1:205662050 right 7 ccagccggtccatgaccagagagaagaccagggagatggcgcactgcaggaacagccccaggctgcccatccgaacgcctgcagagggagaggggcc cc
chr1:206730053 right 5 ccagccacaactctttgaccactccttgttatacaccgtactatgtgggtaagtccacagggggcccagggacctaggcttttcccagaactttt ccc
chr1:208103054 left 8 aggacaattcctct ctctctctgcaaagtactgtcatatcccatcattcccggaaagccccggtcttctgcatgccaaacccttttccagtataccccaaactta
chr1:208267465 left 10 tcataagattactttccagcagcagcagcagcagcagcagcagcagcagcagcag cagcagcagcagcagcagcagcagcagcgatgtaattgacccccatttacagatgatgcagctttaaggcagagaattccatggctg
chr1:209432319 right 6 agagtaactctg catggagctgacaaccatgaggcctcggcagccaccgccaccaccgccgccgccaccaccgtagcagcagcagcagcagcagcagcagca
chr1:209788627 right 5 gagccactactggaatgacctgttcaggacacagaacacaggtgtatcctctgaggaaaaggtatttttaaatagcacaatggacccaagatt cg
chr1:214464832 right 13 gcctatcctggcgcacacgcccctgagatggccttagcagtttcgtgactggaaaattacactatcacctgtgctcctccaggcaggga cct
chr1:214464835 right 8 gcctatcctggcgcacacgcccctgagatggccttagcagtttcgtgactggaaaattacactatcacctgtgctcctccaggcagggaaaagg ccgcct
chr1:215786813 right 12 cctgccacaatgttctgtggcttccatagatgctgggcagaggatcctgcactctttggtttcctgagtcaagtggcag cc
chr1:224114452 right 5 gaggagaaggga ttatgtcctacgacgaaattagccagctccgcctggtgaggcccccgcagaactcctgcctccctctccccccggccgaggtctgggagat
chr1:224330143 right 6 caagccacagctcggaccgccagctcctagtcaaccgggggcctcgtaggggttgcccgccgcgttcgccgggccagttgcacctgaaa ttt
chr1:227925467 right 14 gccgcctctggctccagggtcagcgggaggatggtcaggggctcgctgcccgtcagcctgggcacagagaggccagcatgagcccggcccc gggcg
chr1:229647493 left 5 ttttcctctatc ttattttgccctttagctcttaaaccgagaagcttctcaggagcagcctgtgtccctcacagtggtcgggcctgtcttagatgtcctggct
chr1:230710052 right 11 tcagggagcagccagtcttccatcctgtcacagcctgcatgaacctgtcaatcttctcagcagcaacatccagttctgtgaagtccagagagcgt cccg
chr1:232515127 right 5 ttgggcacagctggggtaccattagccggaccaccaccgccagtctcattggaattcgaggcatttaaagaagtagtgggtcccatgttgccat cccatt
chr1:233379357 right 5 gtggggaggccagcagccccccctccctgccactgtcaagtgccctgggcatcctctccacaccttctttctccacaaagtgcctgctgcagatggac g
chr1:233614150 left 10 cggcgttggccttggctttggct ttggcggcggcggtggagaagatgctgcagtccctggccggcagctcgtgcgtgcgcctggtggagcggcaccgctcggcctggtgctt
chr1:236540468 left 6 caagcacacgcacatatacttatgactgcctgtttgtctggggagagacag ggacgcaaggaaacatttaaatttggataataagttaatttattaactgtttttttttggtggcgggggggg
chr1:240207330 right 6 agccacgaacactctgtttcctctgcctttaaaaacagctgtaacatcccatctccaccacctctgccttgcacagag g
chr1:247434047 left 11 gtgttctgaggccttctctattcca gagctctctggtcagatgtgttctgatgctttctgcctctgttcttggcatgaaggttggggcgctgtggcctctcgcatgagtgctgctt
chr1:248858387 right 11 cctgccgttagggcctcagtttcctcatcagtgaactggggcaagactaaactatttcaatagcagtggcaggtgtggagccaaaccccgtcctt ccc
chr2:676478 left 12 cagaactcctgtaa gtgtcactactctctgctggggaccgcagcggcttctccagagccgcgccatgacataaggacacaagcgcatctactcccatcaatgcac
chr2:1638773 left 11 ggaataaaacgttatacg cagaaggttcgggcagggctgtgctgctgtggaatcttggagtggggggacacaggccgccaggcacctcacctgtgttctgaggtctg
chr2:6910472 right 7 acattagtgggtgcagcgcaccagcatggcac gtagaaagagagagagagaggagtttttaagtactgtatgtattttaaggagattgaataatctaaggtgaggagcatttaaaataata
chr2:9843465 right 7 gcttctccagcctttcccggaagctgcgctcgc aaggtttccctgccgcgcaggcgcacggaatcctaggcgcggatctcgcgtttgcggccggaag
chr2:10122796 right 5 ccactatgctctccctccgtgtcccgctcgcgcccatcacggacccgcagcagctgcagctctcgccgctgaaggggctcagcttggtcgacaagg ccg
chr2:15393610 left 6 gtctgagagaaatgaaagcgtatgtctacac aaatgaaagcgtatgtctacacaaacacttgcatatgaaaagtcatagcaactttatttgtaaaagccaaaactcaaaataacccaaat
chr2:15942065 right 6 gagccgatgccgagctgctccacgtccaccatgccgggcatgatctgcaagaacccagacctcgagtttgactcgctacagccctgcttct ggggggg
chr2:16239536 left 7 acaacaagctacaacagcagcagcagcagcag cagcagcagcagcagcagcagcagcagcagcagcagcagcagcagcagaatgaaggaatgaatgaatgaatgagcgagtgagtgag
chr2:20102729 left 11 atggggagaaggagaagaagaagaagaagaaga agaagaagaagaagaagaagaagaagaagaagaagacgacgacaacggtggtgagggggatggtaccagtctgaggttcgacaggcagtt
chr2:20618698 left 5 ggggctgggcatcta gttggcaaggcatccccacaccctccctccccttcatgtccacggggaataagacacattgggctctggctcctagggtgagagccgctcc
chr2:20640843 right 6 cctgcctcaccagcctggctgaagcctccaggctgcagaggcagctgtggacatgctcccactggggcacggcagcggggcctagttctgggc ccc
chr2:24054101 right 6 cccagccagtctagtgggaatgataaaggaggcttggaaggccaactctttccctcttctaccagcaagggccatccatggtgccagcttctaggt ccc
chr2:27205506 right 6 ccctgcaaggaaagcacagcaaccctgccacagaggccttctaaacccagcttgtccaacctgccttattttgttgttgc cccc
chr2:27629250 right 13 ggtcttcgaggatttggagggt ttttgtacaggtgacgtacacagcatgggtgtagtaggggagcgcaaaaggttgcctccggcaggcggaaggccaggaagaaagggaggga
chr2:28632222 right 5 tttttttttttttaaccatctctctccaagaggattcctgagggtggctttttccacattacctccttt t
chr2:29144009 right 6 acagacagtatg gatcgtgttgttattgcaggacagaaggtacagtaagtaactgcagtctctgaagccagggttgttatgtccatgacctatgttcaaggac
Segmentation fault (core dumped)
Bam files are built against GRCH38, aligned with novoalign.
When running SCRAMBLE-MEI on the example data, I'm getting the following warnings:
Sample had 1 MEI(s)
Warning messages:
1: In data.frame(df.all, alignments_fwd, stringsAsFactors = F) :
row names were found from a short variable and have been discarded
2: In data.frame(df.all, alignments_rev, stringsAsFactors = F) :
row names were found from a short variable and have been discarded
Done analyzing MEIs
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS
Hi,
Is it possible to add the option to specify the reference file in cluster_identifier
? I'm trying to run it using a CRAM file and the tool looks for the reference but can't find it automatically.
Or do you know of a way I can specify this myself?
Thanks
-Nicolas
Issue
We would like to include scramble
to our MEI analysis workflow, but our workflow relies on Conda and requires VCF outputs.
The changes to fix VCF output are in 1.0.2
, but it was not integrated to bioconda-recipes
.
Suggestion
Update bioconda-recipes
. We tried in bioconda/bioconda-recipes#36929, but there is something missing. Perhaps version defined in scramble/cluster_identifier/src/cluster_identifier.c
?
Do you have any suggestions for data filtering, such as: coverage threshold - minimum number of reads that can be determined that the call is reliable?
Hi,
I m using the docker version of scramble.
I get the following error when I ran this command on my clusters file.
/bin/SCRAMble.R
--out-name ${PWD}/test
--cluster-file ${PWD}/MEN_CGH200860-I.sorted.clusters.txt
--install-dir /app/cluster_analysis/bin/
--mei-refs /app/cluster_analysis/resources/MEI_consensus_seqs.fa
--ref /app/validation/test.fa
--eval-meis
Done analyzing MEIs
Writing VCF file to /data/share/genmol/sacha/projects/ALU/test.vcf...
Error in write.table(fixed, paste0(outFilePrefix, ".vcf"), row.names = F, :
unimplemented type 'list' in 'EncodeElement'
Execution halted
Seems the fixed dataframe contains list for the REF columns. Here is the output of print(fixed) and print(str(fixed) :
#CHROM POS ID REF ALT QUAL FILTER
11 chr13 18212144 INS:ME NULL <INS:ME:ALU> 79.24415 PASS
10 chr13 38878403 INS:ME NULL <INS:ME:ALU> 60.81471 PASS
9 chr14 57041288 INS:ME NULL <INS:ME:ALU> 78.32513 PASS
8 chr15 40808120 INS:ME NULL <INS:ME:ALU> 73.71777 PASS
7 chr16 89224981 INS:ME NULL <INS:ME:ALU> 78.32513 PASS
3 chr2 97185492 INS:ME NULL <INS:ME:L1> 57.59200 PASS
4 chr2 102706037 INS:ME NULL <INS:ME:L1> 62.65276 PASS
6 chr22 23928275 INS:ME NULL <INS:ME:ALU> 103.66561 PASS
5 chr22 43928709 INS:ME NULL <INS:ME:L1> 96.75457 PASS
2 chr4 185440770 INS:ME NULL <INS:ME:ALU> 86.15520 PASS
1 chr5 62561291 INS:ME NULL <INS:ME:ALU> 103.66561 PASS
INFO
11 MEINFO=chr13:18212144_ALU_Plus,18212144,18212145,+
10 MEINFO=chr13:38878403_ALU_Plus,38878403,38878404,+
9 MEINFO=chr14:57041288_ALU_Plus,57041288,57041289,+
8 MEINFO=chr15:40808120_ALU_Minus,40808120,40808121,-
7 MEINFO=chr16:89224981_ALU_Plus,89224981,89224982,+
3 MEINFO=chr2:97185492_L1_Plus,97185492,97185493,+
4 MEINFO=chr2:102706037_L1_Plus,102706037,102706038,+
6 MEINFO=chr22:23928275_ALU_Plus,23928275,23928276,+
5 MEINFO=chr22:43928709_L1_Minus,43928709,43928710,-
2 MEINFO=chr4:185440770_ALU_Plus,185440770,185440771,+
1 MEINFO=chr5:62561291_ALU_Plus,62561291,62561292,+
'data.frame': 11 obs. of 8 variables:
$ #CHROM: chr "chr13" "chr13" "chr14" "chr15" ...
$ POS : int 18212144 38878403 57041288 40808120 89224981 97185492 102706037 23928275 43928709 185440770 ...
$ ID : chr "INS:ME" "INS:ME" "INS:ME" "INS:ME" ...
$ REF :List of 11
..$ : NULL
..$ : NULL
..$ : NULL
..$ : NULL
..$ : NULL
..$ : NULL
..$ : NULL
..$ : NULL
..$ : NULL
..$ : NULL
..$ : NULL
$ ALT : chr "<INS:ME:ALU>" "<INS:ME:ALU>" "<INS:ME:ALU>" "<INS:ME:ALU>" ...
$ QUAL : num 79.2 60.8 78.3 73.7 78.3 ...
$ FILTER: chr "PASS" "PASS" "PASS" "PASS" ...
$ INFO : chr "MEINFO=chr13:18212144_ALU_Plus,18212144,18212145,+" "MEINFO=chr13:38878403_ALU_Plus,38878403,38878404,+" "MEINFO=chr14:57041288_ALU_Plus,57041288,57041289,+" "MEINFO=chr15:40808120_ALU_Minus,40808120,40808121,-" ...
chr1:931134 right 6 gtgcccccccccccccccccccccgggccaccggttgggtggggagggg tgggacgtgaacatctctttccgagaggcgtcctgcaggtaggagccgtgctgtgcgtgcataagagggggccgtgactcccc
chr1:939446 left 6 tgctccttgtgttggcccggtagcgcctctaccacctggg cctccccagccacggtgaggacccaccctggcatgatctcccctcatcacctccccagccacatgtactcggccattcctgttgctga
chr1:955902 right 9 atgccccccaccccgcgtaacagcgggaatacatttgcaccaataaaaaaaacaaaatatgtagaaatccaaaaatgt ctctgttgccatgtctctgtcctagccacaaggcctctggcttctcctgtgtgtggtcccgacccaccttccaccctacccccc
chr1:971019 right 10 ggggggggggggggggggggggggggggggggggggggggggg gctggctttaccacctggagaagcagacggccctcctcggggggccgcggcgctgccactcggcacccccacaggtcagtgccgggg
chr1:1046488 right 10 cgccccccccccccggggccccccccaaacccccacaaccccaaccccccacccccc ccagcactcacccgacatctgcctccgtgactgtgaccaccccagggctcctcctgagccaggcactgccggcccccccc
chr1:1046501 right 8 aggcgccccccaagaccccacccacccccacccccccaccccccacaaagcgaacgcggaccacaaaca cccgacatctgcctccgtgactgtgaccaccccagggctcctcctgagccaggcactgccggcccccccccgcgcccaccccc
chr1:1048421 left 6 tgtggccgtttttgttagtgggtatgggttccccccgcctttggtggggggggcggccgccggggggggccatgtttg ggggggggggctaagccaccatcaggctttgagttgggggcaggagcccggattaaggcggggtttcggccagatgcggtggc
chr1:1049076 left 5 gggggtattgtatttctggttttgggggttttttttgggcggggtgctgctcgggggggggggggggggcg ggggcgggggcagctcaggtgggcggggagggg
chr1:1050063 left 17 tgttttggggggggccccggggggggttggggccactttggccctccggggggggggggggggggctgggggggg gggggggggggggggttgaacgtttgggcgggtacaggttccaggtagcattgcagttaggatgcggctcagtctagtctgggttttgag
chr1:1050070 left 6 cggggcggggccccgggggggggtggggcccctttcgcccccccggggggggggggggggctcgggggggggggggtt ggggggggttgaacgtttgggcgggtacaggttccaggtagcattgcagttaggatgcggctcagtctagtctgggttttgag
We ran SCRAMble on NA12878 dataset and compared the output with NA12878 validation data.
Could you help us explain the remaining differences in calls? Could you recommend any additional filters to use for calls made by SCRAMble?
VCF format specification (https://samtools.github.io/hts-specs/VCFv4.2.pdf) highly recommends (but not required) that the header include tags describing the contigs referred to in the VCF file. With samtools, output VCF of Scramble can not be view correctly due to contigs definition :
##contig=<ID=chr1>
Samtools command:
$ samtools view scramble.vcf
[main_samview] fail to read the header from "scramble.vcf"
Suggestion:
Add length and assembly as tags:
##contig=<ID=chr1,length=249250621,assembly=hg19>
I have used the following commands to generate reference files (*.nhr, *.nin, and *.nsq files) for VCF creation for both GRCh37/38.
makeblastdb -in file.fasta -input_type fasta -dbtype nucl
However, when I run the Cluster analysis, I get the following error:
Done analyzing MEIs
Writing VCF file
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Error: subscript contains invalid names
Execution halted
This occurs with either reference (37 or 38).
Any ideas how I could go about troubleshooting this step?
Thanks in advance
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.