Comments (3)
There's a huge deletion on chromosome 4 ID=chr4-49657849-DEL-140440990
. It spans many TR regions and each one performs a pysam.VariantFile.fetch
which has to parse it. The variant by itself in a gzip vcf is 38M. Removing that variant from the VCF allows the job to complete.
Chromosome 9 also has some larger variants that might need to be pre-filtered
# LEN ID
-140440990 chr4-49657849-DEL-140440990
-22543055 chr9-43222012-DEL-22543055
-20115032 chr9-42684836-DEL-20115032
-19608353 chr9-40910205-DEL-19608353
-4215131 chr21-5393558-DEL-4215131
-2828305 chr5-46867696-DEL-2828305
-2818263 chr9-62556860-DEL-2818263
-2240586 chr9-60559282-DEL-2240586
-1664861 chr9-40910205-DEL-1664861
from truvari.
Thanks for looking into this! I never thought to check if large INDELs were slowing things down, since I assumed the --sizemax
flag was excluding them from the analysis entirely.
from truvari.
Reopening this issue, but now with Truvari refine
(Truvari bench
now succeeds on this input in ~15 minutes, thanks!). I've filtered out all remotely large variants and all inversions as follows:
bcftools view \
-i 'TYPE=="SNP" || (ILEN < 1000 && ILEN > -1000)'\
pav.all.vcf.gz |
grep -v "INV" > pav.most.vcf
bgzip -f pav.most.vcf
tabix -p vcf pav.most.vcf.gz
I ran Truvari WFA refine
on the bench
results, limiting the GIAB-TR regions to candidate.refine.bed
. I've included my log file here: pav.log.
On this attempt, I defined the default /tmp
directory to be located on an external hard drive. It crashed after 2.5 hours with a MemoryError
, and used 475GB of /x/tmp
memory.
On a previous attempt, it hung for a few days after filling the /tmp
directory with 185GB of data and then the current directory with another 600GB of data.
All of this data is located in files named tmp********
, and they appear to be FASTA files auto-generated by samtools or something. How much space is this expected to take, and how long should the analysis run for?
Thanks in advance, sorry for opening so many issues. I'd be happy to help provide any more info to get this working.
from truvari.
Related Issues (20)
- Truvari refine fails when no regions to refine HOT 4
- TypeError, 'NoneType' and 'NoneType' HOT 2
- Duplication to Insertion doubt HOT 4
- Failure in pip installation HOT 2
- Question: Does truvari have a upper limit on the file size? How to speed up? HOT 2
- BED Region off-by-one error HOT 4
- Zero matches between base and comp HOT 4
- AttributeError: 'CollapsedCalls' object has no attribute 'consolidate' | version 4.2.1 HOT 4
- Calculate SNV HOT 7
- complex genotype problem HOT 3
- GT integrate HOT 1
- No TP or FP calls for CNV HOT 1
- merging different SV type? HOT 3
- No FP or TP calls HOT 2
- Unable to run MAFFT HOT 9
- md5sum FIPS issue HOT 1
- Support vector for intra-sample merge HOT 6
- some questions about the results in fp.vcf.gz
- some questions about the results in fp.vcf.gz HOT 1
- Getting same numbers of TP-base and TP-comp HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from truvari.