Comments (14)
In the meantime, you can download the gnomAD4 VCFs and use --custom
gnomADv4.0 has both Exome and Genomes, and AFAIK there is no combined score available, so you need to sum the counts and re-calculate the AF if you want them merged
I have some scripts to do this (and cut down the INFO fields to the dozen or so I want) if you want to take a look:
https://github.com/SACGF/variantgrid/blob/master/annotation/annotation_data/generate_annotation/gnomad4.0_download.sh
https://github.com/SACGF/variantgrid/blob/master/annotation/annotation_data/generate_annotation/gnomad_data.py
from ensembl-vep.
Hi @davmlaw
thank you for this help. I appreciate it.
I tried to run the gnomad_data.py script, I already downloaded the gnomad4 data with the gnomad4.0_download.sh script; but I get syntax error on line 131 in the gnomad_data.py script
"with (open(chrom_script, "w") as cs): the arrow is under the w
SyntaxError: invalid syntax
I'll appreciate your help
from ensembl-vep.
Hi @ntm (and others interested).
We are currently not certain in which version of Ensembl we will update to the latest gnomAD, but it's likely to be 113. We tend not to rush to incorporate the initial major release version (4.0 in this case) because gnomAD typically provide an updated minor version not that long after each major release.
If you want to start using the data before we incorporate it to the cache, you can, as described in this thread, use the --custom
option to use the latest data (with a bit of pre-parsing to get it in the right shape).
from ensembl-vep.
Thanks for the info @jamie-m-a , very helpful for planning ahead and deciding our course of action.
While we're talking variant frequencies, any progress on the integration of ALFA, maybe via a plugin as mentioned by @helensch ?
Thanks!
from ensembl-vep.
hi @ntm, were you able to run @davmlaw scripts? I am having issues running them. I'll appreciate you help. Please see my comment above. Thank you.
from ensembl-vep.
Hi @trust-odia - I've removed the outer brackets, which may fix the script (hard to know as it works on my machine)
You may also want to just download them from here:
https://variantgrid.com/download/annotation/VEP/annotation_data/GRCh38/gnomad4.0_GRCh38_combined_af.vcf.bgz
https://variantgrid.com/download/annotation/VEP/annotation_data/GRCh38/gnomad4.0_GRCh38_combined_af.vcf.bgz.tbi
from ensembl-vep.
You don't need to process the data if you just download the already processed/combined VCF in the comment above yours.
VEP should randomly seek inside the VCF so I don't think it should matter much how big the VCF is
If you want to make separate combined VCF files from exomes/genomes, download the individual per-chromosome files from the gnomAD site, then concatenate them, that shouldn't take much processing power either (leaving an old laptop on overnight)
from ensembl-vep.
from ensembl-vep.
from ensembl-vep.
Hi Dave,
I use the combined gnomad4 data that you sent me (Thank you for this), to run VEP, just only for the AF freq, but the gnomADg_AF column does not have new values.
Please see my command:
vep -i /project/dshared/projects/VEP/chr20_NF.vcf.split.gz --tab -everything --buffer_size 1000 --offline --fork 4 --dir_plugins /project/shared/projects/PMBB_VEP_Annotations --dir_cache /home/todia --custom file=/project/shared/projects/gnomad4/gnomad4.0_GRCh38_combined_af.vcf.bgz,short_name=gnomad4,format=vcf -o /project/shared/projects/VEP/vep_out/testAnnot.chr20.txt
I will appreciate your response.
Thank you
from ensembl-vep.
@trust-odia I think you need to add fields=AF
as an argument to custom - then it will add a column called gnomad4_AF
(ie custom short_name + field)
By default it just uses the VCF ID field, and calls the column the custom short name (ie gnomad4 in your case)
from ensembl-vep.
@davmlaw Thank you.
from ensembl-vep.
from ensembl-vep.
Hi, it's hard to know without looking at your files, but it could be due to chromosome mappings. I use the full contig names in the VCF, you may need to use --synonyms
VEP Option
from ensembl-vep.
Related Issues (20)
- Error with dbNSFP plugin HOT 6
- Inconsistent consequence annotation HOT 2
- Fail to install ensembl-release-111 on MacOS 14.2.1 (23C71) `fatal error: 'lzma.h' file not found` HOT 5
- WARNING: Ignoring non-supported 'five_prime_utr' feature_type from Homo_sapiens.GRCh38.111.gtf.bgzip.gz HOT 3
- Empty fields HGVSc and HGVSp of INFO - CSQ after VCF annotation via GTF and Fasta HOT 2
- Absence MANE and canonical annotation tags in output VCF. HOT 3
- VEP custom anotation not working with gnomad 4.0 file HOT 1
- ERROR: DBI module not found. VEP requires the DBI perl module to function HOT 2
- Annotating with GNOMAD custom vcf makes frequencies become STRING and unable to filter HOT 7
- VEP 111 HGVS C dot annotating dups as Insertions HOT 4
- filter_vep not correctly filtering CADD_PHRED scores HOT 10
- [Question] What is the definition of "coding_sequence_variant"? Why are frameshifts no coding sequence variant? HOT 4
- can not call method "seq" HOT 6
- Normalisation-based allele matching algorithm and custom file HOT 2
- False warning messages with vep 111 when using the range input format HOT 4
- All variants are intergenic with NCBI GFF HOT 6
- filter_vep output file larger than input file HOT 2
- WARNING: Chromosome 22 not found in annotation sources or synonyms on line 1 HOT 8
- VEP in Google Batch fails when more than 5 custom databases are passed HOT 6
- trouble finding cache file "MSG: ERROR: Cache directory /..." HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ensembl-vep.