Giter Club home page Giter Club logo

Comments (11)

nuno-agostinho avatar nuno-agostinho commented on September 17, 2024 1

Hey @dennishendriksen,

Just to update you: I opened PR Ensembl/ensembl-variation#1095 to fix allele numbers for breakends. This will be available in the next version of VEP.

Thanks again for reporting this issue!

Cheers,
Nuno

from ensembl-vep.

nuno-agostinho avatar nuno-agostinho commented on September 17, 2024

Hi @dennishendriksen,

The results you are obtaining for that breakpoint variant seem incorrect.

In VEP 111, we represented the alternative allele of the breakpoint (in your case, [1:109650635[GG) to indicate all potential consequences. However, this is confusing if a breakpoint is composed by two or more chromosomal breakends.

As such, in VEP 112, we now separate the consequences of a breakpoint variant for each breakend:

  • [1:109650635[GG: consequences for the breakend located in chr1:109650635
  • .G: consequences for the original breakend in position chr22:29767384 (represented as detailed in the VCF 4.4 standard, chapter 5.4.9: Single breakends)

To answer your questions:

Q1: Is this intended? I would expect this field to always contain a ALT allele index.

Unfortunately, it seems that VEP 112 is returning nothing for the allele number for breakpoint variants. I am going to check how to fix it.

Q2: (...) Could you explain what the dot in the new output means?

The representation depicts a single breakend and its orientation:

  • 2 321681 bndW G G.: breakend occurring at position 321682 with at least position 321681 (and maybe 321680, 321679, etc.) attached
  • 13 123457 bndX A .A: breakend occurring at position 123456 with at least position 123457 (and maybe 123458, 123459, etc.) attached

More information at VCF 4.4 standard, chapter 5.4.9: Single breakends.

Q3: A last observation is that the number of consequences went down from 10 to 7. Could you explain this difference?

I'll also check if the changes are expected or not.

Thanks for reporting this issue! I'll report back as soon as possible.

Best regards,
Nuno

from ensembl-vep.

nuno-agostinho avatar nuno-agostinho commented on September 17, 2024

Hey @dennishendriksen,

The bug fix to the allele number in breakpoint variants has now been merged to the code in the next version of VEP (VEP 113).

I will close this issue but feel free to open a new one if you find further issues or have any suggestions.

Cheers,
Nuno

from ensembl-vep.

dennishendriksen avatar dennishendriksen commented on September 17, 2024

Hi @nuno-agostinho,

Thank you for this fix!

Q3: A last observation is that the number of consequences went down from 10 to 7. Could you explain this difference?

I'll also check if the changes are expected or not.

Did you get around to checking this?

Greetings,
@dennishendriksen

from ensembl-vep.

nuno-agostinho avatar nuno-agostinho commented on September 17, 2024

Hi @dennishendriksen,

Sorry for closing the issue prematurely.

I was not able to replicate your results. Could you please send me the VEP command that you run to get those results?

Thanks,
Nuno

from ensembl-vep.

dennishendriksen avatar dennishendriksen commented on September 17, 2024

Hi @nuno-agostinho,

From the previously attached vcf:

vep --allele_number --allow_non_variant --assembly GRCh38 --buffer_size 1000 --cache --compress_output bgzip --custom [PATH]/hg38.phyloP100way.bw,phyloP,bigwig,exact,0 --database 0 --dir_cache [PATH]/cache --dir_plugins [PATH]/plugins --dont_skip --exclude_predicted --fasta [PATH]/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz --flag_pick_allele --fork 4 --format vcf --hgvs --input_file GRCh37_normalized.vcf.gz --no_stats --numbers --offline --output_file GRCh37_annotated.vcf.gz --plugin Grantham --plugin SpliceAI,snv=[PATH]/spliceai_scores.masked.snv.hg38.vcf.gz,indel=[PATH]/spliceai_scores.masked.indel.hg38.vcf.gz --plugin Capice,GRCh37_capice_output.tsv.gz --plugin UTRannotator,[PATH]/uORF_5UTR_PUBLIC.txt --plugin Inheritance,[PATH]/inheritance_20240115.tsv --plugin VKGL,[PATH]/vkgl_consensus_20240401.tsv,1 --plugin gnomAD,[PATH]/gnomad.total.v4.1.sites.stripped.tsv.gz --plugin ClinVar,[PATH]/clinvar_20240603_stripped.tsv.gz --plugin AnnotSV,GRCh37_normalized.vcf.gz.tsv,AnnotSV_ranking_score;AnnotSV_ranking_criteria;ACMG_class --plugin AlphScore,[PATH]/AlphScore_final_20230825_stripped_GRCh38.tsv.gz --plugin ncER,[PATH]/GRCh38_ncER_perc.bed.gz --plugin FATHMM_MKL_NC,[PATH]/GRCh38_FATHMM-MKL_NC.tsv.gz --plugin ReMM,[PATH]/GRCh38_ReMM.tsv.gz --polyphen s --pubmed --refseq --safe --shift_3prime --sift s --symbol --total_length --use_given_ref --vcf

Greetings,
@dennishendriksen

from ensembl-vep.

nuno-agostinho avatar nuno-agostinho commented on September 17, 2024

Hey @dennishendriksen,

I am confused by your command, as you are mixing GRCh37 and GRCh38 data.

For GRCh38, the alternative breakend [1:109650635[G should only return an intergenic variant1, whereas there are Transcript consequences if you use --assembly GRCh37.

Could you check if the results make sense for you when using GRCh37 throughout the VEP command?

Thanks,
Nuno

Footnotes

  1. However, the results only show results for the reference breakend (.G). This is a bug, it should also show intergenic variants if there are no other consequences. I will try to fix this.

from ensembl-vep.

dennishendriksen avatar dennishendriksen commented on September 17, 2024

Hi @nuno-agostinho,

Apologies for the confusing filename, this is an artifact after liftover from GRCh37 to GRCh38. Both file content and command should be GRCh38. I'm not an expert on breakend notations, could it be that you missed the final G in G>[1:109650635[GG?

Greetings,
@dennishendriksen

from ensembl-vep.

nuno-agostinho avatar nuno-agostinho commented on September 17, 2024

Hi @dennishendriksen,

could it be that you missed the final G in G>[1:109650635[GG?

Currently, the alternative sequence of a breakend is ignored by VEP. We intend to improve this in the future.

Upon further inspection, the difference may be related with updates to the Ensembl database. For instance, one of the consequences for the breakend [1:109650635[GG in GRCh38 is associated with regulatory feature ENSR00001170488, which is not available in the current version of Ensembl.

If you want the same results as in VEP 111, you can download the previous VEP cache from http://ftp.ensembl.org/pub/release-111/variation/vep and then run VEP with option --db_version 111. However, I would suggest to simply use the most up-to-date version of VEP cache when possible.

Hope this makes it clearer, but tell me if you want to discuss this further. Thanks!

Cheers,
Nuno

from ensembl-vep.

dennishendriksen avatar dennishendriksen commented on September 17, 2024

Hi @nuno-agostinho,

Good to know that it is a change in database content (I had not thought on running VEP v112 with the 111 database). Case closed, thank you for your effort and time, greatly appreciated.

Cheers,
@dennishendriksen

from ensembl-vep.

nuno-agostinho avatar nuno-agostinho commented on September 17, 2024

HI @dennishendriksen,

We are always here to help! Glad you reported the issue so that we could improve VEP.

Have a great day! 😄

Cheers,
Nuno

from ensembl-vep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.