Giter Club home page Giter Club logo

Comments (5)

nakib103 avatar nakib103 commented on September 17, 2024

Hello @GSYongWu,

Thanks for your query and sorry for the late reply.

From the HGVS notation you provided, I can infer that you are using GRCh37 assembly with refseq cache. The RefSeq transcript do not necessarily always match the reference assembly. For that reason when we provide VEP annotation we need alignment information and RefSeq (an external source to Ensembl) provide it for us. See here -
https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#refseq_bam

In e112 we have updated our cache with the new alignment file from NCBI, that is why you are seeing this change.

Best regards,
Nakib

from ensembl-vep.

GSYongWu avatar GSYongWu commented on September 17, 2024

However, the coordinates provided by VEP e112 do not match those in the UCSC Genome Browser, nor can they be aligned with literature and other databases. Is this appropriate?

from ensembl-vep.

nakib103 avatar nakib103 commented on September 17, 2024

Hi @GSYongWu,

I am just replying here to say that, the HGVS output seems dodgy to me (mainly the HGVSp) too. I am looking at the alignment between Ensembl and RefSeq and will get back to you soon hopefully.

from ensembl-vep.

nakib103 avatar nakib103 commented on September 17, 2024

Hi @GSYongWu,

Sorry for late reply. I have looked into the issue.

First of all, RefSeq transcripts can differ in sequence to the reference genome to which they map; this is because the transcript models are built from primary sequence data and not the reference genome.

Here, when the NM_015326.5 transcript is mapped to GRCh37 assembly it gets 5' UTR and some coding region truncated -
https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr1%3A206516178%2D206516278&hgsid=2313969114_3KKuutXr6hD7XyWLPdDzR1IZDUJK

The difference in e111 and e112 comes from the new realignment file from NCBI which now have the 909S sequence in the CIGAR string which add back the truncated sequence (83 + 909 = 982).

But, anyway, in such cases where the RefSeq transcript does not match the reference sequence, consequence calling using VEP would not be reliable. If you are using the GRCh38 assembly you should be getting better result.

Hope that answers the question.

Best regards,
Nakib

from ensembl-vep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.