Comments (2)
On the representation of alt contigs, I think we should develop a best practice before modifying the spec. What is the intended output from variant callers? Is it practical for callers to generate such output? How downstream tools are supposed to use the vcf?
Specifically, you proposed to add HT, but in my experience, alt contigs frequently recombine with each other, which makes the tag not applicable most of times. In addition, how are we supposed to use ALTLOCS? If we know a locus overlapping an alt contig, what can we do with it?
We will be clearer about the answers and then develop the right spec when more researchers get experiences on h38. Tools determine the adoption of alt contigs. It is not urgent to change the spec.
from hts-specs.
I think this is a bit of a chicken and egg problem. If we want variant callers to be able to use the Alt loci, we need to be able to express the variants in VCF. This doesn't work well with the current spec (see how dbSNP distributes data).
I think the issue is, there are multiple ways to use VCF- it is just a reporting tool. dbSNP uses it to dump data- so you want to report all genomic contexts for a given SNV. An argument could be made that in the context of an individual genome, you may only want to report one context for a SNP- but how do you handle that when you have multiple samples in the VCF? I fear that decision making will be hard.
I agree the trying to define some best practices is useful.
To attempt to address some specific issues:
- knowing an alt-locus is allelic with a region on the chromosome let's you put the alt-locus in chromsome context which is useful for reporting. Granted, this could be done as some sort of post-processing step, but then you'd have to convert the data to some other format (which may be OK, but likely inconvenient for everyone who wants VCF).
- I agree there are few loci with named haplotypes, but for the ones that exist, this is useful information.
This is really meant to start the discussion about how we want to represent variation on GRCh38. It will be good to have some concrete examples.
from hts-specs.
Related Issues (20)
- vcf: Handling structured header records with missing IDs in VCF 4.1/4.2 HOT 1
- bcf: First phasing indicators not set in genotype (GT) value examples
- CSI file is BGZF compressed but this is not mentioned in the CSV1 spec HOT 2
- Questions about third-party use of test data HOT 6
- VCF Draft 4.5 and Modified Bases HOT 29
- VCF4.4 SVLEN requirement across different variant representations HOT 6
- refget: v2 spec for Range header errors does not align with typical usage
- vcf: Invalid unstructured header line in VCF 4.3 example `complexfile_passed_000.vcf` HOT 2
- VCF format: correct representation of complex indels and MNPs HOT 5
- Number, type and description on FORMAT not in sync with List of changes
- FORMAT field CICN and its relation to CN field
- BED "chrom" field regex is inconsistent with existing practice HOT 5
- Using ChaCha20-Poly1305 for encryption is NOT FIPS140-3 compliant and not justified and is considered unprotected plaintext HOT 3
- vcf: VCF 4.5 gVCF example has off-by-one errors
- INFO CIPOS Number=2xA
- INFO/END should not be deprecated HOT 14
- MM tag preferred format for TAPS data HOT 1
- 0-based coordinate system error in sam spec? HOT 2
- [Improvement] BAM file format, optional strandedness field in header for RNA-seq HOT 8
- Clarify SVCLAIM for deletion bridges HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hts-specs.