Comments (10)
I’ve just dug up an old discussion on this:
http://sourceforge.net/p/vcftools/mailman/message/27623814/
Perhaps we should incorporate Richard’s suggestion that these should be converted into N’s into the spec so that how to handle this is clear across all implementations?
from hts-specs.
I think the concerns from the previous discussion remain valid. I agree the spec should highlight this situation.
from hts-specs.
I really don't like adding IUPAC ambiguity codes to the REF.
I'm fine with pretty much any other solution.
from hts-specs.
For those who are not subscribed to the vcftools-spec
mailing list, here are some of the relevant emails:
http://sourceforge.net/p/vcftools/mailman/message/34096206/
http://sourceforge.net/p/vcftools/mailman/message/34102529/
http://sourceforge.net/p/vcftools/mailman/message/34101323/
The options discussed offline and on the mailing list were these:
How to treat ambiguity codes in the reference sequence
- allow IUPAC codes in REF
- replace with N
- replace with one of the bases, for example, R=A/G becomes A
- both 1 or 2 are possible
How to treat abiguity codes in ALT
- allow IUPAC codes in ALT
- use symbolic alleles, such as
<R>
,<S>
, etc
Can people +1/-1 their preferred options?
from hts-specs.
I like options 3 (for REF) and 2 (for ALT).
from hts-specs.
I think allow ambiguity in the REF is asking for trouble.
I vote 3 for REF.
For ALT, I have a slight preference for 2, but don't feel very strongly
about it.
On Thu, May 21, 2015 at 9:04 AM, Eric Banks [email protected]
wrote:
I like options 3 (for REF) and 2 (for ALT).
—
Reply to this email directly or view it on GitHub
#54 (comment).
Adam Auton
Assistant Professor,
Department of Genetics,
Albert Einstein College of Medicine,
1301 Morris Park Avenue,
Price Center, Room 353B,
Bronx, New York 10461
Tel: +1 (718) 678 1150
from hts-specs.
+1 to option 2: replace with N
from hts-specs.
I find point 1.4.1.4 a bit confusing from a tool developer perspective. When it states that IUPAC codes in the REF could be "reduced", does that mean that tools should still accept them and run the transformation themselves? Otherwise, being the file authors responsibility, it could be described like:
IUPAC ambiguity codes in the reference sequence must be reduced to a concrete base by using
the one that is first alphabetically (thus R=A/G as a reference base is converted to A in VCF.)
from hts-specs.
@cyenyxe You are right, "must be reduced" is what I meant. Thank you.
from hts-specs.
I've just run into this issue myself. My assembled genome of short reads has 9 sites with ambiguity codes. When I align the reads to the reference, it's clear what the true base should be. When I call variants using bcftools call -c --ploidy=1 -Oz
these variants sites are not output by bcftools
.
I'm okay with either the convert to N
or convert to the first alphabetical nucleotide of the ambiguity code. My preference is to convert to N
.
from hts-specs.
Related Issues (20)
- VCF: "Genotype fields" vs "FORMAT" and per-sample HOT 1
- primary, secondary, and supplementary alignments with optional MM tags HOT 5
- Modified base single letter codes update HOT 7
- test/sam: Duplicate aux field tags
- test/vcf: Duplicate contig header record ID
- FAIRsharing Record Query - BED format
- CRAM: Need to improve feature positions description HOT 1
- is `*` better than `\*`? HOT 1
- cram: interpretation of "unmapped" flag in a pseudocode seems incorrect HOT 1
- SVCLAIM: VCF4.4 and backward compatibility with VCF4.3 HOT 1
- How to retrieve the primary alignment for secondary and supplementary reads HOT 7
- Is there a semantic difference between GT=./. and GT=0/0 + GQ=0 ? HOT 15
- cram: Inconsistent descriptions of auxiliary tag types HOT 1
- SA tag CIGAR format
- vcf: Handling structured header records with missing IDs in VCF 4.1/4.2 HOT 1
- bcf: First phasing indicators not set in genotype (GT) value examples
- CSI file is BGZF compressed but this is not mentioned in the CSV1 spec HOT 2
- Questions about third-party use of test data HOT 6
- VCF Draft 4.5 and Modified Bases HOT 27
- VCF4.4 SVLEN requirement across different variant representations HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hts-specs.