Comments (2)
I asked for some clarity from the VCF spec on what .
means so that we can possibly be less pedantic.
from hail.
I have an RFC proposal to just handle the ambiguity: https://github.com/hail-is/hail-rfcs/blob/main/rfc/0008-handle-vcf-array-field-ambiguity
I proposed a PR to fix this: #13465 However, I missed a key issue: many VCF's elide fields to indicate missingness. That is not ambiguous: a field that is entirely elided is clearly missing, not an array of one missing value. You can't do this in a FORMAT (aka entry aka genotype) field, but you can do this in an INFO field a la:
##fileformat=VCFv4.2
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=NUMS,Number=*,Type=Float,Description="some numbers">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ...
... AC=1,1;AN=1 ...
the NUMS
field should be read as missing. My PR considered it unacceptably ambiguous because it thought it had been NUMS=.
.
I don't think we can fix this problem entirely from Python. We need to use Scala-side logic because after we parse in Scala, we lose the knowledge that a field was entirely elided versus a single missing dot.
from hail.
Related Issues (20)
- Invalid maximum heap size: -Xmx0m
- [docs] Query-on-Batch desperately needs its own tutorial
- [query/vds] Actually use `ref_block_max_length` in `to_dense_mt`
- [batch] Properly expose and document "job-private instances"
- [batch] Batch charges for private instance creation that fails with exhausted resource errors.
- [query] global field name clash in GroupedTable
- VEP is being incorrectly initialised in australia-southeast1 region HOT 4
- [query] add string find() function
- [batch] Azure storage requirements beyond tempdisk for standing worker result in NotImplementedError HOT 1
- [query] filter intervals causing a failed partitioner assertion
- from_pandas is super low for a pd.DataFrame with shape 35000*67 HOT 1
- RuntimeException: IR is.hail.expr.ir.StreamFlatMap of type stream<struct{oldContext: str, nRows: int64, nCols: int64}> is not realizable
- [hailctl] QoB job specs should always use the git revision and never jar_url
- Machine Memory Calculations in Hail Batch HOT 1
- [query] Failures to communicate with the spark/local backend result in cryptic error message
- MakeNDArray OOM on stream data HOT 1
- VDS reference data needs to have ploidy information
- Initializing hail with `backend='batch'` fails with `TypeError: Cannot instantiate typing.Literal`. HOT 1
- [qob] Batch backend fails to deserialize IR larger than 20MB
- [query] Automatically break up big spark jobs HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hail.