Comments (7)
I think I see what is happening; htslib defines these:
#define bcf_int32_missing (-2147483647-1) /* INT32_MIN */
#define bcf_int32_vector_end (-2147483647) /* INT32_MIN + 1 */
so that jives with what you see. you could write a helper function like:
proc reshape*[T](values: seq[T], n_samples:int): seq[seq[T]] =
that uses vector_end,
which would return a something like @[[missing], [missing], ..., [255, 255, 0], ..., [255,151,0] ... ]
if it were general enough, I'd accept a PR.
from hts-nim.
does this resolve your issue?
from hts-nim.
Yes thanks, that makes sense. Feel free to close.
With regard to a more general solution, I will have to have a think about it as I imagine how missing values should be handled will be project specific. Given there is a similar challenge with missing values from each of the INFO and FORMAT types, would implementing something like an is_missing proc for each of the relevant types be the way to go?
I am happy to have a go and see if I can figure out how to do something like this this if you think it is worth exploring.
Thanks for your help.
from hts-nim.
yes, I guess:
proc is_missing[T:int32|int8|int16|int64](v:T): bool {.inline.} =
v == T.low
could work. then similar for is_vector_end
. but you're right, not sure what beyond this. I'll think on it.
from hts-nim.
from hts-nim.
something like this should be pretty close:
import hts/vcf
import strutils
proc is_missing[T:int32|int8|int16|int64](v:T): bool {.inline.} =
v == T.low
proc is_vector_end[T:int32|int8|int16|int64](v:T): bool {.inline.} =
v == T.low + 1
proc show*[T:int32|int8|int16|int64](reshaped:seq[seq[T]]): string =
result = newStringOfCap(255)
result.add('[')
for i in 0..<reshaped.len:
result.add('[')
for j in 0..<reshaped[i].len:
if reshaped[i][j].is_missing:
result.add('.'):
else:
result.add(reshaped[i][j])
if reshaped[i][j].is_vector_end: break
if j < reshaped[i].high: result.add(',')
result.add(']')
if i < reshaped.high:
result.add(",\n")
result.add(']')
proc reshape*[T](values: seq[T], n_samples:int): seq[seq[T]] =
result = newSeq[seq[T]](n_samples)
let n_per = int(values.len / n_samples)
for i in 0..<n_samples:
var off = i * n_per
for j in off..<off+n_per:
if values[j].is_vector_end: break
result[i].add(values[j])
when isMainModule:
var v:VCF
doAssert(open(v, "missing_value.vcf"))
var pls = new_seq[int32](0)
for rec in v:
doAssert rec.format.get("PL", pls) == Status.OK
var r = pls.reshape(v.n_samples)
echo r
echo "show:"
echo r.show
from hts-nim.
from hts-nim.
Related Issues (20)
- Issue with vcf.nim when running nimble test or importing HOT 3
- VCF: Clear all INFO field HOT 2
- Multiple iterators with items*(v:VCF) HOT 3
- hts-1.10 assertion failed HOT 3
- bai index HOT 8
- Write to bam/cram instead of sam HOT 3
- Version bump HOT 3
- Static binary not working from AMD nodes HOT 6
- Failure to compile when using new Nim Gas (ORC and ARC) HOT 2
- No public facing method for closing Fai file HOT 3
- Version tag in 0.3.13 is wrong HOT 1
- Static build with libRmath-nim library causing undefined reference HOT 3
- Error: unhandled exception: invalid bgzf file [ValueError]
- Get list of INFO and FORMAT keys from Header HOT 1
- Is there any way to get the variant's row index when using the iterator query() HOT 2
- Modifying GT from ivcf and write to ovcf for multiple pairs of inputs and outputs HOT 6
- Issues with nimble test HOT 6
- Trouble with newRecord "undeclared identifier" in bam HOT 7
- Question about multiple nalts for single record HOT 4
- Question: changing BAM alignment records in place HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hts-nim.