Giter Club home page Giter Club logo

Comments (7)

brentp avatar brentp commented on June 3, 2024

I think I see what is happening; htslib defines these:

#define bcf_int32_missing    (-2147483647-1) /* INT32_MIN */

#define bcf_int32_vector_end (-2147483647)  /* INT32_MIN + 1 */

so that jives with what you see. you could write a helper function like:

proc reshape*[T](values: seq[T], n_samples:int): seq[seq[T]] =

that uses vector_end,

which would return a something like @[[missing], [missing], ..., [255, 255, 0], ..., [255,151,0] ... ]

if it were general enough, I'd accept a PR.

from hts-nim.

brentp avatar brentp commented on June 3, 2024

does this resolve your issue?

from hts-nim.

cassimons avatar cassimons commented on June 3, 2024

Yes thanks, that makes sense. Feel free to close.

With regard to a more general solution, I will have to have a think about it as I imagine how missing values should be handled will be project specific. Given there is a similar challenge with missing values from each of the INFO and FORMAT types, would implementing something like an is_missing proc for each of the relevant types be the way to go?

I am happy to have a go and see if I can figure out how to do something like this this if you think it is worth exploring.

Thanks for your help.

from hts-nim.

brentp avatar brentp commented on June 3, 2024

yes, I guess:

proc is_missing[T:int32|int8|int16|int64](v:T): bool {.inline.} =
    v == T.low

could work. then similar for is_vector_end. but you're right, not sure what beyond this. I'll think on it.

from hts-nim.

cassimons avatar cassimons commented on June 3, 2024

from hts-nim.

brentp avatar brentp commented on June 3, 2024

something like this should be pretty close:

import hts/vcf
import strutils


proc is_missing[T:int32|int8|int16|int64](v:T): bool {.inline.} =
    v == T.low

proc is_vector_end[T:int32|int8|int16|int64](v:T): bool {.inline.} =
    v == T.low + 1


proc show*[T:int32|int8|int16|int64](reshaped:seq[seq[T]]): string =
  result = newStringOfCap(255)
  result.add('[')
  for i in 0..<reshaped.len:
    result.add('[')
    for j in 0..<reshaped[i].len:
      if reshaped[i][j].is_missing:
        result.add('.'):
      else:
        result.add(reshaped[i][j])
      if reshaped[i][j].is_vector_end: break
      if j < reshaped[i].high: result.add(',')
    result.add(']')
    if i < reshaped.high:
      result.add(",\n")

  result.add(']')


proc reshape*[T](values: seq[T], n_samples:int): seq[seq[T]] =
  result = newSeq[seq[T]](n_samples)
  let n_per = int(values.len / n_samples)
  for i in 0..<n_samples:
    var off = i * n_per
    for j in off..<off+n_per:
      if values[j].is_vector_end: break
      result[i].add(values[j])

when isMainModule:

  var v:VCF
  doAssert(open(v, "missing_value.vcf"))

  var pls = new_seq[int32](0)

  for rec in v:
    doAssert rec.format.get("PL", pls) == Status.OK
    var r = pls.reshape(v.n_samples)
    echo r
    echo "show:"
    echo r.show

from hts-nim.

cassimons avatar cassimons commented on June 3, 2024

from hts-nim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.