brentp / hts-nim Goto Github PK
View Code? Open in Web Editor NEWnim wrapper for htslib for parsing genomics data files
Home Page: https://brentp.github.io/hts-nim/
License: MIT License
nim wrapper for htslib for parsing genomics data files
Home Page: https://brentp.github.io/hts-nim/
License: MIT License
Hi Brent,
I have been looking at the source code of hts-nim
and hts-lib
libraries and trying to find whether such information is recorded (Haven't got a solution).
To be clearer, for example, I have the following code :
for rec in ivcf.query(chrom):
# how to get the row index for this rec (row index correspond to the ith row in the supplied VCF file
Best regards,
Ruqian
Hi Brent,
Could we also have a set_tag
or append_tag()
function?
Thanks,
Ruqian
version 0.3.9 was tagged but it still says 0.3.8 in the nimble file :)
Hi Brent,
I'm trying to use hts nim to read a VCF and then write out a new VCF.
I was just wondering if there's a better way to clear the entire INFO field for a variant other than iterating through each field and calling v.info.delete(keyName) on all keys that you know of? (feel like this could get pretty expensive for large vcf files like gnomad)
For context, I'd like to clear the INFO field and then add my own stuff to it and write out the VCF.
Cheers!
It looks like Op
stores both the CigarOp
and length, but why not rename Op
to CigarElem
or CigarTuple
. Op
seems too general, and doesn't imply length. Thoughts?
Hi Brent,
Is there a way to get the list of keys for INFO and FORMAT definitions from the header of a VCF using hts-nim?
The idea I'm working on is to get all the INFO and FORMAT keys defined in the header from 2 VCFs so I can compute the intersection and output a new VCF containing only the shared fields for both.
Thanks!
I'm trying to use the new version but it seems that Nimble won't install it. From looking closer, the version tagged 0.3.13's hts.nimble file has the version as 0.3.12:
Line 3 in 3c4ce10
~/D/P/R/test ❯❯❯ nimble install [email protected]
Reading official package list
Downloading https://github.com/brentp/hts-nim using git
Cloning latest tagged version: v0.3.13
Error: Downloaded package's version does not satisfy requested version range: wanted 0.3.13 got 0.3.12.
htslib 1.10 is coming and it breaks ABI compat.
I will likely tag a last hts-nim release that supports 1.9 and then just have master support htslib 1.10 about 1 month after it is release.
meanwhile (once I have time), the dev branch will start on 1.10 support.
I have random failures with the faitest
part of test/all
. (When I run it repeatedly, sometimes is passes, sometimes it fails.) Same with vcftest
, though less often.
#0 0x0000000000405ad7 in rawAlloc_yn9c8RLaS8vgVBeMBfmkdUg (a=0x6ad600 <gch_IcYaEuuWivYAS86vFMTS3Q+96>, requestedSize=65)
at /mnt/software/n/nim/0.17.2/lib/system/alloc.nim:561
#1 0x000000000040bc5c in rawNewObj_QBPo5VW2O56Eeh8hPbQ7Pg (typ=0x6afa00 <strDesc_D0UzA4zsDu5tgpJQ9a9clXPg>, size=49,
gch=0x6ad5a0 <gch_IcYaEuuWivYAS86vFMTS3Q>) at /mnt/software/n/nim/0.17.2/lib/system/gc.nim:466
#2 0x0000000000407199 in newObj (typ=0x6afa00 <strDesc_D0UzA4zsDu5tgpJQ9a9clXPg>, size=49)
at /mnt/software/n/nim/0.17.2/lib/system/gc.nim:495
#3 0x0000000000407f76 in rawNewString (space=32) at /mnt/software/n/nim/0.17.2/lib/system/sysstr.nim:66
#4 0x000000000040826c in nimIntToStr (x=32) at /mnt/software/n/nim/0.17.2/lib/system/sysstr.nim:293
#5 0x00000000004703a4 in setForegroundColor_Tw7yxMupneOmAicqiYt3NA (f=0x2aaaab79a400 <_IO_2_1_stdout_>, fg=32 ' ',
bright=0 '\000') at /mnt/software/n/nim/0.17.2/lib/pure/terminal.nim:531
#6 0x000000000044eddf in testEnded_AsoOYSa1sGgPTBsUoT9cxsw (formatter=0x2aaaaaafc048,
testResult=0x6be150 <testResult_YxIndn6uRV6zIf6okXYQIw_4>) at /mnt/software/n/nim/0.17.2/lib/pure/terminal.nim:620
#7 0x0000000000401e54 in testEnded_azSnnJ3hAiHHuUdLzViJlw (formatter=0x2aaaaaafc048,
testResult=0x6be150 <testResult_YxIndn6uRV6zIf6okXYQIw_4>) at /mnt/software/n/nim/0.17.2/lib/pure/unittest.nim:137
#8 0x00000000004509c1 in testEnded_xoIS1BdhYWQ6hhMP22XXiw (testResult=0x6be150 <testResult_YxIndn6uRV6zIf6okXYQIw_4>)
at /mnt/software/n/nim/0.17.2/lib/pure/unittest.nim:328
#9 0x000000000043c142 in hts_bgzftestInit000 () at /mnt/software/n/nim/0.17.2/lib/pure/unittest.nim:425
#10 0x0000000000402130 in PreMainInner () at /mnt/software/n/nim/0.17.2/lib/system.nim:2861
#11 0x0000000000402168 in PreMain () at /mnt/software/n/nim/0.17.2/lib/system.nim:2872
#12 0x0000000000402182 in NimMain () at /mnt/software/n/nim/0.17.2/lib/system.nim:2884
#13 0x00000000004021d5 in main (argc=1, args=0x7fffffffe068, env=0x7fffffffe078)
at /mnt/software/n/nim/0.17.2/lib/system.nim:2894
I haven't seen it fail on its own. I have to include at least a couple tests before it:
import cigartest, bgzftest, faitest
[Suite] flag cigar-suite
[OK] test op
[OK] ref coverage
[Suite] bgzf-suite
[OK] test-write
[OK] test-read
[OK] write-with-index
Segmentation fault
I use nim-0.17.2 and htslib-1.6. I linked statically with htslib.
You already have a long name for base_qualities
. I also don't see in the API (https://brentp.github.io/hts-nim/hts/bam.html) how to retrieve the bases or base qualities.
Hello.
When I ran mosdepth
on our bam file I've got error which origin is from hts-nim
- that's why I've created issue here.
Here is my error:
root@b79bb970b062:/outputs# mosdepth -t 20 . /inputs/330003740807_S10.bam
bam.nim(390) open
Error: unhandled exception: invalid bgzf file [ValueError]
And here is output of samtools view
:
root@b79bb970b062:/outputs# samtools view -H /inputs/330003740807_S10.bam | head
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
@HD VN:1.4 SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
And picard's ValidateSamFiles
root@0ea62a6ac57e:/outputs# java -jar $PICARD ValidateSamFile -I /inputs/330003740807_S10.bam -R /ref/GRCh38.d1.vd1/GRCh38.d1.vd1.fa -QUIET true
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp
10:36:39.816 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/soft/picard-2.25.0.jar!/com/intel/gkl/native/libgkl_compression.so
ERROR::TRUNCATED_FILE:Read name /inputs/330003740807_S10.bam, BAM file has defective last gzip block
...
I'll try to recreate this bam, but our bioinformatics pipeline worked fine with this bam (various variant callers and HLA-typers) and samtools
throws warning - not error. Maybe hts-nim
should throw warning too? As you can see - the file is totally readable.
I really enjoy this package and it has brought me to use Nim, so thanks for doing this. Let's say I have a task of wanting to loop over every record in a fasta (with fai) and process each contig to get some metrics. First, I think there should be some more exposure of htslib fai stuff:
proc faidx_iseq*(fai: ptr faidx_t; i: cint): cstring {.cdecl,
importc: "faidx_iseq", dynlib: "libhts.so".}
proc faidx_seq_len*(fai: ptr faidx_t; seq: cstring): cint {.cdecl,
importc: "faidx_seq_len", dynlib: "libhts.so".}
Now I can do something like:
for x in 0..<len(fai):
res = faidx_iseq(fai.cptr, cint(x))
refLen = faidx_seq_len(fai.cptr, res).int
refBases = toUpper(fai.get($res))
lastWindowStart = refLen - windowSize
var state = new CalculateGcState
state.init = true
for i in 1..<lastWindowStart:
var windowEnd = i + windowSize
var gcBin = calculateGc(refBases, i, windowEnd, state)
if gcBin != -1:
result[gcBin] += 1
echo result
free(res)
(ok I've use Nim for less than a month so don't judge me here). Since I don't know what contigs are in the fasta, exposing the faidx_iseq
helps get the string name of the contig so I can use your get
function and then secondarily use the faidx_seq_len
to get its length.
Now I have found that the fai.get()
is extremely slow... I mean even the python htslib is faster for FastaFile.fetch()
... is there something I can do better here? Would it be better for me to do a lot of fetches instead? Yes, they are large contigs, but still it seems crazy slow for the fetch.
It doesn't look as it the new GC mechanisms are support yet. This is a bummer since I'm getting a pretty significant speed boost when using them in my code but can't use it anymore if I want to use hts-nim.
~/D/P/R/viroid-search ❯❯❯ cat src/test_htslib.nim (viroid-search)
import hts
~/D/P/R/viroid-search ❯❯❯ nim c -d:danger src/test_htslib.nim (viroid-search)
Hint: used config file '/usr/local/Cellar/nim/1.4.0/nim/config/nim.cfg' [Conf]
Hint: used config file '/usr/local/Cellar/nim/1.4.0/nim/config/config.nims' [Conf]
.........................
/Users/BenjaminLee/Desktop/Python/Research/viroid-search/src/test_htslib.nim(1, 8) Warning: imported and not used: 'hts' [UnusedImport]
CC: stdlib_assertions.nim
CC: stdlib_system.nim
CC: ../../../../../.nimble/pkgs/hts-0.3.12/hts/private/hts_concat.nim
CC: stdlib_parseutils.nim
CC: stdlib_strutils.nim
CC: ../../../../../.nimble/pkgs/hts-0.3.12/hts/vcf.nim
CC: ../../../../../.nimble/pkgs/hts-0.3.12/hts.nim
CC: test_htslib.nim
Hint: [Link]
Hint: 47780 lines; 1.759s; 49.484MiB peakmem; Dangerous Release build; proj: /Users/BenjaminLee/Desktop/Python/Research/viroid-search/src/test_htslib.nim; out: /Users/BenjaminLee/Desktop/Python/Research/viroid-search/src/test_htslib [SuccessX]
~/D/P/R/viroid-search ❯❯❯ nim c -d:danger --gc:orc src/test_htslib.nim (viroid-search)
Hint: used config file '/usr/local/Cellar/nim/1.4.0/nim/config/nim.cfg' [Conf]
Hint: used config file '/usr/local/Cellar/nim/1.4.0/nim/config/config.nims' [Conf]
....................
/Users/BenjaminLee/.nimble/pkgs/hts-0.3.12/hts/vcf.nim(409, 8) Error: cannot bind another '=destroy' to: Variant:ObjectType; previous declaration was constructed here implicitly: /Users/BenjaminLee/.nimble/pkgs/hts-0.3.12/hts/vcf.nim(180, 16)
~/D/P/R/viroid-search ❯❯❯
In addition to reverse
, how about forward
?
In addition to qcfail
, how about pf
?
I really like what you have started in hts-nim, and I have found having the logically opposite getter (reverse vs. forward, paired vs fragment, mate_unmapped vs mate_mapped) to really powerful. See the scala BAM API in fgbio for some ideas to make it much nicer than htsjdk or the htslib C-API.
Hint: bgzi [Processing]
Hint: csi [Processing]
Hint: stats [Processing]
CC: stdlib_assertions.nim
CC: stdlib_io.nim
CC: stdlib_system.nim
CC: hts/private/hts_concat.nim
CC: stdlib_parseutils.nim
CC: stdlib_strutils.nim
CC: hts/vcf.nim
CC: hts.nim
Hint: [Link]
Hint: LOC: 47556; sec: 2.215; peakmem: 57.086MiB; build: Debug; proj: d:\a\1\s\pkgstemp\hts\src\hts.nim; out: d:\a\1\s\pkgstemp\hts\htss [SuccessX]
Hint: d:\a\1\s\pkgstemp\hts\htss [Exec]
could not load: libhts.dll
Error: execution of an external program failed: 'd:\a\1\s\pkgstemp\hts\htss '
PASS: https://github.com/johnnovak/illwill C (27.63287544 secs)
PASS: https://github.com/AndreiRegiani/INim C ( 5.49948382 secs)
PASS: https://github.com/narimiran/itertools C ( 4.59094095 secs)
PASS: https://github.com/def-/iterutils C ( 6.16629267 secs)
PASS: https://github.com/LemonBoy/jstin C ( 6.41412973 secs)
PASS: https://github.com/pragmagic/karax C (17.10266876 secs)
Dear @brentp
I'm trying to get the query length from the CIGAR string, which is in htslib as int bam_cigar2qlen(int n_cigar, const uint32_t *cigar)
, and is exported from hts/private/hts_concat
, but the fields of Cigar
aren't exported. Could you add the following (or something similar) to cigar.nim
(seems too tiny for a PR).
proc qlen*(c: Cigar): int {. inline .} =
result = int(bam_cigar2qlen(c.n.cint, cast[ptr uint32](c.cig)))
On trying to run the first example given in the README, the code compilation fails with following message:
$ nim c test.nim
Hint: used config file '/usr/local/Cellar/nim/0.19.0/nim/config/nim.cfg' [Conf]
Hint: system [Processing]
Hint: test [Processing]
Hint: hts [Processing]
Hint: utils [Processing]
Hint: hts_concat [Processing]
Hint: bam [Processing]
Hint: strutils [Processing]
Hint: parseutils [Processing]
Hint: math [Processing]
Hint: bitops [Processing]
Hint: algorithm [Processing]
Hint: unicode [Processing]
Hint: simpleoption [Processing]
Hint: typetraits [Processing]
../../.nimble/pkgs/hts-0.2.7/hts/bam.nim(409, 6) Hint: 'bam.main()[declared in ../../.nimble/pkgs/hts-0.2.7/hts/bam.nim(409, 5)]' is declared but not used [XDeclaredButNotUsed]
Hint: vcf [Processing]
Hint: sequtils [Processing]
Hint: macros [Processing]
Hint: fai [Processing]
Hint: bgzf [Processing]
Hint: os [Processing]
Hint: times [Processing]
Hint: options [Processing]
Hint: strformat [Processing]
Hint: posix [Processing]
Hint: ospaths [Processing]
Hint: bgzi [Processing]
Hint: csi [Processing]
Hint: stats [Processing]
test.nim(20, 28) Error: attempting to call undeclared routine: 'aux'
$ nim --version
Nim Compiler Version 0.19.0 [MacOSX: amd64]
Compiled at 2018-09-28
Copyright (c) 2006-2018 by Andreas Rumpf
active boot switches: -d:release -d:useLinenoise
Is there a way to iterate over a VCF by region OR in its entirety?
I'm trying to make supplying a region optional. So something like this:
doAssert open(v, vcf)
var records = if region.len > 0:
v.query(region)
else:
v
for rec in records:
echo rec
However, I keep running into type errors. How can I return a 'blank' query that returns all records?
Hi,
I was trying to build this repo (https://bitbucket.org/andreas-wilm/decontanimate/src/master/) with the 'hts_nim_static_builder' software. However, the final product was not statically linked.
Could there something wrong with the nimble file?
Here are the commands and outputs:
ubuntu@ip-10-0-3-37:~/hts-nim$ ./hts_nim_static_builder -s ../decontanimate/src/decontanimate.nim -n ../decontanimate/decontanimate.nimble
{"--debug": false, "--nim-src-file": ../decontanimate/src/decontanimate.nim, "<nim_compiler_args>": [], "--": false, "--nimble-file": ../decontanimate/decontanimate.nimble, "--tag": latest, "--deps": []}
docker run -v /home/ubuntu/decontanimate:/home/ubuntu/decontanimate -v /home/ubuntu/hts-nim:/load/ brentp/musl-hts-nim:latest /usr/local/bin/nsb -n /home/ubuntu/decontanimate/decontanimate.nimble -s /home/ubuntu/decontanimate/src/decontanimate.nim -- -d:danger -d:release
Unable to find image 'brentp/musl-hts-nim:latest' locally
latest: Pulling from brentp/musl-hts-nim
050382585609: Pull complete
abfd771006f0: Pull complete
8e509104d7b9: Pull complete
f0da7d0f3c7b: Pull complete
70d99ed0e009: Pull complete
8cb505a3e512: Pull complete
Digest: sha256:c9c9456eff1b935e32fd94c7d68d07b46d3c73bd009f77f8a348178c85704e0c
Status: Downloaded newer image for brentp/musl-hts-nim:latest
@["-n", "/home/ubuntu/decontanimate/decontanimate.nimble", "-s", "/home/ubuntu/decontanimate/src/decontanimate.nim", "--", "-d:danger", "-d:release"]
{"<nim_compiler_args>": ["-d:danger", "-d:release"], "--nim-src-file": /home/ubuntu/decontanimate/src/decontanimate.nim, "--nimble-file": /home/ubuntu/decontanimate/decontanimate.nimble, "--": true, "--deps": []}
Verifying dependencies for [email protected]
Installing cligen@>= 0.9.11
Downloading https://github.com/c-blake/cligen.git using git
Verifying dependencies for [email protected]
Installing [email protected]
Success: cligen installed successfully.
Info: Dependency on hts@>= 0.2.4 already satisfied
Verifying dependencies for [email protected]
Installing zip@>= 0.2.1
Downloading https://github.com/nim-lang/zip using git
Verifying dependencies for [email protected]
Installing [email protected]
Success: zip installed successfully.
Hint: used config file '/nim/config/nim.cfg' [Conf]
Hint: used config file '/nim/config/config.nims' [Conf]
Hint: system [Processing]
Hint: widestrs [Processing]
Hint: io [Processing]
Hint: decontanimate [Processing]
Hint: osproc [Processing]
Hint: strutils [Processing]
Hint: parseutils [Processing]
Hint: math [Processing]
Hint: bitops [Processing]
Hint: macros [Processing]
Hint: algorithm [Processing]
Hint: unicode [Processing]
Hint: os [Processing]
Hint: pathnorm [Processing]
Hint: osseps [Processing]
Hint: posix [Processing]
Hint: times [Processing]
Hint: options [Processing]
Hint: typetraits [Processing]
Hint: strtabs [Processing]
Hint: hashes [Processing]
Hint: streams [Processing]
Hint: cpuinfo [Processing]
Hint: linux [Processing]
Hint: bam [Processing]
Hint: hts_concat [Processing]
Hint: strformat [Processing]
Hint: simpleoption [Processing]
Hint: tables [Processing]
Hint: gzipfiles [Processing]
Hint: zlib [Processing]
Hint: logging [Processing]
/home/ubuntu/decontanimate/src/decontanimate.nim(190, 23) Warning: ref_coverage is deprecated [Deprecated]
Hint: cligen [Processing]
Hint: parseopt3 [Processing]
Hint: critbits [Processing]
Hint: argcvt [Processing]
Hint: sets [Processing]
Hint: textUt [Processing]
Hint: terminal [Processing]
Hint: colors [Processing]
Hint: termios [Processing]
Hint: wordwrap [Processing]
Hint: sysUt [Processing]
Hint: macUt [Processing]
CC: stdlib_assertions.nim
CC: stdlib_dollars.nim
CC: stdlib_io.nim
CC: stdlib_system.nim
CC: stdlib_parseutils.nim
CC: stdlib_math.nim
CC: stdlib_unicode.nim
CC: stdlib_strutils.nim
CC: stdlib_pathnorm.nim
CC: stdlib_posix.nim
CC: stdlib_times.nim
CC: stdlib_os.nim
CC: stdlib_hashes.nim
CC: stdlib_strtabs.nim
CC: stdlib_streams.nim
CC: stdlib_osproc.nim
CC: ../../../../root/.nimble/pkgs/hts-0.2.18/hts/private/hts_concat.nim
CC: ../../../../root/.nimble/pkgs/hts-0.2.18/hts/bam.nim
CC: stdlib_tables.nim
CC: ../../../../root/.nimble/pkgs/zip-0.2.1/zip/zlib.nim
CC: ../../../../root/.nimble/pkgs/zip-0.2.1/zip/gzipfiles.nim
CC: stdlib_logging.nim
CC: stdlib_critbits.nim
CC: ../../../../root/.nimble/pkgs/cligen-0.9.38/cligen/parseopt3.nim
CC: stdlib_terminal.nim
CC: stdlib_wordwrap.nim
CC: ../../../../root/.nimble/pkgs/cligen-0.9.38/cligen/textUt.nim
CC: ../../../../root/.nimble/pkgs/cligen-0.9.38/cligen/argcvt.nim
CC: ../../../../root/.nimble/pkgs/cligen-0.9.38/cligen/sysUt.nim
CC: ../../../../root/.nimble/pkgs/cligen-0.9.38/cligen.nim
CC: decontanimate.nim
Hint: [Link]
Hint: operation successful (74494 lines compiled; 5.011 sec total; 90.461MiB peakmem; Dangerous Release Build) [SuccessX]
wrote executable: decontanimate
ubuntu@ip-10-0-3-37:~/hts-nim$ file decontanimate
decontanimate: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, with debug_info, not stripped
Right now if I try to write a bam file, it will only output uncompressed SAM. Is it possible to support writing BAMs and possibly CRAM too?
Hi Brent,
I nimble install hts
hts-0.3.4 and doAssert minor >= 10
failed in hts.nim
Error: unhandled exception: ~/.nimble/pkgs/hts-0.3.4/hts.nim(31, 12) `minor >= 10` [hts/nim] error this version of hts-nim requires htslib >=1.10, got version: 1.9 [AssertionError]
I temporally disabled this assertion to get jigv to work on macOS which the static binary release does support.
Thanks,
Richard
Hi Brent,
I've been wondering about whether it would be possible to open two VCF such that instead of iterating through the items like such
var v_genome: VCF
var v_exome: VCF
discard v_genome.open(hg19_genome)
discard v_exome.open(hg19_exome)
for record in v_genome:
...
I can store the items iterator in a variable -- but I think that requires a {.closure.}
pragma? (new to nim). I tried this out by copying the items proc in vcf.nim and just adding a closure pragma
iterator item_closure*(v:VCF): Variant {.closure.} =
## Each returned Variant has a pointer in the underlying iterator
## that is updated each iteration; use .copy to keep it in memory
# all iterables share the same variant
var variant: Variant
new(variant, destroy_variant)
while true:
if bcf_read(v.hts, v.header.hdr, v.c) == -1:
break
discard bcf_unpack(v.c, 1 or 2)
variant.vcf = v
variant.c = v.c
yield variant
# allow an error code of 1 (CTG_UNDEF) because we can have a contig
# undefined in the reader in the absence of an index or a full header
# but that's not an error in this library.
if v.c.errcode > 1:
stderr.write_line "hts-nim/vcf bcf_read error:" & $v.c.errcode
stderr.write_line "last read variant:", variant.tostring()
quit(2)
I then was able to do the following:
var v_genome: VCF
var v_exome: VCF
discard v_genome.open(hg19_genome)
discard v_exome.open(hg19_exome)
var i1 = item_closure
var i2 = item_closure
while true:
let v_g: Variant = i1(v_genome)
let v_e: Variant = i2(v_exome)
if finished(i1):
echo "Genome Finished"
if finished(i2):
echo "Exome Finished"
if finished(i1) and finished(i2):
break
And it seems to work as I intended...
Would you consider adding a seperate iterator specifically just to allow this? Or is there another way to do this that i'm missing (again new to nim!)
I don't see it in the SAM spec. Is it related to long reads (cigars)? samtools/hts-specs#40
Hi @brentp,
I was looking into merging two sorted VCF files using hts-nim. However I found myself having trouble openning one iterator for each files and moving forward alternatively the iterators depending on the variant positions in each VCF file.
I finally used the concept of "Anonymous Functions" as I used to in Perl, but it's not very sexy. Do you happen to have a better pattern (cf. code bellow) ?
Thanks in advance,
Jérôme.
import hts
type var_it = (proc(): Variant)
proc get_vcf_it(vcf: VCF): var_it =
return proc(): Variant =
for i in vcf:
return i
var
vcf: VCF
v: Variant
vcf_it: var_it
doAssert(open(vcf, "DP28102008_S16.vcf"))
vcf_it = get_vcf_it(vcf)
v = vcf_it()
echo(v)
v = vcf_it()
echo(v)
Here is my command history:
ubuntu $ ./mosdepth pref http://s3.amazonaws.com/1000genomes/phase3/data/NA21103/alignment/NA21103.chrom11.ILLUMINA.bwa.GIH.low_coverage.20120522.bam
[E::knet_seek] SEEK_END is not supported for HTTP. Offset is unchanged
../../../../root/.nimble/pkgs/hts-0.2.8/hts/bam.nim(297) open
Error: unhandled exception: invalid bgzf file [ValueError]
ubuntu $ samtools view http://s3.amazonaws.com/1000genomes/phase3/data/NA21103/alignment/NA21103.chrom11.ILLUMINA.bwa.GIH.low_coverage.20120522.bam >/dev/null
samtools is built with htslib-1.9. mosdepth can read this file just OK if I copy it from S3 first. I put this issue on hts-nim since it seems like it is in bam.nim
and has nothing to do with mosdepth itself.
I'm happy to attempt a fix if you have any idea what might be going on. Thanks!
I admit, I have zero experience in NIM, but I am just looking through the API, and had a thought: could there be a get
method that returns an Option[T]
with the correct type? I'd find this much more convenient than having to check if the return value is nil
. Or at least it satisfies my functional programming style since I can map/filter on it (I think). I'd love to see the following methods:
# assignment of a tag, overwrites an existing value
rec[int]("NM") = 2
# comparison of a tag, required to exist
rec[int]("NM") == 2
# could be none or some(2)
rec.get[int]("NM")
Hopefully this makes sense. It likely can be applied to the VCF API as well.
Is there a way to fetch the list of chromosomes from the header of a VCF along with their lengths?
Hi Brent,
not an issue as such, just wanted to ask whether you would consider adding a little (and not particularly sophisticated) FastQ decontamination tool I wrote a while back to your examples listed in the wiki.
Thanks, Andreas
Hi Brent,
I am trying to understand how missing FORMAT values are handled for vcfs, specifically for int fields.
I am not capable of following the htslib code, but as far as I can figure missing values in the vcf (".") should be represented as int32.low ( is this correct? ).
Assuming that is correct, in some circumstances I have also observed these fields to be encoded as int32.low + 1. Specifically I have seen this for format values with multiple values per sample. For example:
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
Where there are multiple fields per sample the first missing value for each sample is -2147483648 (int32.low) and the rest seem to be -2147483647 (int32.low +1).
Is this an expected or undefined behaviour? Or is there a better way I should be testing for missing data?
I have attached an example of what I see.
Thanks for your help and thanks for this and your other great tools.
Cheers.
Hi Brent,
what's the recommended way to parse internally generated BAM streams? The use case here is that I call BWA internally using osproc
, returning lines on stdout as iterator. open()
can't be used, unless I use a named pipe as intermediate. Is there a way to generate a Record from a SAM/BAM line?
Thanks,
Andreas
see #7
can have each in e.g. scripts/example-bam.nim
and then update there and cat into README upon change. or have a utility to test literal blocks in the readme.
Hi Brent,
I'm trying to read in Variants from an input VCF and write to a output VCF after modifying the GT field. The code snippet I have works for one input and one output. However, when I put that into a for loop to loop through multiple chroms, it errors out as:
free(): invalid pointer
Traceback (most recent call last)
/mnt/mcfiles/rlyu/Projects/sgcocaller/tests/read_write_vcf.nim(41) read_write_vcf
/mnt/mcfiles/rlyu/Projects/sgcocaller/tests/read_write_vcf.nim(20) write_linked_blocks
/mnt/mcfiles/rlyu/.nimble/pkgs/hts-0.3.12/hts/vcf.nim(743) copy
/usr/local/bin/nim-1.4.0/lib/system/gc.nim(439) newObj
/usr/local/bin/nim-1.4.0/lib/system/gc_common.nim(430) prepareDealloc
/mnt/mcfiles/rlyu/.nimble/pkgs/hts-0.3.12/hts/vcf.nim(423) destroy_vcf
SIGABRT: Abnormal termination.
Code:
## test open new vcf and close
import tables
import math
import sequtils
import hts
let threads = 1
proc write_linked_blocks(ivcf:VCF, ovcf:VCF):int =
ovcf.header = ivcf.header
discard ovcf.header.add_string("""##sgcocaller_v0.1=phaseBlocks""")
discard ovcf.write_header()
var gt_string:seq[int32]
for v in ivcf.query("*"):
gt_string = @[int32(2),int32(5)]
if v.format().set("GT",gt_string) != Status.OK:
quit "set GT failed"
if not ovcf.write_variant(v) :
quit "write vcf failed for " & $voff
return 0
var chrs = map(toSeq(1..2), proc (x:int): string = "chr" & $x)
var ivcf,ovcf: VCF
var inFile,outFile:string
for chrName in chrs:
echo "writing for " & chrName
inFile = "data/FVB_NJ.mgp.v5.snps.dbSNP142.homo.alt." & chrName & ".vcf.gz"
outFile = "tests/" & chrName & "_block_phased.vcf.gz"
if not open(ivcf, inFile, threads=threads):
quit "couldn't open input vcf file"
if not open(ovcf, outFile, threads=threads, mode ="w"):
quit "couldn't open output vcf file"
discard write_linked_blocks(ivcf,ovcf)
ivcf.close()
ovcf.close()
I might not be handling some pointer operation right. Thanks for any advice.
Best,
Ruqian
Is there a way to use hts-him to index a BAM file (generate the .bai file)? I looked at the API but didn't find any.
excuse me but i don't know how to install and use this package, would you please help?
I had been retrieving only empty strings for extra fields of type 'A' with the most recent commit to master
. The pull request #18 introduces retrieval of such char-type fields. I think it's a comprehensive fix, but docs still have to be regenerated.
Hello Brent!
I tried compiling a static binary with your docker solution, and it doesn't work from the AMD nodes in the cluster I use.
(i.e. the help screen is displayed, but as soon as I try parsing a BAM file I get SIGILL: Illegal operation.
).
Since I don't know if this is unavoidable or just remoing some optimizations could make the binary working also with old AMD processors I put a note here (worse case scenario this can be mentioned in the README).
Thanks again,
Andrea
PS: here the /proc/cpuinfo bit:
processor : 31
vendor_id : AuthenticAMD
cpu family : 16
model : 9
model name : AMD Opteron(tm) Processor 6134
stepping : 1
microcode : 0x10000d9
cpu MHz : 1400.000
cache size : 512 KB
physical id : 3
siblings : 8
core id : 3
cpu cores : 8
apicid : 71
initial apicid : 55
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate retpoline_amd ibp_disable vmmcall npt lbrv svm_lock nrip_save pausefilter
bogomips : 4600.48
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
I don't see it in samtools.github.io/hts-specs/SAMv1.pdf
Hi Brent,
I'm wondering if there is any easy way to modify a record.qname as I would like to add additional information to it.
Cheers,
Ruqian
Hey Brent,
it might be something obvious that I'm overlooking, but what would be considered best practice to parse a vcf from stdin or a string (generated from within Nim). Will I have to convert it to a temp file first to be able to use vcf.open?
Thanks,
Andreas
Hi @brentp ,
I have been trying to build static binaries of my CLT [sscocaller] (https://gitlab.svi.edu.au/biocellgen-public/sscocaller.git) some of the instructions from your hts_nim_static_builder
. On top of hts-nim, sscocaller
also depends on the libRmath-nim
library and I have been getting the undefined reference error
.
I have used the docker image:
docker run -it -v $PWD:/mnt/mcfiles/ brentp/musl-hts-nim /bin/bash
and then nsb
/usr/local/bin/nsb -s src/sscocaller.nim -n sscocaller.nimble -- --d:release --threads:on
@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected]:(.text+0x15): undefined reference to `dbinom'
/usr/lib/gcc/x86_64-alpine-linux-musl/9.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected]: in function
dbinom__UYyjXXE0GgmIfuGpc2rPqA': @m..@s..@s..@[email protected]@[email protected]@[email protected]:(.text+0x15): undefined reference to
dbinom'
collect2: error: ld returned 1 exit status
Error: execution of an external program failed: 'gcc -o /mnt/mcfiles/src/xx_exe_out /root/.cache/nim/sscocaller_r/stdlib_assertions.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_dollars.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_locks.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_sharedlist.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_formatfloat.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_io.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_system.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_parseutils.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_math.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_algorithm.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_unicode.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_strutils.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_times.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_os.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_hashes.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_tables.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_sequtils.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_sets.nim.c.o /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@sunicodedb@sproperties_data.nim.c.o /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@sunicodedb@stypes_data.nim.c.o /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@spkgs@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@spkgs@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@shts@sprivate@shts_concat.nim.c.o /root/.cache/nim/sscocaller_r/stdlib_strformat.nim.c.o /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@spkgs@[email protected] /root/.cache/nim/sscocaller_r/@m..@s..@s..@sroot@s.nimble@[email protected]@[email protected] /root/.cache/nim/sscocaller_r/stdlib_streams.nim.c.o /root/.cache/nim/sscocaller_r/@msscocaller.nim.c.o -pthread -lm -lrt -static -no-pie -fno-pie /usr/lib/libm.a /usr/local/lib/libhts.a /usr/local/lib/libdeflate.a /usr/local/lib/libz.a /usr/local/lib64/libzip.a /usr/lib/libbz2.a /usr/lib/liblzma.a /usr/lib/libpthread.a /usr/lib/libopenblas.a /usr/lib/libgfortran.a /usr/lib/libquadmath.a /usr/lib/libssl.a /usr/lib/libcrypto.a /usr/lib/libdl.a /usr/lib/libc.a /usr/lib/librt.a /usr/lib/libpcre.a /usr/lib/libd4binding.a -ldl'
error compiling code
I was wondering if you any idea/suggestion/direction of where to look at to fix this.
Thank you very much.
Ruqian
Hello,
I may be missing something obvious but the documentation doesn't seem to have a method for closing an Fai object once you've opened it. Is that intentional or is there anyway to expose such a method?
Thanks a bunch for the library, it's been super helpful!
open
instead of Open
{. borrow .}
: https://nim-lang.org/docs/manual.html#distinct-type-modelling-currencies to avoid re-implementing methods$
proc).I seem to be running into the error at https://github.com/brentp/hts-nim/blob/master/src/hts/bam.nim#L242 when I try to open a SAM file. Am I doing something silly, or is SAM not supported yet?
Thanks!
A
Hi @brentp,
I would like to know if there is a way to compile a static binary that integrates the hts-lib avoiding the dynamic link to libhts.so ?
Best regards,
Jérôme.
On tip of master
:
% nimble --verbose test .
bamtest.nim(48) bamtest
bam.nim(327) load_index
Unhandled exception: [bam] load_index error opening index https://github.com/brentp/hts-nim/raw/master/tests/sa.bam.bai for bam tests/sa.bam. Protocol not supported [IOError]
[FAILED] load non-standard index
Any ideas?
Hello,
I'm running into a problem using hts-nim. Upon downloading the git repo and using nimble install
it seems to install fine, but upon running nimble test
, or attempting to import hts
in any of my nim scripts, I'm getting the following error:
/Users/nproach/Documents/src/hts-nim/src/hts/vcf.nim(202, 35) Error: undeclared identifier: 'csize_t'
stack trace: (most recent call last)
/private/var/folders/84/b9w2y49j5v55sc9sq9tn4qkm0000gn/T/nimblecache/nimscriptapi.nim(165, 16)
/Users/nproach/Documents/src/hts-nim/hts_1286.nims(25, 8) testTask
/usr/local/Cellar/nim/1.0.0/nim/lib/system/nimscript.nim(252, 7) exec
/usr/local/Cellar/nim/1.0.0/nim/lib/system/nimscript.nim(252, 7) Error: unhandled exception: FAILED: nim c -d:useSysAssert -d:useGcAssert --lineDir:on --debuginfo -r tests/all [OSError]
Error: Exception raised during nimble script execution
I'd appreciate any help you can provide on fixing this issue. Thanks!
So I was wondering if you had some good examples of building new variant objects? My case is having n vcfs that i'm trying to merge into 1. Even just a naive merge where i only keep info/format from 1 vcf cause I can assume in this case all vcfs have the same loci in it just different samples. Trying to build that new variant from scratch is not clear to me and I'm not confident copying one variant and trying to update variant.c
variant.vcf
etc is safe really. I guess I was thinking that I would be able to make a new empty variant from a vcf object or vcf header object, since I can easily make my new VCF with the right header etc. Anyways, if you could point me to some examples of building new variants i'd appreciate it.
Hi, currently I can not write a brand new VCF. Do you have plans to be able to initialize an empty VCF header and empty Variant objects that can be modified and then written?
Hello,
I've been working with hts-nim
at GIS with @andreas-wilm.
I believe the documentation needs an update. I've encountered a couple of undocumented (but implemented) functions (e.g. base_at
from the bam.nim module and the []
operator in the fai.nim module). The dynamic functionalities of the doc pages (search, group by...) are broken as well because the HTML relies on a missing a JavaScript file. However, this seems to be a problem with Nim's docgen
package rather than with hts-nim
. I've opened an issue aiming to get this fixed and it is probably a good idea to update the docs when they to :).
One more thing regarding the []
operator for Fai
objects. It produces some strange results. Let's say we have a file called sample.fa with the following content:
>ref
ACGGTCGA
The following two programs execute normally:
import hts
var fai: Fai
discard open(fai, "sample.fa")
echo fai[0] # ouputs 'ref'
import hts
var fai: Fai
discard open(fai, "sample.fa")
echo fai.get("ref", 0, 2) # outputs 'ACG'
Hovewer, the following program fails with an error as if the []
call has consumed the reference somehow:
import hts
var fai: Fai
discard open(fai, "sample.fa")
echo fai[0]
echo fai.get("ref", 0, 2)
Similar strange behavior can be observed like this:
echo fai[0] # outputs 'ref'
echo fai[0] # outputs an empty string
echo fai[0] # outputs some junk, subsequent calls eventually detect a segmentation fault
Thanks,
Filip
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.