gmod / bam-js Goto Github PK

View Code? Open in Web Editor NEW

18.0 18.0 9.0 37.62 MB

Parse BAM and BAM index files in javascript for node and the browser

License: MIT License

JavaScript 3.05% TypeScript 96.95%

bam-js's People

Contributors

Stargazers

Watchers

Forkers

pkerpedjiev jsonbrooks manzt e- healthvivo sehilyi adamjohnwright alexpreynolds kibertoad

bam-js's Issues

Port intra-slice merging from tabix

From GMOD/tabix-js@e9bb883

Htsget unable to fetch header from some endpoints

Ref ga4gh endpoint here

samtools/hts-specs#530

Rejects the bogus refname provided that we use currently

Have to adjust to not specify any refname, but then it returns very large data chunks, and we have to range-request the results of what it gives back

Delay evaluation of bin index until used

Currently bins are processed upfront which could incur large memory usage on a large BAI file

use with ES6

Hi Colin,

I am trying to use bam-js with import syntax like this:

import BamFile from '@gmod/bam';

this.bam = new BamFile({
            bamUrl: url,
            baiUrl: url + '.bai',
        });

but I got the following error:

TypeError: __WEBPACK_IMPORTED_MODULE_3__gmod_bam___default.a is not a constructor

can you suggest how to fix it?
I just install from npm today, so should be latest version. Thanks.

Optimize some view as pairs circumstances

Duplicate of GMOD/cram-js#9

need explanation for usage

I tryed out bam-js but but have problems.
The output of getRecordsForRange() is an array of BamRecords like shown. How can I get the seq und qual fields?

BamRecord {
data: {
start: 42137714,
_flag_nc: 9633793,
_n_cigar_op: 1,
_bin_mq_nl: 475282471,
_l_read_name: 39,
seq_length: 99,
length_on_ref: 99,
cigar: '99M',
end: 42137813
},
bytes: {
start: 904612,
end: 904917,
byteArray: <Buffer 2b 01 00 00 14 00 00 00 9d bf 82 02 27 3c 8a 03 01 00 53 00 64 00 00 00 14 00 00 00 7f bf 82 02 7e ff ff ff 41 30 30 36 38 35 3a 32 36 3a 48 4b 48 54 ... 1225259 more bytes>
},
flags: 147,
_id: 1458877625505,
_refID: 20,
_tagOffset: undefined,
_tagList: [],
_allTagsParsed: false
}

LocalFile is not a constructor

Hello!

I am developing an application with node v18.9.0 and webpack v5.74.0. When I try to parse a new Bamfile with the code

const t = new BamFile({ bamPath: "../test_alignment.bam" });

I get an error

Uncaught TypeError: generic_filehandle__WEBPACK_IMPORTED_MODULE_3__.LocalFile is not a constructor

Is there something basic that I am missing? Thanks for your help!

htsget auth support

Hello @cmdcolin,

We are deploying JBrowse2 @umccr, primarily for the htsget support, so that different web-based clients work fine using one data access interface (htsget).

Unfortunately we hit a snag with the Bearer token support, which would allow our users to access private data via htsget, namely:

bam-js/src/htsget.ts

Lines 65 to 68 in 25ba6e8

 const base = `${this.baseUrl}/${this.trackId}` 

 const url = `${base}?referenceName=${chr}&start=${min}&end=${max}&format=BAM` 

 const chrId = this.chrToIndex && this.chrToIndex[chr] 

 const result = await fetch(url, { ...opts })

Do you have plans to add Bearer token support to htsget in JBrowse2?

@andrewpatto @victorskl

/cc @ohofmann @mlin

question on seq_id of returned feature

Hi Colin,

I am curious about the seq_id in the returned BamRecord.

in the screenshot below you can see I got BamRecord has seq_id as 6

but in the header the 6 is chr6, the read should be in chr7 as i am querying a region in chr7

do I need to add 1 to the seq_id to get right chr? Thanks.

by the way, why the seq property is undefined? there should be read sequence. Thank you.

Support AN (alternative naming)

Alternative naming conventions can be embedded in the BAM file via the SAM header

Add indexCov

Fails to get range when forcing to use pakoUnzip in node and browser

I noticed that a section of volvox-sorted.bam was failing as I was integrating bam-js into JBrowse e.g. somewhere in region ctgA:30000-35000 produced the error "unknown compression method" (there is a narrower range that can produce same error but that is the general region

My tests in node.js pass for getRecordsForRange here, but if I manually switch the node.js code to use pakoUnzip, then the tests fail in node.js land and give the same "unknown compression method" error

I guess I am curious whether it is something the bam-js code is doing that causes the unzip error or if it is a pako specific bug

CC @rbuels

CG field

This is a stretch goal but the CG field enables very long reads to be due to CIGAR limit

Restructure record fetching with a callback

Currently returns a big block of features

Add big endianness

Typing issues

Hi,

I'm trying to validate types (I'm using TS through JSDoc) in my codebase using tsc, but I stumbled upon a couple of issues in @GMOD/[email protected].

The first one:

../../node_modules/@gmod/bam/dist/util.d.ts:4:44 - error TS2304: Cannot find name 'Long'.

4 export declare function longToNumber(long: Long): number;
                                             ~~~~

There's no "long" package imported in util.d.ts (https://cdn.jsdelivr.net/npm/@gmod/[email protected]/dist/util.d.ts), although it's in the original source file:

bam-js/src/util.ts

Line 1 in 492c145

import Long from 'long'

🤷

If I checkout v1.1.18 of this repo and compile the project, the long import is missing from util.d.ts. If I checkout HEAD and compile, then it's there. Maybe there's some problem in the old [email protected] (?)

And then there's the other issue:

../../node_modules/@gmod/bam/dist/htsget.d.ts:14:5 - error TS2416: Property '_readChunk' in type 'HtsgetFile' is not assignable to the same property in base type 'BamFile'.
  Type '(params: { chunk: { buffer: Buffer; chunk: Chunk; }; opts: BaseOpts; }) => Promise<{ data: Buffer; cpositions: null; dpositions: null; chunk: Chunk; }>' is not assignable to type '({ chunk, opts }: { chunk: Chunk; opts: BaseOpts; }) => Promise<{ data: any; cpositions: any; dpositions: any; chunk: Chunk; }>'.
    Types of parameters 'params' and '__0' are incompatible.
      Type '{ chunk: Chunk; opts: BaseOpts; }' is not assignable to type '{ chunk: { buffer: Buffer; chunk: Chunk; }; opts: BaseOpts; }'.
        Types of property 'chunk' are incompatible.
          Type 'Chunk' is missing the following properties from type '{ buffer: Buffer; chunk: Chunk; }': buffer, chunk

14     _readChunk(params: {
       ~~~~~~~~~~

Indeed, they are different:

bam-js/src/bamFile.ts

Line 401 in 492c145

async _readChunk({ chunk, opts }: { chunk: Chunk; opts: BaseOpts }) {

bam-js/src/htsget.ts

Line 110 in 492c145

async _readChunk({ chunk }: { chunk: Chunk; opts: BaseOpts }) {

However, I don't have enough knowledge of TypeScript to say whether these should be compatible or not. 🤔

And btw, I'm using [email protected].

expand README

The README needs a lot of expansion.

In particular, need to document the options to getRecordsForRange: viewAsPairs, pairAcrossChr, maxInsertSize, and signal

Add check for incorrect BAI files for larger chromosomes

See GMOD/tabix-js#17

Not fetching all reads

A small region of a pacbio bam file by samtools contains 34 reads

% node index.jscdiesh-Precision-7540% samtools view https://s3.amazonaws.com/jbrowse.org/genomes/hg19/pacbio/m64011_181218_235052.8M.HG002.hs37d5.11kb.bam 1:10000-10600|wc
34 771 777929

By a small script we get less

const fetch = require( 'cross-fetch');

const {RemoteFile} = require('generic-filehandle')
const {BamFile} = require('@gmod/bam')

const b = new BamFile({
>       bamFilehandle: new RemoteFile('https://s3.amazonaws.com/jbrowse.org/genomes/hg19/pacbio/m64011_181218_235052.8M.HG002.hs37d5.11kb.bam', {fetch}),
>       baiFilehandle: new RemoteFile('https://s3.amazonaws.com/jbrowse.org/genomes/hg19/pacbio/m64011_181218_235052.8M.HG002.hs37d5.11kb.bam.bai', {fetch} )
});

(async () => {
>       await b.getHeader()
>       const ret = await b.getRecordsForRange('1',9999,10600)
>       console.log(ret.length)
})()

//outputs 12

git bisect narrows down squarely to #36

Removing the final vestige of the chunk merging code makes this problem go away for the most part, but actually returns 37 reads instead of 34

https://github.com/GMOD/bam-js/tree/remove_more_chunk_merging

Will need to make it match up to par with what samtools returns

error in getHeader()

Hi Colin,

Trying use the getHeader() function but get following error:

my code is like:

this.bam = new BamFile({
            bamUrl: url,
            baiUrl: url + '.bai',
        });

const header = await this.bam.getHeader();
        console.log(header);

can you suggest how to fix this? Thanks.

fetch size limit error with `pairAcrossChr`

Thank you for this awesome library!

I was using this in HiGlass/Gosling.js with the opt.pairAcrossChr option turned on

bamFile.getRecordsForRange(..., {
  viewAsPairs: true,
  pairAcrossChr: true
})

but was getting the following error and warning messages:

Error: data size of 1,073,213,398 bytes exceeded fetch size limit of 500,000,000 bytes

Could you suggest how to work around this issue?

Perhaps, searching pairAcrossChr is an expensive operation and is not supported for large data? (e.g., I was using a 200GB BAM file w/ a BAI file).

All tags are stored in lower case, but technically lower case tags are distinct from upper case

Support BAI index aka BAM index index

Wrong negative number values being reported for some tags

In samtools

ms:i:47450 AS:i:47542

In JBrowse (2)

ms -18086
AS -17994

Large amount of short reads bogs jbrowse down

Issues like this show that large amounts of data with small genomes e.g. covid can bog down the browser

GMOD/jbrowse#1524

There are some things that this module could possibly do to speed it up

Merge adjacent blocks code could result in large allocation

See tabix discussion

Missing end position in read mate?

I am experimenting with the viewAsPairs option (which is great to have). My understanding is that if two reads have the same read name, then they are mate pairs.
In the example below, I have a read pair (extracted from the getRecordsForRange result). The problem is that the second read does not have an end position. Is this intended? I am using the latest version of the tool.

Thanks!

Add more hydrate / snapshot tests

Use @gmod/tabix techniques to decode bgzf blocks

Support for Content-Encoding:gzip(?)

Content-Encoding:gzip often confuses not just jbrowse but the web browser also

In some limited cases, we can skip running pako unzip steps like this PR demonstrates and this will decode the BAM header, but it still fails on the actual BAM data. the error at that point I believe comes from the browser, and not any JS code, so I think this may be the limiting factor

example of allowing the header to be read
https://github.com/GMOD/bam-js/compare/t1?expand=1

Rewrite record class for less lazy loading

Fix CSI refNameToId resolution

The current code does this in a weird way where a second invocation of index parsing adds this...

error with local uploaded file

Hi Colin,

I am using latest version from npm, I am uploading both the bam and .bai to a file input and construct bamFile with both File object, from console.log I can see bamFile constructed successfully, but get error afterwards. Could you please take a look? Here are the test code, and test bam file, index file. Thanks.

Feature sequence is undefined

I am using getRecordsForRange to extract data from a BAM file. I can successfully retrieve feature.get('MD') and feature.get('cigar'), but feature.get('seq') is undefined. I am using the latest version. With samtools view I can see the sequence. Any help is appreciated. Thanks!

	const base = `${this.baseUrl}/${this.trackId}`
	const url = `${base}?referenceName=${chr}&start=${min}&end=${max}&format=BAM`
	const chrId = this.chrToIndex && this.chrToIndex[chr]
	const result = await fetch(url, { ...opts })

gmod / bam-js Goto Github PK

bam-js's People

Contributors

Stargazers

Watchers

Forkers

bam-js's Issues

Recommend Projects

Recommend Topics

Recommend Org