gmod / bam-js Goto Github PK
View Code? Open in Web Editor NEWParse BAM and BAM index files in javascript for node and the browser
License: MIT License
Parse BAM and BAM index files in javascript for node and the browser
License: MIT License
Ref ga4gh endpoint here
Rejects the bogus refname provided that we use currently
Have to adjust to not specify any refname, but then it returns very large data chunks, and we have to range-request the results of what it gives back
Currently bins are processed upfront which could incur large memory usage on a large BAI file
Hi Colin,
I am trying to use bam-js with import syntax like this:
import BamFile from '@gmod/bam';
this.bam = new BamFile({
bamUrl: url,
baiUrl: url + '.bai',
});
but I got the following error:
TypeError: __WEBPACK_IMPORTED_MODULE_3__gmod_bam___default.a is not a constructor
can you suggest how to fix it?
I just install from npm today, so should be latest version. Thanks.
Duplicate of GMOD/cram-js#9
I tryed out bam-js but but have problems.
The output of getRecordsForRange() is an array of BamRecords like shown. How can I get the seq und qual fields?
BamRecord {
data: {
start: 42137714,
_flag_nc: 9633793,
_n_cigar_op: 1,
_bin_mq_nl: 475282471,
_l_read_name: 39,
seq_length: 99,
length_on_ref: 99,
cigar: '99M',
end: 42137813
},
bytes: {
start: 904612,
end: 904917,
byteArray: <Buffer 2b 01 00 00 14 00 00 00 9d bf 82 02 27 3c 8a 03 01 00 53 00 64 00 00 00 14 00 00 00 7f bf 82 02 7e ff ff ff 41 30 30 36 38 35 3a 32 36 3a 48 4b 48 54 ... 1225259 more bytes>
},
flags: 147,
_id: 1458877625505,
_refID: 20,
_tagOffset: undefined,
_tagList: [],
_allTagsParsed: false
}
Hello!
I am developing an application with node v18.9.0 and webpack v5.74.0. When I try to parse a new Bamfile with the code
const t = new BamFile({ bamPath: "../test_alignment.bam" });
I get an error
Uncaught TypeError: generic_filehandle__WEBPACK_IMPORTED_MODULE_3__.LocalFile is not a constructor
Is there something basic that I am missing? Thanks for your help!
Hello @cmdcolin,
We are deploying JBrowse2 @umccr, primarily for the htsget support, so that different web-based clients work fine using one data access interface (htsget).
Unfortunately we hit a snag with the Bearer token support, which would allow our users to access private data via htsget, namely:
Lines 65 to 68 in 25ba6e8
Do you have plans to add Bearer token support to htsget in JBrowse2?
Hi Colin,
I am curious about the seq_id in the returned BamRecord.
in the screenshot below you can see I got BamRecord has seq_id as 6
but in the header the 6 is chr6, the read should be in chr7 as i am querying a region in chr7
do I need to add 1 to the seq_id to get right chr? Thanks.
by the way, why the seq property is undefined? there should be read sequence. Thank you.
Alternative naming conventions can be embedded in the BAM file via the SAM header
I noticed that a section of volvox-sorted.bam was failing as I was integrating bam-js into JBrowse e.g. somewhere in region ctgA:30000-35000 produced the error "unknown compression method" (there is a narrower range that can produce same error but that is the general region
My tests in node.js pass for getRecordsForRange here, but if I manually switch the node.js code to use pakoUnzip, then the tests fail in node.js land and give the same "unknown compression method" error
I guess I am curious whether it is something the bam-js code is doing that causes the unzip error or if it is a pako specific bug
CC @rbuels
This is a stretch goal but the CG field enables very long reads to be due to CIGAR limit
Currently returns a big block of features
Hi,
I'm trying to validate types (I'm using TS through JSDoc) in my codebase using tsc
, but I stumbled upon a couple of issues in @GMOD/[email protected]
.
The first one:
../../node_modules/@gmod/bam/dist/util.d.ts:4:44 - error TS2304: Cannot find name 'Long'.
4 export declare function longToNumber(long: Long): number;
~~~~
There's no "long"
package imported in util.d.ts
(https://cdn.jsdelivr.net/npm/@gmod/[email protected]/dist/util.d.ts), although it's in the original source file:
Line 1 in 492c145
๐คท
If I checkout v1.1.18
of this repo and compile the project, the long import is missing from util.d.ts
. If I checkout HEAD and compile, then it's there. Maybe there's some problem in the old [email protected]
(?)
And then there's the other issue:
../../node_modules/@gmod/bam/dist/htsget.d.ts:14:5 - error TS2416: Property '_readChunk' in type 'HtsgetFile' is not assignable to the same property in base type 'BamFile'.
Type '(params: { chunk: { buffer: Buffer; chunk: Chunk; }; opts: BaseOpts; }) => Promise<{ data: Buffer; cpositions: null; dpositions: null; chunk: Chunk; }>' is not assignable to type '({ chunk, opts }: { chunk: Chunk; opts: BaseOpts; }) => Promise<{ data: any; cpositions: any; dpositions: any; chunk: Chunk; }>'.
Types of parameters 'params' and '__0' are incompatible.
Type '{ chunk: Chunk; opts: BaseOpts; }' is not assignable to type '{ chunk: { buffer: Buffer; chunk: Chunk; }; opts: BaseOpts; }'.
Types of property 'chunk' are incompatible.
Type 'Chunk' is missing the following properties from type '{ buffer: Buffer; chunk: Chunk; }': buffer, chunk
14 _readChunk(params: {
~~~~~~~~~~
Indeed, they are different:
Line 401 in 492c145
Line 110 in 492c145
However, I don't have enough knowledge of TypeScript to say whether these should be compatible or not. ๐ค
And btw, I'm using [email protected]
.
The README needs a lot of expansion.
In particular, need to document the options to getRecordsForRange: viewAsPairs, pairAcrossChr, maxInsertSize, and signal
See GMOD/tabix-js#17
A small region of a pacbio bam file by samtools contains 34 reads
% node index.jscdiesh-Precision-7540% samtools view https://s3.amazonaws.com/jbrowse.org/genomes/hg19/pacbio/m64011_181218_235052.8M.HG002.hs37d5.11kb.bam 1:10000-10600|wc
34 771 777929
By a small script we get less
const fetch = require( 'cross-fetch');
const {RemoteFile} = require('generic-filehandle')
const {BamFile} = require('@gmod/bam')
const b = new BamFile({
> bamFilehandle: new RemoteFile('https://s3.amazonaws.com/jbrowse.org/genomes/hg19/pacbio/m64011_181218_235052.8M.HG002.hs37d5.11kb.bam', {fetch}),
> baiFilehandle: new RemoteFile('https://s3.amazonaws.com/jbrowse.org/genomes/hg19/pacbio/m64011_181218_235052.8M.HG002.hs37d5.11kb.bam.bai', {fetch} )
});
(async () => {
> await b.getHeader()
> const ret = await b.getRecordsForRange('1',9999,10600)
> console.log(ret.length)
})()
//outputs 12
git bisect narrows down squarely to #36
Removing the final vestige of the chunk merging code makes this problem go away for the most part, but actually returns 37 reads instead of 34
https://github.com/GMOD/bam-js/tree/remove_more_chunk_merging
Will need to make it match up to par with what samtools returns
Thank you for this awesome library!
I was using this in HiGlass/Gosling.js with the opt.pairAcrossChr
option turned on
bamFile.getRecordsForRange(..., {
viewAsPairs: true,
pairAcrossChr: true
})
but was getting the following error and warning messages:
Error: data size of 1,073,213,398 bytes exceeded fetch size limit of 500,000,000 bytes
Could you suggest how to work around this issue?
Perhaps, searching pairAcrossChr
is an expensive operation and is not supported for large data? (e.g., I was using a 200GB BAM file w/ a BAI file).
In samtools
ms:i:47450 AS:i:47542
In JBrowse (2)
ms -18086
AS -17994
Issues like this show that large amounts of data with small genomes e.g. covid can bog down the browser
There are some things that this module could possibly do to speed it up
See tabix discussion
I am experimenting with the viewAsPairs
option (which is great to have). My understanding is that if two reads have the same read name, then they are mate pairs.
In the example below, I have a read pair (extracted from the getRecordsForRange
result). The problem is that the second read does not have an end
position. Is this intended? I am using the latest version of the tool.
Thanks!
Content-Encoding:gzip often confuses not just jbrowse but the web browser also
In some limited cases, we can skip running pako unzip steps like this PR demonstrates and this will decode the BAM header, but it still fails on the actual BAM data. the error at that point I believe comes from the browser, and not any JS code, so I think this may be the limiting factor
example of allowing the header to be read
https://github.com/GMOD/bam-js/compare/t1?expand=1
The current code does this in a weird way where a second invocation of index parsing adds this...
Hi Colin,
I am using latest version from npm, I am uploading both the bam and .bai to a file input and construct bamFile with both File object, from console.log I can see bamFile constructed successfully, but get error afterwards. Could you please take a look? Here are the test code, and test bam file, index file. Thanks.
I am using getRecordsForRange
to extract data from a BAM file. I can successfully retrieve feature.get('MD')
and feature.get('cigar')
, but feature.get('seq')
is undefined. I am using the latest version. With samtools view
I can see the sequence. Any help is appreciated. Thanks!
could use bam-js machinery or become a separate module
Mentioned here GMOD/tabix-js#2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.