crg-barcelona / bwtool Goto Github PK
View Code? Open in Web Editor NEWA tool for bigWig files.
Home Page: https://github.com/CRG-Barcelona/bwtool/wiki
License: Other
A tool for bigWig files.
Home Page: https://github.com/CRG-Barcelona/bwtool/wiki
License: Other
This may be more universal than just pasting or using aggregate, but the features using multiple bigWigs should do a check to make sure the internal chromosome sizes are all consistent among multiple bigWigs in case someone uses bigWigs from different assemblies/species by accident.
This one is a little confusing. Doing the obvious thing to fix it results in failed checks.
Summarize regions not in bed. Low priority.
That's probably enough. Maybe later an example plot in R for the next milestone.
Anytime a negative number is given on the command-line it's interpreted as an option. I need to keep a central list of all the options so early in the program this ambiguity is resolved.
At the moment, chromosome size does not change after shifting up (5'). Maybe it should be an option.
Major bug that needs fixing.
It's pretty good at the moment.
With one of our 250 MB bigwig test files we get this error message:
Fatal error: Exit code 1 ()
errAbort re-entered due to out-of-memory condition. Exiting.
What does it mean? This process was able to allocate 30GB of memory.
On some input files I have the following crash with bwtool matrix -cluster=10
./bwtool matrix -keep-be…” terminated by signal SIGSEGV (Address boundary error)
I traced the error to this code in beato/cluster.c
static int *k_means(struct cluster_bed_matrix *cbm, double t)
{
/* output cluster label for each data point */
int *labels; /* Labels for each cluster (size n) */
int h, i, j; /* loop counters, of course :) */
double old_error;
double error = DBL_MAX; /* sum of squared euclidean distance */
double **tmp_centroids; /* centroids and temp centroids (size k x m) */
int n = cbm->n;
int m = cbm->m;
int k = cbm->k;
AllocArray(labels, n);
AllocArray(tmp_centroids, k);
printf("k_means: 0\n");
for (i = 0; i < k; i++)
AllocArray(tmp_centroids[i], m);
/* assert(data && k > 0 && k <= n && m > 0 && t >= 0); /\* for debugging *\/ */
/* init ialization */
printf("k_means: 1\n");
for (i = 0, h = cbm->num_na; i < k; h += (cbm->n-cbm->num_na) / k, i++)
{
printf("k_means: 1:%d\n", i);
/* pick k points as initial centroids */
for (j = 0; j < m; j++) {
printf("k_means: 1:%d %d %d %d\n", i, j, m, h);
cbm->centroids[i][j] = cbm->pbm->matrix[h][j];
}
}
...
For a working file:
do_kmeans_sort
do_kmeans_sort: 0
do_kmeans_sort float: 0.001000
k_means: 0
k_means: 1
k_means: 1:0
k_means: 1:0 0 10000 19982
k_means: 1:0 1 10000 19982
k_means: 1:0 2 10000 19982
k_means: 1:0 3 10000 19982
...
k_means: 1:9 9993 10000 19991
k_means: 1:9 9994 10000 19991
k_means: 1:9 9995 10000 19991
k_means: 1:9 9996 10000 19991
k_means: 1:9 9997 10000 19991
k_means: 1:9 9998 10000 19991
k_means: 1:9 9999 10000 19991
k_means: 2
k_means: 3
do_kmeans_sort: 1
do_kmeans_sort: 2
do_kmeans_sort: 3
do_kmeans_sort: 4
do_kmeans_sort: 5
output_matrix
For a non-working file:
do_kmeans_sort
do_kmeans_sort: 0
do_kmeans_sort float: 0.001000
k_means: 0
k_means: 1
k_means: 1:0
k_means: 1:0 0 10000 20000
Segmentation fault (core dumped)
I think h (=20000) is calculated wrongly:
I have a window size of 10000 and 20000 regions in my BED file.
I tried with lift on a big file and it dumped a 26 GB core file. Maybe find some smaller examples.
Seems like an infinite loop or something that it's just not working at all. Never mind whether the output is random-seeming. There's no output.
Pretty good already. More scripts or big examples could be useful.
Maybe allow bigWig-writing programs to end it at the wig-writing step.
The chrom sizes handed into the bigWig writer is the one from the source bigWig and not the destination. Big problem.
We are trying to integrate bwtool into Galaxy https://github.com/galaxyproject/tools-iuc/compare/bwtool?expand=1, but unfortunately the latest release is crashing for every huge bigwig file. Compiling from master seems to work and a new release would make our life much easier.
Thanks,
Bjoern
Hey Andy,
i'm trying to run "bwtool agg" on ERCC spike-in sequences but it crashes. ERCC spike-ins are a set of 92 synthetic, unspliced sequences that one adds to an RNA mix before making a cDNA library. Conceptually, I guess each of those 92 sequences could be considered a separate chromosome. Their sequences are known, so one can map RNAseq reads onto them and generate BigWigs.
Here's the command I've used, followed by its standard error:
$ ~/bin/bwtool/bwtool agg -long-form -header -expanded 0:100:0 hsAll_Cap1_all_bothAdapters.ERCC.bw ERCC.bed /dev/stdout
Segmentation fault (core dumped)
You can get the two input files here:
http://genome.crg.es/~jlagarde/tmp/ERCC.bed
http://genome.crg.es/~jlagarde/tmp/hsAll_Cap1_all_bothAdapters.ERCC.bw
If you could look into this issue it would be great!
Cheers
Jakob recommended this because it's present in some of the programs but not others.
What determines the chromosome order in "bwtool window"? For example I get this order on my bigwigs (just look at the first column):
GL456396.1 21200 21225 111
GL456354.1 195950 195975 2
GL456382.1 23125 23150 2
JH584298.1 184150 184175 2
GL456367.1 42025 42050 0
GL456216.1 66625 66650 8
GL456381.1 25825 25850 2
JH584297.1 205750 205775 0
GL456366.1 47025 47050 0
GL456394.1 24275 24300 14
GL456379.1 72350 72375 2
JH584296.1 199325 199350 4
MT 16250 16275 33
GL456393.1 55675 55700 2
GL456378.1 31575 31600 6
JH584295.1 1950 1975 0
GL456392.1 23600 23625 63
GL456350.1 227925 227950 4
GL456213.1 39300 39325 2
JH584294.1 191875 191900 0
JH584304.1 114425 114450 842
GL456212.1 153575 153600 2
JH584293.1 207925 207950 4
JH584303.1 158050 158075 0
GL456239.1 40025 40050 48
GL456389.1 28725 28750 0
GL456390.1 24625 24650 29
GL456211.1 241700 241725 0
19 61431525 61431550 0
18 90702600 90702625 0
17 94987225 94987250 0
16 98207725 98207750 0
15 104043650 104043675 0
14 124902200 124902225 0
13 120421600 120421625 0
12 120128975 120129000 0
11 122082500 122082525 0
10 130694950 130694975 0
JH584292.1 14900 14925 4
JH584302.1 155800 155825 1
GL456210.1 169700 169725 0
JH584301.1 259850 259875 0
GL456359.1 22925 22950 8
GL456360.1 31675 31700 6
GL456387.1 24650 24675 0
JH584300.1 182300 182325 0
GL456372.1 28625 28650 14
GL456221.1 206925 206950 4
GL456385.1 35200 35225 2
GL456219.1 175925 175950 0
GL456370.1 26725 26750 0
Y 91744650 91744675 0
X 171031250 171031275 0
GL456233.1 336900 336925 0
9 124595075 124595100 0
8 129401175 129401200 0
7 145441425 145441450 0
6 149736500 149736525 0
5 151834650 151834675 0
4 156508075 156508100 0
3 160039650 160039675 0
2 182113175 182113200 0
1 195471925 195471950 0
The order seems a bit random. My problem is that I just want to use the "bwtool window" output from column 4 to the last column, and for example concatenate this output from multiple files. But then I need to be sure about the ordering of the regions/chromosomes. Can I be sure that for similar bigwigs the chromosome order is always the same?
Thanks!
This would make sense. After all, only one line is outputted with -total
In principle for the first milestone it's done.
It's good enough for this milestone.
It only outputs one? Strange.
This should be ok.
I think perhaps there are several other options that expect an argument and if one isn't given it fails similarly. The current error message makes sense to me but probably wouldn't for a typical user.
Jakob also suggested this for summary. With the sum of squares, the variance can be calculated. This perhaps makes the standard deviation redundant.
It seems that bwtools is not compatible with the function binary_format() in /usr/local/include/htslib/hts.h. How can I solve it?
make[1]: Entering directory`/home/user/repo/tools/libbeato/beato'
gcc -DHAVE_CONFIG_H -I. -I.. -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -fno-strict-aliasing -g -O0 -I/home/user/include -MT metaBigBam.o -MD -MP -MF .deps/metaBigBam.Tpo -c -o metaBigBam.o metaBigBam.c
metaBigBam.c:572:13: error: ‘binary_format’ redeclared as different kind of symbol
static void binary_format(char _s, int num)
^
In file included from /usr/local/include/htslib/sam.h:30:0,
from ../beato/metaBig.h:20,
from metaBigBam.c:16:
/usr/local/include/htslib/hts.h:86:5: note: previous definition of ‘binary_format’ was here
binary_format, text_format,
^
make[1]: *_\* [metaBigBam.o] Error 1
It needs a -header option. -expanded is confusing.
Seems ok for now.
When I run the command "make" in the bwtool folder, there is something wrong and I don't know how to fix it. Could you pleas help me to fix this problem?
$ make
/Applications/Xcode.app/Contents/Developer/usr/bin/make all-recursive
Making all in tests
make[2]: Nothing to be done for `all'.
gcc -DHAVE_CONFIG_H -I. -I/sw/include -MT aggregate.o -MD -MP -MF .deps/aggregate.Tpo -c -o aggregate.o aggregate.c
aggregate.c:7:26: fatal error: jkweb/common.h: No such file or directory
compilation terminated.
make[2]: *** [aggregate.o] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
I think the build of libbeator seemed to be normal.
Thank you!
Hello,
Just a question to clarify the license:
It seems like the resulting binary program will be both "GPL3" (which specifically requires the program to be OK to for all purposes, including commercial use) and JimKent's license (which forbids commercial use).
So basically, it looks like an impossible license... (since libjkweb is an integral part of the program).
I don't know when it started doing that but it doesn't seem to really matter except that it's annoying to wait for it again. It's probably some extra line in the Makefile.am or configure.ac triggering it. Hmm. It's probably easiest to figure out from the generated Makefile.
I think there was a problem with the outputting when inputting multiple beds. It seemed that the output would be consecutive as opposed to concurrent (3 columns). This is probably an easy fix.
Seems to affect both matrix and aggregate. Not yet reproducible in small examples. It's a bit worrying. Perhaps a rollback to an earlier cluster.c is necessary.
Seems not to make correct labels when clustering.
There are none. Just the command summary when no arguments are used.
This remains an issue because doubles are stored as signed values, and it's never clear if all the other bits are zero anyway. Casting tricks aren't so desirable and care needs to be done to maintain that the casted-to integer has the same number of words/bits.
bwtool remove or other places ignoring zero may be changed to use epsilon values to approximate zero. I dunno.
This is sort of a bug. It doesn't crash, it's just that -expanded doesn't do anything with -clusters. At the least, a message should be provided that says this enhancement isn't available yet.
It's good enough for the milestone.
Hi,
A tiny bug:
bwtool extract
generates double commas after an NA
(and -tabs
does not supress all commas)
This is due to line 71 in extractOutBed() from extract.c:
fprintf(out, "NA,%c", (tabs) ? '\t' : ',');
... which should be:
fprintf(out, "NA%c", (tabs) ? '\t' : ',');
Thanks,
Guy
Seen again with bwtool summary not finding the bigWig.
I should put this back in ifdef'd based on whether GSL is linked in or not.
This should also be ok.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.