lh3 / cgranges Goto Github PK
View Code? Open in Web Editor NEWA C/C++ library for fast interval overlap queries (with a "bedtools coverage" example)
License: MIT License
A C/C++ library for fast interval overlap queries (with a "bedtools coverage" example)
License: MIT License
I noticed a difference in behaviour when calling the contain and overlap functions.
First, adds the interval cr_add(cr, "chr1", 10, 10, 0);
overlap ("chr1", 10, 20) finds nothing.
contain ("chr1", 10, 20) finds this interval.
#include <stdio.h>
#include <stdlib.h>
#include "cgranges.h"
int main(void)
{
cgranges_t *cr = cr_init();
cr_add(cr, "chr1", 10, 10, 0);
cr_index(cr);
int64_t i, n, *b = 0, max_b = 0;
n = cr_overlap(cr, "chr1", 10, 11, &b, &max_b);
for (i = 0; i < n; ++i)
printf("%d\t%d\t%d\n", cr_start(cr, b[i]), cr_end(cr, b[i]), cr_label(cr, b[i]));
free(b);
cr_destroy(cr);
return 0;
}
# no output
#include <stdio.h>
#include <stdlib.h>
#include "cgranges.h"
int main(void)
{
cgranges_t *cr = cr_init();
cr_add(cr, "chr1", 10, 10, 0);
cr_index(cr);
int64_t i, n, *b = 0, max_b = 0;
n = cr_contain(cr, "chr1", 10, 11, &b, &max_b);
for (i = 0; i < n; ++i)
printf("%d\t%d\t%d\n", cr_start(cr, b[i]), cr_end(cr, b[i]), cr_label(cr, b[i]));
free(b);
cr_destroy(cr);
return 0;
}
10 10 0
This behaviour was somewhat surprising. Is this normal behaviour? I'm not very familiar with genomics, so I don't know exactly how it should behave.
I am very interested in wrapping this library in Cython and implementing functionality to avoid the Python overhead by allowing for batch queries. This is because I might want to use CRanges in my PyRanges library.
Hi!
I am pleased to inform you that I have created a Ruby language binding for cgranges.
This is a native Ruby C extension, unlike ruby-minimap2, which uses ffi.
Understanding the algorithm is difficult for me, but I can call it anyway.
Have a nice day!
Getting a warning about variable shadowing.
../src/third-party/cgranges/IITree.h:48:35: warning: declaration shadows a field of 'IITree<S, T>' [-Wshadow]
bool operator()(const Interval &a, const Interval &b) const { return a.st < b.st; }
^
../src/third-party/cgranges/IITree.h:50:24: note: previous declaration is here
std::vector<Interval> a;
^
../src/third-party/cgranges/IITree.h:52:40: warning: declaration shadows a field of 'IITree<S, T>' [-Wshadow]
int index_core(std::vector<Interval> &a) {
^
../src/third-party/cgranges/IITree.h:50:24: note: previous declaration is here
std::vector<Interval> a;
Dear @lh3,
While running sanitation (address,undefined), I hit errors in IITree
. I attempted debugging, but was unable to figure out the shifts. I would really appreciate it if you could take a look at this undefined behavior, and variable name masking. Let me know if I can help in any way.
Best wishes,
Zev
+ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/third-party/cgranges/IITree.h:84:41 in
+ ../src/third-party/cgranges/IITree.h:84:54: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long long'
+ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/third-party/cgranges/IITree.h:84:54 in
+ ../src/third-party/cgranges/IITree.h:88:24: runtime error: shift exponent -1 is negative
+ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/third-party/cgranges/IITree.h:88:24 in
+ ../src/third-party/cgranges/IITree.h:88:31: runtime error: shift exponent -1 is negative
+ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/third-party/cgranges/IITree.h:88:31 in
Hi,
6da723 (fixes #7) introduced if (max_level < 0) return false;
into cpp/IITreeBFS.h , but max_level
is only defined & used in the IITree class of cpp/IITree.h ...?
On current head (2fb5a2):
$ make
g++ -g -Wall -O3 -D_USE_BFS -std=c++98 -I.. -I../cpp bedcov-iitree.cpp -lz -o bedcov-iitree-bfs
In file included from bedcov-iitree.cpp:6:
../cpp/IITreeBFS.h: In member function ‘bool IITree<S, T>::overlap(const S&, const S&, std::vector<long unsigned int>&) const’:
../cpp/IITreeBFS.h:65:7: error: ‘max_level’ was not declared in this scope
65 | if (max_level < 0) return false;
| ^~~~~~~~~
make: *** [bedcov-iitree-bfs] Error 1
Thank you!
I noticed that I cannot delete a specific item in this tree. Please add this method…
Thanks for the library.
I am wondering whether you have other types of queries planned. GRanges allows for many kinds of genomic queries, such as finding the nearest interval and so on.
hi, thanks for the nice library. I wrapped it for nim-lang thinking it would be faster than my naive library.
See timings here:
https://github.com/brentp/nim-cgranges
My library (here: https://github.com/brentp/nim-lapper) has horrible worst-case performance given some huge intervals, because it's just sorted by start, with knowledge of longest interval. But, it does seem to work well for most genomic data-types which often do not have huge intervals.
I wonder if you could see if a simple datastructure like the one in nim-lapper holds up in your timings. (Or if you see something wrong with my code that would result in slowing down cgranges-- I was surprised at the result).
current timings here: https://github.com/brentp/nim-cgranges#speed
@lh3 I would like to use cgranges
via a conda install, but to add the recipe to bioconda a release is needed (see this link for why). This can be as simple as tagging the current main comment as 0.1.1
. Any chance that this can be done?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.