passionlab / bella Goto Github PK
View Code? Open in Web Editor NEWBELLA: a Computationally-Efficient and Highly-Accurate Long-Read to Long-Read Aligner and Overlapper
License: Other
BELLA: a Computationally-Efficient and Highly-Accurate Long-Read to Long-Read Aligner and Overlapper
License: Other
Hello,
I'm attempting to use BELLA but have encountered a problem in the alignment step.
The installation is one CentOS 3.10, using gcc 6.3.0, with virtualenv pip install of simplesam. When I try the test files using ./bella -i test.txt -o bella_output -d 30
the produced output file 'bella_output.out' is empty.
Looking over the print out I find this 'Average length of successful alignment -nan bp'. Everything else seems fine.
paeruginosa30x_0001_5reads.fastq: 0 MB
K-mer counting: BELLA
Output filename: bella_output.out
K-mer length: 17
X-drop: 7
Depth: 30X
Compute alignment: true
Seeding: two-kmer
Running with up to 4 threads
Reading FASTQ file paeruginosa30x_0001_5reads.fastq
Initial parsing, error estimation, and k-mer loading took: 0.0749692s
Cardinality estimate is 27251
Table size is: 186907 bits, 0.0222811 MB
Optimal number of hash functions is: 5
First pass of k-mer counting took: 0.0391129s
Second pass of k-mer counting took: 0.0153589s
Entries within reliable range: 231
Error rate estimate is -nan
Reliable lower bound: 2
Reliable upper bound: 30
Deviation from expected alignment score: 0.2
Constant of adaptive threshold: -nan
Running with up to 4 threads
Reading FASTQ file paeruginosa30x_0001_5reads.fastq
Fastq(s) parsing fastq took: 0.0317728s
Total number of reads: 5
Old number of nonzeros before merging: 474
New number of nonzeros after merging: 474
Old number of nonzeros before merging: 474
New number of nonzeros after merging: 474
Sparse matrix construction took: 0.00283812s
Available RAM is assumed to be: 8000 MB
FLOPS is 255
nnz(output): 10 | free memory: 8.38861e+09 | required memory: 288
Stages: 1 | max nnz per stage: 291271111
Columns [0 - 5] overlap time: 0.00734909s
Creating or appending to output file with 0 MB
Columns [0 - 5] alignment time: 0.0329758s | alignment rate: 937294 bases/s | average read length: 5556.6 | read pairs aligned this stage: 10
Average length of successful alignment -nan bps
Average length of failed alignment 5788.6 bps
Outputted 0 lines in 0.0115177s
Total running time: 0.346414s```
SELF_TODO: implement lower/upper bound as input parameter.
I'm getting the same error as issue #32 with a copy of the repository cloned today, in the same system background as described in issue #32 - Ubuntu 20.04, nvcc 11.6, gcc 9.4. The comment in that issue that LOGAN may have issues with "large-ish" input is a concern, because I'd like to use bella-gpu on a file containing about 600 gigabases of long-read sequences.
Two questions: (1) Is there a fix for the compile failure other than to install different versions of nvcc and gcc?
(2) If I'm successful in compiling, will bella-gpu process 600 gigabases of input?
Thanks!
I was doing some testing on my computer and I couldn't type the letter 'b' in the control centre
gcc -O3 -fopenmp -c -o bound.o kmercode/bound.cpp
In file included from /usr/include/c++/4.9/array:35:0,
from kmercode/bound.cpp:11:
/usr/include/c++/4.9/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
#error This file requires compiler and library support for the
^
make: *** [bound.o] Error 1
how can I solve the errors? thanks!
Hi,
I've been trying to use bella b150701 to assemble a small portion of the human genome. Whenever I try to use it though I run into the issue that bella apparently sometimes wants insane amounts of memory.
./bella -k 13 -i input.fastq -o test-bella -d 3
sometimes gives
@id1 : 133599645 MB
ACGT .... : 133599645 MB
: 133599645 MB
When I let this run, the memory fills up until I have to restart my computer.
Only sometimes i get somethig more feasible (with the exact same command, mind you) :
@id1 : 3 MB
ACGT .... : 3 MB
: 3 MB
I've tried this with different depth settings and with different k-mer lengths, but the problem seems to always occur.
Hi Giulia,
I've run into a problem when I try to run Bella using minimizers. Specifically, when I run the command
"bella -f input.txt -o output --paf -k 31 -w 17 -e 0.005 -l 2 -u 100",
I get the following output log:
INFO: src/../include/kmercount.hpp(108) InputFile = metagenome-anonymous.fastq
INFO: src/../include/kmercount.hpp(109) InputSize = 67.693069 MB
INFO: src/main.cpp(200) OutputFile = output.out
INFO: src/main.cpp(203) kmerSize = 31
INFO: src/main.cpp(206) GPUs = DISABLED
INFO: src/main.cpp(209) UserDefinedMemory = 8000.000000 MB
INFO: src/main.cpp(212) OutputPAF = 1
INFO: src/main.cpp(215) BinSize = 500
INFO: src/main.cpp(218) DeltaChernoff = 0.100000
INFO: src/main.cpp(221) RunPairwiseAlignment = 1
INFO: src/main.cpp(231) HOPC = DISABLED
INFO: src/main.cpp(235) xDrop = 7
INFO: src/main.cpp(238) KmerSplitCount = 1
INFO: src/main.cpp(243) useMinimizer = ENABLED
INFO: src/main.cpp(245) minimizerWindow = 17
INFO: src/main.cpp(276) numThreads = 64
INFO: src/../include/kmercount.hpp(711) ReadingFASTQ = metagenome-anonymous.fastq
*** Error in `bella': free(): invalid pointer: 0x00002aaacc076480 ***
Aborted
When I run without the "-w 17" option, it runs perfectly.
Thanks,
Gabe
Issue I neglected earlier
Data set:
/project/projectdirs/mp309/bella-spgemm/ecoli_hifi_29x.fastq
Parameters:
-k 31 -l 20 -u 23
Fix:
The issue appears to be in include/common/transpose.h, ~ line 35.
for (IT i=0; i <= n; i++)
{
cscColPtr[i+1] = atomicColPtr[i] + cscColPtr[i];
}
cscColPtr
is of length n+1 I believe (n rows, n+1 stores nnz).
However this writes at location n+1 (i.e. n+2nd element), causing out of bounds.
I believe the correct code would be
for (IT i=0; i < n; i++)
cscColPtr[i+1]
already handles the +1 size correctly for the last index (n).
Hi,
when building bella-gpu (make bella-gpu) ; I've got an error
nvcc -arch=sm_70 -O3 -maxrregcount=32 -std=c++14 -Xcompiler -fopenmp -w -Iinclude/common/GTgraph/sprng2.0-lite/include -IloganGPU -Iseqan -o bella hash_funcs.o Kmer.o Buffer.o fq_reader.o optlist.o src/main.cu -L/home/boelle/Documents/bella/libbloom/build -lbloom -lpthread -lbz2 -lz -D__NVCC__
seqan/seqan/score/score_matrix_dyn.h:110:489: error: template argument 1 is invalid
110 | enum class AminoAcidScoreMatrixID : std::underlying_type_t<decltype(Find<impl::score::MatrixTags, ScoreSpecBlosum30>::VALUE)>
Seems like some template library must be missing - Do you have an idea ?
building without GPU is ok (make bella ); building LOGAN runs fine
platform : Ubuntu 20.04
gcc : 9.4.0
nvcc : 11.6 (quadro P1000 - architecture=sm_62)
Thanks a lot.
Hi,
I was working with the evaluation module of the Bella. I was going to test the output from MECAT. So, there I need to provide two output files: one is Mecat's output file and the other one id Mecat's indices files. I am not sure what does that " Meact's indices" file mean? Where where to find it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.