lbcb-sci / raven Goto Github PK

View Code? Open in Web Editor NEW

200.0 10.0 21.0 1.79 MB

De novo genome assembler for long uncorrected reads

License: MIT License

CMake 6.73% C++ 88.65% Python 4.16% C 0.46%

raven's People

Stargazers

Watchers

Forkers

pythseq thanhleviet rrwick yatsukha rilango wook2014 lvrcek austinhartman mingjuhao fbosnic yananzh ivujevic andrewzhang217 rnshah9 monsanto-pinheiro tihomirkonosic filiptomas ningshuang-yao martin-g

raven's Issues

Raven: stuck at raven::Graph::Construct without any progress.

iI,
I am trying to run CLR reads (around 50 million reads). It runs for few hours and then simply hangs for hours and hours seemingly not doing anything. At this point, when I run top in command line, job don't show up. when I abort it and resume, it works and then same thing happens again after some time. No out puts other than raven.cereal and a empty out put file. it is happening with subset of original data too. program just hangs.

Difference with Ra

Hello, a very simple question: how does Raven differ from Ra? Is Raven simply the evolution of Ra?
From what I understand they use the same modules to perform assembly.

Estimate hardware requirements

Hello,

do you have any benchmarks on the requirements of Raven in relation to the amount of data and the target genome size?

I am wondering if I can assemble a 5 Gb (assembly size) genome on a powerful laptop.

Thank you and keep safe.

QUESTION: What's the largest genome that end-users have assembled with RAVEN?

What's the largest genome that end-users have assembled with RAVEN?
Did you use the GPU version (built for CUDA/GPU)
What were your options, if any?
How long did it take?
Approximately how big was your computer?

mitogenome lost

Hi,
I just ran raven over some of my nanopore data, and the assembly looks great, with one caveat, it lost the complete mitogenome. The same reads assembled by Canu lead to more fragmented contigs, but a tandem repeat of 3-4 mitogenomes. NECAT assembled the mitogenome into a single copy contig, but Raven seems to have discarded all the mitogenome reads as I cannot find any of the mito sequences anywhere in the assembly.
Any idea what happend or how to find out?
Cheers.
Raven.Cereal file

Assembly possibly getting stuck on mapping invalid reads

Hi Robert,
I've been running into some issues with assemblies not completing. It may be an issue with not giving the job enough time, but based on expectations and the logging, I'm not sure if it is an issue with the data or some loop.

I'm assembly a 2.8gb mammal genome with about 60x coverage. I've included the logging output below. Everything proceeds pretty quickly, until it gets "stuck" between minimising and mapping invalid reads. There is no further output after the first iteration of this loop until the job time expires.

[raven::Graph::Construct] mapped sequences 978.772839s
[raven::Graph::Construct] minimized 3141332 - 3275448 / 5662746 16.137639s
[raven::Graph::Construct] mapped sequences 1060.586259s
[raven::Graph::Construct] minimized 3275448 - 3407851 / 5662746 16.112244s
...
[raven::Graph::Construct] mapped sequences 1794.347470s
[raven::Graph::Construct] minimized 5549681 - 5662746 / 5662746 12.567266s
[raven::Graph::Construct] mapped sequences 1453.411794s
[raven::Graph::Construct] annotated piles 56.984509s
[raven::Graph::Construct] removed contained sequences 40.598622s
[raven::Graph::Construct] removed chimeric sequences 136.707741s
[raven::Graph::Construct] reached checkpoint 89.431498s
[raven::Graph::Construct] cleared piles 1.909425s
[raven::Graph::Construct] minimized 0 - 48009 / 193871 19.858725s

After calling with --resume, the cereal file is reloaded, and then it appears to get stuck again on the first iteration of mapping invalid reads.

[raven::] loaded previous run 73.293433s
[raven::] loaded 5662746 sequences 2374.362824s
[raven::Graph::Construct] cleared piles 21.914438s
[raven::Graph::Construct] minimized 0 - 48009 / 193871 27.706757s

Both of these jobs expired after about 650 cpu hours each, compared to the 500 hours suggested for human scale at 44x.

I also tried with a separate dataset with about 30x coverage, but it was the same story of never getting beyond the first iteration of mapping invalid reads.

This was based on building from tip, but a conda-based install seems to be having the same issue (stuck on the same spot after 400+ cpu hours).

Do you have any suggestions?

Thanks,
Alex

Raven is not performing so well on highly heterozygous regions as Ra

Hi,

Thank you for Ra and Raven.

Ra v0.2.1
Raven v0.0.3

I have a diploid genome for which other assemblers would juxtapose the two haplotypes instead of "crushing" them. I tested Ra and was very happy to find that it performed much better than the other long read assemblers I know on this aspect, when running it with the longest reads. However, Ra shortened repeated regions, so I tested Raven hoping that it could perhaps improve this aspect.
Sadly, when using Raven with the longest read as I did with Ra, I found juxtaposed haplotypes like I had with other assemblers.

BUSCO

Hi.
I have assembled the same set of PacBio reads using different assembler programs. When I analysed the assemblies' quality, I noticed that the BUSCO annotation is relatively low for Raven in particular, despite getting similar contiguity that others assemblers such as Flye. Any idea why this is going on?
Thank you.

No fasta output

Hi!

I am trying to polish a set of nanopore amplicon sequences from a very small gene but the following command just gives me an empty file as output:

raven cluster_1.fasta > 1.fasta

This is what is printed in my terminal:

[raven::] loaded 42 sequences 0.004137s
[raven::Graph::Construct] minimized 0 - 42 / 42 0.024922s
[raven::Graph::Construct] mapped sequences 0.005538s
[raven::Graph::Construct] annotated piles 0.000533s
[raven::Graph::Construct] removed contained sequences 0.000017s
[raven::Graph::Construct] removed chimeric sequences 0.000060s
[raven::Graph::Construct] reached checkpoint 0.000276s
[raven::Graph::Construct] minimized 0 - 4 / 4 0.017390s
[raven::Graph::Construct] mapped valid sequences 0.000413s
[raven::Graph::Construct] updated overlaps 0.000002s
[raven::Graph::Construct] removed false overlaps 0.000047s
[raven::Graph::Construct] stored 8 nodes 0.000061s
[raven::Graph::Construct] stored 12 edges 0.000007s
[raven::Graph::Construct] reached checkpoint 0.000337s
[raven::Graph::Construct] 0.049604s
[raven::Graph::Assemble] removed transitive edges 0.000027s
[raven::Graph::Assemble] reached checkpoint 0.000200s
[raven::Graph::Assemble] removed tips and bubbles 0.000005s
[raven::Graph::Assemble] reached checkpoint 0.000223s
[raven::Graph::Assemble] removed long edges 0.000061s
[raven::Graph::Assemble] reached checkpoint 0.000146s
[raven::Graph::Assemble] 0.000712s
[raven::] 0.056928s

It seems, raven stops processing after assembly. Is there any way to resolve this issue?

Raven gfa output

Hello,
is it possible to get the gfa out of Raven?
Thank you!

different results running the same dataset 3 times

Hi!

I just encountered that when running Raven three times with the same settings on the same ONT dataset I get different results. Is that expected with the used algorithm?

I was running raven 3 times each on three datasets (all reads >1kb, all >5kb, all >10kb).
The expected coverage of the three datasets were:

1kb+: ~110x
5kb+: ~80
10kb+: ~40x

I used these commands:

# RAVEN_run1_1kb:
raven -t 20 -p 0 reads_porechopped.1kb.combined.fastq.gz > RAVEN_run1_1kb.fasta 2> RAVEN_run1_1kb.log

# RAVEN_run2_1kb:
raven -t 20 -p 0 reads_porechopped.1kb.combined.fastq.gz > RAVEN_run2_1kb.fasta 2> RAVEN_run2_1kb.log

# RAVEN_run3_1kb:
raven -t 20 -p 0 reads_porechopped.1kb.combined.fastq.gz > RAVEN_run3_1kb.fasta 2> RAVEN_run3_1kb.log

# RAVEN_run1_5kb:
raven -t 20 -p 0 reads_porechopped.5kb.fasta.gz > RAVEN_run1_5kb.fasta 2> RAVEN_run1_5kb.log

# RAVEN_run2_5kb:
raven -t 20 -p 0 reads_porechopped.5kb.fasta.gz > RAVEN_run2_5kb.fasta 2> RAVEN_run2_5kb.log

# RAVEN_run3_5kb:
raven -t 20 -p 0 reads_porechopped.5kb.fasta.gz > RAVEN_run3_5kb.fasta 2> RAVEN_run3_5kb.log

# RAVEN_run1_10kb:
raven -t 20 -p 0 reads_porechopped.10kb.fasta.gz > RAVEN_run1_10kb.fasta 2> RAVEN_run1_10kb.log

# RAVEN_run2_10kb:
raven -t 20 -p 0 reads_porechopped.10kb.fasta.gz > RAVEN_run2_10kb.fasta 2> RAVEN_run2_10kb.log

# RAVEN_run3_10kb:
raven -t 20 -p 0 reads_porechopped.10kb.fasta.gz > RAVEN_run3_10kb.fasta 2> RAVEN_run3_10kb.log

The stats of the raw (unpolished) assemblies are as follows:

As you can see the stats are generally comparable between the runs for each dataset. However, what is quite striking is the huge difference between the longest contigs that were generated.

Any clue as to why this happens and how one can be sure one gets the best possible assembly?

Thanks
Michael

terminate called after throwing an instance of 'std::bad_alloc'

Hi,
I tried to run Raven on Nanopore data (quite a big file, 85GB roughly) on a HPCC with a Torque Scheduler. The run was aborted with the following StdErr:

[raven::] loaded 9087491 sequences 640.684164s
[raven::Graph::Construct] minimized 0 - 244845 / 9087491 359.047807s
[raven::Graph::Construct] mapped sequences 1918.468984s
[raven::Graph::Construct] minimized 244845 - 485704 / 9087491 369.553781s
[raven::Graph::Construct] mapped sequences 4952.626360s
[raven::Graph::Construct] minimized 485704 - 730713 / 9087491 422.459109s
[raven::Graph::Construct] mapped sequences 8062.137459s
[raven::Graph::Construct] minimized 730713 - 970097 / 9087491 365.644221s
[raven::Graph::Construct] mapped sequences 11157.831874s
[raven::Graph::Construct] minimized 970097 - 1205546 / 9087491 408.545256s
[raven::Graph::Construct] mapped sequences 14233.668864s
[raven::Graph::Construct] minimized 1205546 - 1444410 / 9087491 425.380612s
[raven::Graph::Construct] mapped sequences 17397.713790s
[raven::Graph::Construct] minimized 1444410 - 1678744 / 9087491 356.319257s
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
/var/spool/torque/mom_priv/jobs/1752457.mgmt01.cluster.local.SC: line 4: 2272 Aborted "$@"

I am a bit at a loss why. Could it be that I run out of memory? Any help is appreciated!!
Thank you!

Best wishes

Raven not writing anything

Hello,
I wonder if, for large and complex genomes, it would be better to have Raven writing some intermediate files, that could be used to resume the run.
In my case, my job was running for more than 2 weeks when it died for memory limitation. It would have been nice to restart the process with new resources (e.g. less CPUs and more memory, then switching back to more CPUs) and use better the computing resources we have available.
I agree that for small genomes this is not an issue, but with mammalian and large plant genomes, time and costs start becoming relevant.
Any thoughts to go in this direction?

Dario

No fasta output

Another issue it is that with command,

raven -p 3 -t 20 /home/Porechop/porechoped.fastq > contigs.fasta

contigs.fasta is an empty file
and the ouput is this,
raven.txt

any solution-suggestion for this?

Thank you very much

QUESTION: CUDA Parameters Optimisation?

Trying to optimise the CUDA parameters. Using 24 GB GPUs (2x RTX TITANs).

This is essentially the command I'm using, but not sure if I'm getting proper CUDA usage...

$ raven -t 124 -c 100 -a 100 input_file.fastq > raven_asm.fasta

Transport selection via DSN is deprecated

Hello,

Since the last version update (1.5.1) the error I get is:

"Transport selection via DSN is deprecated. You should explicitly pass the transport class to Client() instead."

The job runs only when older version is used.

[raven::] error: file

Hi, so this is my command :
raven -p 3 - t 20 /my path to file/my-file.fastq

and the output error.

[raven::] error: file - has unsupported format extension (valid extensions: .fasta, .fasta.gz, .fa, .fa.gz, .fastq, .fastq.gz, .fq, .fq.gz)!

Could you help me please?

Contigs and raw reads included in the GFA output

Hello Robert,

I noticed that we have contigs and some raw reads included in the GFA output but not in the FASTA output (stdout). Also, it looks they are not polished by Racon. What is the point to include them? Which one should I use as the final genome assembly, the GFA or the FASTA?

Thanks,
Chenxi

Now in brew

FYI

brew install brewsci/bio/raven-assember

Is there a reference (preprint or publication) that can be cited for raven?

I can find a bioRxiv preprint for the assembly program Ra:

https://doi.org/10.1101/656306

but not for raven. Is there something out there?

How to cite raven?

Is there a citation for Raven and can you add it to the Readme?

Thanks

Rob

cannot find .fa.gz file

Hello,
Just FYI: I was running raven with a compressed fasta file and I got this error:
[bioparser::createParser] error: unable to open file reads.fa.gz!
now, with a uncompressed input, it is running. I have raven v0.0.1

Are Illumina reads still supported?

What about mate-pair illumina reads?
How do polish my pacbio restults?

[bioparser::FastaParser] error: invalid file format

Hi,
My input file is a ONT fastq file (where I simply cat *.fastq > bacteria01.fastq the reads I got from the sequencing center).
However, it is not recognized by Raven.
Command:
raven -t 50 bacteria01.fastq

There are no 0-length reads in the file.fq.
I'll be happy for any help (converting the fastq to fastq didn't work as well).

HiFi reads

Any idea of compatibility or performance with Pacbio HiFi reads?

Best
André

Small plasmid misassmbly

Hello Robert,
I am using Raven v1.4.0 to assemble a bacterial genome (using nanopore data) which contains 1 chromosome, 1 large plasmid and 1 small plasmid. When I view the gfa file in bandage, I can see 3 circular contigs are generated. The chromosome and large plasmid are the correct size but the small plasmid (~12kb) seems to assembly as a multimer (96.6kb) and is not output to the fasta file. Do you have any suggestions that might allow this plasmid to assembly correctly? A large percentage of the reads (~30%) for this dataset map to the plasmid.
Thanks,
Scott

error at polishing stage

Hello,
Thanks for developing Raven. I am trying to assemble a 2.5Gb genome with Raven using ~200Gb of CLR reads. The assembly progresses to the polishing stage, but I end with the following error:

[racon::Polisher::Polish] minimized 11278 - 15451 / 15451 15.802887s
[racon::Polisher::Polish] mapped sequences 1652.453550s
[racon::Polisher::Polish] found 9617833 overlaps 6140.732715s
[racon::Polisher::Polish] reverse complemented sequences 1.986854s
[racon::Polisher::Polish] aligned 9617833 / 9617833 overlaps [================] 5401.589224s
[racon::Polisher::Polish] prepared 5396414 window placeholders 79.317244s
/var/spool/slurmd/job6175577/slurm_script: line 16: 92163 Illegal instruction     (core dumped) raven -t 48 Ma_subreads.fasta --resume

Do you have any suggestion?
Thanks a lot!

What is the raven.cereal file?

Hi,

I've looked through the documentation and I can't find an explanation of what the raven.cereal file is / what it does - would it be possible to explain?

Thanks!

How to use multiple fastq files?

How to use multiple fastq files?
Can I use:
raven lib1.fq lib2.fq > assembly.fa

QUESTION: How do I build with/for CUDA?

If I want to build with CUDA capacity (on Ubuntu 18.04 LTS), would the command be?:

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_BUILD_TYPE=racon_enable_cuda .. && make

Enabling GFA output

Hello Robert,

I am trying to use Raven to assemble a small bacterial genome and i am struggling to get the program to save the GFA file. I realise this is in the help but i can't quite figure out the argument (I'm rather new to this). How would i structure this if my input is:

raven
-t 8
/home/concat.fastq.gz > raven.fasta

Any help is much appreciated.

Alan

error while loading shared libraries: libraven.so:

Could you please help with the following error while starting

raven: error while loading shared libraries: libraven.so: cannot open shared object file: No such file or directory

Racon settings after Raven assembly

Dear Robert,

I would like to run Medaka after Raven. ONT suggest to run 1 round of Racon polishing before starting with the Medaka polishing and they suggest specific setting for the Racon stage.
ONT suggests to run Racon with:
racon -m 8 -x -6 -g -8 -w 500 …

Is there a way to add the possibility to change the Racon setting during the Raven run?

Where any specific Racon settings hardcoded in Raven or does it currently use defaults?

I do not want to run Racon through the normal Raven pipeline and then again with different settings. This might not be helpful to yield the best consensus in the end.
Thanks for your help!

Michael

No FASTA output

Hello Mr. Vaser,

I tried out the Raven assembly (v1.3) for my assembly (2.3 Gbp) and ran it with this command:

raven -p 3 -t 30 <FASTQ.GZ>

When it was completed it did not output me a FASTA file, instead I had to extract it out of my screenlog using grep, as the sequences were written to the terminal.

I am not sure why this happened, or if my assembly just crashed, but after examining the Screenlog, I could not find any error messages. When using the --resume option, it just prints the entire(?) sequence without headers again.

Cheers

David

Does raven use the extra information in fastq file?

Hello,

I am currently switching to genomics of organisms that have considerably big genomes. Therefore the question of data storage space becomes prominent. I was wondering if Raven was using the extra information relative to base quality in the fastq or if I could just convert my fastq to fasta, if Raven doesn't care about fastq.

Thank you

EDIT: I was also wondering if you planned a publication, or at least somewhere with more in depth description of how raven does work computationally.

More fragmented assembly after updating from version 1.3.0

Hello,
I have been using raven for a while and recently I reran an assembly of the same bacterial data with a newer version of raven and got a more fragmented genome. With version 1.3.0 I got the complete bacterial genome in one contig. With any later version I got the genome more fragmented and with a smaller total assembly size.
Is it possible to keep the improvements done in recent raven versions but restore the better contiguity observed in version 1.3.0?
Sorry but I can't share the data.
Thanks,
Ilya.

Circular contigs with Raven

Hi Robert,

I have a question about circular contigs for bacterial assembly please. Is there a way to distinguish between circular and linear contigs with Raven output. I tried to visualise the GFA file in Bandage but the contigs do not look circular. Contig self-alignment do not show evidence of overlapping ends. I am planning to try the Circlator pipeline as well.

Many thanks,
Valentine

Racon rounds shrink target contig

Hi!

First, congrats for your work on Ra and Raven. I've been following and testing your work for nanopore reads for quite a while.

I work in viral genomics (dsDNA viruses), studying new viruses, making reference genomes, their diversity, repeat distribution, etc.

I've been testing Raven for quite a while, and, by comparison to the rest, it does a great job at assembling this type of data! However, I realized that the end-contig size is quite sensible to size pre-filtering and to racon iterations. Filtering to intermediate sizes (>5-10 kb) yields almost perfect contiguity. Accepting shorter sequences adds to much diversity (especially in the repeats) and the contiguity drops. If filter is higher, there is no sufficient data to close the genome.

Interestingly though, increasing numbers of Racon iterations, tends to shrink the target contig. The size is not known but it is though to believe to be between 132-150 kbp (experimental data from the '80). Around 110 - 120 kb should be unique and then tandem repeats of 1,5 kb x15 - x20 times.

Here some data: Raven v.1.1.10, nanopore reads filtered at >Q12 + >10kbp:

Racon Repeats	Target Contig length (bp)
0	132197
2	132379
4	131799
5	130988
10	130082
20	128028
30	125151
40	122883
50	120623
80	106626
100	96830

Here some data: Raven v.1.1.10, Nanopore reads filtered at >Q12 + >5kbp:

Racon Repeats	Target Contig length (bp)
2	127475
10	124968
20	121798
30	120227

What do you think might be the phenomenon behind it?

Joan

Chimera creation

Hi Raven team,
Thanks for the tool, I really like how fast it is. I have run raven on a 3gb mammalian genome with 79GB of ONT data. It runs fine but it looks like it is creating a fairly large number of chimeric contigs. I suspect that this could be resolved by adjusting the settings but there seems to be no way to do this..... obviously this causes incorrect N50 values etc.

Is there a way to adjust the overlap settings other then the 'weaken' true false switch? I can turn that on but of course the N50 drops dramatically and it would be nice to be able to find the sweet spot for my data.

Cheers.

aborted job

Hello,
I am running Raven on a large genome, and after several days it dies. The informative parts of the lsf.o file are here:

[raven::Graph::construct] mapped sequences 19607.307358 s
[raven::Graph::construct] minimized 7039343 - 7056909 / 7056909 37.313087 s
[raven::Graph::construct] mapped sequences 19557.058535 s
[raven::Graph::construct] annotated piles 104.209236 s
[raven::Graph::construct] removed contained sequences 73.756697 s
*** Error in `raven': free(): invalid next size (fast): 0x00000031ad378ab0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81499)[0x2b5b65377499]
raven(_ZN5raven5Graph9constructERSt6vectorISt10unique_ptrIN3ram8SequenceESt14default_deleteIS4_EESaIS7_EE+0x1598)[0x427448]
raven(main+0x33b)[0x41589b]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b5b65318445]
raven[0x4162aa]
======= Memory map: ========
00400000-0048a000 r-xp 00000000 00:2e 87958378                           /cluster/home/copettid/bin/raven/bin/raven
00689000-0068a000 r--p 00089000 00:2e 87958378                           /cluster/home/copettid/bin/raven/bin/raven
0068a000-0068b000 rw-p 0008a000 00:2e 87958378                           /cluster/home/copettid/bin/raven/bin/raven
01a2b000-33edfe2000 rw-p 00000000 00:00 0                                [heap]
2b5b64695000-2b5b646b7000 r-xp 00000000 08:02 29360711                   /usr/lib64/ld-2.17.so
2b5b646b7000-2b5b646b9000 rw-p 00000000 00:00 0
2b5b646c8000-2b5b646f0000 rw-p 00000000 00:00 0
2b5b64711000-2b5b64735000 rw-p 00000000 00:00 0
2b5b648b6000-2b5b648b7000 r--p 00021000 08:02 29360711                   /usr/lib64/ld-2.17.so
2b5b648b7000-2b5b648b8000 rw-p 00022000 08:02 29360711                   /usr/lib64/ld-2.17.so
2b5b648b8000-2b5b648b9000 rw-p 00000000 00:00 0
2b5b648b9000-2b5b649a4000 r-xp 00000000 00:2d 68612723                   /cluster/apps/gcc/4.8.2/lib64/libstdc++.so.6.0.18
2b5b649a4000-2b5b64ba4000 ---p 000eb000 00:2d 68612723                   /cluster/apps/gcc/4.8.2/lib64/libstdc++.so.6.0.18
2b5b64ba4000-2b5b64bac000 r--p 000eb000 00:2d 68612723                   /cluster/apps/gcc/4.8.2/lib64/libstdc++.so.6.0.18
[...]
2b5f50000000-2b5f54021000 rw-p 00000000 00:00 0
2b5f54021000-2b5f58000000 ---p 00000000 00:00 0
7ffd5a8a9000-7ffd5a8cb000 rw-p 00000000 00:00 0                          [stack]
7ffd5a929000-7ffd5a92b000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
/cluster/shadow/.lsbatch/1569429955.100973420: line 8: 103861 Aborted                 (core dumped) raven -t 68 ../ra_work_ass/Rabiosa_G303_all_10kb_q7heads.fa

Is this a simple matter of memory? I am wondering if we reached a peak at that time, and whether there is a way to estimate how much more memory would be needed.

Exited with exit code 134.

Resource usage summary:

    CPU time :                                   21250032.00 sec.
    Max Memory :                                 449401 MB
    Average Memory :                             218468.00 MB
    Total Requested Memory :                     432000.00 MB
    Delta Memory :                               -17401.00 MB
    Max Swap :                                   31282 MB
    Max Processes :                              6
    Max Threads :                                75
    Run time :                                   1181580 sec.
    Turnaround time :                            1210107 sec.

I had requested 432 GB RAM on a single node, 36 threads in multithreading. The input file has ~147 Gb of reads (with minimal headers) longer than 10 kb, N50 ~25 kb and about 30x coverage of each haplotype.
Thanks,

Dario

Resume run error

Hi there,

I'm trying to resume a run that exited due to job time limit. The run was in the polisher phase and had recently check pointed.
Below are the last few time stamps.
====> ] 573.365355s^M[racon::Polisher::Polish] called consensus for 6761 / 13522 windows [========> ] 653.947463s^M[racon::Polisher::Polish] called consensus for 7607 / 13522 windows [=========> ] 973.806961s^M[racon::Polisher::Polish] called consensus for 8452 / 13522 windows [==========> ] 973.871454s^M[racon::Polisher::Polish] called consensus for 9297 / 13522 windows [===========> ] 973.901760s^M[racon::Polisher::Polish] called consensus for 10142 / 13522 windows [============> ] 973.928897s^M[racon::Polisher::Polish] called consensus for 10987 / 13522 windows [=============> ] 973.953601s^M[racon::Polisher::Polish] called consensus for 11832 / 13522 windows [==============> ] 973.987368s^M[racon::Polisher::Polish] called consensus for 12677 / 13522 windows [===============>] 975.614038s^M[racon::Polisher::Polish] called consensus for 13522 / 13522 windows [================] 989.959393s [raven::Graph::Polish] reached checkpoint 6.142959s [raven::] 2296.952221s

The job is running 20x PacBio data.
with 48 cpus and 400GB memory. est genome size is 6.5 Gb.
raven --weaken -p 2 -t 48 ${datadir}/*.subreads.fastq.gz > ${outdir}/pb-raven-asm3.fasta
Thanks

raven_build_tests

My problem is due to my lack of skill. I'm hoping for a command that I copy and paste into my terminal.

This is the command to install raven:

git clone https://github.com/lbcb-sci/raven && cd raven && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release .. && make

Which works great. No issues here.

I want to try the 'raven_build_tests' when I build raven.

Where exactly does this happen?

If I wanted to try building with GPU, what is the installation instructions for that?

No output file?

Hi,
I ran raven with the following commands and I got nothing out of it.
#!/bin/bash
#SBATCH --time=120:00:00
#SBATCH --mem=170G
#SBATCH --ntasks=48
#SBATCH --job-name=Raven_ONTq-set1-1K
#SBATCH --account=PHS0338
module load python && source activate assembly-Y
raven ONTq-set1-1K.fa -t 48

Is there a way to get the output as a file - Fasta?

There are just a file called raven.cereal, and that's it.

I don't even know the final assemble size or anything... to check for completeness or else.

How to cite raven?

Not an issue, an query/feature.
Are you planning to publish raven? How do you want raven to be cited in its current version?

Shorter segments than overlaps in GFA

Hello,

While looking at the GFA output of raven, I noticed that some links have longer overlaps than the segment size allows. For instance:

S  ch227_read12486_template_pass_FAH31515     LN:i:15968       RC:i:1
S  ch96_read20376_template_pass_FAH42885      LN:i:5840        RC:i:1
L  ch227_read12486_template_pass_FAH31515     ch96_read20376_template_pass_FAH42885       -       6422M

Is this expected?

Cheers

Autopolyploid genome recommendation

Hi,

I'm working on an auto-tetraploid plant with a genome size of 1.7Gb. For testing purposes I had thrown all the data that I had on default settings with Raven.

The assembly was around 340Mb which is closer to the haploid length estimate. Do you think Raven can be used to assemble it or any parameters that can be changed to accommodate the ploidy?

The genomescope for it is here: http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=1RdMr7FUNtBmC0zbxQTX

Effects on using scrubbed/corrected reads

Hello, @rvaser,

I would like to know if there should be any improvement on assembly quality with usage of previously corrected/scrubbed reads, or the correction step in Raven would possibly result in worst results.

Best regards,
fsciammarella

question about the very large genome

Hello, After reading your paper , I am curious about if it is useful for my genome . So I have some questions . I have a very large genome ,about 6 Gb , to be sequenced . Will the software support its assembling? How about the heterozygosity dealing when it assembles the high heterozygosity speceies to 2.0% ? And how about the run time and RAM consuming compared to WTDBG2 ? Thank you in advance .

raven.cereal

Hi,

would it be possible to not create the file raven.cereal by default or delete it after raven is finished?

thanks,
Peter

Core dumped: Illegal instruction

Hi Robert,

Thanks for another amazing tool. I am trying to run it on a cluster (compiling machine different from the target machine) using conda installation (v0.0.7). It results in a dumped core with error message from the script being "Illegal instruction". Could it be an issue with Cmake activating -march=native?
I will try source installation as well.

lbcb-sci / raven Goto Github PK

raven's People

Stargazers

Watchers

Forkers

raven's Issues

Recommend Projects

Recommend Topics

Recommend Org