mikolmogorov / flye Goto Github PK

De novo assembler for single molecule sequencing reads using repeat graphs

License: Other

Python 4.94% Makefile 0.74% C++ 42.90% C 44.52% Roff 1.59% JavaScript 1.72% M4 0.78% Shell 0.21% Java 0.12% Perl 1.95% Lua 0.37% Cython 0.16%

flye's People

Contributors

Stargazers

Watchers

Forkers

druvus biolinyu sdquest thackl jellyr bioluria kingspm sanvva pythseq liaoherui seryrzu zeeev sunnycqcn alekseyzimin xma82 beatusmodest kazumaxneo kdm9 zhanmengtao tw7649116 jingmingxia peiwenliu18 almiheenko davidehufnagel amalit lalalagartija rmzelle lucast122 ccoulombe xuelei-dai xjyx lauramilena3 zovoilis-lab sir-pinecone jianguozhou3 alexpersa7 vikash84 corazontom bestweicheng tomneu arun-sub tmassingham-ont ramsayl a7032018 schultzelab alienzj mjpdejong gavinband luciernag zozo123 juadiegaitan ural-yunusbaev elor77 liaohu1231 hasindu2008 zxgsy520 lizhizhong1992 neptuneyt qinyuanapril shernadi dimple2020 hiroshisuga huang1990 sebschmi deniribicic takuronkym mingjuhao alexweisberg skybig233 dmsalsgh97 bikc eernst yangxiaofeill chiawei-liao zouyinstein davinsaviro raverjay lucyintheskyzzz jguhlin wook2014 joshuamcginnis felixlangschied mpalmada mirpedrol bresyd yananzh dankein krol33 yuzhenpeng tintingli sachitesh mphschmitt zymergen-luke vviiit asan-emirsaleh bikmi wjt0925 jackgoza rnshah9 jlombo96

flye's Issues

can't start new thread

Hello everyone,
I'm trying to get flye to run using similar parameters as I did with Abruijn. I'm using a Centos 7.0 with 1.5Tb of RAM and 64 cores (128 threads). With the following command:
flye --pacbio-raw ${reads} -g 3g -o ${OD} -t 127 -i 3
The program executes for 15 hours and then it fails with a series of error, the firs of which is the following:

[2018-01-08 10:03:43] INFO: Running Flye 2.3-release
[2018-01-08 10:03:43] INFO: Assembling reads
[2018-01-08 10:03:43] INFO: Reading sequences
[2018-01-08 10:20:58] INFO: Generating solid k-mer index
[2018-01-08 10:25:51] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-08 10:29:21] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-08 10:51:10] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-08 11:46:11] INFO: Extending reads
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-08 22:26:16] INFO: Assembled 226679 draft contigs
[2018-01-08 22:30:40] INFO: Generating contig sequences
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-08 23:28:59] INFO: Running Minimap2
[2018-01-09 01:13:13] INFO: Computing consensus
Process SyncManager-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 558, in _run_server
    server.serve_forever()
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 184, in serve_forever
    t.start()
  File "/usr/lib64/python2.7/threading.py", line 747, in start
    _start_new_thread(self.__bootstrap, ())
error: can't start new thread
(...)

and then just continues complaining about all the threads that failed, then missing or empty files and then it finishes. I know that Abruijn was able to use all the 128 threads available in the node, is there a limit to the number of threads that Flye can start?, Would it be better to drop the number to 64 or something like that?

Best regards,

Wrong parameters in get_consensus?

From main.py:
https://github.com/fenderglass/Flye/blob/e5b1903e5ce7fbf687b44e2c46f50587453e50e7/flye/main.py#L179-L182

From consensus.py:

https://github.com/fenderglass/Flye/blob/e5b1903e5ce7fbf687b44e2c46f50587453e50e7/flye/consensus.py#L48-L49

self.args.threads should change place with self.args.min_overlap, as far as I can see.

Thank you.

Ole

Error in assemble binary on ONP data

I have:

ONP data (about 1.1Gbp), 9.4 pores
Genome size 7Mbp
Most recent ABruijn version (pulled and compiled today)
Many reads longer than 100kbp up to 700kbp max

I get the following error:
[2017-09-19 15:37:47] INFO: Extending reads 0% 10% 20% [2017-09-19 17:48:35] ERROR: Error: Error in assemble binary: Command '['abruijn-assemble', '-k', '15', '-l', '/home/user/Documents/Abruijn_ko_fz/out/abruijn.log', '-t', '11', '-v', '5000', '/home/user/bioinf_archive/32_scmi_storage/onp/ko_onp_FZ1/extracted/twoBestMin30.fasta', '/home/user/Documents/Abruijn_ko_fz/out/draft_assembly.fasta', '150']' returned non-zero exit status -9

I don't think that I run out of discspace/memory.

What went wrong?

Here the end of the log file:

	With 11 reads
	Start read: -2be4669b-93c2-4154-aa4b-625728fa7d06_runid=2b076ac8f6a448e848698ae57d8581ac75fc0637_read=7487_ch=298_start_time=2017-09-13T02:49:06Z_.poretools_tmp/20170912_1617_qc/fast5/pass/36/fz_i_177_20170912_fah18372_MN15037_sequencing_run_qc_40637_read_7487_ch_298_strand.fast5
	At position: 10
	leftTip: 0 rightTip: 0
	Suspicios: 0
	Mean overlaps: 256
	Inner reads: 10
[2017-09-19 15:40:10] DEBUG: Inner: 30804 covered: 42884 total: 55124
[2017-09-19 15:40:10] DEBUG: Discarded contig with 17 reads and 16 inner overlaps
[2017-09-19 15:40:10] DEBUG: Discarded contig with 13 reads and 12 inner overlaps
[2017-09-19 17:48:35] root: ERROR: Error: Error in assemble binary: Command '['abruijn-assemble', '-k', '15', '-l', '/home/user/Documents/Abruijn_ko_fz/out/abruijn.log', '-t', '11', '-v', '5000', '/home/user/archive/storage/onp/ko_onp_FZ1/extracted/twoBestMin30.fasta', '/home/user/Documents/Abruijn_ko_fz/out/draft_assembly.fasta', '150']' returned non-zero exit status -9```

Read extension jumping to 100% completion

Hi,

Many thanks for developing this great assembler.

I have recently updated to version 2.0 version of ABruijn and testing it out on relatively complex nanopore metagenomics samples. The eukaryotic assembly portion, which I'm most interested in, comprises 5-30% of the reads in a given dataset. This means that there tends to be a high coverage of several bacterial genomes in the assembly as a consequence. The two datasets that I'm working with are 2.7 Gbp and 16 Gbp.
With the new version of ABruijn I noticed that the read extension appears to jump from either 0%, 10% or 30% (example shown below) directly to 100% completion depending on the dataset. The assembly appears to progress normally after this. The final assemblies I'm getting appears to have lower contiguity than earlier versions (pre 2.0) of ABruijn with the same dataset or a dataset of half the size..

Is the jump to completion in read extension expected behaviour with the new version or could there be an indication of a problem with the assembly (lower assembly contiguity)? Sorry if these are vague questions but I have a feeling that the assembler is running into issues, perhaps as a result of conflicts between the given genome size and the estimated coverage by the assembler.

The estimated genome size is around 30 Mbp, with coverage of around 20x in the dataset for the example shown below.

The launch code:
abruijn MinION_albacore1.0.3.chop.fasta /scratch2/jon/MinION/MinION_Abruijn_2.0/ 20 --platform nano --threads 10 --min-overlap 3000 --iterations 3


[2017-08-21 12:43:16] INFO: Running ABruijn
[2017-08-21 12:43:17] INFO: Assembling reads
[2017-08-21 12:43:17] INFO: Reading FASTA
[2017-08-21 12:45:28] INFO: Generating solid k-mer index
[2017-08-21 12:45:30] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-21 12:55:07] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-21 13:06:59] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-21 13:23:30] INFO: Extending reads
0% 10% 20% 30% 100%
[2017-08-21 15:56:40] INFO: Assembled 325 draft contigs
[2017-08-21 15:56:40] INFO: Generating contig sequences
[2017-08-21 16:05:24] INFO: Running BLASR
[2017-08-21 16:37:38] INFO: Computing rough consensus
[2017-08-21 18:12:30] INFO: Performing repeat analysis
[2017-08-21 18:12:32] INFO: Reading FASTA
[2017-08-21 18:14:25] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-21 18:17:48] INFO: Simplifying the graph
[2017-08-21 18:17:49] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-21 19:02:42] INFO: Resolving repeats
[2017-08-21 19:02:59] INFO: Generating contigs
[2017-08-21 19:02:59] INFO: Generated 461 contigs
[2017-08-21 19:03:05] INFO: Running BLASR
[2017-08-21 19:30:03] INFO: Polishing genome (1/3)
[2017-08-21 19:32:03] INFO: Separating alignment into bubbles
[2017-08-21 21:19:35] INFO: Correcting bubbles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-22 07:16:53] INFO: Running BLASR
[2017-08-22 07:44:43] INFO: Polishing genome (2/3)
[2017-08-22 07:46:42] INFO: Separating alignment into bubbles
[2017-08-22 09:43:09] INFO: Correcting bubbles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-22 15:47:37] INFO: Running BLASR
[2017-08-22 16:14:38] INFO: Polishing genome (3/3)
[2017-08-22 16:16:32] INFO: Separating alignment into bubbles
[2017-08-22 18:15:51] INFO: Correcting bubbles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-22 21:48:37] INFO: Done! Your assembly is in file: /scratch2/jon/MinION/Abruijn_2.0/polished_3.fasta

ABruijn won't generate reads_order.fasta file

Hi,

I tried to assemble hi5 genome from the pacbio reads, the genome size is about 400M. It was looking for reads_order.fasta file, but there is only one log file in the folder.

Thanks

Jack

[11:34:53] root: INFO: Running ABruijn
[11:34:53] root: INFO: Assembling reads
-----------Begin assembly log------------
[11:34:53] DEBUG: Build date: Dec 16 2016 11:26:08
[11:34:53] DEBUG: Reading FASTA
[11:44:26] DEBUG: Hard threshold set to 6
[11:44:27] INFO: Counting kmers (1/2):
[12:19:43] INFO: Counting kmers (2/2):
[12:56:18] DEBUG: Genome size estimate: 348697701
[12:56:18] DEBUG: Filtered 4104784 repetitive kmers
[12:56:18] DEBUG: Estimated minimum kmer coverage: 12, 697822632 unique kmers selected
[12:56:18] INFO: Building kmer index
[14:08:30] root: ERROR: Error: Error in assemble binary: Command '['abruijn-assemble', '-k', '15', '-l', '/is2/projects/nanopore/scratch/hi5/hi5_32_abjuijn/abruijn.log', '-t', '16', '-v', '5000', '/is2/projects/pacbio/active/data/product/smrtportal/018/018932/data/filtered_subreads.fasta', '/is2/projects/nanopore/scratch/hi5/hi5_32_abjuijn/reads_order.fasta', '60']' returned non-zero exit status -9

No progress from flye-repeat after two weeks

I've been running Flye on some PacBio data which is high quality but for a long and highly repetitive genome and I don't have a lot of coverage (cost constraints). Progress was fine up to the point where it started flye-repeat and now it is just sitting there using a single CPU (even though 64 were specified) and hasn't produced any logs since Feb 16th. I've no way to tell how long this is likely to take looking at it and it is just sitting at Initializing edges. There's ample RAM and CPU on the machine (3TB and 72 cores) so I'm just hoping there's some way to tell if it is ever going to finish. The genome is 26Gb and I've only got about 6x coverage. I have a whole lot of Illumina data too but I was hoping to make some long runs from this that could help me join the many many contigs I got out of SOAPdenovo2 together.

Error during consensus

Hi,

I'm trying to assemble a eukaryotic genome of about 200-300 Mbp size, genome size was estimated from a miniasm assembly. Besides the eukaryote genome there are also a considerable amount of prokaryotic genomes associated (both endosymbiont and extracellular) with the data set. The total dataset is around 16 Gbp ONT reads. An appreciable amount of the data is realtively short so I decided to run with "--min-overlap 3000"

Launch-script

flye --nano-raw \
/scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq \
--genome-size 200m --out-dir Busselton2_Flye_200m_3000 --threads 20 --min-overlap 3000 --iterations 2 --resume

Log-file, start

[2018-02-22 08:41:05] root: DEBUG: Genome size: 209715200
[2018-02-22 08:41:05] root: DEBUG: Chosen k-mer size: 17
[2018-02-22 08:41:05] root: INFO: Running Flye 2.3-release
[2018-02-22 08:41:05] root: DEBUG: Cmd: /scratch2/software/python-2.7-env/bin/flye --nano-raw /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq --genome-size 200m --out-dir Busselton2_Flye_200m_3000 --threads 20 --min-overlap 3000 --iterations 2
[2018-02-22 08:41:05] root: INFO: Assembling reads
[2018-02-22 08:41:05] root: DEBUG: -----Begin assembly log------
[2018-02-22 08:41:05] root: DEBUG: Running: flye-assemble -k 17 -l /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/flye.log -t 20 -v 3000 /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/0-assembly/draft_assembly.fasta 209715200 /scratch2/software/python-2.7-env/local/lib/python2.7/site-packages/flye/resource/asm_raw_reads.cfg
[2018-02-22 08:41:05] DEBUG: Build date: Jan  8 2018 12:26:55
[2018-02-22 08:41:05] DEBUG: Parameters:
[2018-02-22 08:41:05] DEBUG:    maximum_jump=1500
[2018-02-22 08:41:05] DEBUG:    maximum_overhang=1500
[2018-02-22 08:41:05] DEBUG:    hard_min_coverage_rate=10
[2018-02-22 08:41:05] DEBUG:    repeat_coverage_rate=10
[2018-02-22 08:41:05] DEBUG:    close_jump_rate=100
[2018-02-22 08:41:05] DEBUG:    far_jump_rate=2
[2018-02-22 08:41:05] DEBUG:    overlap_divergence_rate=5
[2018-02-22 08:41:05] DEBUG:    penalty_window=100
[2018-02-22 08:41:05] DEBUG:    max_coverage_drop_rate=5
[2018-02-22 08:41:05] DEBUG:    chimera_window=100
[2018-02-22 08:41:05] DEBUG:    min_reads_in_contig=4
[2018-02-22 08:41:05] DEBUG:    max_inner_reads=10
[2018-02-22 08:41:05] DEBUG:    max_inner_fraction=0.25
[2018-02-22 08:41:05] DEBUG:    max_separation=500
[2018-02-22 08:41:05] DEBUG:    tip_length_threshold=20000
[2018-02-22 08:41:05] DEBUG:    unique_edge_length=50000
[2018-02-22 08:41:05] DEBUG:    min_repeat_res_support=0.5
[2018-02-22 08:41:05] DEBUG:    out_paths_ratio=5
[2018-02-22 08:41:05] DEBUG:    graph_cov_drop_rate=10
[2018-02-22 08:41:05] DEBUG:    coverage_estimate_window=100
[2018-02-22 08:41:05] DEBUG:    low_cutoff_warning=1
[2018-02-22 08:41:05] DEBUG:    assemble_kmer_sample=1
[2018-02-22 08:41:05] DEBUG:    assemble_gap=500
[2018-02-22 08:41:05] DEBUG:    repeat_graph_kmer_sample=5
[2018-02-22 08:41:05] DEBUG:    repeat_graph_gap=100
[2018-02-22 08:41:05] DEBUG:    repeat_graph_max_kmer=500
[2018-02-22 08:41:05] DEBUG:    read_align_kmer_sample=1
[2018-02-22 08:41:05] DEBUG:    read_align_gap=500
[2018-02-22 08:41:05] DEBUG:    read_align_max_kmer=500
[2018-02-22 08:41:05] INFO: Reading sequences
[2018-02-22 10:17:47] DEBUG: Mean read length: 3639
[2018-02-22 10:17:47] DEBUG: Estimated coverage: 69
[2018-02-22 10:17:47] INFO: Generating solid k-mer index
[2018-02-22 10:17:47] DEBUG: Hard threshold set to 7
[2018-02-22 10:17:47] DEBUG: Started kmer counting
[2018-02-22 10:28:35] INFO: Counting kmers (1/2):
[2018-02-22 10:32:57] INFO: Counting kmers (2/2):
[2018-02-22 10:44:30] DEBUG: Filtered 363871 repetitive kmers
[2018-02-22 10:44:30] DEBUG: Estimated minimum kmer coverage: 10, 206931346 unique kmers selected
[2018-02-22 10:44:30] INFO: Filling index table
[2018-02-22 10:44:38] DEBUG: Solid kmers: 206931346
[2018-02-22 10:44:38] DEBUG: Kmer index size: 6149339332
[2018-02-22 11:02:03] DEBUG: Total chunks 1467 wasted space: 71130
[2018-02-22 11:11:41] INFO: Extending reads
[2018-02-22 11:17:22] DEBUG: Mean read coverage: 53
[2018-02-22 11:23:31] DEBUG: Assembled contig 1

Log-file end

[2018-02-23 01:46:19] DEBUG: Inner: 737088 covered: 1174293 total: 8006938
[2018-02-23 01:47:02] DEBUG: Discarded contig with 7 reads and 2 inner overlaps
[2018-02-23 01:49:04] INFO: Assembled 1496 draft contigs
[2018-02-23 01:49:11] INFO: Generating contig sequences
[2018-02-23 02:12:56] DEBUG: Writing FASTA
-----------End assembly log------------
[2018-02-23 02:13:40] root: INFO: Running Minimap2
[2018-02-23 02:13:40] root: DEBUG: Running: flye-minimap2 /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/0-assembly/draft_assembly.fasta /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq -a -Q -w5 -m100 -g10000 --max-chain-skip 25 -t 20 -k15
[2018-02-23 03:17:44] root: DEBUG: Sorting alignment file
[2018-02-23 04:01:37] root: INFO: Computing consensus
[2018-02-25 12:13:37] root: DEBUG: Genome size: 209715200
[2018-02-25 12:13:37] root: DEBUG: Chosen k-mer size: 17
[2018-02-25 12:13:37] root: INFO: Running Flye 2.3-release
[2018-02-25 12:13:37] root: DEBUG: Cmd: /scratch2/software/python-2.7-env/bin/flye --nano-raw /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq --genome-size 200m --out-dir Busselton2_Flye_200m_3000 --threads 20 --min-overlap 3000 --iterations 2 --resume
[2018-02-25 12:13:37] root: INFO: Resuming previous run
[2018-02-25 12:13:37] root: INFO: Running Minimap2
[2018-02-25 12:13:37] root: DEBUG: Running: flye-minimap2 /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/0-assembly/draft_assembly.fasta /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq -a -Q -w5 -m100 -g10000 --max-chain-skip 25 -t 20 -k15
[2018-02-25 14:28:55] root: DEBUG: Sorting alignment file
[2018-02-25 16:13:47] root: INFO: Computing consensus

The assembly was running well and produced the a draft sequence, Then we had a cluster crash during the consensus step (not specifically related to Flye I think). I restarted using "--resume ". During the consensus run I have received a large number of error like the two instances shown below. Flye appears to be still running.

[2018-02-25 12:13:37] INFO: Running Flye 2.3-release
[2018-02-25 12:13:37] INFO: Resuming previous run
[2018-02-25 12:13:37] INFO: Running Minimap2
[2018-02-25 16:13:47] INFO: Computing consensus
Process Process-1020:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/scratch2/software/python-2.7-env/local/lib/python2.7/site-packages/flye/consensus.py", line 45, in _thread_worker
    error_queue.put(e)
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
    self._connect()
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 175, in Client
    answer_challenge(c, authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 428, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
EOFError
Process Process-1023:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/scratch2/software/python-2.7-env/local/lib/python2.7/site-packages/flye/consensus.py", line 45, in _thread_worker
    error_queue.put(e)
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
    self._connect()
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 175, in Client
    answer_challenge(c, authkey)
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 428, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message

When investigate the node processes, there appear to be a large number of flye processes that are not using any resources. I launched with 20 threads.

Tasks: 1211 total,   2 running, 588 sleeping,   0 stopped, 621 zombie
%Cpu(s): 37.4 us,  0.3 sy,  0.0 ni, 62.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  79251590+total, 29567382+used, 49684204+free,    75548 buffers
KiB Swap:  7842748 total,        0 used,  7842748 free. 50032572 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
22028 jon       20   0   25896   4048   2560 R   1.3  0.0   0:01.36 top
 3571 jon       20   0   20836   5868   2752 S   0.0  0.0   0:00.07 bash
 3580 jon       20   0  374792 341216   6100 S   0.0  0.0   3:44.47 flye
 8248 jon       20   0 32.335g 774880   4028 S   0.0  0.1   0:07.17 flye
15421 jon       20   0  374568 338464   3552 S   0.0  0.0   0:00.00 flye
15424 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15427 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15430 jon       20   0       0      0      0 Z   0.0  0.0   0:00.03 flye
15433 jon       20   0       0      0      0 Z   0.0  0.0   0:00.02 flye
15436 jon       20   0       0      0      0 Z   0.0  0.0   0:00.02 flye
15439 jon       20   0       0      0      0 Z   0.0  0.0   0:00.03 flye
15442 jon       20   0       0      0      0 Z   0.0  0.0   0:00.02 flye
15445 jon       20   0       0      0      0 Z   0.0  0.0   0:00.03 flye
15448 jon       20   0       0      0      0 Z   0.0  0.0   0:00.04 flye
15451 jon       20   0       0      0      0 Z   0.0  0.0   0:00.03 flye
15454 jon       20   0       0      0      0 Z   0.0  0.0   0:00.03 flye
15457 jon       20   0       0      0      0 Z   0.0  0.0   0:00.00 flye
15460 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15463 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15466 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15469 jon       20   0       0      0      0 Z   0.0  0.0   0:00.04 flye
15472 jon       20   0       0      0      0 Z   0.0  0.0   0:00.00 flye
15475 jon       20   0       0      0      0 Z   0.0  0.0   0:00.04 flye
15478 jon       20   0       0      0      0 Z   0.0  0.0   0:00.04 flye
15481 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15484 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15487 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye
15490 jon       20   0       0      0      0 Z   0.0  0.0   0:00.00 flye
15493 jon       20   0       0      0      0 Z   0.0  0.0   0:00.01 flye

Any ideas on what these errors might mean or if they are benign?

Cheers
Jon

Single amplicon mode

Hi!

I'd like to try your approach to get a draft assembly for a 3kb amplicon. Most of the reads fully span the region with ~30x. Did you ever try this? It doesn't assemble a contig.

[17:40:19] INFO: Running ABruijn
[17:40:19] INFO: Assembling reads
[17:40:21] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:40:21] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:40:22] INFO: Building kmer index
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:40:22] INFO: Finding overlaps:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:40:22] INFO: Extending reads
[17:40:22] INFO: Assembled 0 contigs
[17:40:22] INFO: Generating contig sequences
[17:40:22] INFO: Polishing genome (1/2)
[17:40:22] INFO: Running BLASR
[17:40:22] ERROR: While running...

The goal would be to create a draft consensus sequence.

Thank you,
Armin

Override min overlap restriction

Hi there,

I'm trying to assemble an older PacBio dataset in which many of the reads are ~1000bp. Flye does not allow the min overlap threshold to be set any lower than 1000bp - I think that this might be why my final assembly has many contigs of roughly this size.

Is there a good reason not to run flye with a lower limit? and if not, how can disable the error message?

Cheers,
Adam

Using Flye for Athena_meta

Hello,

I am trying to get Athena_meta to work, and part of the pipeline involves flye.
I have tried using version 2.3.4 and 2.3.1 but I always get the same error related to flye-polish.
Can you help me identify a solution for this issue?

launching Flye OLC assembly
cmd flye --subassemblies ./results/olc/flye-input-contigs.fa --out-dir ./results/olc/flye-asm-1 --genome-size 1857551 --threads 4 --min-overlap 1000
Traceback (most recent call last):
  File "/usr/local/devel/BCIS/kevin/Flye-2.3.1/bin/flye", line 31, in <module>
    sys.exit(main())
  File "/usr/local/devel/BCIS/kevin/Flye-2.3.1/flye/main.py", line 513, in main
    pol.check_binaries()
  File "/usr/local/devel/BCIS/kevin/Flye-2.3.1/flye/polish.py", line 41, in check_binaries
    raise PolishException(str(e))
flye.polish.PolishException: Command '['flye-polish', '-h']' returned non-zero exit status 1

Thanks in Advance,
Kevin N.

Update of install docs

Hi, just started here.
I think there are a couple of errors in the install docs

First, to build ABruijn, run:

python install.py build
I used
python setup.py build

ABruijn could be invoked with the following command:

bin/abruijn
This then worked, I haven't tested out the full algorithm with test or my data yet though

Additonally, you may install the package for the better OS integration:

python setup.pu install

Correction
python setup.py install

Any plans for supporting gfa format for assembly graphs?

Hello all,

I've been trying to assemble some small microbes with flye and so far the results look very encouraging!

I'm used to looking at assembly graphs with Bandage. Do you have any plans for providing gfa-formatted assembly graphs in the near future?

Cheers,
~Lina

Opening assembly graph in gephi?

Hi!

When I try to open the assembly_graph.dot with gephi, I get this error below. It seems like gephi supports this file format, any ideas what might be going wrong here? Suggestions for a different viewer?

Thanks!
Lizzy

java.lang.IllegalArgumentException: The id can't be empty
at org.gephi.io.importer.impl.ImportContainerImpl.checkId(ImportContainerImpl.java:1045)
at org.gephi.io.importer.impl.ImportContainerImpl.nodeExists(ImportContainerImpl.java:209)
at org.gephi.io.importer.plugin.file.ImporterDOT.getOrCreateNode(ImporterDOT.java:197)
at org.gephi.io.importer.plugin.file.ImporterDOT.stmt(ImporterDOT.java:181)
at org.gephi.io.importer.plugin.file.ImporterDOT.stmtList(ImporterDOT.java:161)
at org.gephi.io.importer.plugin.file.ImporterDOT.graph(ImporterDOT.java:149)
at org.gephi.io.importer.plugin.file.ImporterDOT.importData(ImporterDOT.java:105)
at org.gephi.io.importer.plugin.file.ImporterDOT.execute(ImporterDOT.java:87)
Caused: java.lang.RuntimeException
at org.gephi.io.importer.plugin.file.ImporterDOT.execute(ImporterDOT.java:89)
at org.gephi.io.importer.impl.ImportControllerImpl.importFile(ImportControllerImpl.java:199)
at org.gephi.io.importer.impl.ImportControllerImpl.importFile(ImportControllerImpl.java:169)
at org.gephi.desktop.importer.DesktopImportControllerUI$4.run(DesktopImportControllerUI.java:341)
Caused: java.lang.RuntimeException
at org.gephi.desktop.importer.DesktopImportControllerUI$4.run(DesktopImportControllerUI.java:349)
[catch] at org.gephi.utils.longtask.api.LongTaskExecutor$RunningLongTask.run(LongTaskExecutor.java:274)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Flye extremely slow at generating contigs

I have a question about the following step in Flye. I am running Flye and it gets to this point:

[2018-07-31 06:48:39] INFO: Performing repeat analysis
[2018-07-31 06:48:39] INFO: Reading sequences
[2018-07-31 06:54:55] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-07-31 07:01:59] INFO: Sequence divergence stats: Q25 = 0.028, Q50 = 0.055, Q75 = 0.11
[2018-07-31 07:04:44] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-07-31 07:58:56] INFO: Aligned read sequence: 16147937116 / 28135562892 (0.573933)
[2018-07-31 07:58:56] INFO: Sequence divergence stats: Q25 = 0.011, Q50 = 0.028, Q75 = 0.077
[2018-07-31 07:59:14] INFO: Mean edge coverage: 11
[2018-07-31 08:20:16] INFO: Resolving repeats
[2018-07-31 10:23:18] INFO: Generating contigs
[2018-07-31 12:26:35] INFO: Generated 43861 contigs

And then it doesn't produce any output and stays like this for 24 hrs. I have run strace on the PID and it does look like its doing something on one thread. Is this step normally slow?

ERROR: parse error in 1-consensus/consensus.fasta on line 1: empty sequence

Hi again,

Because of the memory issue, I extracted the longest 50X reads using SelectLongestReads to run Flye. But there is another problem now:

[2018-05-09 10:40:18] INFO: Running Flye 2.3.3-g47cdd0b
[2018-05-09 10:40:18] INFO: Assembling reads
[2018-05-09 10:40:18] INFO: Running with k-mer size: 17
[2018-05-09 10:40:18] INFO: Reading sequences
[2018-05-09 11:19:00] INFO: Reads N50/90: 23770 / 18657
[2018-05-09 11:19:02] INFO: Selected minimum overlap 5000
[2018-05-09 11:19:04] INFO: Expected read coverage: 46
[2018-05-09 11:19:04] INFO: Generating solid k-mer index
[2018-05-09 11:19:28] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-05-09 11:33:48] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-05-09 13:04:42] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-05-09 15:40:36] INFO: Extending reads
[2018-05-09 16:12:50] INFO: Overlap-based coverage: 14
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-05-15 23:03:43] INFO: Assembled 9386 draft contigs
[2018-05-15 23:05:10] INFO: Generating contig sequences
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-05-16 01:04:56] INFO: Running Minimap2
[2018-05-16 04:55:10] INFO: Computing consensus
[2018-05-16 05:45:08] INFO: Alignment error rate: 0.0
[2018-05-16 05:45:08] INFO: Performing repeat analysis
[2018-05-16 05:45:09] INFO: Reading sequences
[2018-05-16 05:45:09] ERROR: parse error in /parastor300/niuyw/Project/Goqi_genome_180207/flye/run1.1/1-consensus/consensus.fasta on line 1: empty sequence
[2018-05-16 05:45:09] ERROR: Command '['flye-repeat', '-l', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1.1/flye.log', '-t', '40', '-g', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1.1/1-consensus/consensus.fasta', '/home/zhangll/Tasks/Gouqi/data/Pacbio/Pacbio_50x.fasta', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1.1/2-repeat', '2147483648', '/home/niuyw/software/Flye-2.3.3/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1

cmdline: flye --pacbio-raw Pacbio_50x.fasta --out-dir run1.1 --genome-size 2g --threads 40

Thank you in advance!

Future support for BAM-formatted subreads

Hi guys, thanks for the fantastic piece of software, it works beautifully. I am just wondering if you plan to add support for subreads in the PacBio BAM format? Lots of information contained in the BAM file could probably be used to help improve the assembly.

Why the decision to use minimap2 as replacement for BLASR?

I am wondering why you decided to use minimap2 as a replacement for BLASR. I know, BLASR is designed for PacBio data, so BLASR is not the right choice for ONP data. Also I am aware of the fact, that minimap2 is super fast.

However, can you elaborate a bit on why you did not choose GraphMap as a replacement? Isn't it more sensitive than minimap2?

It might even be an option to incorporate GraphMap as well and let the user to choose? :-) Though this means quite some work on your side I guess...

Polishing fails on ppc64le

Hi,

Thanks for developing this great software!

I have encountered an error in the polishing step when running Abruijn on a mixed/metagenomic 1D Nanopore-dataset (many organisms with varying coverage). I have assembled similar data sets before without large issues. Oddly enough Abruijn assembles the data and manages to polish a first iteration, but it then fails in the second iteration with the following error message. The requistite files appears to be present (i e bubbles_2.fasta). Not sure what is going on here. If you have suggestion to what has gone wrong and how I could avoid this happening in the future it would be great!

[13:34:45] INFO: Polishing genome (1/2)
[13:34:50] INFO: Running BLASR
[14:37:51] INFO: Separating draft genome into bubbles
[16:38:52] INFO: Correcting bubbles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[15:25:21] INFO: Polishing genome (2/2)
[15:25:30] INFO: Running BLASR
[16:27:39] INFO: Separating draft genome into bubbles
[19:09:34] INFO: Correcting bubbles
0% [19:48:05] ERROR: Error: Error while running polish binary: Command '['abruijn-polish', '-t', '16', '/scratch2/jon/MinION/BMAN/assemblies/abruijn/BMAN_Abruijn/bubbles_2.fasta', '/scratch2/software/Python-2.7.13/lib/python2.7/site-packages/abruijn/resource/nano_substitutions.mat', '/scratch2/software/Python-2.7.13/lib/python2.7/site-packages/abruijn/resource/nano_homopolymers.mat', '/scratch2/jon/MinION/BMAN/assemblies/abruijn/BMAN_Abruijn/consensus_2.fasta']' returned non-zero exit status -11

segfault in flye-repeat

During assembly of a low-quality old (2012) dataset:

input: 4 FASTQ files available from the SRA:

SRR497965
SRR497966 
SRR497967
SRR497968

end of log:

[2018-01-05 14:29:03] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-05 14:33:57] ERROR: Segmentation fault! Backtrace:
[2018-01-05 14:33:57] ERROR:    flye-repeat(_Z15segfaultHandleri+0x36) [0x47efd6]
[2018-01-05 14:33:57] ERROR:    /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f07f7fc64b0]
[2018-01-05 14:33:57] ERROR:    flye-repeat(_Z3q75IiET_RSt6vectorIS0_SaIS0_EE+0x102) [0x434082]
[2018-01-05 14:33:57] ERROR:    flye-repeat(_ZN19MultiplicityInferer16estimateCoverageEv+0x1375) [0x431355]
[2018-01-05 14:33:57] ERROR:    flye-repeat(main+0xb91) [0x429f81]
[2018-01-05 14:33:57] ERROR:    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f07f7fb1830]
[2018-01-05 14:33:57] ERROR:    flye-repeat(_start+0x29) [0x42b059]
[2018-01-05 14:33:57] ERROR: Command '['flye-repeat', '-k', '15', '-l', '[..]/flye.log', '-t', '10', '-v', '5000', '-g', '[..]/fly$
_assembly/1-consensus/consensus.fasta', '../SRR497965.fastq,../SRR497966.fastq,../SRR497967.fastq,../SRR497968.fastq', '[..]/Flye/flye_assembly/2-repeat', '[..]/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1

cmdline: time bin/flye --threads 10 --pacbio-raw ../SRR*.fastq -g 4M -o flye_assembly

Getting started...

Hi,

This is an exciting tool - thanks for developing it. I am eager to get it up and running.

I installed and tried a simple test with 37X coverage of simulated lambda reads. All are 15000 bp long and have no errors. I cannot seem to get abruijn.py to run.

If I try:

abruijn.py reads.fasta out 37

I get:

[16:52:16] INFO: Running ABruijn
[16:52:16] INFO: Assembling reads
[16:52:16] INFO: Indexing kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[16:52:21] INFO: Indexing kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[16:52:23] WARNING: Unable to choose minimum kmer count cutoff. Check if the coverage parameter is correct. Running with default parameter t = 4
[16:52:23] INFO: Building read index
[16:52:23] INFO: Finding overlaps:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[16:52:41] INFO: Extending reads
[16:52:41] INFO: Assembled 0 contigs
[16:52:41] INFO: Generating contig sequences
[16:52:41] INFO: Running Blasr
ERROR, Fail to load FASTA file /gpfs/scratch/jurban/male-ilmn/lambda/abruijn/out/draft_assembly.fasta to virtual memory.
[16:52:41] ERROR: While running blasr: Command '['blasr', 'reads.fasta', '/gpfs/scratch/jurban/male-ilmn/lambda/abruijn/out/draft_assembly.fasta', '-bestn', '1', '-minMatch', '15', '-maxMatch', '25', '-m', '5', '-nproc', '1', '-out', '/gpfs/scratch/jurban/male-ilmn/lambda/abruijn/out/alignment.m5']' returned non-zero exit status 1
[16:52:41] ERROR: Error: Error in alignment module, exiting

Note that draft_assembly.fasta is an empty file.

I saw the warning: WARNING: Unable to choose minimum kmer count cutoff. Check if the coverage parameter is correct. Running with default parameter t = 4

So I tried adding in a minimum cutoff instead of default auto, but when I add any arguments it gives another error:

$ abruijn.py reads.fasta out 37 -m 10
[16:49:22] INFO: Running ABruijn
[16:49:22] INFO: Assembling reads
Traceback (most recent call last):
  File "/users/jurban/software/abruijn/ABruijn/abruijn.py", line 35, in <module>
    sys.exit(main())
  File "/gpfs_home/jurban/software/abruijn/ABruijn/abruijn/main.py", line 102, in main
    run(args)
  File "/gpfs_home/jurban/software/abruijn/ABruijn/abruijn/main.py", line 41, in run
    args.max_cov, args.coverage, args.debug, log_file)
  File "/gpfs_home/jurban/software/abruijn/ABruijn/abruijn/assemble.py", line 48, in assemble
    subprocess.check_call(cmdline)
  File "/gpfs/runtime/opt/python/2.7.3/lib/python2.7/subprocess.py", line 506, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/gpfs/runtime/opt/python/2.7.3/lib/python2.7/subprocess.py", line 493, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/gpfs/runtime/opt/python/2.7.3/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/gpfs/runtime/opt/python/2.7.3/lib/python2.7/subprocess.py", line 1249, in _execute_child
    raise child_exception
TypeError: execv() arg 2 must contain only strings

Any advice to help me get up and running would be appreciated.

best,
John

Segmentation fault during chimera detection

Hi,
I tried using ABruijn with Oxford Nanopore reads but I get a segmentation fault error during the chimera detection phase.

Here is the command line I used :
python abruijn.py $(pwd)/BAM_10X.fasta BAM_10X

Moreover here is the log :

Running ABruijn
Assembling reads
[10:50:02] Reading FASTA
[10:50:05] Building kmer index
[10:50:05] First pass:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[10:55:39] Second pass:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[10:56:33] Trimming index
[10:56:36] Building read index
[10:56:37] Finding overlaps
[10:56:46] Detecting chimeric sequences
Error:  Error in assemble binary:
Command '['abruijn-assemble', '/env/cns/bigtmp1/ONT/ABruijn/BAM_10X.fasta', '/env/export/nfs6/bigtmp1/ONT/ABruijn/BAM_10X/read_edges.fasta']' returned non-zero exit status -11

If I run the faulty command manually, I get a segmentation fault.

abruijn-assemble /env/cns/bigtmp1/ONT/ABruijn/BAM_10X.fasta /env/export/nfs6/bigtmp1/ONT/ABruijn/BAM_10X/read_edges.fasta

Moreover, the output directory is empty.

Thanks for your help,
Benjamin

ERROR: Caught unhandled exception: std::bad_alloc in both 2.3.2 and 2.3.3

Hi, I got this error messages when using version 2.3.2 and version 2.3.3.

The genome is about 2G, and default parameters were used.

version 2.3.2

[2018-04-18 18:38:40] INFO: Running Flye 2.3.2-release
[2018-04-18 18:38:40] INFO: Assembling reads
[2018-04-18 18:38:40] INFO: Reading sequences
[2018-04-18 21:39:09] INFO: Generating solid k-mer index
[2018-04-18 21:39:35] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-04-18 22:28:15] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-04-19 00:03:13] INFO: Filling index table
[2018-04-19 02:32:35] ERROR: Caught unhandled exception: std::bad_alloc
[2018-04-19 02:32:35] ERROR: 	flye-assemble(_Z16exceptionHandlerv+0xd0) [0x42f4a0]
[2018-04-19 02:32:35] ERROR: 	/home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e0e6) [0x2b08dbac10e6]
[2018-04-19 02:32:35] ERROR: 	/home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e131) [0x2b08dbac1131]
[2018-04-19 02:32:35] ERROR: 	/home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e349) [0x2b08dbac1349]
[2018-04-19 02:32:35] ERROR: 	/home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e869) [0x2b08dbac1869]
[2018-04-19 02:32:35] ERROR: 	/home/software/gcc-4.9.3/lib64/libstdc++.so.6(_Znam+0x9) [0x2b08dbac18c9]
[2018-04-19 02:32:35] ERROR: 	flye-assemble(_ZN11VertexIndex10buildIndexEii+0x9c6) [0x44f3d6]
[2018-04-19 02:32:35] ERROR: 	flye-assemble(main+0xaf8) [0x434378]
[2018-04-19 02:32:35] ERROR: 	/lib64/libc.so.6(__libc_start_main+0xfd) [0x3fbbe1ed5d]
[2018-04-19 02:32:35] ERROR: 	flye-assemble() [0x41d275]
[2018-04-19 02:32:57] ERROR: Command '['flye-assemble', '-l', '/home/zhangll/Tasks/Gouqi/Third_assembl/Flye/flye.log', '-t', '16', '-v', '5000', '/home/zhangll/Tasks/Gouqi/data/Pacbio/all.fasta', '/home/zhangll/Tasks/Gouqi/Third_assembl/Flye/0-assembly/draft_assembly.fasta', '2202009600', '/home/niuyw/software/Flye/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1

version 2.3.3

[2018-04-28 05:04:03] INFO: Running Flye 2.3.3-g47cdd0b
[2018-04-28 05:04:03] INFO: Assembling reads
[2018-04-28 05:04:03] INFO: Running with k-mer size: 17
[2018-04-28 05:04:03] INFO: Reading sequences
[2018-04-28 05:59:54] ERROR: parse error in /parastor300/niuyw/Project/Goqi_genome_180207/data/Pacbio/all.fq.gz on line 37943506: Fastq fromat error
[2018-04-28 05:59:58] ERROR: Command '['flye-assemble', '-l', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1/flye.log', '-t', '30', '/parastor300/niuyw/Project/Goqi_genome_180207/data/Pacbio/all.fq.gz', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1/0-assembly/draft_assembly.fasta', '2147483648', '/home/niuyw/software/Flye-2.3.3/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1
Finish time is 2018/04/28--05:59
niuyw@admin:/parastor300/niuyw/Project/Goqi_genome_180207/flye/run2$ cat ../flye.g.run1.e55155 
Start time is 2018/04/28--15:49
[2018-04-28 15:49:49] INFO: Running Flye 2.3.3-g47cdd0b
[2018-04-28 15:49:49] INFO: Assembling reads
[2018-04-28 15:49:49] INFO: Running with k-mer size: 17
[2018-04-28 15:49:49] INFO: Reading sequences
[2018-04-28 18:05:07] INFO: Reads N50/90: 16659 / 5780
[2018-04-28 18:05:23] INFO: Selected minimum overlap 5000
[2018-04-28 18:05:35] INFO: Expected read coverage: 102
[2018-04-28 18:05:35] INFO: Generating solid k-mer index
[2018-04-28 18:08:19] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-04-28 18:33:35] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-05-01 21:49:37] INFO: Filling index table
[2018-05-04 12:43:47] ERROR: Caught unhandled exception: std::bad_alloc
[2018-05-04 12:43:47] ERROR: 	flye-assemble(_Z16exceptionHandlerv+0xd0) [0x431590]
[2018-05-04 12:43:47] ERROR: 	/home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e0e6) [0x2b52528950e6]
[2018-05-04 12:43:47] ERROR: 	/home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e131) [0x2b5252895131]
[2018-05-04 12:43:47] ERROR: 	/home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e349) [0x2b5252895349]
[2018-05-04 12:43:47] ERROR: 	/home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e869) [0x2b5252895869]
[2018-05-04 12:43:47] ERROR: 	/home/software/gcc-4.9.3/lib64/libstdc++.so.6(_Znam+0x9) [0x2b52528958c9]
[2018-05-04 12:43:47] ERROR: 	flye-assemble(_ZN11VertexIndex10buildIndexEii+0x9c6) [0x452056]
[2018-05-04 12:43:47] ERROR: 	flye-assemble(main+0xbe5) [0x436595]
[2018-05-04 12:43:47] ERROR: 	/lib64/libc.so.6(__libc_start_main+0xfd) [0x3fbbe1ed5d]
[2018-05-04 12:43:47] ERROR: 	flye-assemble() [0x41dbc5]
[2018-05-04 12:46:17] ERROR: Command '['flye-assemble', '-l', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1/flye.log', '-t', '40', '/parastor300/niuyw/Project/Goqi_genome_180207/data/Pacbio/all.fasta', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1/0-assembly/draft_assembly.fasta', '2147483648', '/home/niuyw/software/Flye-2.3.3/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1

BTW, I also ran Flye 2.3.3 based on the corrected reads of Canu, and it ran successfully. Here is the logs if it's useful.

[2018-04-28 05:15:35] INFO: Running Flye 2.3.3-g47cdd0b
[2018-04-28 05:15:35] INFO: Assembling reads
[2018-04-28 05:15:36] INFO: Running with k-mer size: 17
[2018-04-28 05:15:36] INFO: Reading sequences
[2018-04-28 05:49:20] INFO: Reads N50/90: 22994 / 18323
[2018-04-28 05:49:22] INFO: Selected minimum overlap 5000
[2018-04-28 05:49:24] INFO: Expected read coverage: 34
[2018-04-28 05:49:24] INFO: Generating solid k-mer index
[2018-04-28 05:49:47] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-04-28 05:55:09] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-04-28 08:55:36] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-04-28 17:32:09] INFO: Extending reads
[2018-04-28 18:19:00] INFO: Overlap-based coverage: 20
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-05-03 00:13:25] INFO: Assembled 6725 draft contigs
[2018-05-03 00:13:57] INFO: Generating contig sequences
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-05-03 01:03:32] INFO: Running Minimap2
[2018-05-03 10:15:46] INFO: Computing consensus
[2018-05-03 11:18:10] INFO: Alignment error rate: 0.0299390805236
[2018-05-03 11:18:34] INFO: Performing repeat analysis
[2018-05-03 11:18:35] INFO: Reading sequences
[2018-05-03 11:50:14] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-05-03 18:02:10] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-05-04 02:18:05] INFO: Aligned sequence: 137844577112 / 149133911062 (0.924301)
[2018-05-04 02:18:34] INFO: Mean edge coverage: 38
[2018-05-04 02:20:09] INFO: Resolving repeats
[2018-05-04 11:02:04] INFO: Generating contigs
[2018-05-04 12:05:35] INFO: Generated 17311 contigs
[2018-05-04 14:08:03] INFO: Polishing genome (1/1)
[2018-05-04 14:08:03] INFO: Running Minimap2
[2018-05-04 21:32:38] INFO: Separating alignment into bubbles
[2018-05-05 03:50:13] INFO: Alignment error rate: 0.0230640593152
[2018-05-05 03:50:14] INFO: Correcting bubbles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2018-05-05 08:18:15] INFO: Assembly statistics:

	Total length:	1886554189
	Contigs:	13177
	Scaffolds:	13049
	Scaffolds N50:	315687
	Largest scf:	2690265
	Mean coverage:	34

[2018-05-05 08:18:15] INFO: Final assembly: /parastor300/niuyw/Project/Goqi_genome_180207/flye/run2/scaffolds.fasta

Do you know what could have cause it? Thanks in advance!

Bests,
Yiwei Niu

Automatic expansion triggered when load factor was below minimum threshold

hi there, I got this error message. Any ideas about what could have cause it?

[2018-02-22 08:36:13] INFO: Running Flye 2.3.2-gd46edb7
[2018-02-22 08:36:13] INFO: Assembling reads
[2018-02-22 08:36:13] INFO: Reading sequences
[2018-02-22 08:44:22] INFO: Generating solid k-mer index
[2018-02-22 08:46:32] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-02-22 08:54:26] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-02-22 09:21:11] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-02-22 10:00:35] INFO: Extending reads
0% [2018-02-22 10:14:55] ERROR: Caught unhandled exception: Automatic expansion triggered when load factor was below minimum threshold
[2018-02-22 10:14:55] ERROR: flye-assemble(_Z16exceptionHandlerv+0x2d) [0x43c73d]
[2018-02-22 10:14:55] ERROR: /usr/lib64/libstdc++.so.6(+0x96706) [0x2aaaab277706]
[2018-02-22 10:14:55] ERROR: /usr/lib64/libstdc++.so.6(+0x96751) [0x2aaaab277751]
[2018-02-22 10:14:55] ERROR: /usr/lib64/libstdc++.so.6(+0xc1708) [0x2aaaab2a2708]
[2018-02-22 10:14:55] ERROR: /lib64/libpthread.so.0(+0x8744) [0x2aaaab789744]
[2018-02-22 10:14:55] ERROR: /lib64/libc.so.6(clone+0x6d) [0x2aaaaba87aad]
[2018-02-22 10:15:15] ERROR: Command '['flye-assemble', '-l', '/flush1/esc003/Flye_cynegetis_assembly/flye/flye.log', '-t', '20', '-v', '5000', '/flush2/esc003/Pacbio_subreads_smartbellremoved.fasta', '/flush1/esc003/Flye_cynegetis_assembly/flye/0-assembly/draft_assembly.fasta', '576716800', '/data/esc003/apps/Flye/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1

selected kmers = 0

Hi, I recently attempted to do an assembly of a human genome with raw read coverage of 30X. I got an error in the BLASR step and looking back in the log file it was caused because 0 kmers were selected to build the index:

[2017-11-12 07:52:55] root: INFO: Running ABruijn
[2017-11-12 07:52:55] root: DEBUG: Estimated genome size: 3342172801
[2017-11-12 07:52:55] root: DEBUG: Chosen k-mer size: 17
[2017-11-12 07:52:55] root: INFO: Assembling reads
[2017-11-12 07:52:55] root: DEBUG: -----Begin assembly log------
[2017-11-12 07:52:55] DEBUG: Build date: Nov  8 2017 12:30:29
[2017-11-12 07:52:55] INFO: Reading sequences
[2017-11-12 08:10:16] DEBUG: Mean read length: 5658
[2017-11-12 08:10:16] INFO: Generating solid k-mer index
[2017-11-12 08:10:16] DEBUG: Hard threshold set to 2
[2017-11-12 08:10:16] DEBUG: Started kmer counting
[2017-11-12 08:10:24] INFO: Counting kmers (1/2):
[2017-11-12 11:48:22] INFO: Counting kmers (2/2):
[2017-11-12 12:08:29] DEBUG: Genome size estimate: -980663202
[2017-11-12 12:08:29] DEBUG: Filtered 10646768 repetitive kmers
[2017-11-12 12:08:29] DEBUG: Estimated minimum kmer coverage: 251, 0 unique kmers selected
[2017-11-12 12:08:29] INFO: Filling index table
[2017-11-12 12:10:12] DEBUG: Kmer index size: 0
[2017-11-12 12:22:17] INFO: Extending reads
[2017-11-12 13:54:03] INFO: Assembled 0 draft contigs
[2017-11-12 13:54:03] INFO: Generating contig sequences
[2017-11-12 13:54:03] DEBUG: Writing FASTA
-----------End assembly log------------
[2017-11-12 13:55:10] root: INFO: Running BLASR
[2017-11-12 13:55:11] root: ERROR: Command '['blasr', '/data/Bioinfo/bioinfo-proj-jmontenegro/DENOVO/Human/Data/Merged/simon.fastq', '/data/Bioinfo/bioinfo-proj-jmontenegro/DENOVO/Human/Results/Assembly/Abruijn/blasr_ref_0.fasta', '--bestn', '1', '--minMatch', '15', '--maxMatch', '20', '-m', '5', '--nproc', '128', '--out', '/data/Bioinfo/bioinfo-proj-jmontenegro/DENOVO/Human/Results/Assembly/Abruijn/blasr_0.m5', '--advanceHalf', '--advanceExactMatches', '10', '--fastSDP', '--aggressiveIntervalCut']' returned non-zero exit status 1

The original command was as follows:
abruijn -t 128 -i 5 -p pacbio -o 2000 /data/Bioinfo/bioinfo-proj-jmontenegro/DENOVO/Human/Data/Merged/simon.fastq /data/Bioinfo/bioinfo-proj-jmontenegro/DENOVO/Human/Results/Assembly/Abruijn 25 so I did not specify a kmer size and the program chose K=17 automatically, but could not find any unique kmers. Should I manually increase the K size to try and find unique kmers?
I look forward to hearing back from you.
Best regards,

Retreiving reads that correspond to an edge in the graph.

Given an edge ID, we are trying to retrieve the reads for that edge. The documentation does not describe how to do this.

Grepping for the ID didn't turn up anything.

Could you please add this to the docs?

MinOverlap value question

I am running ABruijn with nanopore reads with average length of 2,000bp. Since the minimum overlap length of ABruijn must be within the [3000, 10000] range, no overlap was found for my reads, and "blasr" failed. I modified the code to accept lower overlap, and then ABruijn worked fine. With this being said, I was wondering if there are a reason why ABruijn requires such a high overlap value ?

Error when installing Abruijn on SLC6

Hi,

I tried ton install ABruijn on my SLC6 distribution but I get an error.

Before launch the install commands, I died :

scl enable devtoolset-3 bash
export PATH=/cm/shared/apps/miniconda2/bin/:/cm/shared/apps/pitchfork/deployment/bin/:$PATH
export LD_LIBRARY_PATH=/cm/shared/apps/pitchfork/deployment/lib:/usr/lib/:$LD_LIBRARY_PATH

So, thanks to these paths, I have :

python --version
Python 2.7.12 :: Continuum Analytics, Inc.

cmake --version
cmake version 3.4.1

make --version
GNU Make 3.81

gcc --version
gcc (GCC) 4.9.1 20140922 (Red Hat 4.9.1-10)

blasr --version
blasr 5.3.

Then, I launch the first installation command :

python setup.py build

And I get :

running build
make release -C /cm/shared/apps/ABruijn/assemble
make[1]: Entering directory /cm/shared/apps/ABruijn/assemble' g++ -c -I/cm/shared/apps/ABruijn/libcuckoo -I/cm/shared/apps/ABruijn/include -Wall -pthread -std=c++11 -D_LOG -O3 -DNDEBUG chimera.cpp -o chimera.o In file included from overlap.h:11:0, from chimera.h:7, from chimera.cpp:10: /cm/shared/apps/ABruijn/libcuckoo/cuckoohash_map.hh: In instantiation of ‘cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::BucketContainer<N>::BucketContainer(const cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>*, Args&& ...) [with Args = {const long unsigned int&, const long unsigned int&}; long unsigned int N = 2ul; Key = Kmer; T = std::vector<VertexIndex::ReadPosition>*; Hash = DefaultHasher<Kmer>; Pred = std::equal_to<Kmer>; Alloc = std::allocator<std::pair<const Kmer, std::vector<VertexIndex::ReadPosition>*> >; long unsigned int SLOT_PER_BUCKET = 4ul]’: /cm/shared/apps/ABruijn/libcuckoo/cuckoohash_map.hh:786:39: required from ‘cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::TwoBuckets cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::lock_two(size_t, size_t, size_t) const [with Key = Kmer; T = std::vector<VertexIndex::ReadPosition>*; Hash = DefaultHasher<Kmer>; Pred = std::equal_to<Kmer>; Alloc = std::allocator<std::pair<const Kmer, std::vector<VertexIndex::ReadPosition>*> >; long unsigned int SLOT_PER_BUCKET = 4ul; cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::TwoBuckets = cuckoohash_map<Kmer, std::vector<VertexIndex::ReadPosition>*>::BucketContainer<2ul>; size_t = long unsigned int]’ /cm/shared/apps/ABruijn/libcuckoo/cuckoohash_map.hh:830:43: required from ‘cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::TwoBuckets cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::snapshot_and_lock_two(size_t) const [with Key = Kmer; T = std::vector<VertexIndex::ReadPosition>*; Hash = DefaultHasher<Kmer>; Pred = std::equal_to<Kmer>; Alloc = std::allocator<std::pair<const Kmer, std::vector<VertexIndex::ReadPosition>*> >; long unsigned int SLOT_PER_BUCKET = 4ul; cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::TwoBuckets = cuckoohash_map<Kmer, std::vector<VertexIndex::ReadPosition>*>::BucketContainer<2ul>; size_t = long unsigned int]’ /cm/shared/apps/ABruijn/libcuckoo/cuckoohash_map.hh:500:42: required from ‘bool cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::contains(const key_type&) const [with Key = Kmer; T = std::vector<VertexIndex::ReadPosition>*; Hash = DefaultHasher<Kmer>; Pred = std::equal_to<Kmer>; Alloc = std::allocator<std::pair<const Kmer, std::vector<VertexIndex::ReadPosition>*> >; long unsigned int SLOT_PER_BUCKET = 4ul; cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::key_type = Kmer]’ vertex_index.h:57:35: required from here /cm/shared/apps/ABruijn/libcuckoo/cuckoohash_map.hh:673:37: internal compiler error: in process_init_constructor_array, at cp/typeck2.c:1224 : map(_map), i{{inds...}} {} ^ Please submit a full bug report, with preprocessed source if appropriate. See <http://bugzilla.redhat.com/bugzilla> for instructions. Preprocessed source stored into /tmp/ccI84QyY.out file, please attach this to your bugreport. make[1]: *** [chimera.o] Error 1 make[1]: Leaving directory /cm/shared/apps/ABruijn/assemble'
make: *** [all] Error 2
Compilation error: Command '['make']' returned non-zero exit status 2

That is the screen shot :

Do you know what this error is due to?

Thank you in advance for your help.

Best,
Amandine

BLASR error

I have been trying to run ABruijn but I get an error related to BLASR.

(myenv) stelo@H4:~/ABruijn$ ./abruijn.py reads.fa out_dir 50
[17:57:56] INFO: Running ABruijn
[17:57:56] INFO: Assembling reads
[17:58:13] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:58:27] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:59:04] INFO: Building kmer index
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[18:00:09] INFO: Finding overlaps:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[18:04:25] INFO: Extending reads
[18:04:26] INFO: Assembled 1 contigs
[18:04:26] INFO: Generating contig sequences
[18:07:59] INFO: Polishing genome (1/2)
[18:07:59] INFO: Running BLASR
   Options for blasr
   Basic usage: 'blasr reads.{bam|fasta|bax.h5|fofn} genome.fasta [-options]
 option Description (default_value).
[... LISTS OF ALL BLASR PARAMETERS ...]
In release v5.1 of BLASR, command-line options will use the
single dash/double dash convention:
Character options are preceded by a single dash. (Example: -v)
Word options are preceded by a double dash. (Example: --verbose)
Please modify your scripts accordingly when BLASR v5.1 is released.

To cite BLASR, please use: Chaisson M.J., and Tesler G., Mapping
single molecule sequencing reads using Basic Local Alignment with
Successive Refinement (BLASR): Theory and Application, BMC
Bioinformatics 2012, 13:238.
Please report any bugs to 'https://github.com/PacificBiosciences/blasr/issues'.

ERROR: -bestn is not a valid option.
[18:07:59] ERROR: While running blasr: Command '['blasr', 'reads.fa', '/24-2/home/stelo/ABruijn/out_dir/blasr_ref_1.fasta', '-bestn', '1', '-minMatch', '15', '-maxMatch', '25', '-m', '5', '-nproc', '1', '-out', '/24-2/home/stelo/ABruijn/out_dir/blasr_1.m5']' returned non-zero exit status 1
[18:07:59] ERROR: Error: Error in alignment module, exiting

It seems that options now need the double dash.

My version of BLASR is

(myenv) stelo@H4:~/ABruijn$ blasr --version
blasr   5.2.def62de

IOError

Hi there,

I tried to use Flye 2.3.3-release to assemble human chromosome 6, but I encountered a IOError (log showed below).

Flye/2.3.3/bin/flye --pacbio-raw chr6.read.fq --out-dir assembly_result --genome-size 171m --threads 16
…
[2018-04-04 19:47:55] INFO: Generating contigs
[2018-04-04 19:48:04] INFO: Generated 144 contigs
[2018-04-04 19:48:22] INFO: Polishing genome (1/1)
[2018-04-04 19:48:22] INFO: Running Minimap2
[2018-04-04 19:56:45] INFO: Separating alignment into bubbles
Traceback (most recent call last):
File "/short/te53/software/Flye/2.3.3/bin/flye", line 31, in
sys.exit(main())
File "/short/te53/software/Flye/2.3.3/lib/python2.7/site-packages/flye/main.py", line 511, in main
_run(args)
File "/short/te53/software/Flye/2.3.3/lib/python2.7/site-packages/flye/main.py", line 348, in _run
jobs[i].run()
File "/short/te53/software/Flye/2.3.3/lib/python2.7/site-packages/flye/main.py", line 227, in run
config.vals["min_aln_rate"], bubbles_file)
File "/short/te53/software/Flye/2.3.3/lib/python2.7/site-packages/flye/bubbles.py", line 109, in make_bubbles
raise error_queue.get()
IOError: bad message length
Any advice to help me get up and running would be appreciated.

Multiplicity and Repetitive

Hi,

would you be so nice to explain Multiplicity and Repetitive values? Just looking into the assembly_info.txt and from the description here and I have to admit I am bit confused.

thx

Create intermedian files for recovery during kmer or extension of the reads?

Hi,

Flye was nearly killing one of our nodes through RAM and SWAP uptaking. It was in the first phase and from observation only the first folder structure and the logs where produced. Is it possible to dump some files during this stage? I assume that if no further files are generated, it will start from scratch during a crash?

Error in repeat binary

Hi,

I've attempted to assemble a genome using PacBio Sequel reads, and encountered an error on my first run and when I attempt to --resume the run. I have been running these as jobs on a PBS system on SUSE. I don't think it is a memory error since the job would be "killed" if I tried to use more than the amount I allotted.

I am using github commit 9c3f166 (v 2.1b) to assemble this.

For further information, I'm assembling a genome from a eukaryotic organism that does not have closely related species (< 100mya) genomes previously sequenced, so I don't have a strong idea of the exact genome size. I have used kmer-based genome estimates on corrected reads and assembled this genome with about 6 assemblers, so the consensus seems to be a genome of roughly 295-360MB in size (kmer estimates provide the lower range, many assemblers including abruijn's polished assembly provide the upper range). Using the lower range of that estimate, I have roughly 115x coverage including all the reads in my subreads. The stats of my raw reads are below (just in case you need this information to track down why this is occurring).

Number of contigs: 1247879
Shortest contig: 1001
Longest contig: 76985

N50: 13316
Median: 12214
Mean: 12937.225609213714

Below are the stderr from the first run and the --resume run

[2017-09-30 09:32:52] INFO: Running ABruijn
[2017-09-30 09:32:52] INFO: Assembling reads
[2017-09-30 09:32:52] INFO: Reading FASTA
[2017-09-30 09:42:22] INFO: Generating solid k-mer index
[2017-09-30 09:42:26] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-09-30 10:51:57] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-09-30 11:08:20] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-09-30 11:55:37] INFO: Extending reads
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-09-30 19:49:12] INFO: Assembled 3244 draft contigs
[2017-09-30 19:49:12] INFO: Generating contig sequences
[2017-09-30 20:25:11] INFO: Running BLASR
[2017-10-01 12:09:43] INFO: Computing rough consensus
[2017-10-01 12:45:26] INFO: Performing repeat analysis
[2017-10-01 12:45:27] INFO: Reading FASTA
[2017-10-01 12:55:09] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-10-01 21:06:40] INFO: Simplifying the graph
[2017-10-01 21:06:40] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
terminate called without an active exception
[2017-10-02 04:36:49] ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status -6

[2017-10-02 22:36:10] INFO: Running ABruijn
[2017-10-02 22:36:10] INFO: Resuming previous run
[2017-10-02 22:36:10] INFO: Performing repeat analysis
[2017-10-02 22:36:10] INFO: Reading FASTA
[2017-10-02 22:45:31] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-10-03 04:02:13] INFO: Simplifying the graph
[2017-10-03 04:02:13] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-10-03 09:33:50] ERROR: Resource temporarily unavailable
[2017-10-03 09:33:50] ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

This is an excerpt from the log file (it's quite large), I've tried to just get the relevant portions and have used ellipses to abbreviate repetitive sections. If you want to see the whole log file I can do that.

[2017-09-30 09:32:52] root: INFO: Running ABruijn						
[2017-09-30 09:32:52] root: DEBUG: Estimated genome size: 303080041						
[2017-09-30 09:32:52] root: DEBUG: Chosen k-mer size: 17						
[2017-09-30 09:32:52] root: INFO: Assembling reads						
[2017-09-30 09:32:52] root: DEBUG: -----Begin assembly log------						
[2017-09-30 09:32:52] DEBUG: Build date: Sep 28 2017 21:15:18						
[2017-09-30 09:32:52] INFO: Reading FASTA						
[2017-09-30 09:42:22] DEBUG: Mean read length: 7481						
[2017-09-30 09:42:22] INFO: Generating solid k-mer index						
[2017-09-30 09:42:22] DEBUG: Hard threshold set to 11						
[2017-09-30 09:42:22] DEBUG: Started kmer counting						
[2017-09-30 09:42:26] INFO: Counting kmers (1/2):						
[2017-09-30 10:51:57] INFO: Counting kmers (2/2):						
[2017-09-30 11:08:20] DEBUG: Genome size estimate: 291937348						
[2017-09-30 11:08:20] DEBUG: Filtered 472253 repetitive kmers						
[2017-09-30 11:08:20] DEBUG: Estimated minimum kmer coverage: 18, 279329292 unique kmers selected						
[2017-09-30 11:08:20] INFO: Filling index table						
[2017-09-30 11:08:39] DEBUG: Kmer index size: 11425325554						
[2017-09-30 11:55:37] INFO: Extending reads						
[2017-09-30 11:56:05] DEBUG: Mean read coverage: 75						
[2017-09-30 11:56:17] DEBUG: Assembled contig						
	With 31 reads					
	Start read: -m54105_170625_161744/56558013/0_15235					
	At position: 15					
	leftTip: 0 rightTip: 0					
	Suspicios: 0					
	Mean overlaps: 118					
	Inner reads: 0					
[2017-09-30 11:56:17] DEBUG: Inner: 2120 covered: 3752 total: 9115006						
[2017-09-30 11:57:03] DEBUG: Assembled contig						
	With 97 reads					
	Start read: -m54105_170623_233908/28902028/43_14990					
	At position: 34					
	leftTip: 0 rightTip: 0					
	Suspicios: 1					
	Mean overlaps: 86					
	Inner reads: 0					
[2017-09-30 11:57:03] DEBUG: Inner: 7162 covered: 12394 total: 9115006						
[2017-09-30 11:57:25] DEBUG: Assembled contig						
…						
…						
[2017-09-30 19:48:42] DEBUG: Inner: 2628858 covered: 3768626 total: 9115006						
[2017-09-30 19:49:12] INFO: Assembled 3244 draft contigs						
[2017-09-30 19:49:12] INFO: Generating contig sequences						
-----------End assembly log------------						
[2017-09-30 20:25:11] root: INFO: Running BLASR						
[2017-09-30 20:25:11] root: DEBUG: Reading contigs file						
[2017-10-01 12:06:06] root: DEBUG: Sorting alignment file						
[2017-10-01 12:09:43] root: INFO: Computing rough consensus						
[2017-10-01 12:09:43] root: DEBUG: Reading contigs file						
[2017-10-01 12:45:26] root: INFO: Performing repeat analysis						
[2017-10-01 12:45:26] root: DEBUG: -----Begin repeat analyser log------						
[2017-10-01 12:45:27] DEBUG: Build date: Sep 28 2017 21:14:44						
[2017-10-01 12:45:27] INFO: Reading FASTA						
[2017-10-01 12:55:09] INFO: Building repeat graph						
[2017-10-01 12:55:09] DEBUG: Hard threshold set to 1						
[2017-10-01 12:55:09] DEBUG: Started kmer counting						
[2017-10-01 12:57:10] DEBUG: Kmer index size: 344619478						
[2017-10-01 20:48:04] DEBUG: Computing gluepoints						
[2017-10-01 20:48:25] DEBUG: Initializing edges						
[2017-10-01 21:06:24] DEBUG: *	5152	=+contig_430	0	455	455	
…						
…						
[2017-10-01 21:06:40] INFO: Simplifying the graph						
[2017-10-01 21:06:40] DEBUG: 12800 tips removed						
[2017-10-01 21:06:40] DEBUG: Removed 1140 fake loops						
[2017-10-01 21:06:40] DEBUG: Unrolled 447, removed 576						
[2017-10-01 21:06:40] DEBUG: Removed 6345 edges						
[2017-10-01 21:06:40] DEBUG: Added 2149 edges						
[2017-10-01 21:06:40] DEBUG: Unrolled 20, removed 38						
[2017-10-01 21:06:40] DEBUG: Removed 531 edges						
[2017-10-01 21:06:40] DEBUG: Added 518 edges						
[2017-10-01 21:06:40] DEBUG: Removed 1 chimeric junctions						
[2017-10-01 21:06:40] INFO: Aligning reads to the graph						
[2017-10-01 21:06:41] DEBUG: Hard threshold set to 1						
[2017-10-01 21:06:41] DEBUG: Started kmer counting						
[2017-10-01 21:08:21] DEBUG: Kmer index size: 323587749						
[2017-10-02 04:22:24] DEBUG: Aligned 6749792 / 9115006						
[2017-10-02 04:23:42] DEBUG: Mean edge coverage: 99						
[2017-10-02 04:23:42] DEBUG: *	21618	20897	0	1	105	1.06061
…						
…						
[2017-10-02 04:23:43] DEBUG: Unique coverage threshold 105						
[2017-10-02 04:23:45] DEBUG: Outputs: -14731 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -15258 1						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -18150 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: 20552 3						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -13609 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: 18386 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -16283 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -19265 1						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: 15722 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -7893 1						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
…						
…						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: R   21617 1 -> 2 (2,1) 142010	99					
[2017-10-02 04:23:45] DEBUG: R   21615 1 -> 2 (2,0) 36996	100					
[2017-10-02 04:23:45] DEBUG: R   21613 1 -> 2 (2,1) 18727	95					
[2017-10-02 04:23:45] DEBUG: R   21605 1 -> 2 (1,2) 26980	81					
[2017-10-02 04:23:45] DEBUG: R   21601 1 -> 2 (1,2) 40593	106					
[2017-10-02 04:23:45] DEBUG: R   21600 1 -> 2 (2,1) 23606	113					
[2017-10-02 04:23:45] DEBUG: R   21599 1 -> 2 (0,2) 132657	111					
[2017-10-02 04:23:45] DEBUG: R   21595 1 -> 2 (2,1) 103909	87					
[2017-10-02 04:23:45] DEBUG: R   21586 1 -> 2 (0,2) 10054	109					
[2017-10-02 04:23:45] DEBUG: R   21584 1 -> 2 (2,0) 3451	89					
…						
…						
[2017-10-02 04:36:49] root: ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status -6						
[2017-10-02 22:36:10] root: INFO: Running ABruijn						
[2017-10-02 22:36:10] root: DEBUG: Estimated genome size: 303080041						
[2017-10-02 22:36:10] root: DEBUG: Chosen k-mer size: 17						
[2017-10-02 22:36:10] root: INFO: Resuming previous run						
[2017-10-02 22:36:10] root: INFO: Performing repeat analysis						
[2017-10-02 22:36:10] root: DEBUG: -----Begin repeat analyser log------						
[2017-10-02 22:36:10] DEBUG: Build date: Sep 28 2017 21:14:44						
[2017-10-02 22:36:10] INFO: Reading FASTA						
[2017-10-02 22:45:31] INFO: Building repeat graph						
[2017-10-02 22:45:31] DEBUG: Hard threshold set to 1						
[2017-10-02 22:45:31] DEBUG: Started kmer counting						
[2017-10-02 22:46:38] DEBUG: Kmer index size: 344619478						
[2017-10-03 03:43:14] DEBUG: Computing gluepoints						
[2017-10-03 03:43:35] DEBUG: Initializing edges						
[2017-10-03 04:01:59] DEBUG: *	14224	=+contig_472	0	9	9	
…						
…						
[2017-10-03 04:02:13] INFO: Simplifying the graph						
[2017-10-03 04:02:13] DEBUG: 12798 tips removed						
[2017-10-03 04:02:13] DEBUG: Removed 1160 fake loops						
[2017-10-03 04:02:13] DEBUG: Unrolled 441, removed 572						
[2017-10-03 04:02:13] DEBUG: Removed 6351 edges						
[2017-10-03 04:02:13] DEBUG: Added 2147 edges						
[2017-10-03 04:02:13] DEBUG: Unrolled 21, removed 42						
[2017-10-03 04:02:13] DEBUG: Removed 532 edges						
[2017-10-03 04:02:13] DEBUG: Added 518 edges						
[2017-10-03 04:02:13] DEBUG: Removed 1 chimeric junctions						
[2017-10-03 04:02:13] INFO: Aligning reads to the graph						
[2017-10-03 04:02:14] DEBUG: Hard threshold set to 1						
[2017-10-03 04:02:14] DEBUG: Started kmer counting						
[2017-10-03 04:03:39] DEBUG: Kmer index size: 324344284						
[2017-10-03 09:32:13] DEBUG: Aligned 6748592 / 9115006						
[2017-10-03 09:33:15] DEBUG: Mean edge coverage: 99						
[2017-10-03 09:33:15] DEBUG: *	-21645	20897	0	1	101	1.0202
…						
…						
[2017-10-03 09:33:15] DEBUG: Unique coverage threshold 105						
[2017-10-03 09:33:17] DEBUG: Outputs: 4330 0						
[2017-10-03 09:33:17] DEBUG: 	13662 2 0					
[2017-10-03 09:33:17] DEBUG: 						
[2017-10-03 09:33:17] DEBUG: Outputs: 21367 2						
[2017-10-03 09:33:17] DEBUG: 	21368 26 0					
[2017-10-03 09:33:17] DEBUG: 						
[2017-10-03 09:33:17] DEBUG: Outputs: 1047 0						
[2017-10-03 09:33:17] DEBUG: 	20778 1 0					
[2017-10-03 09:33:17] DEBUG: 						
[2017-10-03 09:33:17] DEBUG: Outputs: 7396 1						
[2017-10-03 09:33:17] DEBUG: 	7397 2 1					
[2017-10-03 09:33:17] DEBUG: 	7405 32 1					
[2017-10-03 09:33:17] DEBUG: 						
[2017-10-03 09:33:17] DEBUG: Outputs: 8526 0						
…						
…						
[2017-10-03 09:33:18] DEBUG: 						
[2017-10-03 09:33:18] DEBUG: R   21644 1 -> 2 (2,1) 142010	99					
[2017-10-03 09:33:18] DEBUG: R   21641 1 -> 2 (1,2) 68125	106					
[2017-10-03 09:33:18] DEBUG: R   21640 1 -> 2 (2,1) 22106	112					
[2017-10-03 09:33:18] DEBUG: R   21632 1 -> 2 (1,2) 143630	90					
[2017-10-03 09:33:18] DEBUG: R   21630 1 -> 2 (2,0) 3169	108					
[2017-10-03 09:33:18] DEBUG: R   21629 1 -> 2 (1,2) 12733	79					
[2017-10-03 09:33:18] DEBUG: R   21625 1 -> 2 (0,2) 7580	105					
[2017-10-03 09:33:18] DEBUG: R   21624 1 -> 2 (2,0) 116506	92					
…						
…						
[2017-10-03 09:33:18] DEBUG: R   16067 1 -> 2 (1,2) 10058	104					
[2017-10-03 09:33:18] DEBUG: R   16134 1 -> 2 (1,2) 5990	107					
[2017-10-03 09:33:50] ERROR: Resource temporarily unavailable						
-----------End assembly log------------						
[2017-10-03 09:33:50] root: ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

If you could give me a hand to find out what is causing this issue that would be really appreciated.

Thanks,
Zac.

threading across mulitple nodes

Hi
Just reading in and I was wondering about this question.
Does Flye expect all open threads to be on the same node? The question is going if I can Flye threads to be spread around multiple nodes sharing a common filesystem?

kind regards

Question regarding repeat assembly

Hi, Mikhail. I've read your preprint and found the correspondence of repeat graph construction problem and the assembly problem intriguing! I was left confused about one particular aspect: when assembling a linear genome ARBRC with a two-copy repeat R, my understanding of the algorithm goes like this:

select a read at random from UnprocessedReads, selecting a read from A
follow a random walk from that read which assembles the contig ARC
map the reads to that contig and remove those reads from UnprocessedReads
UnprocessedReads now contains only reads from B
select a read at random from UnprocessedReads, selecting a read from B
follow a random walk from that read which assemble the contig B, since UnprocessedReads contains no reads from ARC
map the reads to that contig and remove those reads from UnprocessedReads
UnprocessedReads now contains no reads. Stop assembling contigs, and identify repeats.

The assembled contigs are ARC and B. How is Flye able to identify R as a repeat? Thanks for the clarification!

Note: In Acknowledgments, Bahar Beshaz should be Bahar Behsaz. We worked together at the BC Cancer Genome Sciences Centre in Vancouver!

Floating point exception of flye-assemble

Hi there,

I tried to use flye-assemble to extend the contigs in flye-input-contigs.fa, but I encountered a "floating point exception" (log showed below). Do you think it is a problem of my contig sequences or a bug in flye-assemble?

flye-assemble -l /oak/stanford/groups/arend/Eric/meta/readclouds-l-gasseri-example/results/olc/flye-asm-1/flye.log -t 4 -s -v 1000 ./results/olc/flye-input-contigs.fa /oak/stanford/groups/arend/Eric/meta/readclouds-l-gasseri-example/results/olc/flye-asm-1/0-assembly/draft_assembly.fasta 1857551 /scratch/users/zhanglu2/software/Flye-2.3.3/flye/resource/asm_subasm.cfg
[2018-04-01 19:31:29] INFO: Running with k-mer size: 31
[2018-04-01 19:31:29] INFO: Reading sequences
[2018-04-01 19:31:30] INFO: Reads N50/90: 64830 / 21423
[2018-04-01 19:31:30] INFO: Selected minimum overlap 1000
[2018-04-01 19:31:30] INFO: Expected read coverage: 7
[2018-04-01 19:31:30] INFO: Generating solid k-mer index
[2018-04-01 19:31:47] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-01 19:31:48] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-01 19:31:49] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-01 19:31:50] INFO: Extending reads
Floating point exception

Flye with uncorrected reads report only half of the genome, while working well with corrected reads

Hi, I've tried Flye on inter-species hybrid genome (~15% divergence between parentals) and it worked well with corrected MinION reads, however when ran on uncorrected reads, Flye reported nearly 2x smaller assembly. Is it possible to alter options related to haplotypes separation for uncorrected reads?

Error running Flye

I encountered an error today which I have trouble to understand well:


../../Flye/bin/flye --pacbio-raw m11111_111111_111111.subreads.extract.fasta m11111_111111_111112.subreads.extract.fasta m11111_111111_111113.subreads.extract.fasta --genome-size 200000 --threads 30 -o test
[2018-01-10 16:59:22] INFO: Running Flye 2.3-4-g77de267
[2018-01-10 16:59:22] INFO: Assembling reads
[2018-01-10 16:59:23] INFO: Reading sequences
[2018-01-10 17:01:13] INFO: Generating solid k-mer index
[2018-01-10 17:01:13] ERROR: Caught unhandled exception: Wrong hard threshold value: 817
[2018-01-10 17:01:13] ERROR: 	flye-assemble(_Z16exceptionHandlerv+0x9f) [0x42c38f]
[2018-01-10 17:01:13] ERROR: 	/software/lib64/libstdc++.so.6(+0x8f136) [0x7fedece04136]
[2018-01-10 17:01:13] ERROR: 	/software/lib64/libstdc++.so.6(+0x8f181) [0x7fedece04181]
[2018-01-10 17:01:13] ERROR: 	/software/lib64/libstdc++.so.6(+0x8f399) [0x7fedece04399]
[2018-01-10 17:01:13] ERROR: 	flye-assemble(_ZN11VertexIndex10countKmersEm+0x986) [0x439c96]
[2018-01-10 17:01:13] ERROR: 	flye-assemble(main+0x89b) [0x41429b]
[2018-01-10 17:01:13] ERROR: 	/software/lib64/libc.so.6(__libc_start_main+0xf1) [0x7fedec291181]
[2018-01-10 17:01:13] ERROR: 	flye-assemble(_start+0x29) [0x415659]
[2018-01-10 17:01:13] ERROR: Command '['flye-assemble', '-k', '15', '-l', '/scratch/beegfs/monthly/eschmid/Giannuzzi_Assembly/FLYE_attempt/test/flye.log', '-t', '30', '-v', '5000', 'm11111_111111_111111.subreads.extract.fasta,m11111_111111_111112.subreads.extract.fasta,m11111_111111_111113.subreads.extract.fasta', 'FLYE_attempt/test/0-assembly/draft_assembly.fasta', '200000', '/Flye/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1

Any idea what could cause this threshold value error ?

Missing date in log

Hi developer,

Date is not shown in the log. In logging.Formatter(), datefmt is set to "%H:%M:%S", so only hour:min:sec is shown:

[11:04:10] INFO: Running ABruijn
[11:04:10] INFO: Assembling reads
[11:04:17] INFO: Counting kmers (1/2):
...

It would be great to include the date (like "%Y-%m-%d %H:%M:%S"), especially for benchmarking long runs.

[2017-05-26 11:04:10] INFO: Running ABruijn
[2017-05-26 11:04:10] INFO: Assembling reads
[2017-05-26 11:04:17] INFO: Counting kmers (1/2):
...

memory issues

I am trying to assemble a PacBio dataset containing 52Gb of data (my genome is 620Mb). ABruijn has been stuck for two days at 0% below:

(myenv) stelo@H4:~/ABruijn$ ./abruijn.py -t 40 reads.fa out_dir 70
[09:10:42] INFO: Running ABruijn
[09:10:42] INFO: Assembling reads
[09:34:00] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[12:41:12] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[15:02:27] WARNING: Unable to choose minimum kmer count cutoff. Check if the coverage parameter is correct. Running with the default parameter t = 2
[15:02:27] INFO: Building kmer index
0%

It is currently using 483GB (resident) of the 512GB RAM I have on my server, but also quite a bit of swap (virtual 719GB). I think ABruijn is not making much progress because of a lot of swapping. Is there any way I can reduce the amount of RAM needed? Maybe changing params? Or perhaps I can filter out reads that ABruijn would not use (i.e., shorter than a threshold)? Thanks.

ERROR: No contigs were assembled - are you using corrected input instead of raw?

I am getting the above mentioned error: "ERROR: No contigs were assembled - are you using corrected input instead of raw?"
even though I am running flye with the --pacbio-corr option. I am running it like this:
flye --pacbio-corr /path/to/Corrected/*.fasta --genome-size 200m -o /path/to/Pacbio/ -t 15

do I need to provide the uncorrected as well? I though I could run it also just with corrected reads.

kind regards

final assmelby much smaller tha raw assembly

Hi all,
First of all thank for an amazing tool, I have been using Flye quite successfully for assembly mammalian genomes using relatively low coverage PacBio reads (~30X). However, recently I tried an assembly of a non-model species with an expected haploid genome size of 2.2 Gbp (flow cytometry) and using 35X coverage of PacBio reads with an average read legnth of 3.4 Kbp. I am using Flye 2.3 with the following command line:

flye --pacbio-raw ${reads} -g 2.5g -m 1000 -o ${OD} -t 64 -i 3

The raw assembly and consensus stages produced assemblies of ~2.1 Gbp, but after the repeat solving and polishing steps the final assembly is only 781 Mbp (roughly 1/3). The genome is expected to be quite repetitive (~50% of simple sequence repeats and transposable elements), but that still does not quite explain such big difference between the raw assembly and the final polished assembly.

Is this expected behaviour? Is there any parameter that can be tweaked to improve the final assembly?

I look forward to to hearing back from you, any suggestions would be more than welcome.

Kind regards,

Juan Montenegro

contigs length smaller than Pacbio reads length

The output of Abruijn "polished_1.fasta" contains many short contigs(less than 1k). And the length of Pacbio reads larger than these short contigs. Why this happen?

Minimum coverage requirement

May I know the minimum coverage requirement for ABruijn to assemble a genome of about 300Mb?

Flye occasionally misjoined two different chromosomes (ONT data)

Hello,
I tested Flye on both pacbio and nanopore dataset from the budding yeast and in general I found Flye did a pretty good job with much shorter processing time than Canu (especially on the nanopore data). However, in my test with the nanopore data, Flye misjoined two different chromosomes, presumably based on their shared chromosome-end structure (e.g. telomere repeats). I checked the available parameters but didn't find much space to tweak. I was wondering if you have some specific recommendations. I can send my testing data for your check if this can help with the future development of Flye. Thanks in advance!

Best,
Jia-Xing

ERROR: Caught unhandled exception: Can't open config file

Hi all,

I would like to use flye to assemble a 7.5 megabase streptomyces genome. I have 1d oxford nanopore data for this organism.

I installed the tool as follows:

cd ~

git clone https://github.com/fenderglass/Flye
cd Flye
python setup.py build
python setup.py install --user

And then called it:

flye --nano-raw 1d.fastq --genome-size 7.5m --out-dir flye_out --threads 40

Unfortunately, the program encountered an error because it couldn't open a config file:

[2018-01-05 14:54:49] INFO: Running Flye 2.3-release
[2018-01-05 14:54:49] INFO: Assembling reads
[2018-01-05 14:54:49] ERROR: Caught unhandled exception: Can't open config file: /home/lina/.local/lib/python2.7/site-packages/flye/resource/asm_raw_reads.cfg
[2018-01-05 14:54:49] ERROR:    flye-assemble(_Z16exceptionHandlerv+0xb4) [0x434f34]
[2018-01-05 14:54:49] ERROR:    /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d6b6) [0x7f77c0bc26b6]
[2018-01-05 14:54:49] ERROR:    /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d701) [0x7f77c0bc2701]
[2018-01-05 14:54:49] ERROR:    /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d919) [0x7f77c0bc2919]
[2018-01-05 14:54:49] ERROR:    flye-assemble(_ZN6Config4loadERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1b11) [0x439941]
[2018-01-05 14:54:49] ERROR:    flye-assemble(main+0x2b6) [0x41cec6]
[2018-01-05 14:54:49] ERROR:    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f77c004f830]
[2018-01-05 14:54:49] ERROR:    flye-assemble(_start+0x29) [0x41e8d9]
[2018-01-05 14:54:49] ERROR: Command '['flye-assemble', '-k', '15', '-l', '/home/lina/flye_out/flye.log', '-t', '40', '-v', '5000', '/home/lina/1d.fastq', '/home/lina/flye_out/0-assembly/draft_assembly.fasta', '7864320', '/home/lina/.local/lib/python2.7/site-packages/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1

I checked and this config file does not exist. Is it something that I can manually create? Or has it been installed somewhere but the path needs to be adjusted?

Thanks for any advice!

returned non-zero exit status -7

Dear Flye,

Hope this email finds you well.
While I was testing the program for a PacBio data (genome size 1.5Gb) in PBSpro environment, I have bumped into the same issue constantly with the “returned non-zero exit status -7".
FYI, please see below for the output file.

Looking forward to your reply!
Flye_026T_Output.txt

Regards,

Taek

it needs python2 to compile

with python3:

Flye$ python setup.py build
  File "setup.py", line 16
    print "Compilation error: ", e
                              ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(print "Compilation error: ", e)?

How to add a supplementary polishing iteration

Hello,
First, many thanks in giving access to this great tool. It works just fine on my pacbio sequences from a plant genome (500Mb). It's the only one abble to assemble the chloroplast and the mitochondrial in one contig each and give L50 metrics for less than 150 contigs.
I've run Abruinj whith the default parameters except for the Kmer size that i've raise to 17. The polishing part iterated twice and returned Alignment error rate: 0.208880757523 (Polishing genome (1/2)) then Alignment error rate: 0.165291041893 (Polishing genome (2/2)).
I suppose i will have to iterate more polishing steps to improve the quality of the assembly.
How can i launch only the Abruinj-polishing part of he tool ?
Thank you for your reply.
_(°-°)_/

Empirically determine substitutions and homopolymers based on a first assembly?

Do you think that it would make a big difference for the assembly quality if one would determine the parameters contained in 'nano_homopolymers.mat' and 'nano_substitutions.mat' empirically for a specific ONP run. One could do a first "preliminary assembly" map the ONP data back, determine the parameters and then do a second assembly with refined tuning.

How did you generate the substitutions.mat?

Segmentation Fault when changing '--min-overlap' parameter

I am trying to assemble bacterial genome from the minion sequecing data. Sequencing coverage is about 30X- When i try to run flye with default paramaters, It completed successfully but the length of the final assembly is very less. So i want to try changing the '-m' paramter (1000) and see if it improves my assembly. But i got "Segmentation fault" error during the 'flye-repeat' step. Below is the output of my log file.

[2018-01-19 11:29:43] DEBUG: Build date: Jan 16 2018 12:39:30
[2018-01-19 11:29:43] DEBUG: Parameters:
[2018-01-19 11:29:43] DEBUG:    maximum_jump=1500
[2018-01-19 11:29:43] DEBUG:    maximum_overhang=1500
[2018-01-19 11:29:43] DEBUG:    hard_min_coverage_rate=10
[2018-01-19 11:29:43] DEBUG:    repeat_coverage_rate=10
[2018-01-19 11:29:43] DEBUG:    close_jump_rate=100
[2018-01-19 11:29:43] DEBUG:    far_jump_rate=2
[2018-01-19 11:29:43] DEBUG:    overlap_divergence_rate=5
[2018-01-19 11:29:43] DEBUG:    penalty_window=100
[2018-01-19 11:29:43] DEBUG:    max_coverage_drop_rate=5
[2018-01-19 11:29:43] DEBUG:    chimera_window=100
[2018-01-19 11:29:43] DEBUG:    min_reads_in_contig=4
[2018-01-19 11:29:43] DEBUG:    max_inner_reads=10
[2018-01-19 11:29:43] DEBUG:    max_inner_fraction=0.25
[2018-01-19 11:29:43] DEBUG:    max_separation=500
[2018-01-19 11:29:43] DEBUG:    tip_length_threshold=20000
[2018-01-19 11:29:43] DEBUG:    unique_edge_length=50000
[2018-01-19 11:29:43] DEBUG:    min_repeat_res_support=0.5
[2018-01-19 11:29:43] DEBUG:    out_paths_ratio=5
[2018-01-19 11:29:43] DEBUG:    graph_cov_drop_rate=10
[2018-01-19 11:29:43] DEBUG:    coverage_estimate_window=100
[2018-01-19 11:29:43] DEBUG:    low_cutoff_warning=1
[2018-01-19 11:29:43] DEBUG:    assemble_kmer_sample=1
[2018-01-19 11:29:43] DEBUG:    assemble_gap=500
[2018-01-19 11:29:43] DEBUG:    repeat_graph_kmer_sample=5
[2018-01-19 11:29:43] DEBUG:    repeat_graph_gap=100
[2018-01-19 11:29:43] DEBUG:    repeat_graph_max_kmer=500
[2018-01-19 11:29:43] DEBUG:    read_align_kmer_sample=1
[2018-01-19 11:29:43] DEBUG:    read_align_gap=500
[2018-01-19 11:29:43] DEBUG:    read_align_max_kmer=500
[2018-01-19 11:29:43] INFO: Reading sequences
[2018-01-19 11:29:44] INFO: Building repeat graph
[2018-01-19 11:29:44] DEBUG: Hard threshold set to 1
[2018-01-19 11:29:44] DEBUG: Started kmer counting
[2018-01-19 11:30:00] DEBUG: Solid kmers: 15643
[2018-01-19 11:30:00] DEBUG: Kmer index size: 36807
[2018-01-19 11:30:00] DEBUG: Total chunks 1 wasted space: 0
[2018-01-19 11:30:18] DEBUG: Found 148 overlaps
[2018-01-19 11:30:18] DEBUG: Left 18 overlaps after filtering
[2018-01-19 11:30:18] DEBUG: Building interval tree
[2018-01-19 11:30:18] DEBUG: Computing gluepoints
[2018-01-19 11:30:18] DEBUG: Created 28 gluepoints
[2018-01-19 11:30:18] DEBUG: Tandems removed: 0 left, 0 right, 0 both
`[2018-01-19 11:30:18] DEBUG: Initializing edges
[2018-01-19 11:30:18] DEBUG: *  -2      +contig_1       171     30872   30701
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       30872   33181   2309
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       33181   33858   677
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       33858   43849   9991
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       43849   45000   1151
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       45000   46640   1640
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       46640   47575   935
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       47575   49824   2249
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       49824   50562   738
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       50562   52679   2117
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       52679   54164   1485
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       54164   55016   852
[2018-01-19 11:30:18] DEBUG:    -1      +contig_1       55016   55689   673
[2018-01-19 11:30:18] DEBUG: Total edges: 2
[2018-01-19 11:30:18] DEBUG: Writing Dot
[2018-01-19 11:30:18] DEBUG: 0 tips clipped
[2018-01-19 11:30:18] DEBUG: Removed 0 edges
[2018-01-19 11:30:18] DEBUG: Added 0 edges
[2018-01-19 11:30:18] DEBUG: Removed 0 chimeric junctions
[2018-01-19 11:30:18] DEBUG: Collapsed 0 bulges
[2018-01-19 11:30:18] DEBUG: Removed 0 edges
[2018-01-19 11:30:18] DEBUG: Added 0 edges
[2018-01-19 11:30:18] DEBUG: 0 tips clipped
[2018-01-19 11:30:18] INFO: Aligning reads to the graph
[2018-01-19 11:30:18] DEBUG: Hard threshold set to 1
[2018-01-19 11:30:18] DEBUG: Started kmer counting
[2018-01-19 11:30:34] DEBUG: Solid kmers: 15424
[2018-01-19 11:30:34] DEBUG: Kmer index size: 36465
[2018-01-19 11:30:34] DEBUG: Total chunks 1 wasted space: 0
[2018-01-19 11:36:07] DEBUG: Aligned 73 / 1354
[2018-01-19 11:36:07] DEBUG: Aligned length 1209883 / 5984822 0.202159
[2018-01-19 11:36:07] DEBUG: Mean edge coverage: 1
[2018-01-19 11:36:07] DEBUG: -2 30701   13      13
[2018-01-19 11:36:07] DEBUG: 2  30701   13      13
 DEBUG: -1 2068    15      15
[2018-01-19 11:36:07] DEBUG: 1  2068    15      15
[2018-01-19 11:36:07] ERROR: Segmentation fault! Backtrace:
[2018-01-19 11:36:07] ERROR:    flye-repeat(_Z15segfaultHandleri+0x1e) [0x47c1de]
[2018-01-19 11:36:07] ERROR:    /usr/lib64/libc.so.6(+0x35270) [0x7effb1dce270]
[2018-01-19 11:36:07] ERROR:    flye-repeat(_Z3q75IiET_RSt6vectorIS0_SaIS0_EE+0x11d) [0x432c5d]
[2018-01-19 11:36:07] ERROR:    flye-repeat(_ZN19MultiplicityInferer16estimateCoverageEv+0xe59) [0x42ff39]
[2018-01-19 11:36:07] ERROR:    flye-repeat(main+0x58c) [0x429b3c]
[2018-01-19 11:36:07] ERROR:    /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7effb1dbac05]
[2018-01-19 11:36:07] ERROR:    flye-repeat() [0x42aa2f]

mikolmogorov / flye Goto Github PK

flye's People

Contributors

Stargazers

Watchers

Forkers

flye's Issues

Recommend Projects

Recommend Topics

Recommend Org