mikolmogorov / flye Goto Github PK
View Code? Open in Web Editor NEWDe novo assembler for single molecule sequencing reads using repeat graphs
License: Other
De novo assembler for single molecule sequencing reads using repeat graphs
License: Other
Hello everyone,
I'm trying to get flye to run using similar parameters as I did with Abruijn. I'm using a Centos 7.0 with 1.5Tb of RAM and 64 cores (128 threads). With the following command:
flye --pacbio-raw ${reads} -g 3g -o ${OD} -t 127 -i 3
The program executes for 15 hours and then it fails with a series of error, the firs of which is the following:
[2018-01-08 10:03:43] INFO: Running Flye 2.3-release
[2018-01-08 10:03:43] INFO: Assembling reads
[2018-01-08 10:03:43] INFO: Reading sequences
[2018-01-08 10:20:58] INFO: Generating solid k-mer index
[2018-01-08 10:25:51] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-08 10:29:21] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-08 10:51:10] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-08 11:46:11] INFO: Extending reads
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-08 22:26:16] INFO: Assembled 226679 draft contigs
[2018-01-08 22:30:40] INFO: Generating contig sequences
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-08 23:28:59] INFO: Running Minimap2
[2018-01-09 01:13:13] INFO: Computing consensus
Process SyncManager-1:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python2.7/multiprocessing/managers.py", line 558, in _run_server
server.serve_forever()
File "/usr/lib64/python2.7/multiprocessing/managers.py", line 184, in serve_forever
t.start()
File "/usr/lib64/python2.7/threading.py", line 747, in start
_start_new_thread(self.__bootstrap, ())
error: can't start new thread
(...)
and then just continues complaining about all the threads that failed, then missing or empty files and then it finishes. I know that Abruijn was able to use all the 128 threads available in the node, is there a limit to the number of threads that Flye can start?, Would it be better to drop the number to 64 or something like that?
Best regards,
From main.py:
https://github.com/fenderglass/Flye/blob/e5b1903e5ce7fbf687b44e2c46f50587453e50e7/flye/main.py#L179-L182
From consensus.py:
self.args.threads should change place with self.args.min_overlap, as far as I can see.
Thank you.
Ole
I have:
I get the following error:
[2017-09-19 15:37:47] INFO: Extending reads 0% 10% 20% [2017-09-19 17:48:35] ERROR: Error: Error in assemble binary: Command '['abruijn-assemble', '-k', '15', '-l', '/home/user/Documents/Abruijn_ko_fz/out/abruijn.log', '-t', '11', '-v', '5000', '/home/user/bioinf_archive/32_scmi_storage/onp/ko_onp_FZ1/extracted/twoBestMin30.fasta', '/home/user/Documents/Abruijn_ko_fz/out/draft_assembly.fasta', '150']' returned non-zero exit status -9
I don't think that I run out of discspace/memory.
What went wrong?
Here the end of the log file:
With 11 reads
Start read: -2be4669b-93c2-4154-aa4b-625728fa7d06_runid=2b076ac8f6a448e848698ae57d8581ac75fc0637_read=7487_ch=298_start_time=2017-09-13T02:49:06Z_.poretools_tmp/20170912_1617_qc/fast5/pass/36/fz_i_177_20170912_fah18372_MN15037_sequencing_run_qc_40637_read_7487_ch_298_strand.fast5
At position: 10
leftTip: 0 rightTip: 0
Suspicios: 0
Mean overlaps: 256
Inner reads: 10
[2017-09-19 15:40:10] DEBUG: Inner: 30804 covered: 42884 total: 55124
[2017-09-19 15:40:10] DEBUG: Discarded contig with 17 reads and 16 inner overlaps
[2017-09-19 15:40:10] DEBUG: Discarded contig with 13 reads and 12 inner overlaps
[2017-09-19 17:48:35] root: ERROR: Error: Error in assemble binary: Command '['abruijn-assemble', '-k', '15', '-l', '/home/user/Documents/Abruijn_ko_fz/out/abruijn.log', '-t', '11', '-v', '5000', '/home/user/archive/storage/onp/ko_onp_FZ1/extracted/twoBestMin30.fasta', '/home/user/Documents/Abruijn_ko_fz/out/draft_assembly.fasta', '150']' returned non-zero exit status -9```
Hi,
Many thanks for developing this great assembler.
I have recently updated to version 2.0 version of ABruijn and testing it out on relatively complex nanopore metagenomics samples. The eukaryotic assembly portion, which I'm most interested in, comprises 5-30% of the reads in a given dataset. This means that there tends to be a high coverage of several bacterial genomes in the assembly as a consequence. The two datasets that I'm working with are 2.7 Gbp and 16 Gbp.
With the new version of ABruijn I noticed that the read extension appears to jump from either 0%, 10% or 30% (example shown below) directly to 100% completion depending on the dataset. The assembly appears to progress normally after this. The final assemblies I'm getting appears to have lower contiguity than earlier versions (pre 2.0) of ABruijn with the same dataset or a dataset of half the size..
Is the jump to completion in read extension expected behaviour with the new version or could there be an indication of a problem with the assembly (lower assembly contiguity)? Sorry if these are vague questions but I have a feeling that the assembler is running into issues, perhaps as a result of conflicts between the given genome size and the estimated coverage by the assembler.
The estimated genome size is around 30 Mbp, with coverage of around 20x in the dataset for the example shown below.
The launch code:
abruijn MinION_albacore1.0.3.chop.fasta /scratch2/jon/MinION/MinION_Abruijn_2.0/ 20 --platform nano --threads 10 --min-overlap 3000 --iterations 3
[2017-08-21 12:43:16] INFO: Running ABruijn
[2017-08-21 12:43:17] INFO: Assembling reads
[2017-08-21 12:43:17] INFO: Reading FASTA
[2017-08-21 12:45:28] INFO: Generating solid k-mer index
[2017-08-21 12:45:30] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-21 12:55:07] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-21 13:06:59] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-21 13:23:30] INFO: Extending reads
0% 10% 20% 30% 100%
[2017-08-21 15:56:40] INFO: Assembled 325 draft contigs
[2017-08-21 15:56:40] INFO: Generating contig sequences
[2017-08-21 16:05:24] INFO: Running BLASR
[2017-08-21 16:37:38] INFO: Computing rough consensus
[2017-08-21 18:12:30] INFO: Performing repeat analysis
[2017-08-21 18:12:32] INFO: Reading FASTA
[2017-08-21 18:14:25] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-21 18:17:48] INFO: Simplifying the graph
[2017-08-21 18:17:49] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-21 19:02:42] INFO: Resolving repeats
[2017-08-21 19:02:59] INFO: Generating contigs
[2017-08-21 19:02:59] INFO: Generated 461 contigs
[2017-08-21 19:03:05] INFO: Running BLASR
[2017-08-21 19:30:03] INFO: Polishing genome (1/3)
[2017-08-21 19:32:03] INFO: Separating alignment into bubbles
[2017-08-21 21:19:35] INFO: Correcting bubbles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-22 07:16:53] INFO: Running BLASR
[2017-08-22 07:44:43] INFO: Polishing genome (2/3)
[2017-08-22 07:46:42] INFO: Separating alignment into bubbles
[2017-08-22 09:43:09] INFO: Correcting bubbles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-22 15:47:37] INFO: Running BLASR
[2017-08-22 16:14:38] INFO: Polishing genome (3/3)
[2017-08-22 16:16:32] INFO: Separating alignment into bubbles
[2017-08-22 18:15:51] INFO: Correcting bubbles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-08-22 21:48:37] INFO: Done! Your assembly is in file: /scratch2/jon/MinION/Abruijn_2.0/polished_3.fasta
Hi,
I tried to assemble hi5 genome from the pacbio reads, the genome size is about 400M. It was looking for reads_order.fasta file, but there is only one log file in the folder.
Thanks
Jack
[11:34:53] root: INFO: Running ABruijn
[11:34:53] root: INFO: Assembling reads
-----------Begin assembly log------------
[11:34:53] DEBUG: Build date: Dec 16 2016 11:26:08
[11:34:53] DEBUG: Reading FASTA
[11:44:26] DEBUG: Hard threshold set to 6
[11:44:27] INFO: Counting kmers (1/2):
[12:19:43] INFO: Counting kmers (2/2):
[12:56:18] DEBUG: Genome size estimate: 348697701
[12:56:18] DEBUG: Filtered 4104784 repetitive kmers
[12:56:18] DEBUG: Estimated minimum kmer coverage: 12, 697822632 unique kmers selected
[12:56:18] INFO: Building kmer index
[14:08:30] root: ERROR: Error: Error in assemble binary: Command '['abruijn-assemble', '-k', '15', '-l', '/is2/projects/nanopore/scratch/hi5/hi5_32_abjuijn/abruijn.log', '-t', '16', '-v', '5000', '/is2/projects/pacbio/active/data/product/smrtportal/018/018932/data/filtered_subreads.fasta', '/is2/projects/nanopore/scratch/hi5/hi5_32_abjuijn/reads_order.fasta', '60']' returned non-zero exit status -9
I've been running Flye on some PacBio data which is high quality but for a long and highly repetitive genome and I don't have a lot of coverage (cost constraints). Progress was fine up to the point where it started flye-repeat and now it is just sitting there using a single CPU (even though 64 were specified) and hasn't produced any logs since Feb 16th. I've no way to tell how long this is likely to take looking at it and it is just sitting at Initializing edges. There's ample RAM and CPU on the machine (3TB and 72 cores) so I'm just hoping there's some way to tell if it is ever going to finish. The genome is 26Gb and I've only got about 6x coverage. I have a whole lot of Illumina data too but I was hoping to make some long runs from this that could help me join the many many contigs I got out of SOAPdenovo2 together.
Hi,
I'm trying to assemble a eukaryotic genome of about 200-300 Mbp size, genome size was estimated from a miniasm assembly. Besides the eukaryote genome there are also a considerable amount of prokaryotic genomes associated (both endosymbiont and extracellular) with the data set. The total dataset is around 16 Gbp ONT reads. An appreciable amount of the data is realtively short so I decided to run with "--min-overlap 3000"
Launch-script
flye --nano-raw \
/scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq \
--genome-size 200m --out-dir Busselton2_Flye_200m_3000 --threads 20 --min-overlap 3000 --iterations 2 --resume
Log-file, start
[2018-02-22 08:41:05] root: DEBUG: Genome size: 209715200
[2018-02-22 08:41:05] root: DEBUG: Chosen k-mer size: 17
[2018-02-22 08:41:05] root: INFO: Running Flye 2.3-release
[2018-02-22 08:41:05] root: DEBUG: Cmd: /scratch2/software/python-2.7-env/bin/flye --nano-raw /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq --genome-size 200m --out-dir Busselton2_Flye_200m_3000 --threads 20 --min-overlap 3000 --iterations 2
[2018-02-22 08:41:05] root: INFO: Assembling reads
[2018-02-22 08:41:05] root: DEBUG: -----Begin assembly log------
[2018-02-22 08:41:05] root: DEBUG: Running: flye-assemble -k 17 -l /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/flye.log -t 20 -v 3000 /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/0-assembly/draft_assembly.fasta 209715200 /scratch2/software/python-2.7-env/local/lib/python2.7/site-packages/flye/resource/asm_raw_reads.cfg
[2018-02-22 08:41:05] DEBUG: Build date: Jan 8 2018 12:26:55
[2018-02-22 08:41:05] DEBUG: Parameters:
[2018-02-22 08:41:05] DEBUG: maximum_jump=1500
[2018-02-22 08:41:05] DEBUG: maximum_overhang=1500
[2018-02-22 08:41:05] DEBUG: hard_min_coverage_rate=10
[2018-02-22 08:41:05] DEBUG: repeat_coverage_rate=10
[2018-02-22 08:41:05] DEBUG: close_jump_rate=100
[2018-02-22 08:41:05] DEBUG: far_jump_rate=2
[2018-02-22 08:41:05] DEBUG: overlap_divergence_rate=5
[2018-02-22 08:41:05] DEBUG: penalty_window=100
[2018-02-22 08:41:05] DEBUG: max_coverage_drop_rate=5
[2018-02-22 08:41:05] DEBUG: chimera_window=100
[2018-02-22 08:41:05] DEBUG: min_reads_in_contig=4
[2018-02-22 08:41:05] DEBUG: max_inner_reads=10
[2018-02-22 08:41:05] DEBUG: max_inner_fraction=0.25
[2018-02-22 08:41:05] DEBUG: max_separation=500
[2018-02-22 08:41:05] DEBUG: tip_length_threshold=20000
[2018-02-22 08:41:05] DEBUG: unique_edge_length=50000
[2018-02-22 08:41:05] DEBUG: min_repeat_res_support=0.5
[2018-02-22 08:41:05] DEBUG: out_paths_ratio=5
[2018-02-22 08:41:05] DEBUG: graph_cov_drop_rate=10
[2018-02-22 08:41:05] DEBUG: coverage_estimate_window=100
[2018-02-22 08:41:05] DEBUG: low_cutoff_warning=1
[2018-02-22 08:41:05] DEBUG: assemble_kmer_sample=1
[2018-02-22 08:41:05] DEBUG: assemble_gap=500
[2018-02-22 08:41:05] DEBUG: repeat_graph_kmer_sample=5
[2018-02-22 08:41:05] DEBUG: repeat_graph_gap=100
[2018-02-22 08:41:05] DEBUG: repeat_graph_max_kmer=500
[2018-02-22 08:41:05] DEBUG: read_align_kmer_sample=1
[2018-02-22 08:41:05] DEBUG: read_align_gap=500
[2018-02-22 08:41:05] DEBUG: read_align_max_kmer=500
[2018-02-22 08:41:05] INFO: Reading sequences
[2018-02-22 10:17:47] DEBUG: Mean read length: 3639
[2018-02-22 10:17:47] DEBUG: Estimated coverage: 69
[2018-02-22 10:17:47] INFO: Generating solid k-mer index
[2018-02-22 10:17:47] DEBUG: Hard threshold set to 7
[2018-02-22 10:17:47] DEBUG: Started kmer counting
[2018-02-22 10:28:35] INFO: Counting kmers (1/2):
[2018-02-22 10:32:57] INFO: Counting kmers (2/2):
[2018-02-22 10:44:30] DEBUG: Filtered 363871 repetitive kmers
[2018-02-22 10:44:30] DEBUG: Estimated minimum kmer coverage: 10, 206931346 unique kmers selected
[2018-02-22 10:44:30] INFO: Filling index table
[2018-02-22 10:44:38] DEBUG: Solid kmers: 206931346
[2018-02-22 10:44:38] DEBUG: Kmer index size: 6149339332
[2018-02-22 11:02:03] DEBUG: Total chunks 1467 wasted space: 71130
[2018-02-22 11:11:41] INFO: Extending reads
[2018-02-22 11:17:22] DEBUG: Mean read coverage: 53
[2018-02-22 11:23:31] DEBUG: Assembled contig 1
Log-file end
[2018-02-23 01:46:19] DEBUG: Inner: 737088 covered: 1174293 total: 8006938
[2018-02-23 01:47:02] DEBUG: Discarded contig with 7 reads and 2 inner overlaps
[2018-02-23 01:49:04] INFO: Assembled 1496 draft contigs
[2018-02-23 01:49:11] INFO: Generating contig sequences
[2018-02-23 02:12:56] DEBUG: Writing FASTA
-----------End assembly log------------
[2018-02-23 02:13:40] root: INFO: Running Minimap2
[2018-02-23 02:13:40] root: DEBUG: Running: flye-minimap2 /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/0-assembly/draft_assembly.fasta /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq -a -Q -w5 -m100 -g10000 --max-chain-skip 25 -t 20 -k15
[2018-02-23 03:17:44] root: DEBUG: Sorting alignment file
[2018-02-23 04:01:37] root: INFO: Computing consensus
[2018-02-25 12:13:37] root: DEBUG: Genome size: 209715200
[2018-02-25 12:13:37] root: DEBUG: Chosen k-mer size: 17
[2018-02-25 12:13:37] root: INFO: Running Flye 2.3-release
[2018-02-25 12:13:37] root: DEBUG: Cmd: /scratch2/software/python-2.7-env/bin/flye --nano-raw /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq --genome-size 200m --out-dir Busselton2_Flye_200m_3000 --threads 20 --min-overlap 3000 --iterations 2 --resume
[2018-02-25 12:13:37] root: INFO: Resuming previous run
[2018-02-25 12:13:37] root: INFO: Running Minimap2
[2018-02-25 12:13:37] root: DEBUG: Running: flye-minimap2 /misc/scratch3/jon/MinION/Busselton/ASSEMBLY/Flye/Busselton2_Flye_200m_3000/0-assembly/draft_assembly.fasta /scratch3/jon/MinION/Busselton/Busselton2_180218/TRIMMED_READS/Busselton2_MinION_180221_ALL.chop.fastq -a -Q -w5 -m100 -g10000 --max-chain-skip 25 -t 20 -k15
[2018-02-25 14:28:55] root: DEBUG: Sorting alignment file
[2018-02-25 16:13:47] root: INFO: Computing consensus
The assembly was running well and produced the a draft sequence, Then we had a cluster crash during the consensus step (not specifically related to Flye I think). I restarted using "--resume ". During the consensus run I have received a large number of error like the two instances shown below. Flye appears to be still running.
[2018-02-25 12:13:37] INFO: Running Flye 2.3-release
[2018-02-25 12:13:37] INFO: Resuming previous run
[2018-02-25 12:13:37] INFO: Running Minimap2
[2018-02-25 16:13:47] INFO: Computing consensus
Process Process-1020:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/scratch2/software/python-2.7-env/local/lib/python2.7/site-packages/flye/consensus.py", line 45, in _thread_worker
error_queue.put(e)
File "<string>", line 2, in put
File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
self._connect()
File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 175, in Client
answer_challenge(c, authkey)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 428, in answer_challenge
message = connection.recv_bytes(256) # reject large message
EOFError
Process Process-1023:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/scratch2/software/python-2.7-env/local/lib/python2.7/site-packages/flye/consensus.py", line 45, in _thread_worker
error_queue.put(e)
File "<string>", line 2, in put
File "/usr/lib/python2.7/multiprocessing/managers.py", line 755, in _callmethod
self._connect()
File "/usr/lib/python2.7/multiprocessing/managers.py", line 742, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 175, in Client
answer_challenge(c, authkey)
File "/usr/lib/python2.7/multiprocessing/connection.py", line 428, in answer_challenge
message = connection.recv_bytes(256) # reject large message
When investigate the node processes, there appear to be a large number of flye processes that are not using any resources. I launched with 20 threads.
Tasks: 1211 total, 2 running, 588 sleeping, 0 stopped, 621 zombie
%Cpu(s): 37.4 us, 0.3 sy, 0.0 ni, 62.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 79251590+total, 29567382+used, 49684204+free, 75548 buffers
KiB Swap: 7842748 total, 0 used, 7842748 free. 50032572 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22028 jon 20 0 25896 4048 2560 R 1.3 0.0 0:01.36 top
3571 jon 20 0 20836 5868 2752 S 0.0 0.0 0:00.07 bash
3580 jon 20 0 374792 341216 6100 S 0.0 0.0 3:44.47 flye
8248 jon 20 0 32.335g 774880 4028 S 0.0 0.1 0:07.17 flye
15421 jon 20 0 374568 338464 3552 S 0.0 0.0 0:00.00 flye
15424 jon 20 0 0 0 0 Z 0.0 0.0 0:00.01 flye
15427 jon 20 0 0 0 0 Z 0.0 0.0 0:00.01 flye
15430 jon 20 0 0 0 0 Z 0.0 0.0 0:00.03 flye
15433 jon 20 0 0 0 0 Z 0.0 0.0 0:00.02 flye
15436 jon 20 0 0 0 0 Z 0.0 0.0 0:00.02 flye
15439 jon 20 0 0 0 0 Z 0.0 0.0 0:00.03 flye
15442 jon 20 0 0 0 0 Z 0.0 0.0 0:00.02 flye
15445 jon 20 0 0 0 0 Z 0.0 0.0 0:00.03 flye
15448 jon 20 0 0 0 0 Z 0.0 0.0 0:00.04 flye
15451 jon 20 0 0 0 0 Z 0.0 0.0 0:00.03 flye
15454 jon 20 0 0 0 0 Z 0.0 0.0 0:00.03 flye
15457 jon 20 0 0 0 0 Z 0.0 0.0 0:00.00 flye
15460 jon 20 0 0 0 0 Z 0.0 0.0 0:00.01 flye
15463 jon 20 0 0 0 0 Z 0.0 0.0 0:00.01 flye
15466 jon 20 0 0 0 0 Z 0.0 0.0 0:00.01 flye
15469 jon 20 0 0 0 0 Z 0.0 0.0 0:00.04 flye
15472 jon 20 0 0 0 0 Z 0.0 0.0 0:00.00 flye
15475 jon 20 0 0 0 0 Z 0.0 0.0 0:00.04 flye
15478 jon 20 0 0 0 0 Z 0.0 0.0 0:00.04 flye
15481 jon 20 0 0 0 0 Z 0.0 0.0 0:00.01 flye
15484 jon 20 0 0 0 0 Z 0.0 0.0 0:00.01 flye
15487 jon 20 0 0 0 0 Z 0.0 0.0 0:00.01 flye
15490 jon 20 0 0 0 0 Z 0.0 0.0 0:00.00 flye
15493 jon 20 0 0 0 0 Z 0.0 0.0 0:00.01 flye
Any ideas on what these errors might mean or if they are benign?
Cheers
Jon
Hi!
I'd like to try your approach to get a draft assembly for a 3kb amplicon. Most of the reads fully span the region with ~30x. Did you ever try this? It doesn't assemble a contig.
[17:40:19] INFO: Running ABruijn
[17:40:19] INFO: Assembling reads
[17:40:21] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:40:21] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:40:22] INFO: Building kmer index
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:40:22] INFO: Finding overlaps:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:40:22] INFO: Extending reads
[17:40:22] INFO: Assembled 0 contigs
[17:40:22] INFO: Generating contig sequences
[17:40:22] INFO: Polishing genome (1/2)
[17:40:22] INFO: Running BLASR
[17:40:22] ERROR: While running...
The goal would be to create a draft consensus sequence.
Thank you,
Armin
Hi there,
I'm trying to assemble an older PacBio dataset in which many of the reads are ~1000bp. Flye does not allow the min overlap threshold to be set any lower than 1000bp - I think that this might be why my final assembly has many contigs of roughly this size.
Is there a good reason not to run flye with a lower limit? and if not, how can disable the error message?
Cheers,
Adam
Hello,
I am trying to get Athena_meta to work, and part of the pipeline involves flye.
I have tried using version 2.3.4 and 2.3.1 but I always get the same error related to flye-polish.
Can you help me identify a solution for this issue?
launching Flye OLC assembly
cmd flye --subassemblies ./results/olc/flye-input-contigs.fa --out-dir ./results/olc/flye-asm-1 --genome-size 1857551 --threads 4 --min-overlap 1000
Traceback (most recent call last):
File "/usr/local/devel/BCIS/kevin/Flye-2.3.1/bin/flye", line 31, in <module>
sys.exit(main())
File "/usr/local/devel/BCIS/kevin/Flye-2.3.1/flye/main.py", line 513, in main
pol.check_binaries()
File "/usr/local/devel/BCIS/kevin/Flye-2.3.1/flye/polish.py", line 41, in check_binaries
raise PolishException(str(e))
flye.polish.PolishException: Command '['flye-polish', '-h']' returned non-zero exit status 1
Thanks in Advance,
Kevin N.
Hi, just started here.
I think there are a couple of errors in the install docs
First, to build ABruijn, run:
python install.py build
I used
python setup.py build
ABruijn could be invoked with the following command:
bin/abruijn
This then worked, I haven't tested out the full algorithm with test or my data yet though
Additonally, you may install the package for the better OS integration:
python setup.pu install
Correction
python setup.py install
Hello all,
I've been trying to assemble some small microbes with flye and so far the results look very encouraging!
I'm used to looking at assembly graphs with Bandage. Do you have any plans for providing gfa-formatted assembly graphs in the near future?
Cheers,
~Lina
Hi!
When I try to open the assembly_graph.dot with gephi, I get this error below. It seems like gephi supports this file format, any ideas what might be going wrong here? Suggestions for a different viewer?
Thanks!
Lizzy
java.lang.IllegalArgumentException: The id can't be empty
at org.gephi.io.importer.impl.ImportContainerImpl.checkId(ImportContainerImpl.java:1045)
at org.gephi.io.importer.impl.ImportContainerImpl.nodeExists(ImportContainerImpl.java:209)
at org.gephi.io.importer.plugin.file.ImporterDOT.getOrCreateNode(ImporterDOT.java:197)
at org.gephi.io.importer.plugin.file.ImporterDOT.stmt(ImporterDOT.java:181)
at org.gephi.io.importer.plugin.file.ImporterDOT.stmtList(ImporterDOT.java:161)
at org.gephi.io.importer.plugin.file.ImporterDOT.graph(ImporterDOT.java:149)
at org.gephi.io.importer.plugin.file.ImporterDOT.importData(ImporterDOT.java:105)
at org.gephi.io.importer.plugin.file.ImporterDOT.execute(ImporterDOT.java:87)
Caused: java.lang.RuntimeException
at org.gephi.io.importer.plugin.file.ImporterDOT.execute(ImporterDOT.java:89)
at org.gephi.io.importer.impl.ImportControllerImpl.importFile(ImportControllerImpl.java:199)
at org.gephi.io.importer.impl.ImportControllerImpl.importFile(ImportControllerImpl.java:169)
at org.gephi.desktop.importer.DesktopImportControllerUI$4.run(DesktopImportControllerUI.java:341)
Caused: java.lang.RuntimeException
at org.gephi.desktop.importer.DesktopImportControllerUI$4.run(DesktopImportControllerUI.java:349)
[catch] at org.gephi.utils.longtask.api.LongTaskExecutor$RunningLongTask.run(LongTaskExecutor.java:274)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I have a question about the following step in Flye. I am running Flye and it gets to this point:
[2018-07-31 06:48:39] INFO: Performing repeat analysis
[2018-07-31 06:48:39] INFO: Reading sequences
[2018-07-31 06:54:55] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-07-31 07:01:59] INFO: Sequence divergence stats: Q25 = 0.028, Q50 = 0.055, Q75 = 0.11
[2018-07-31 07:04:44] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-07-31 07:58:56] INFO: Aligned read sequence: 16147937116 / 28135562892 (0.573933)
[2018-07-31 07:58:56] INFO: Sequence divergence stats: Q25 = 0.011, Q50 = 0.028, Q75 = 0.077
[2018-07-31 07:59:14] INFO: Mean edge coverage: 11
[2018-07-31 08:20:16] INFO: Resolving repeats
[2018-07-31 10:23:18] INFO: Generating contigs
[2018-07-31 12:26:35] INFO: Generated 43861 contigs
And then it doesn't produce any output and stays like this for 24 hrs. I have run strace on the PID and it does look like its doing something on one thread. Is this step normally slow?
Hi again,
Because of the memory issue, I extracted the longest 50X reads using SelectLongestReads to run Flye
. But there is another problem now:
[2018-05-09 10:40:18] INFO: Running Flye 2.3.3-g47cdd0b
[2018-05-09 10:40:18] INFO: Assembling reads
[2018-05-09 10:40:18] INFO: Running with k-mer size: 17
[2018-05-09 10:40:18] INFO: Reading sequences
[2018-05-09 11:19:00] INFO: Reads N50/90: 23770 / 18657
[2018-05-09 11:19:02] INFO: Selected minimum overlap 5000
[2018-05-09 11:19:04] INFO: Expected read coverage: 46
[2018-05-09 11:19:04] INFO: Generating solid k-mer index
[2018-05-09 11:19:28] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-05-09 11:33:48] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-05-09 13:04:42] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-05-09 15:40:36] INFO: Extending reads
[2018-05-09 16:12:50] INFO: Overlap-based coverage: 14
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-05-15 23:03:43] INFO: Assembled 9386 draft contigs
[2018-05-15 23:05:10] INFO: Generating contig sequences
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-05-16 01:04:56] INFO: Running Minimap2
[2018-05-16 04:55:10] INFO: Computing consensus
[2018-05-16 05:45:08] INFO: Alignment error rate: 0.0
[2018-05-16 05:45:08] INFO: Performing repeat analysis
[2018-05-16 05:45:09] INFO: Reading sequences
[2018-05-16 05:45:09] ERROR: parse error in /parastor300/niuyw/Project/Goqi_genome_180207/flye/run1.1/1-consensus/consensus.fasta on line 1: empty sequence
[2018-05-16 05:45:09] ERROR: Command '['flye-repeat', '-l', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1.1/flye.log', '-t', '40', '-g', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1.1/1-consensus/consensus.fasta', '/home/zhangll/Tasks/Gouqi/data/Pacbio/Pacbio_50x.fasta', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1.1/2-repeat', '2147483648', '/home/niuyw/software/Flye-2.3.3/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1
cmdline: flye --pacbio-raw Pacbio_50x.fasta --out-dir run1.1 --genome-size 2g --threads 40
Thank you in advance!
Hi guys, thanks for the fantastic piece of software, it works beautifully. I am just wondering if you plan to add support for subreads in the PacBio BAM format? Lots of information contained in the BAM file could probably be used to help improve the assembly.
I am wondering why you decided to use minimap2 as a replacement for BLASR. I know, BLASR is designed for PacBio data, so BLASR is not the right choice for ONP data. Also I am aware of the fact, that minimap2 is super fast.
However, can you elaborate a bit on why you did not choose GraphMap as a replacement? Isn't it more sensitive than minimap2?
It might even be an option to incorporate GraphMap as well and let the user to choose? :-) Though this means quite some work on your side I guess...
Hi,
Thanks for developing this great software!
I have encountered an error in the polishing step when running Abruijn on a mixed/metagenomic 1D Nanopore-dataset (many organisms with varying coverage). I have assembled similar data sets before without large issues. Oddly enough Abruijn assembles the data and manages to polish a first iteration, but it then fails in the second iteration with the following error message. The requistite files appears to be present (i e bubbles_2.fasta). Not sure what is going on here. If you have suggestion to what has gone wrong and how I could avoid this happening in the future it would be great!
[13:34:45] INFO: Polishing genome (1/2)
[13:34:50] INFO: Running BLASR
[14:37:51] INFO: Separating draft genome into bubbles
[16:38:52] INFO: Correcting bubbles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[15:25:21] INFO: Polishing genome (2/2)
[15:25:30] INFO: Running BLASR
[16:27:39] INFO: Separating draft genome into bubbles
[19:09:34] INFO: Correcting bubbles
0% [19:48:05] ERROR: Error: Error while running polish binary: Command '['abruijn-polish', '-t', '16', '/scratch2/jon/MinION/BMAN/assemblies/abruijn/BMAN_Abruijn/bubbles_2.fasta', '/scratch2/software/Python-2.7.13/lib/python2.7/site-packages/abruijn/resource/nano_substitutions.mat', '/scratch2/software/Python-2.7.13/lib/python2.7/site-packages/abruijn/resource/nano_homopolymers.mat', '/scratch2/jon/MinION/BMAN/assemblies/abruijn/BMAN_Abruijn/consensus_2.fasta']' returned non-zero exit status -11
During assembly of a low-quality old (2012) dataset:
input: 4 FASTQ files available from the SRA:
SRR497965
SRR497966
SRR497967
SRR497968
end of log:
[2018-01-05 14:29:03] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-01-05 14:33:57] ERROR: Segmentation fault! Backtrace:
[2018-01-05 14:33:57] ERROR: flye-repeat(_Z15segfaultHandleri+0x36) [0x47efd6]
[2018-01-05 14:33:57] ERROR: /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f07f7fc64b0]
[2018-01-05 14:33:57] ERROR: flye-repeat(_Z3q75IiET_RSt6vectorIS0_SaIS0_EE+0x102) [0x434082]
[2018-01-05 14:33:57] ERROR: flye-repeat(_ZN19MultiplicityInferer16estimateCoverageEv+0x1375) [0x431355]
[2018-01-05 14:33:57] ERROR: flye-repeat(main+0xb91) [0x429f81]
[2018-01-05 14:33:57] ERROR: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f07f7fb1830]
[2018-01-05 14:33:57] ERROR: flye-repeat(_start+0x29) [0x42b059]
[2018-01-05 14:33:57] ERROR: Command '['flye-repeat', '-k', '15', '-l', '[..]/flye.log', '-t', '10', '-v', '5000', '-g', '[..]/fly$
_assembly/1-consensus/consensus.fasta', '../SRR497965.fastq,../SRR497966.fastq,../SRR497967.fastq,../SRR497968.fastq', '[..]/Flye/flye_assembly/2-repeat', '[..]/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1
cmdline: time bin/flye --threads 10 --pacbio-raw ../SRR*.fastq -g 4M -o flye_assembly
Hi,
This is an exciting tool - thanks for developing it. I am eager to get it up and running.
I installed and tried a simple test with 37X coverage of simulated lambda reads. All are 15000 bp long and have no errors. I cannot seem to get abruijn.py to run.
If I try:
abruijn.py reads.fasta out 37
I get:
[16:52:16] INFO: Running ABruijn
[16:52:16] INFO: Assembling reads
[16:52:16] INFO: Indexing kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[16:52:21] INFO: Indexing kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[16:52:23] WARNING: Unable to choose minimum kmer count cutoff. Check if the coverage parameter is correct. Running with default parameter t = 4
[16:52:23] INFO: Building read index
[16:52:23] INFO: Finding overlaps:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[16:52:41] INFO: Extending reads
[16:52:41] INFO: Assembled 0 contigs
[16:52:41] INFO: Generating contig sequences
[16:52:41] INFO: Running Blasr
ERROR, Fail to load FASTA file /gpfs/scratch/jurban/male-ilmn/lambda/abruijn/out/draft_assembly.fasta to virtual memory.
[16:52:41] ERROR: While running blasr: Command '['blasr', 'reads.fasta', '/gpfs/scratch/jurban/male-ilmn/lambda/abruijn/out/draft_assembly.fasta', '-bestn', '1', '-minMatch', '15', '-maxMatch', '25', '-m', '5', '-nproc', '1', '-out', '/gpfs/scratch/jurban/male-ilmn/lambda/abruijn/out/alignment.m5']' returned non-zero exit status 1
[16:52:41] ERROR: Error: Error in alignment module, exiting
Note that draft_assembly.fasta
is an empty file.
I saw the warning: WARNING: Unable to choose minimum kmer count cutoff. Check if the coverage parameter is correct. Running with default parameter t = 4
So I tried adding in a minimum cutoff instead of default auto, but when I add any arguments it gives another error:
$ abruijn.py reads.fasta out 37 -m 10
[16:49:22] INFO: Running ABruijn
[16:49:22] INFO: Assembling reads
Traceback (most recent call last):
File "/users/jurban/software/abruijn/ABruijn/abruijn.py", line 35, in <module>
sys.exit(main())
File "/gpfs_home/jurban/software/abruijn/ABruijn/abruijn/main.py", line 102, in main
run(args)
File "/gpfs_home/jurban/software/abruijn/ABruijn/abruijn/main.py", line 41, in run
args.max_cov, args.coverage, args.debug, log_file)
File "/gpfs_home/jurban/software/abruijn/ABruijn/abruijn/assemble.py", line 48, in assemble
subprocess.check_call(cmdline)
File "/gpfs/runtime/opt/python/2.7.3/lib/python2.7/subprocess.py", line 506, in check_call
retcode = call(*popenargs, **kwargs)
File "/gpfs/runtime/opt/python/2.7.3/lib/python2.7/subprocess.py", line 493, in call
return Popen(*popenargs, **kwargs).wait()
File "/gpfs/runtime/opt/python/2.7.3/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/gpfs/runtime/opt/python/2.7.3/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
TypeError: execv() arg 2 must contain only strings
Any advice to help me get up and running would be appreciated.
best,
John
Hi,
I tried using ABruijn with Oxford Nanopore reads but I get a segmentation fault error during the chimera detection phase.
Here is the command line I used :
python abruijn.py $(pwd)/BAM_10X.fasta BAM_10X
Moreover here is the log :
Running ABruijn
Assembling reads
[10:50:02] Reading FASTA
[10:50:05] Building kmer index
[10:50:05] First pass:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[10:55:39] Second pass:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[10:56:33] Trimming index
[10:56:36] Building read index
[10:56:37] Finding overlaps
[10:56:46] Detecting chimeric sequences
Error: Error in assemble binary:
Command '['abruijn-assemble', '/env/cns/bigtmp1/ONT/ABruijn/BAM_10X.fasta', '/env/export/nfs6/bigtmp1/ONT/ABruijn/BAM_10X/read_edges.fasta']' returned non-zero exit status -11
If I run the faulty command manually, I get a segmentation fault.
abruijn-assemble /env/cns/bigtmp1/ONT/ABruijn/BAM_10X.fasta /env/export/nfs6/bigtmp1/ONT/ABruijn/BAM_10X/read_edges.fasta
Moreover, the output directory is empty.
Thanks for your help,
Benjamin
Hi, I got this error messages when using version 2.3.2 and version 2.3.3.
The genome is about 2G, and default parameters were used.
version 2.3.2
[2018-04-18 18:38:40] INFO: Running Flye 2.3.2-release
[2018-04-18 18:38:40] INFO: Assembling reads
[2018-04-18 18:38:40] INFO: Reading sequences
[2018-04-18 21:39:09] INFO: Generating solid k-mer index
[2018-04-18 21:39:35] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-18 22:28:15] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-19 00:03:13] INFO: Filling index table
[2018-04-19 02:32:35] ERROR: Caught unhandled exception: std::bad_alloc
[2018-04-19 02:32:35] ERROR: flye-assemble(_Z16exceptionHandlerv+0xd0) [0x42f4a0]
[2018-04-19 02:32:35] ERROR: /home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e0e6) [0x2b08dbac10e6]
[2018-04-19 02:32:35] ERROR: /home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e131) [0x2b08dbac1131]
[2018-04-19 02:32:35] ERROR: /home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e349) [0x2b08dbac1349]
[2018-04-19 02:32:35] ERROR: /home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e869) [0x2b08dbac1869]
[2018-04-19 02:32:35] ERROR: /home/software/gcc-4.9.3/lib64/libstdc++.so.6(_Znam+0x9) [0x2b08dbac18c9]
[2018-04-19 02:32:35] ERROR: flye-assemble(_ZN11VertexIndex10buildIndexEii+0x9c6) [0x44f3d6]
[2018-04-19 02:32:35] ERROR: flye-assemble(main+0xaf8) [0x434378]
[2018-04-19 02:32:35] ERROR: /lib64/libc.so.6(__libc_start_main+0xfd) [0x3fbbe1ed5d]
[2018-04-19 02:32:35] ERROR: flye-assemble() [0x41d275]
[2018-04-19 02:32:57] ERROR: Command '['flye-assemble', '-l', '/home/zhangll/Tasks/Gouqi/Third_assembl/Flye/flye.log', '-t', '16', '-v', '5000', '/home/zhangll/Tasks/Gouqi/data/Pacbio/all.fasta', '/home/zhangll/Tasks/Gouqi/Third_assembl/Flye/0-assembly/draft_assembly.fasta', '2202009600', '/home/niuyw/software/Flye/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1
version 2.3.3
[2018-04-28 05:04:03] INFO: Running Flye 2.3.3-g47cdd0b
[2018-04-28 05:04:03] INFO: Assembling reads
[2018-04-28 05:04:03] INFO: Running with k-mer size: 17
[2018-04-28 05:04:03] INFO: Reading sequences
[2018-04-28 05:59:54] ERROR: parse error in /parastor300/niuyw/Project/Goqi_genome_180207/data/Pacbio/all.fq.gz on line 37943506: Fastq fromat error
[2018-04-28 05:59:58] ERROR: Command '['flye-assemble', '-l', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1/flye.log', '-t', '30', '/parastor300/niuyw/Project/Goqi_genome_180207/data/Pacbio/all.fq.gz', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1/0-assembly/draft_assembly.fasta', '2147483648', '/home/niuyw/software/Flye-2.3.3/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1
Finish time is 2018/04/28--05:59
niuyw@admin:/parastor300/niuyw/Project/Goqi_genome_180207/flye/run2$ cat ../flye.g.run1.e55155
Start time is 2018/04/28--15:49
[2018-04-28 15:49:49] INFO: Running Flye 2.3.3-g47cdd0b
[2018-04-28 15:49:49] INFO: Assembling reads
[2018-04-28 15:49:49] INFO: Running with k-mer size: 17
[2018-04-28 15:49:49] INFO: Reading sequences
[2018-04-28 18:05:07] INFO: Reads N50/90: 16659 / 5780
[2018-04-28 18:05:23] INFO: Selected minimum overlap 5000
[2018-04-28 18:05:35] INFO: Expected read coverage: 102
[2018-04-28 18:05:35] INFO: Generating solid k-mer index
[2018-04-28 18:08:19] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-28 18:33:35] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-05-01 21:49:37] INFO: Filling index table
[2018-05-04 12:43:47] ERROR: Caught unhandled exception: std::bad_alloc
[2018-05-04 12:43:47] ERROR: flye-assemble(_Z16exceptionHandlerv+0xd0) [0x431590]
[2018-05-04 12:43:47] ERROR: /home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e0e6) [0x2b52528950e6]
[2018-05-04 12:43:47] ERROR: /home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e131) [0x2b5252895131]
[2018-05-04 12:43:47] ERROR: /home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e349) [0x2b5252895349]
[2018-05-04 12:43:47] ERROR: /home/software/gcc-4.9.3/lib64/libstdc++.so.6(+0x5e869) [0x2b5252895869]
[2018-05-04 12:43:47] ERROR: /home/software/gcc-4.9.3/lib64/libstdc++.so.6(_Znam+0x9) [0x2b52528958c9]
[2018-05-04 12:43:47] ERROR: flye-assemble(_ZN11VertexIndex10buildIndexEii+0x9c6) [0x452056]
[2018-05-04 12:43:47] ERROR: flye-assemble(main+0xbe5) [0x436595]
[2018-05-04 12:43:47] ERROR: /lib64/libc.so.6(__libc_start_main+0xfd) [0x3fbbe1ed5d]
[2018-05-04 12:43:47] ERROR: flye-assemble() [0x41dbc5]
[2018-05-04 12:46:17] ERROR: Command '['flye-assemble', '-l', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1/flye.log', '-t', '40', '/parastor300/niuyw/Project/Goqi_genome_180207/data/Pacbio/all.fasta', '/parastor300/niuyw/Project/Goqi_genome_180207/flye/run1/0-assembly/draft_assembly.fasta', '2147483648', '/home/niuyw/software/Flye-2.3.3/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1
BTW, I also ran Flye 2.3.3
based on the corrected reads of Canu
, and it ran successfully. Here is the logs if it's useful.
[2018-04-28 05:15:35] INFO: Running Flye 2.3.3-g47cdd0b
[2018-04-28 05:15:35] INFO: Assembling reads
[2018-04-28 05:15:36] INFO: Running with k-mer size: 17
[2018-04-28 05:15:36] INFO: Reading sequences
[2018-04-28 05:49:20] INFO: Reads N50/90: 22994 / 18323
[2018-04-28 05:49:22] INFO: Selected minimum overlap 5000
[2018-04-28 05:49:24] INFO: Expected read coverage: 34
[2018-04-28 05:49:24] INFO: Generating solid k-mer index
[2018-04-28 05:49:47] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-28 05:55:09] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-28 08:55:36] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-28 17:32:09] INFO: Extending reads
[2018-04-28 18:19:00] INFO: Overlap-based coverage: 20
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-05-03 00:13:25] INFO: Assembled 6725 draft contigs
[2018-05-03 00:13:57] INFO: Generating contig sequences
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-05-03 01:03:32] INFO: Running Minimap2
[2018-05-03 10:15:46] INFO: Computing consensus
[2018-05-03 11:18:10] INFO: Alignment error rate: 0.0299390805236
[2018-05-03 11:18:34] INFO: Performing repeat analysis
[2018-05-03 11:18:35] INFO: Reading sequences
[2018-05-03 11:50:14] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-05-03 18:02:10] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-05-04 02:18:05] INFO: Aligned sequence: 137844577112 / 149133911062 (0.924301)
[2018-05-04 02:18:34] INFO: Mean edge coverage: 38
[2018-05-04 02:20:09] INFO: Resolving repeats
[2018-05-04 11:02:04] INFO: Generating contigs
[2018-05-04 12:05:35] INFO: Generated 17311 contigs
[2018-05-04 14:08:03] INFO: Polishing genome (1/1)
[2018-05-04 14:08:03] INFO: Running Minimap2
[2018-05-04 21:32:38] INFO: Separating alignment into bubbles
[2018-05-05 03:50:13] INFO: Alignment error rate: 0.0230640593152
[2018-05-05 03:50:14] INFO: Correcting bubbles
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-05-05 08:18:15] INFO: Assembly statistics:
Total length: 1886554189
Contigs: 13177
Scaffolds: 13049
Scaffolds N50: 315687
Largest scf: 2690265
Mean coverage: 34
[2018-05-05 08:18:15] INFO: Final assembly: /parastor300/niuyw/Project/Goqi_genome_180207/flye/run2/scaffolds.fasta
Do you know what could have cause it? Thanks in advance!
Bests,
Yiwei Niu
hi there, I got this error message. Any ideas about what could have cause it?
[2018-02-22 08:36:13] INFO: Running Flye 2.3.2-gd46edb7
[2018-02-22 08:36:13] INFO: Assembling reads
[2018-02-22 08:36:13] INFO: Reading sequences
[2018-02-22 08:44:22] INFO: Generating solid k-mer index
[2018-02-22 08:46:32] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-02-22 08:54:26] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-02-22 09:21:11] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-02-22 10:00:35] INFO: Extending reads
0% [2018-02-22 10:14:55] ERROR: Caught unhandled exception: Automatic expansion triggered when load factor was below minimum threshold
[2018-02-22 10:14:55] ERROR: flye-assemble(_Z16exceptionHandlerv+0x2d) [0x43c73d]
[2018-02-22 10:14:55] ERROR: /usr/lib64/libstdc++.so.6(+0x96706) [0x2aaaab277706]
[2018-02-22 10:14:55] ERROR: /usr/lib64/libstdc++.so.6(+0x96751) [0x2aaaab277751]
[2018-02-22 10:14:55] ERROR: /usr/lib64/libstdc++.so.6(+0xc1708) [0x2aaaab2a2708]
[2018-02-22 10:14:55] ERROR: /lib64/libpthread.so.0(+0x8744) [0x2aaaab789744]
[2018-02-22 10:14:55] ERROR: /lib64/libc.so.6(clone+0x6d) [0x2aaaaba87aad]
[2018-02-22 10:15:15] ERROR: Command '['flye-assemble', '-l', '/flush1/esc003/Flye_cynegetis_assembly/flye/flye.log', '-t', '20', '-v', '5000', '/flush2/esc003/Pacbio_subreads_smartbellremoved.fasta', '/flush1/esc003/Flye_cynegetis_assembly/flye/0-assembly/draft_assembly.fasta', '576716800', '/data/esc003/apps/Flye/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1
Hi, I recently attempted to do an assembly of a human genome with raw read coverage of 30X. I got an error in the BLASR step and looking back in the log file it was caused because 0 kmers were selected to build the index:
[2017-11-12 07:52:55] root: INFO: Running ABruijn
[2017-11-12 07:52:55] root: DEBUG: Estimated genome size: 3342172801
[2017-11-12 07:52:55] root: DEBUG: Chosen k-mer size: 17
[2017-11-12 07:52:55] root: INFO: Assembling reads
[2017-11-12 07:52:55] root: DEBUG: -----Begin assembly log------
[2017-11-12 07:52:55] DEBUG: Build date: Nov 8 2017 12:30:29
[2017-11-12 07:52:55] INFO: Reading sequences
[2017-11-12 08:10:16] DEBUG: Mean read length: 5658
[2017-11-12 08:10:16] INFO: Generating solid k-mer index
[2017-11-12 08:10:16] DEBUG: Hard threshold set to 2
[2017-11-12 08:10:16] DEBUG: Started kmer counting
[2017-11-12 08:10:24] INFO: Counting kmers (1/2):
[2017-11-12 11:48:22] INFO: Counting kmers (2/2):
[2017-11-12 12:08:29] DEBUG: Genome size estimate: -980663202
[2017-11-12 12:08:29] DEBUG: Filtered 10646768 repetitive kmers
[2017-11-12 12:08:29] DEBUG: Estimated minimum kmer coverage: 251, 0 unique kmers selected
[2017-11-12 12:08:29] INFO: Filling index table
[2017-11-12 12:10:12] DEBUG: Kmer index size: 0
[2017-11-12 12:22:17] INFO: Extending reads
[2017-11-12 13:54:03] INFO: Assembled 0 draft contigs
[2017-11-12 13:54:03] INFO: Generating contig sequences
[2017-11-12 13:54:03] DEBUG: Writing FASTA
-----------End assembly log------------
[2017-11-12 13:55:10] root: INFO: Running BLASR
[2017-11-12 13:55:11] root: ERROR: Command '['blasr', '/data/Bioinfo/bioinfo-proj-jmontenegro/DENOVO/Human/Data/Merged/simon.fastq', '/data/Bioinfo/bioinfo-proj-jmontenegro/DENOVO/Human/Results/Assembly/Abruijn/blasr_ref_0.fasta', '--bestn', '1', '--minMatch', '15', '--maxMatch', '20', '-m', '5', '--nproc', '128', '--out', '/data/Bioinfo/bioinfo-proj-jmontenegro/DENOVO/Human/Results/Assembly/Abruijn/blasr_0.m5', '--advanceHalf', '--advanceExactMatches', '10', '--fastSDP', '--aggressiveIntervalCut']' returned non-zero exit status 1
The original command was as follows:
abruijn -t 128 -i 5 -p pacbio -o 2000 /data/Bioinfo/bioinfo-proj-jmontenegro/DENOVO/Human/Data/Merged/simon.fastq /data/Bioinfo/bioinfo-proj-jmontenegro/DENOVO/Human/Results/Assembly/Abruijn 25
so I did not specify a kmer size and the program chose K=17 automatically, but could not find any unique kmers. Should I manually increase the K size to try and find unique kmers?
I look forward to hearing back from you.
Best regards,
Given an edge ID, we are trying to retrieve the reads for that edge. The documentation does not describe how to do this.
Grepping for the ID didn't turn up anything.
Could you please add this to the docs?
I am running ABruijn with nanopore reads with average length of 2,000bp. Since the minimum overlap length of ABruijn must be within the [3000, 10000] range, no overlap was found for my reads, and "blasr" failed. I modified the code to accept lower overlap, and then ABruijn worked fine. With this being said, I was wondering if there are a reason why ABruijn requires such a high overlap value ?
Hi,
I tried ton install ABruijn on my SLC6 distribution but I get an error.
Before launch the install commands, I died :
scl enable devtoolset-3 bash
export PATH=/cm/shared/apps/miniconda2/bin/:/cm/shared/apps/pitchfork/deployment/bin/:$PATH
export LD_LIBRARY_PATH=/cm/shared/apps/pitchfork/deployment/lib:/usr/lib/:$LD_LIBRARY_PATH
So, thanks to these paths, I have :
python --version
Python 2.7.12 :: Continuum Analytics, Inc.
cmake --version
cmake version 3.4.1
make --version
GNU Make 3.81
gcc --version
gcc (GCC) 4.9.1 20140922 (Red Hat 4.9.1-10)
blasr --version
blasr 5.3.
Then, I launch the first installation command :
python setup.py build
And I get :
running build
make release -C /cm/shared/apps/ABruijn/assemble
make[1]: Entering directory/cm/shared/apps/ABruijn/assemble' g++ -c -I/cm/shared/apps/ABruijn/libcuckoo -I/cm/shared/apps/ABruijn/include -Wall -pthread -std=c++11 -D_LOG -O3 -DNDEBUG chimera.cpp -o chimera.o In file included from overlap.h:11:0, from chimera.h:7, from chimera.cpp:10: /cm/shared/apps/ABruijn/libcuckoo/cuckoohash_map.hh: In instantiation of ‘cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::BucketContainer<N>::BucketContainer(const cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>*, Args&& ...) [with Args = {const long unsigned int&, const long unsigned int&}; long unsigned int N = 2ul; Key = Kmer; T = std::vector<VertexIndex::ReadPosition>*; Hash = DefaultHasher<Kmer>; Pred = std::equal_to<Kmer>; Alloc = std::allocator<std::pair<const Kmer, std::vector<VertexIndex::ReadPosition>*> >; long unsigned int SLOT_PER_BUCKET = 4ul]’: /cm/shared/apps/ABruijn/libcuckoo/cuckoohash_map.hh:786:39: required from ‘cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::TwoBuckets cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::lock_two(size_t, size_t, size_t) const [with Key = Kmer; T = std::vector<VertexIndex::ReadPosition>*; Hash = DefaultHasher<Kmer>; Pred = std::equal_to<Kmer>; Alloc = std::allocator<std::pair<const Kmer, std::vector<VertexIndex::ReadPosition>*> >; long unsigned int SLOT_PER_BUCKET = 4ul; cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::TwoBuckets = cuckoohash_map<Kmer, std::vector<VertexIndex::ReadPosition>*>::BucketContainer<2ul>; size_t = long unsigned int]’ /cm/shared/apps/ABruijn/libcuckoo/cuckoohash_map.hh:830:43: required from ‘cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::TwoBuckets cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::snapshot_and_lock_two(size_t) const [with Key = Kmer; T = std::vector<VertexIndex::ReadPosition>*; Hash = DefaultHasher<Kmer>; Pred = std::equal_to<Kmer>; Alloc = std::allocator<std::pair<const Kmer, std::vector<VertexIndex::ReadPosition>*> >; long unsigned int SLOT_PER_BUCKET = 4ul; cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::TwoBuckets = cuckoohash_map<Kmer, std::vector<VertexIndex::ReadPosition>*>::BucketContainer<2ul>; size_t = long unsigned int]’ /cm/shared/apps/ABruijn/libcuckoo/cuckoohash_map.hh:500:42: required from ‘bool cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::contains(const key_type&) const [with Key = Kmer; T = std::vector<VertexIndex::ReadPosition>*; Hash = DefaultHasher<Kmer>; Pred = std::equal_to<Kmer>; Alloc = std::allocator<std::pair<const Kmer, std::vector<VertexIndex::ReadPosition>*> >; long unsigned int SLOT_PER_BUCKET = 4ul; cuckoohash_map<Key, T, Hash, Pred, Alloc, SLOT_PER_BUCKET>::key_type = Kmer]’ vertex_index.h:57:35: required from here /cm/shared/apps/ABruijn/libcuckoo/cuckoohash_map.hh:673:37: internal compiler error: in process_init_constructor_array, at cp/typeck2.c:1224 : map(_map), i{{inds...}} {} ^ Please submit a full bug report, with preprocessed source if appropriate. See <http://bugzilla.redhat.com/bugzilla> for instructions. Preprocessed source stored into /tmp/ccI84QyY.out file, please attach this to your bugreport. make[1]: *** [chimera.o] Error 1 make[1]: Leaving directory
/cm/shared/apps/ABruijn/assemble'
make: *** [all] Error 2
Compilation error: Command '['make']' returned non-zero exit status 2
Do you know what this error is due to?
Thank you in advance for your help.
Best,
Amandine
I have been trying to run ABruijn but I get an error related to BLASR.
(myenv) stelo@H4:~/ABruijn$ ./abruijn.py reads.fa out_dir 50
[17:57:56] INFO: Running ABruijn
[17:57:56] INFO: Assembling reads
[17:58:13] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:58:27] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[17:59:04] INFO: Building kmer index
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[18:00:09] INFO: Finding overlaps:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[18:04:25] INFO: Extending reads
[18:04:26] INFO: Assembled 1 contigs
[18:04:26] INFO: Generating contig sequences
[18:07:59] INFO: Polishing genome (1/2)
[18:07:59] INFO: Running BLASR
Options for blasr
Basic usage: 'blasr reads.{bam|fasta|bax.h5|fofn} genome.fasta [-options]
option Description (default_value).
[... LISTS OF ALL BLASR PARAMETERS ...]
In release v5.1 of BLASR, command-line options will use the
single dash/double dash convention:
Character options are preceded by a single dash. (Example: -v)
Word options are preceded by a double dash. (Example: --verbose)
Please modify your scripts accordingly when BLASR v5.1 is released.
To cite BLASR, please use: Chaisson M.J., and Tesler G., Mapping
single molecule sequencing reads using Basic Local Alignment with
Successive Refinement (BLASR): Theory and Application, BMC
Bioinformatics 2012, 13:238.
Please report any bugs to 'https://github.com/PacificBiosciences/blasr/issues'.
ERROR: -bestn is not a valid option.
[18:07:59] ERROR: While running blasr: Command '['blasr', 'reads.fa', '/24-2/home/stelo/ABruijn/out_dir/blasr_ref_1.fasta', '-bestn', '1', '-minMatch', '15', '-maxMatch', '25', '-m', '5', '-nproc', '1', '-out', '/24-2/home/stelo/ABruijn/out_dir/blasr_1.m5']' returned non-zero exit status 1
[18:07:59] ERROR: Error: Error in alignment module, exiting
It seems that options now need the double dash.
My version of BLASR is
(myenv) stelo@H4:~/ABruijn$ blasr --version
blasr 5.2.def62de
Hi there,
I tried to use Flye 2.3.3-release to assemble human chromosome 6, but I encountered a IOError (log showed below).
Flye/2.3.3/bin/flye --pacbio-raw chr6.read.fq --out-dir assembly_result --genome-size 171m --threads 16
…
[2018-04-04 19:47:55] INFO: Generating contigs
[2018-04-04 19:48:04] INFO: Generated 144 contigs
[2018-04-04 19:48:22] INFO: Polishing genome (1/1)
[2018-04-04 19:48:22] INFO: Running Minimap2
[2018-04-04 19:56:45] INFO: Separating alignment into bubbles
Traceback (most recent call last):
File "/short/te53/software/Flye/2.3.3/bin/flye", line 31, in
sys.exit(main())
File "/short/te53/software/Flye/2.3.3/lib/python2.7/site-packages/flye/main.py", line 511, in main
_run(args)
File "/short/te53/software/Flye/2.3.3/lib/python2.7/site-packages/flye/main.py", line 348, in _run
jobs[i].run()
File "/short/te53/software/Flye/2.3.3/lib/python2.7/site-packages/flye/main.py", line 227, in run
config.vals["min_aln_rate"], bubbles_file)
File "/short/te53/software/Flye/2.3.3/lib/python2.7/site-packages/flye/bubbles.py", line 109, in make_bubbles
raise error_queue.get()
IOError: bad message length
Any advice to help me get up and running would be appreciated.
Hi,
would you be so nice to explain Multiplicity and Repetitive values? Just looking into the assembly_info.txt
and from the description here and I have to admit I am bit confused.
thx
Hi,
Flye was nearly killing one of our nodes through RAM and SWAP uptaking. It was in the first phase and from observation only the first folder structure and the logs where produced. Is it possible to dump some files during this stage? I assume that if no further files are generated, it will start from scratch during a crash?
Hi,
I've attempted to assemble a genome using PacBio Sequel reads, and encountered an error on my first run and when I attempt to --resume the run. I have been running these as jobs on a PBS system on SUSE. I don't think it is a memory error since the job would be "killed" if I tried to use more than the amount I allotted.
I am using github commit 9c3f166 (v 2.1b) to assemble this.
For further information, I'm assembling a genome from a eukaryotic organism that does not have closely related species (< 100mya) genomes previously sequenced, so I don't have a strong idea of the exact genome size. I have used kmer-based genome estimates on corrected reads and assembled this genome with about 6 assemblers, so the consensus seems to be a genome of roughly 295-360MB in size (kmer estimates provide the lower range, many assemblers including abruijn's polished assembly provide the upper range). Using the lower range of that estimate, I have roughly 115x coverage including all the reads in my subreads. The stats of my raw reads are below (just in case you need this information to track down why this is occurring).
Number of contigs: 1247879
Shortest contig: 1001
Longest contig: 76985
N50: 13316
Median: 12214
Mean: 12937.225609213714
Below are the stderr from the first run and the --resume run
[2017-09-30 09:32:52] INFO: Running ABruijn
[2017-09-30 09:32:52] INFO: Assembling reads
[2017-09-30 09:32:52] INFO: Reading FASTA
[2017-09-30 09:42:22] INFO: Generating solid k-mer index
[2017-09-30 09:42:26] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-09-30 10:51:57] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-09-30 11:08:20] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-09-30 11:55:37] INFO: Extending reads
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-09-30 19:49:12] INFO: Assembled 3244 draft contigs
[2017-09-30 19:49:12] INFO: Generating contig sequences
[2017-09-30 20:25:11] INFO: Running BLASR
[2017-10-01 12:09:43] INFO: Computing rough consensus
[2017-10-01 12:45:26] INFO: Performing repeat analysis
[2017-10-01 12:45:27] INFO: Reading FASTA
[2017-10-01 12:55:09] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-10-01 21:06:40] INFO: Simplifying the graph
[2017-10-01 21:06:40] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
terminate called without an active exception
[2017-10-02 04:36:49] ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status -6
[2017-10-02 22:36:10] INFO: Running ABruijn
[2017-10-02 22:36:10] INFO: Resuming previous run
[2017-10-02 22:36:10] INFO: Performing repeat analysis
[2017-10-02 22:36:10] INFO: Reading FASTA
[2017-10-02 22:45:31] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-10-03 04:02:13] INFO: Simplifying the graph
[2017-10-03 04:02:13] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-10-03 09:33:50] ERROR: Resource temporarily unavailable
[2017-10-03 09:33:50] ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1
This is an excerpt from the log file (it's quite large), I've tried to just get the relevant portions and have used ellipses to abbreviate repetitive sections. If you want to see the whole log file I can do that.
[2017-09-30 09:32:52] root: INFO: Running ABruijn
[2017-09-30 09:32:52] root: DEBUG: Estimated genome size: 303080041
[2017-09-30 09:32:52] root: DEBUG: Chosen k-mer size: 17
[2017-09-30 09:32:52] root: INFO: Assembling reads
[2017-09-30 09:32:52] root: DEBUG: -----Begin assembly log------
[2017-09-30 09:32:52] DEBUG: Build date: Sep 28 2017 21:15:18
[2017-09-30 09:32:52] INFO: Reading FASTA
[2017-09-30 09:42:22] DEBUG: Mean read length: 7481
[2017-09-30 09:42:22] INFO: Generating solid k-mer index
[2017-09-30 09:42:22] DEBUG: Hard threshold set to 11
[2017-09-30 09:42:22] DEBUG: Started kmer counting
[2017-09-30 09:42:26] INFO: Counting kmers (1/2):
[2017-09-30 10:51:57] INFO: Counting kmers (2/2):
[2017-09-30 11:08:20] DEBUG: Genome size estimate: 291937348
[2017-09-30 11:08:20] DEBUG: Filtered 472253 repetitive kmers
[2017-09-30 11:08:20] DEBUG: Estimated minimum kmer coverage: 18, 279329292 unique kmers selected
[2017-09-30 11:08:20] INFO: Filling index table
[2017-09-30 11:08:39] DEBUG: Kmer index size: 11425325554
[2017-09-30 11:55:37] INFO: Extending reads
[2017-09-30 11:56:05] DEBUG: Mean read coverage: 75
[2017-09-30 11:56:17] DEBUG: Assembled contig
With 31 reads
Start read: -m54105_170625_161744/56558013/0_15235
At position: 15
leftTip: 0 rightTip: 0
Suspicios: 0
Mean overlaps: 118
Inner reads: 0
[2017-09-30 11:56:17] DEBUG: Inner: 2120 covered: 3752 total: 9115006
[2017-09-30 11:57:03] DEBUG: Assembled contig
With 97 reads
Start read: -m54105_170623_233908/28902028/43_14990
At position: 34
leftTip: 0 rightTip: 0
Suspicios: 1
Mean overlaps: 86
Inner reads: 0
[2017-09-30 11:57:03] DEBUG: Inner: 7162 covered: 12394 total: 9115006
[2017-09-30 11:57:25] DEBUG: Assembled contig
…
…
[2017-09-30 19:48:42] DEBUG: Inner: 2628858 covered: 3768626 total: 9115006
[2017-09-30 19:49:12] INFO: Assembled 3244 draft contigs
[2017-09-30 19:49:12] INFO: Generating contig sequences
-----------End assembly log------------
[2017-09-30 20:25:11] root: INFO: Running BLASR
[2017-09-30 20:25:11] root: DEBUG: Reading contigs file
[2017-10-01 12:06:06] root: DEBUG: Sorting alignment file
[2017-10-01 12:09:43] root: INFO: Computing rough consensus
[2017-10-01 12:09:43] root: DEBUG: Reading contigs file
[2017-10-01 12:45:26] root: INFO: Performing repeat analysis
[2017-10-01 12:45:26] root: DEBUG: -----Begin repeat analyser log------
[2017-10-01 12:45:27] DEBUG: Build date: Sep 28 2017 21:14:44
[2017-10-01 12:45:27] INFO: Reading FASTA
[2017-10-01 12:55:09] INFO: Building repeat graph
[2017-10-01 12:55:09] DEBUG: Hard threshold set to 1
[2017-10-01 12:55:09] DEBUG: Started kmer counting
[2017-10-01 12:57:10] DEBUG: Kmer index size: 344619478
[2017-10-01 20:48:04] DEBUG: Computing gluepoints
[2017-10-01 20:48:25] DEBUG: Initializing edges
[2017-10-01 21:06:24] DEBUG: * 5152 =+contig_430 0 455 455
…
…
[2017-10-01 21:06:40] INFO: Simplifying the graph
[2017-10-01 21:06:40] DEBUG: 12800 tips removed
[2017-10-01 21:06:40] DEBUG: Removed 1140 fake loops
[2017-10-01 21:06:40] DEBUG: Unrolled 447, removed 576
[2017-10-01 21:06:40] DEBUG: Removed 6345 edges
[2017-10-01 21:06:40] DEBUG: Added 2149 edges
[2017-10-01 21:06:40] DEBUG: Unrolled 20, removed 38
[2017-10-01 21:06:40] DEBUG: Removed 531 edges
[2017-10-01 21:06:40] DEBUG: Added 518 edges
[2017-10-01 21:06:40] DEBUG: Removed 1 chimeric junctions
[2017-10-01 21:06:40] INFO: Aligning reads to the graph
[2017-10-01 21:06:41] DEBUG: Hard threshold set to 1
[2017-10-01 21:06:41] DEBUG: Started kmer counting
[2017-10-01 21:08:21] DEBUG: Kmer index size: 323587749
[2017-10-02 04:22:24] DEBUG: Aligned 6749792 / 9115006
[2017-10-02 04:23:42] DEBUG: Mean edge coverage: 99
[2017-10-02 04:23:42] DEBUG: * 21618 20897 0 1 105 1.06061
…
…
[2017-10-02 04:23:43] DEBUG: Unique coverage threshold 105
[2017-10-02 04:23:45] DEBUG: Outputs: -14731 0
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG: Outputs: -15258 1
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG: Outputs: -18150 0
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG: Outputs: 20552 3
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG: Outputs: -13609 0
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG: Outputs: 18386 0
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG: Outputs: -16283 0
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG: Outputs: -19265 1
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG: Outputs: 15722 0
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG: Outputs: -7893 1
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG:
…
…
[2017-10-02 04:23:45] DEBUG:
[2017-10-02 04:23:45] DEBUG: R 21617 1 -> 2 (2,1) 142010 99
[2017-10-02 04:23:45] DEBUG: R 21615 1 -> 2 (2,0) 36996 100
[2017-10-02 04:23:45] DEBUG: R 21613 1 -> 2 (2,1) 18727 95
[2017-10-02 04:23:45] DEBUG: R 21605 1 -> 2 (1,2) 26980 81
[2017-10-02 04:23:45] DEBUG: R 21601 1 -> 2 (1,2) 40593 106
[2017-10-02 04:23:45] DEBUG: R 21600 1 -> 2 (2,1) 23606 113
[2017-10-02 04:23:45] DEBUG: R 21599 1 -> 2 (0,2) 132657 111
[2017-10-02 04:23:45] DEBUG: R 21595 1 -> 2 (2,1) 103909 87
[2017-10-02 04:23:45] DEBUG: R 21586 1 -> 2 (0,2) 10054 109
[2017-10-02 04:23:45] DEBUG: R 21584 1 -> 2 (2,0) 3451 89
…
…
[2017-10-02 04:36:49] root: ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status -6
[2017-10-02 22:36:10] root: INFO: Running ABruijn
[2017-10-02 22:36:10] root: DEBUG: Estimated genome size: 303080041
[2017-10-02 22:36:10] root: DEBUG: Chosen k-mer size: 17
[2017-10-02 22:36:10] root: INFO: Resuming previous run
[2017-10-02 22:36:10] root: INFO: Performing repeat analysis
[2017-10-02 22:36:10] root: DEBUG: -----Begin repeat analyser log------
[2017-10-02 22:36:10] DEBUG: Build date: Sep 28 2017 21:14:44
[2017-10-02 22:36:10] INFO: Reading FASTA
[2017-10-02 22:45:31] INFO: Building repeat graph
[2017-10-02 22:45:31] DEBUG: Hard threshold set to 1
[2017-10-02 22:45:31] DEBUG: Started kmer counting
[2017-10-02 22:46:38] DEBUG: Kmer index size: 344619478
[2017-10-03 03:43:14] DEBUG: Computing gluepoints
[2017-10-03 03:43:35] DEBUG: Initializing edges
[2017-10-03 04:01:59] DEBUG: * 14224 =+contig_472 0 9 9
…
…
[2017-10-03 04:02:13] INFO: Simplifying the graph
[2017-10-03 04:02:13] DEBUG: 12798 tips removed
[2017-10-03 04:02:13] DEBUG: Removed 1160 fake loops
[2017-10-03 04:02:13] DEBUG: Unrolled 441, removed 572
[2017-10-03 04:02:13] DEBUG: Removed 6351 edges
[2017-10-03 04:02:13] DEBUG: Added 2147 edges
[2017-10-03 04:02:13] DEBUG: Unrolled 21, removed 42
[2017-10-03 04:02:13] DEBUG: Removed 532 edges
[2017-10-03 04:02:13] DEBUG: Added 518 edges
[2017-10-03 04:02:13] DEBUG: Removed 1 chimeric junctions
[2017-10-03 04:02:13] INFO: Aligning reads to the graph
[2017-10-03 04:02:14] DEBUG: Hard threshold set to 1
[2017-10-03 04:02:14] DEBUG: Started kmer counting
[2017-10-03 04:03:39] DEBUG: Kmer index size: 324344284
[2017-10-03 09:32:13] DEBUG: Aligned 6748592 / 9115006
[2017-10-03 09:33:15] DEBUG: Mean edge coverage: 99
[2017-10-03 09:33:15] DEBUG: * -21645 20897 0 1 101 1.0202
…
…
[2017-10-03 09:33:15] DEBUG: Unique coverage threshold 105
[2017-10-03 09:33:17] DEBUG: Outputs: 4330 0
[2017-10-03 09:33:17] DEBUG: 13662 2 0
[2017-10-03 09:33:17] DEBUG:
[2017-10-03 09:33:17] DEBUG: Outputs: 21367 2
[2017-10-03 09:33:17] DEBUG: 21368 26 0
[2017-10-03 09:33:17] DEBUG:
[2017-10-03 09:33:17] DEBUG: Outputs: 1047 0
[2017-10-03 09:33:17] DEBUG: 20778 1 0
[2017-10-03 09:33:17] DEBUG:
[2017-10-03 09:33:17] DEBUG: Outputs: 7396 1
[2017-10-03 09:33:17] DEBUG: 7397 2 1
[2017-10-03 09:33:17] DEBUG: 7405 32 1
[2017-10-03 09:33:17] DEBUG:
[2017-10-03 09:33:17] DEBUG: Outputs: 8526 0
…
…
[2017-10-03 09:33:18] DEBUG:
[2017-10-03 09:33:18] DEBUG: R 21644 1 -> 2 (2,1) 142010 99
[2017-10-03 09:33:18] DEBUG: R 21641 1 -> 2 (1,2) 68125 106
[2017-10-03 09:33:18] DEBUG: R 21640 1 -> 2 (2,1) 22106 112
[2017-10-03 09:33:18] DEBUG: R 21632 1 -> 2 (1,2) 143630 90
[2017-10-03 09:33:18] DEBUG: R 21630 1 -> 2 (2,0) 3169 108
[2017-10-03 09:33:18] DEBUG: R 21629 1 -> 2 (1,2) 12733 79
[2017-10-03 09:33:18] DEBUG: R 21625 1 -> 2 (0,2) 7580 105
[2017-10-03 09:33:18] DEBUG: R 21624 1 -> 2 (2,0) 116506 92
…
…
[2017-10-03 09:33:18] DEBUG: R 16067 1 -> 2 (1,2) 10058 104
[2017-10-03 09:33:18] DEBUG: R 16134 1 -> 2 (1,2) 5990 107
[2017-10-03 09:33:50] ERROR: Resource temporarily unavailable
-----------End assembly log------------
[2017-10-03 09:33:50] root: ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1
If you could give me a hand to find out what is causing this issue that would be really appreciated.
Thanks,
Zac.
Hi
Just reading in and I was wondering about this question.
Does Flye expect all open threads to be on the same node? The question is going if I can Flye threads to be spread around multiple nodes sharing a common filesystem?
kind regards
Hi, Mikhail. I've read your preprint and found the correspondence of repeat graph construction problem and the assembly problem intriguing! I was left confused about one particular aspect: when assembling a linear genome ARBRC
with a two-copy repeat R
, my understanding of the algorithm goes like this:
UnprocessedReads
, selecting a read from A
ARC
UnprocessedReads
UnprocessedReads
now contains only reads from B
UnprocessedReads
, selecting a read from B
B
, since UnprocessedReads
contains no reads from ARC
UnprocessedReads
UnprocessedReads
now contains no reads. Stop assembling contigs, and identify repeats.The assembled contigs are ARC
and B
. How is Flye able to identify R
as a repeat? Thanks for the clarification!
Note: In Acknowledgments, Bahar Beshaz
should be Bahar Behsaz
. We worked together at the BC Cancer Genome Sciences Centre in Vancouver!
Hi there,
I tried to use flye-assemble to extend the contigs in flye-input-contigs.fa, but I encountered a "floating point exception" (log showed below). Do you think it is a problem of my contig sequences or a bug in flye-assemble?
flye-assemble -l /oak/stanford/groups/arend/Eric/meta/readclouds-l-gasseri-example/results/olc/flye-asm-1/flye.log -t 4 -s -v 1000 ./results/olc/flye-input-contigs.fa /oak/stanford/groups/arend/Eric/meta/readclouds-l-gasseri-example/results/olc/flye-asm-1/0-assembly/draft_assembly.fasta 1857551 /scratch/users/zhanglu2/software/Flye-2.3.3/flye/resource/asm_subasm.cfg
[2018-04-01 19:31:29] INFO: Running with k-mer size: 31
[2018-04-01 19:31:29] INFO: Reading sequences
[2018-04-01 19:31:30] INFO: Reads N50/90: 64830 / 21423
[2018-04-01 19:31:30] INFO: Selected minimum overlap 1000
[2018-04-01 19:31:30] INFO: Expected read coverage: 7
[2018-04-01 19:31:30] INFO: Generating solid k-mer index
[2018-04-01 19:31:47] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-01 19:31:48] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-01 19:31:49] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2018-04-01 19:31:50] INFO: Extending reads
Floating point exception
Hi, I've tried Flye on inter-species hybrid genome (~15% divergence between parentals) and it worked well with corrected MinION reads, however when ran on uncorrected reads, Flye reported nearly 2x smaller assembly. Is it possible to alter options related to haplotypes separation for uncorrected reads?
I encountered an error today which I have trouble to understand well:
../../Flye/bin/flye --pacbio-raw m11111_111111_111111.subreads.extract.fasta m11111_111111_111112.subreads.extract.fasta m11111_111111_111113.subreads.extract.fasta --genome-size 200000 --threads 30 -o test
[2018-01-10 16:59:22] INFO: Running Flye 2.3-4-g77de267
[2018-01-10 16:59:22] INFO: Assembling reads
[2018-01-10 16:59:23] INFO: Reading sequences
[2018-01-10 17:01:13] INFO: Generating solid k-mer index
[2018-01-10 17:01:13] ERROR: Caught unhandled exception: Wrong hard threshold value: 817
[2018-01-10 17:01:13] ERROR: flye-assemble(_Z16exceptionHandlerv+0x9f) [0x42c38f]
[2018-01-10 17:01:13] ERROR: /software/lib64/libstdc++.so.6(+0x8f136) [0x7fedece04136]
[2018-01-10 17:01:13] ERROR: /software/lib64/libstdc++.so.6(+0x8f181) [0x7fedece04181]
[2018-01-10 17:01:13] ERROR: /software/lib64/libstdc++.so.6(+0x8f399) [0x7fedece04399]
[2018-01-10 17:01:13] ERROR: flye-assemble(_ZN11VertexIndex10countKmersEm+0x986) [0x439c96]
[2018-01-10 17:01:13] ERROR: flye-assemble(main+0x89b) [0x41429b]
[2018-01-10 17:01:13] ERROR: /software/lib64/libc.so.6(__libc_start_main+0xf1) [0x7fedec291181]
[2018-01-10 17:01:13] ERROR: flye-assemble(_start+0x29) [0x415659]
[2018-01-10 17:01:13] ERROR: Command '['flye-assemble', '-k', '15', '-l', '/scratch/beegfs/monthly/eschmid/Giannuzzi_Assembly/FLYE_attempt/test/flye.log', '-t', '30', '-v', '5000', 'm11111_111111_111111.subreads.extract.fasta,m11111_111111_111112.subreads.extract.fasta,m11111_111111_111113.subreads.extract.fasta', 'FLYE_attempt/test/0-assembly/draft_assembly.fasta', '200000', '/Flye/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1
Any idea what could cause this threshold value error ?
Hi developer,
Date is not shown in the log. In logging.Formatter(), datefmt
is set to "%H:%M:%S"
, so only hour:min:sec is shown:
[11:04:10] INFO: Running ABruijn
[11:04:10] INFO: Assembling reads
[11:04:17] INFO: Counting kmers (1/2):
...
It would be great to include the date (like "%Y-%m-%d %H:%M:%S"
), especially for benchmarking long runs.
[2017-05-26 11:04:10] INFO: Running ABruijn
[2017-05-26 11:04:10] INFO: Assembling reads
[2017-05-26 11:04:17] INFO: Counting kmers (1/2):
...
I am trying to assemble a PacBio dataset containing 52Gb of data (my genome is 620Mb). ABruijn has been stuck for two days at 0% below:
(myenv) stelo@H4:~/ABruijn$ ./abruijn.py -t 40 reads.fa out_dir 70
[09:10:42] INFO: Running ABruijn
[09:10:42] INFO: Assembling reads
[09:34:00] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[12:41:12] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[15:02:27] WARNING: Unable to choose minimum kmer count cutoff. Check if the coverage parameter is correct. Running with the default parameter t = 2
[15:02:27] INFO: Building kmer index
0%
It is currently using 483GB (resident) of the 512GB RAM I have on my server, but also quite a bit of swap (virtual 719GB). I think ABruijn is not making much progress because of a lot of swapping. Is there any way I can reduce the amount of RAM needed? Maybe changing params? Or perhaps I can filter out reads that ABruijn would not use (i.e., shorter than a threshold)? Thanks.
I am getting the above mentioned error: "ERROR: No contigs were assembled - are you using corrected input instead of raw?"
even though I am running flye with the --pacbio-corr option. I am running it like this:
flye --pacbio-corr /path/to/Corrected/*.fasta --genome-size 200m -o /path/to/Pacbio/ -t 15
do I need to provide the uncorrected as well? I though I could run it also just with corrected reads.
kind regards
Hi all,
First of all thank for an amazing tool, I have been using Flye quite successfully for assembly mammalian genomes using relatively low coverage PacBio reads (~30X). However, recently I tried an assembly of a non-model species with an expected haploid genome size of 2.2 Gbp (flow cytometry) and using 35X coverage of PacBio reads with an average read legnth of 3.4 Kbp. I am using Flye 2.3 with the following command line:
flye --pacbio-raw ${reads} -g 2.5g -m 1000 -o ${OD} -t 64 -i 3
The raw assembly and consensus stages produced assemblies of ~2.1 Gbp, but after the repeat solving and polishing steps the final assembly is only 781 Mbp (roughly 1/3). The genome is expected to be quite repetitive (~50% of simple sequence repeats and transposable elements), but that still does not quite explain such big difference between the raw assembly and the final polished assembly.
Is this expected behaviour? Is there any parameter that can be tweaked to improve the final assembly?
I look forward to to hearing back from you, any suggestions would be more than welcome.
Kind regards,
Juan Montenegro
The output of Abruijn "polished_1.fasta" contains many short contigs(less than 1k). And the length of Pacbio reads larger than these short contigs. Why this happen?
May I know the minimum coverage requirement for ABruijn to assemble a genome of about 300Mb?
Hello,
I tested Flye on both pacbio and nanopore dataset from the budding yeast and in general I found Flye did a pretty good job with much shorter processing time than Canu (especially on the nanopore data). However, in my test with the nanopore data, Flye misjoined two different chromosomes, presumably based on their shared chromosome-end structure (e.g. telomere repeats). I checked the available parameters but didn't find much space to tweak. I was wondering if you have some specific recommendations. I can send my testing data for your check if this can help with the future development of Flye. Thanks in advance!
Best,
Jia-Xing
Hi all,
I would like to use flye to assemble a 7.5 megabase streptomyces genome. I have 1d oxford nanopore data for this organism.
I installed the tool as follows:
cd ~
git clone https://github.com/fenderglass/Flye
cd Flye
python setup.py build
python setup.py install --user
And then called it:
flye --nano-raw 1d.fastq --genome-size 7.5m --out-dir flye_out --threads 40
Unfortunately, the program encountered an error because it couldn't open a config file:
[2018-01-05 14:54:49] INFO: Running Flye 2.3-release
[2018-01-05 14:54:49] INFO: Assembling reads
[2018-01-05 14:54:49] ERROR: Caught unhandled exception: Can't open config file: /home/lina/.local/lib/python2.7/site-packages/flye/resource/asm_raw_reads.cfg
[2018-01-05 14:54:49] ERROR: flye-assemble(_Z16exceptionHandlerv+0xb4) [0x434f34]
[2018-01-05 14:54:49] ERROR: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d6b6) [0x7f77c0bc26b6]
[2018-01-05 14:54:49] ERROR: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d701) [0x7f77c0bc2701]
[2018-01-05 14:54:49] ERROR: /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d919) [0x7f77c0bc2919]
[2018-01-05 14:54:49] ERROR: flye-assemble(_ZN6Config4loadERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1b11) [0x439941]
[2018-01-05 14:54:49] ERROR: flye-assemble(main+0x2b6) [0x41cec6]
[2018-01-05 14:54:49] ERROR: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f77c004f830]
[2018-01-05 14:54:49] ERROR: flye-assemble(_start+0x29) [0x41e8d9]
[2018-01-05 14:54:49] ERROR: Command '['flye-assemble', '-k', '15', '-l', '/home/lina/flye_out/flye.log', '-t', '40', '-v', '5000', '/home/lina/1d.fastq', '/home/lina/flye_out/0-assembly/draft_assembly.fasta', '7864320', '/home/lina/.local/lib/python2.7/site-packages/flye/resource/asm_raw_reads.cfg']' returned non-zero exit status 1
I checked and this config file does not exist. Is it something that I can manually create? Or has it been installed somewhere but the path needs to be adjusted?
Thanks for any advice!
Dear Flye,
Hope this email finds you well.
While I was testing the program for a PacBio data (genome size 1.5Gb) in PBSpro environment, I have bumped into the same issue constantly with the “returned non-zero exit status -7".
FYI, please see below for the output file.
Looking forward to your reply!
Flye_026T_Output.txt
Regards,
Taek
with python3:
Flye$ python setup.py build
File "setup.py", line 16
print "Compilation error: ", e
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(print "Compilation error: ", e)?
Hello,
First, many thanks in giving access to this great tool. It works just fine on my pacbio sequences from a plant genome (500Mb). It's the only one abble to assemble the chloroplast and the mitochondrial in one contig each and give L50 metrics for less than 150 contigs.
I've run Abruinj whith the default parameters except for the Kmer size that i've raise to 17. The polishing part iterated twice and returned Alignment error rate: 0.208880757523 (Polishing genome (1/2)) then Alignment error rate: 0.165291041893 (Polishing genome (2/2)).
I suppose i will have to iterate more polishing steps to improve the quality of the assembly.
How can i launch only the Abruinj-polishing part of he tool ?
Thank you for your reply.
_(°-°)_/
Do you think that it would make a big difference for the assembly quality if one would determine the parameters contained in 'nano_homopolymers.mat' and 'nano_substitutions.mat' empirically for a specific ONP run. One could do a first "preliminary assembly" map the ONP data back, determine the parameters and then do a second assembly with refined tuning.
How did you generate the substitutions.mat?
I am trying to assemble bacterial genome from the minion sequecing data. Sequencing coverage is about 30X- When i try to run flye with default paramaters, It completed successfully but the length of the final assembly is very less. So i want to try changing the '-m' paramter (1000) and see if it improves my assembly. But i got "Segmentation fault" error during the 'flye-repeat' step. Below is the output of my log file.
[2018-01-19 11:29:43] DEBUG: Build date: Jan 16 2018 12:39:30
[2018-01-19 11:29:43] DEBUG: Parameters:
[2018-01-19 11:29:43] DEBUG: maximum_jump=1500
[2018-01-19 11:29:43] DEBUG: maximum_overhang=1500
[2018-01-19 11:29:43] DEBUG: hard_min_coverage_rate=10
[2018-01-19 11:29:43] DEBUG: repeat_coverage_rate=10
[2018-01-19 11:29:43] DEBUG: close_jump_rate=100
[2018-01-19 11:29:43] DEBUG: far_jump_rate=2
[2018-01-19 11:29:43] DEBUG: overlap_divergence_rate=5
[2018-01-19 11:29:43] DEBUG: penalty_window=100
[2018-01-19 11:29:43] DEBUG: max_coverage_drop_rate=5
[2018-01-19 11:29:43] DEBUG: chimera_window=100
[2018-01-19 11:29:43] DEBUG: min_reads_in_contig=4
[2018-01-19 11:29:43] DEBUG: max_inner_reads=10
[2018-01-19 11:29:43] DEBUG: max_inner_fraction=0.25
[2018-01-19 11:29:43] DEBUG: max_separation=500
[2018-01-19 11:29:43] DEBUG: tip_length_threshold=20000
[2018-01-19 11:29:43] DEBUG: unique_edge_length=50000
[2018-01-19 11:29:43] DEBUG: min_repeat_res_support=0.5
[2018-01-19 11:29:43] DEBUG: out_paths_ratio=5
[2018-01-19 11:29:43] DEBUG: graph_cov_drop_rate=10
[2018-01-19 11:29:43] DEBUG: coverage_estimate_window=100
[2018-01-19 11:29:43] DEBUG: low_cutoff_warning=1
[2018-01-19 11:29:43] DEBUG: assemble_kmer_sample=1
[2018-01-19 11:29:43] DEBUG: assemble_gap=500
[2018-01-19 11:29:43] DEBUG: repeat_graph_kmer_sample=5
[2018-01-19 11:29:43] DEBUG: repeat_graph_gap=100
[2018-01-19 11:29:43] DEBUG: repeat_graph_max_kmer=500
[2018-01-19 11:29:43] DEBUG: read_align_kmer_sample=1
[2018-01-19 11:29:43] DEBUG: read_align_gap=500
[2018-01-19 11:29:43] DEBUG: read_align_max_kmer=500
[2018-01-19 11:29:43] INFO: Reading sequences
[2018-01-19 11:29:44] INFO: Building repeat graph
[2018-01-19 11:29:44] DEBUG: Hard threshold set to 1
[2018-01-19 11:29:44] DEBUG: Started kmer counting
[2018-01-19 11:30:00] DEBUG: Solid kmers: 15643
[2018-01-19 11:30:00] DEBUG: Kmer index size: 36807
[2018-01-19 11:30:00] DEBUG: Total chunks 1 wasted space: 0
[2018-01-19 11:30:18] DEBUG: Found 148 overlaps
[2018-01-19 11:30:18] DEBUG: Left 18 overlaps after filtering
[2018-01-19 11:30:18] DEBUG: Building interval tree
[2018-01-19 11:30:18] DEBUG: Computing gluepoints
[2018-01-19 11:30:18] DEBUG: Created 28 gluepoints
[2018-01-19 11:30:18] DEBUG: Tandems removed: 0 left, 0 right, 0 both
`[2018-01-19 11:30:18] DEBUG: Initializing edges
[2018-01-19 11:30:18] DEBUG: * -2 +contig_1 171 30872 30701
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 30872 33181 2309
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 33181 33858 677
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 33858 43849 9991
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 43849 45000 1151
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 45000 46640 1640
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 46640 47575 935
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 47575 49824 2249
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 49824 50562 738
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 50562 52679 2117
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 52679 54164 1485
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 54164 55016 852
[2018-01-19 11:30:18] DEBUG: -1 +contig_1 55016 55689 673
[2018-01-19 11:30:18] DEBUG: Total edges: 2
[2018-01-19 11:30:18] DEBUG: Writing Dot
[2018-01-19 11:30:18] DEBUG: 0 tips clipped
[2018-01-19 11:30:18] DEBUG: Removed 0 edges
[2018-01-19 11:30:18] DEBUG: Added 0 edges
[2018-01-19 11:30:18] DEBUG: Removed 0 chimeric junctions
[2018-01-19 11:30:18] DEBUG: Collapsed 0 bulges
[2018-01-19 11:30:18] DEBUG: Removed 0 edges
[2018-01-19 11:30:18] DEBUG: Added 0 edges
[2018-01-19 11:30:18] DEBUG: 0 tips clipped
[2018-01-19 11:30:18] INFO: Aligning reads to the graph
[2018-01-19 11:30:18] DEBUG: Hard threshold set to 1
[2018-01-19 11:30:18] DEBUG: Started kmer counting
[2018-01-19 11:30:34] DEBUG: Solid kmers: 15424
[2018-01-19 11:30:34] DEBUG: Kmer index size: 36465
[2018-01-19 11:30:34] DEBUG: Total chunks 1 wasted space: 0
[2018-01-19 11:36:07] DEBUG: Aligned 73 / 1354
[2018-01-19 11:36:07] DEBUG: Aligned length 1209883 / 5984822 0.202159
[2018-01-19 11:36:07] DEBUG: Mean edge coverage: 1
[2018-01-19 11:36:07] DEBUG: -2 30701 13 13
[2018-01-19 11:36:07] DEBUG: 2 30701 13 13
DEBUG: -1 2068 15 15
[2018-01-19 11:36:07] DEBUG: 1 2068 15 15
[2018-01-19 11:36:07] ERROR: Segmentation fault! Backtrace:
[2018-01-19 11:36:07] ERROR: flye-repeat(_Z15segfaultHandleri+0x1e) [0x47c1de]
[2018-01-19 11:36:07] ERROR: /usr/lib64/libc.so.6(+0x35270) [0x7effb1dce270]
[2018-01-19 11:36:07] ERROR: flye-repeat(_Z3q75IiET_RSt6vectorIS0_SaIS0_EE+0x11d) [0x432c5d]
[2018-01-19 11:36:07] ERROR: flye-repeat(_ZN19MultiplicityInferer16estimateCoverageEv+0xe59) [0x42ff39]
[2018-01-19 11:36:07] ERROR: flye-repeat(main+0x58c) [0x429b3c]
[2018-01-19 11:36:07] ERROR: /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7effb1dbac05]
[2018-01-19 11:36:07] ERROR: flye-repeat() [0x42aa2f]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.