Giter Club home page Giter Club logo

Comments (21)

mikolmogorov avatar mikolmogorov commented on August 27, 2024

Hi Zac,

Thanks for the detailed report! This seems to be a strange error, provided that all memory/performance bottlenecks have passed... In the second run, the error says "Resource temporarily unavailable", which might mean many things. My first suspicion would be that abruijn can't create an output file for contigs - this might happen because of lack of space or other system policies. It seems that this error also might arise from exceeding memory/thread limitations (but it sounds less likely to me).

Could you actually send me the full log file - it might help to figure where exactly error is happening? In the mean time, I suggest you to try the latest version from the devel branch (it contain a few updates that might be important). You still should be able to --resume though (however if it turns to be working I would suggest to restart from scratch later). Also check that there are no problems with creating files in your filesystem (maybe its full?).

Best,
Mikhail

from flye.

zkstewart avatar zkstewart commented on August 27, 2024

Thanks for the response Mikhail,

I'll re-run it with the latest commit - it took about 11 hrs to reach the point where it crashed when I resumed the run so it will take me a bit of time to see if the latest update changes anything. With regards to the filesystem, I should have no problems with file creation/filesystem limits, other assemblers that are currently running aren't having any issues. The HPC environment I use doesn't place limits on file use, but they would email me if I was abusing that.

Full log should be attached here.

abruijn.log

from flye.

mikolmogorov avatar mikolmogorov commented on August 27, 2024

Thanks! I am pretty sure the problem is related to generation of output files (might not be the filesystem though). If you will have a chance, could you also do 'ls -l' of the working directory?

from flye.

zkstewart avatar zkstewart commented on August 27, 2024

Okay, so I resumed the run using the latest version of ABruijn, and this time the program ends after 1 hr (instead of 11 hrs). The stderr says this:

[2017-10-04 22:15:50] INFO: Running ABruijn
[2017-10-04 22:15:50] INFO: Resuming previous run
[2017-10-04 22:15:50] INFO: Computing rough consensus
[2017-10-04 22:47:03] INFO: Performing repeat analysis
[2017-10-04 22:47:04] INFO: Reading FASTA
[2017-10-04 22:57:03] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-10-04 23:13:14] ERROR: Graph is not symmetric
[2017-10-04 23:13:14] ERROR: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

Also, ls -l of the working directory is below.

total 88236068
-rw------- 1 user default    12121523 Oct  4 23:13 abruijn.log
-rw------- 1 user default          39 Oct  4 22:47 abruijn.save
-rw------- 1 user default 89614032792 Oct  1 12:09 blasr_0.m5
-rw------- 1 user default   368491296 Sep 30 20:20 draft_assembly.fasta
-rw------- 1 user default   359056296 Oct  4 22:47 polished_0.fasta

I'm assuming the "Graph is not symmetric" line is relatively important, but I'm not sure how that relates to my input data. Perhaps I need to re-run the program from start to finish and see what changes?

edit: Should have included the log. This is the new lines appended to the log I attached above.

[2017-10-04 22:15:50] root: INFO: Running ABruijn
[2017-10-04 22:15:50] root: DEBUG: Estimated genome size: 303080041
[2017-10-04 22:15:50] root: DEBUG: Chosen k-mer size: 17
[2017-10-04 22:15:50] root: INFO: Resuming previous run
[2017-10-04 22:15:50] root: INFO: Computing rough consensus
[2017-10-04 22:47:03] root: INFO: Performing repeat analysis
[2017-10-04 22:47:03] root: DEBUG: -----Begin repeat analyser log------
[2017-10-04 22:47:04] DEBUG: Build date: Oct  4 2017 13:37:13
[2017-10-04 22:47:04] INFO: Reading FASTA
[2017-10-04 22:57:03] INFO: Building repeat graph
[2017-10-04 22:57:03] DEBUG: Hard threshold set to 1
[2017-10-04 22:57:03] DEBUG: Started kmer counting
[2017-10-04 22:59:19] DEBUG: Kmer index size: 344619478
[2017-10-04 23:09:04] DEBUG: Found 6461012 overlaps
[2017-10-04 23:10:37] DEBUG: Left 1316792 overlaps after filtering
[2017-10-04 23:10:37] DEBUG: Computing gluepoints
[2017-10-04 23:12:42] DEBUG: Created 2720138 gluepoints
[2017-10-04 23:12:44] DEBUG: Tandems removed: 1172 left, 1175 right, 3732 both
[2017-10-04 23:12:44] DEBUG: Initializing edges
[2017-10-04 23:13:14] ERROR: Graph is not symmetric
-----------End assembly log------------
[2017-10-04 23:13:14] root: ERROR: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

from flye.

mikolmogorov avatar mikolmogorov commented on August 27, 2024

Interesting. This error indicates an issue in the algorithm, rather than input data, so there is no need to re-run for now - I will try to fix this problem shortly.

from flye.

mikolmogorov avatar mikolmogorov commented on August 27, 2024

By the way, are you sure that you got the latest revision from devel branch (not the master)? This log looks a bit outdated..

from flye.

zkstewart avatar zkstewart commented on August 27, 2024

You're right, I updated it with the master branch. I'll run the devel branch and give an update regarding what happens.

from flye.

zkstewart avatar zkstewart commented on August 27, 2024

Error appears to be the same, but the output is different. Stderr:

[2017-10-05 11:15:56] INFO: Running ABruijn
[2017-10-05 11:15:56] INFO: Resuming previous run
[2017-10-05 11:15:56] INFO: Performing repeat analysis
[2017-10-05 11:15:56] INFO: Reading sequences
[2017-10-05 11:25:23] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-10-05 11:37:38] ERROR: Caught unhandled exception: Graph is not symmetric
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_Z16exceptionHandlerv+0xd0) [0x44c5b0]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x64296) [0x2aaaaab38296]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x642e1) [0x2aaaaab382e1]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x644f8) [0x2aaaaab384f8]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_ZN11RepeatGraph15initializeEdgesERK16OverlapContainer+0x11a9) [0x443979]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_ZN11RepeatGraph5buildEv+0x55f) [0x4483af]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(main+0x557) [0x41e407]
[2017-10-05 11:37:38] ERROR:    /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaab2086e5]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_start+0x29) [0x41f1b9]
[2017-10-05 11:37:39] ERROR: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

Appended to abruijn.log

[2017-10-05 11:15:56] root: INFO: Running ABruijn
[2017-10-05 11:15:56] root: DEBUG: Estimated genome size: 303080041
[2017-10-05 11:15:56] root: DEBUG: Chosen k-mer size: 17
[2017-10-05 11:15:56] root: INFO: Resuming previous run
[2017-10-05 11:15:56] root: INFO: Performing repeat analysis
[2017-10-05 11:15:56] root: DEBUG: -----Begin repeat analyser log------
[2017-10-05 11:15:56] DEBUG: Build date: Oct  5 2017 10:02:26
[2017-10-05 11:15:56] INFO: Reading sequences
[2017-10-05 11:25:23] INFO: Building repeat graph
[2017-10-05 11:25:23] DEBUG: Hard threshold set to 1
[2017-10-05 11:25:23] DEBUG: Started kmer counting
[2017-10-05 11:26:27] DEBUG: Kmer index size: 344619478
[2017-10-05 11:34:42] DEBUG: Found 6461012 overlaps
[2017-10-05 11:36:18] DEBUG: Left 1316792 overlaps after filtering
[2017-10-05 11:36:18] DEBUG: Computing gluepoints
[2017-10-05 11:37:36] DEBUG: Created 2720012 gluepoints
[2017-10-05 11:37:38] DEBUG: Tandems removed: 1172 left, 1175 right, 3732 both
[2017-10-05 11:37:38] DEBUG: Initializing edges
[2017-10-05 11:37:38] ERROR: Caught unhandled exception: Graph is not symmetric
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_Z16exceptionHandlerv+0xd0) [0x44c5b0]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x64296) [0x2aaaaab38296]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x642e1) [0x2aaaaab382e1]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x644f8) [0x2aaaaab384f8]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_ZN11RepeatGraph15initializeEdgesERK16OverlapContainer+0x11a9) [0x443979]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_ZN11RepeatGraph5buildEv+0x55f) [0x4483af]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(main+0x557) [0x41e407]
[2017-10-05 11:37:38] ERROR:    /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaab2086e5]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_start+0x29) [0x41f1b9]

from flye.

mikolmogorov avatar mikolmogorov commented on August 27, 2024

Thank you! I got this error in one of my datasets as well - will try to fix shortly.

from flye.

mikolmogorov avatar mikolmogorov commented on August 27, 2024

Please try the latest version from the master branch, it should be resolved now.

from flye.

zkstewart avatar zkstewart commented on August 27, 2024

The original problem has been fixed now, but I unfortunately seem to have encountered another error. I'll rerun the program to see if it happens again, but because it took 12 hours to get to this point it will be a while before I can verify that it is repeatable/occurs at roughly the same time.

This is the stderr:

[2017-10-07 15:54:56] INFO: Running ABruijn
[2017-10-07 15:54:57] INFO: Resuming previous run
[2017-10-07 15:54:57] INFO: Performing repeat analysis
[2017-10-07 15:54:57] INFO: Reading sequences
[2017-10-07 16:04:14] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-10-07 16:20:31] INFO: Simplifying the graph
[2017-10-07 16:20:31] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-10-08 04:11:24] ERROR: Caught unhandled exception: Resource temporarily unavailable
[2017-10-08 04:11:24] ERROR:    abruijn-repeat(_Z16exceptionHandlerv+0xd0) [0x450f60]
[2017-10-08 04:11:24] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x64296) [0x2aaaaab38296]
[2017-10-08 04:11:24] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x642e1) [0x2aaaaab382e1]
[2017-10-08 04:11:24] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x644f8) [0x2aaaaab384f8]
[2017-10-08 04:11:24] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(_ZSt20__throw_system_errori+0x7f) [0x2aaaaab8e87f]
[2017-10-08 04:11:24] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEE+0x11d) [0x2aaaaab902cd]
[2017-10-08 04:11:24] ERROR:    abruijn-repeat(_Z17processInParallelImEvRKSt6vectorIT_SaIS1_EESt8functionIFvRKS1_EEmb+0x171) [0x45f561]
[2017-10-08 04:11:24] ERROR:    abruijn-repeat(_ZN15ContigGenerator18generateAlignmentsERK10ContigPath+0x236) [0x45d646]
[2017-10-08 04:11:24] ERROR:    abruijn-repeat(_ZN15ContigGenerator14generateLinearERK10ContigPath+0x28) [0x45da98]
[2017-10-08 04:11:24] ERROR:    abruijn-repeat(_ZNK15OutputGenerator23generateContigSequencesERSt6vectorI15UnbranchingPathSaIS1_EE+0x98e) [0x45660e]
[2017-10-08 04:11:24] ERROR: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

ls -l:

total 88244196
-rw------- 1 user default    17391744 Oct  8 04:11 abruijn.log
-rw------- 1 user default          39 Oct  7 15:54 abruijn.save
-rw------- 1 user default 89614032792 Oct  1 12:09 blasr_0.m5
-rw------- 1 user default   368491296 Sep 30 20:20 draft_assembly.fasta
-rw------- 1 user default     1236467 Oct  8 04:11 graph_before_rr.dot
-rw------- 1 user default     1813629 Oct  7 16:20 graph_raw.dot
-rw------- 1 user default   359056296 Oct  4 22:47 polished_0.fasta

And the details from the latest run in the log file are attached.
abruijn.appended.log

The main thing I notice is that the program crashed at the same time the graph_before_rr.dot file was created, so it's possibly related.

from flye.

mikolmogorov avatar mikolmogorov commented on August 27, 2024

Thank you,
This is the same error as in the very beginning, but now we know where it happened. What happens is the program wants to create a new thread but the system does not allow it - seems that you are hitting the maximum thread limit (either of kernel or PBS system). How many threads are you using, and is it consistent with how many threads you are requesting through PBS submission? I suggest to try the following - try to run again, but request more threads with PBS (maybe twice more), than you will be actually setting for abruijn (-t argument).

Best,
Mikhail

from flye.

zkstewart avatar zkstewart commented on August 27, 2024

I have to admit I'm not 100% certain how the HPC system is set up, but to my knowledge it works by allocating the number of CPU cores and memory within the submission script, so I am not sure how that relates to the underlying kernel/threads. Previously, I was running the program with 12 abruijn threads and asking PBS for 12 cores. I'll try it with 12 threads but 16 cores (asking for more cores means a significantly longer queue time, and running it with less threads means longer run time), and if the same error occurs I'll lower the threads to 8 and ask for 16 cores again. Will update with results, and thanks for your help thus far.

from flye.

zkstewart avatar zkstewart commented on August 27, 2024

I've tried running the program with 16 cores/12 threads as well as 16 cores/8 threads with the same result as above. As soon as the program either starts writing the graph_before_rr.dot file or begins the step after writing this file, it crashes. At this point I'm not sure how to proceed other than by reducing the number of abruijn threads to 1 (which will take approximately 100-140 hours to run to the point where it is currently crashing), but I will try this to exhaust the possibilities that it's related to threading requirements.

Below is the result of ulimit -a if that might reveal some obvious problem with the HPC setup, although I've never encountered any problems with this previously.

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 514210
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 16384
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 300000
cpu time               (seconds, -t) unlimited
max user processes              (-u) 514210
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

from flye.

mikolmogorov avatar mikolmogorov commented on August 27, 2024

Ok, my current best guess is that system might not like threads to be created/destroyed too often. I tried to address this in the latest devel commit - could you check? Otherwise, is there any chance I can take a look on the data - it would be much faster I I could reproduce it on my machine..

Best,
Mikhail

from flye.

zkstewart avatar zkstewart commented on August 27, 2024

I'm currently running the devel branch now, and will let you know how it goes. If it doesn't work, I'll retry the single threaded master branch and discuss with my supervisors if they have any problems with sharing the data (data is still private, but they would probably not have any major problems with this).

from flye.

mikolmogorov avatar mikolmogorov commented on August 27, 2024

Let me know if the devel branch does not work - I will prepare a special single-threaded version.

from flye.

zkstewart avatar zkstewart commented on August 27, 2024

Success! The changes to the devel branch have allowed the program to make it through the point where it was normally crashing. It's currently running BLASR after the repeat processing step has completed. I'll let you know how the final output looks, and thanks again for all your help.

from flye.

mikolmogorov avatar mikolmogorov commented on August 27, 2024

Yay! Now when it's working I would actually suggest to re-run assembly module as well (if you haven't done so yet after using results from v2.1) :)

Also, do you expect your genome to be highly diploid? I see some patterns of this in the log.. Currently ABruijn is a bit conservative and will not span alternative alleles if the structural difference is high, which might lead to fragmentation.

from flye.

zkstewart avatar zkstewart commented on August 27, 2024

I'll make sure to rerun it with the latest updates once this current run completes. With regards to its diploid nature... I'm not entirely certain since the animal is very distantly related to any relatives that have had a genome previously sequenced, and the organism was taken directly from the wild. I'll keep that in mind regarding ABruijn's assembly - perhaps something like Redundans might help? At the moment I'm trying everything I can get my hands on.

from flye.

zkstewart avatar zkstewart commented on August 27, 2024

Final update before I close this issue. I was able to run the assembly to completion, and will now begin to reassemble it completely using the latest updates. The final stats appear to reflect what you have suggested with respect to the genome having some heterozygosity, as the final genome size is a fair bit smaller after the final repeat and bubble correction compared to the polished_0.fasta file. I have been experimenting with programs like quickmerge which may make the ABruijn output useful as a 'backbone' assembly which can be filled in using other assemblies assuming that its repeat regions are better resolved than other assemblers. It will take me some time to figure out the optimal configuration of things. Thanks again for fixing these issues, best of luck with the future development of this program.

Zac.

from flye.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.