Comments (11)
I see this quite a few times in the log:
sh: 1: Cannot fork
This error suggests resource exhaustion, in particular running low on memory or hitting the maximum number of processes/threads. This can cause all kinds of issues and false reporting of error conditions. Can you share the output of ulimit -u
from inside the container, and do you know how much memory was in use at the time / the memory available? The main log / screen output of the overall RepeatModeler run may also be helpful if you still have it. And finally, what was the full command line you used to run RepeatModeler?
Do you have any suggestions on how I could finish this run without starting from scratch?
You can run the LTRPipeline
separately: LTRPipeline -pa 4 genome.fa
(where 4 is the number of parallel threads to use). However, it is not easy to combine the results between RECON+RepeatScout and LTRPipeline if you do them separately. I will open a GitHub issue for RepeatModeler to properly support -recoverDir
when only the LTR pipeline failed.
Note that the fork
error could have impacted the previous rounds, so it may be worth re-doing the run anyway if that turns out to be a serious problem.
from tetools.
I ran the container on a shared server, so the fork
error could indeed be caused by a lack of available threads during runtime. Memory was likely not the issue (I assume that there was at least 100G available at the time), nor were any limits placed on the container itself (ulimit -u
and ulimit -m
both gave unlimited
as output when ran inside the container).
The full command with which I ran RepeatModeler was: docker run -it --mount type=bind,source="/mnt/local_scratch/wijfj001/repeatmodeler_dir",target=/work --mount type=bind,source="/lustre/BIF/nobackup/wijfj001/Software/bin/TRF",target=/opt/trf,ro --user "$(id -u):$(id -g)" --workdir "/work" --env "HOME=/work" "rwijfjes/tetools:1.1" RepeatModeler -database Hinc_ctgs -pa 5 -LTRStruct >& repeat_modeler_run.log
Besides the message that no results were obtained after running LTR_retriever, the main log file does not contain any red flags as far as I can tell:
It seems that restarting the run on a system on which I do not have to share threads is the most practical solution for now. Thanks for the help up to this point!
from tetools.
Alright, I managed to complete a run through the container on a different server and the main log file does not seem to point out any issues. However, we had to make the increase the pid_max
value of the server to 200,000 to make it happen, as RepeatModeler had generated 57,795 zombie processes by the time it reached the final RepeatClassifier stage. Any idea what could have caused this?
The full command that I used was: docker run -it --mount type=bind,source="/mnt/local_scratch/wijfj001/repeatmodeler_dir",target=/work --mount type=bind,source="/mnt/scratch/wijfj001/Software/bin/TRF",target=/opt/trf,ro --user "$(id -u):$(id -g)" --workdir "/work" --env "HOME=/work" "rwijfjes/tetools:1.1" RepeatModeler -database Hinc_scfs -pa 6 -LTRStruct
from tetools.
as RepeatModeler had generated 57,795 zombie processes by the time it reached the final RepeatClassifier stage. Any idea what could have caused this?
That is definitely wrong, and it's possible it is the cause of or otherwise related to to the cannot fork
error you experienced earlier. However, RepeatModeler
does wait
on its children so I am surprised that you would accumulate any zombies let alone such an absurd number of them. Do you know if the zombies were of RepeatModeler
itself or of another script/program that it runs?
from tetools.
The zombie processes were generated while RepeatModeler
was running LTR_retriever
and seemed to stick around until the complete run had finished. Nevertheless, the main log file reported that LTR_retriever
had finished (see attachment). There was unfortunately no log file (besides makeblastdb.log) present in the working directory of the LTRpipeline
part of the run, so I could not check what the stderr
output was during that time.
from tetools.
Does running LTRPipeline -pa X genome.fa
by itself also create the zombie processes? I have not been able to reproduce it yet but this does help narrow it down.
from tetools.
I tried out the following command: docker run -it --mount type=bind,source="/mnt/local_scratch/wijfj001/repeatmodeler_dir",target=/work --mount type=bind,source="/mnt/scratch/wijfj001/Software/bin/TRF",target=/opt/trf,ro --user "$(id -u):$(id -g)" --workdir "/work" --env "HOME=/work" "rwijfjes/tetools:1.1" LTRPipeline -pa 6 scaffold_sequences_short_ids.fa >& LTRPipeline_scaffold_rerun.out
This command had spawned ~57,000 [bash] <defunct>
processes before I killed it. They were all generated between 14:17 and 14:19, which corresponded to the end of the modules 2-5 part of the pipeline (see log):
Hope this helps!
from tetools.
Thanks. And I realize I should have asked these right away and forgot to, in case there is a known bug in those versions:
- Host operating system and version (from
uname -a
,lsb_release -a
, etc.) - Version of docker and where you got it from (OS package manager or from source)
- Whether and how you modified the
Dockerfile
to buildrwijfjes/tetools:1.1
from tetools.
You can disregard the previous message - I was able to reproduce this after all.
When you run a command directly as in docker run ... LTRPipeline ...
, child processes whose parents die are reparented to LTRPipeline
which does not reap adopted orphans automatically (and it does not usually need to).
One simple change that should work for you is to use sh -c
as an intermediate, which does reap adopted children:
docker run -it --mount type=bind,source="/mnt/local_scratch/wijfj001/repeatmodeler_dir",target=/work --mount type=bind,source="/lustre/BIF/nobackup/wijfj001/Software/bin/TRF",target=/opt/trf,ro --user "$(id -u):$(id -g)" --workdir "/work" --env "HOME=/work" "rwijfjes/tetools:1.1" sh -c 'RepeatModeler -database Hinc_ctgs -pa 5 -LTRStruct' >& repeat_modeler_run.log
This is often not a huge problem, but LTR_retriever
indirectly spawns a large number of processes so this issue became noticeable.
EDIT: Docker's --init
flag may also work or even be preferred as a workaround. I will continue to test and make sure the issue is resolved on our side for anyone who uses the dfam-tetools.sh
script.
from tetools.
The --init
option did the trick! I ran a docker run ... LTRPipeline ...
command with this flag and it finished without starting a zombie apocalypse. I expect that the full RepeatModeler
pipeline will now run properly as well.
Thanks for getting to the bottom of this, Jeb!
from tetools.
I added --init
to the docker command in dfam-tetoolsh.sh
, so this should not negatively impact other users in the future.
I have also opened Dfam-consortium/RepeatModeler#65 to track the issue where one cannot resume the run if only the LTRPipeline
step fails.
Thanks for reporting both issues!
from tetools.
Related Issues (20)
- > The combine is a very confusing place for us. What causes such a deviation?
- Feature request: Make the Docker image multi-platform HOT 8
- forksys: Program terminated by a signal 9. HOT 1
- addRepbase.pl: no such file
- reasonaTE "https://github.com/DerKevinRiehl/transposon_annotation_reasonaTE" HOT 1
- hangup error on round5 of RepeatModeler on singularity sif v1.8, v1.85 HOT 1
- Error running repeatmodeler in container HOT 2
- rmblast does not work in new docker image for TETools 1.86. HOT 2
- error of repeatmasker in container HOT 1
- Docker Image Cannot Run LTRStruct pipeline HOT 1
- Customizing RepeatMasker libraries: Absent HOT 2
- Problems configuring RepeatClasifier on docker. HOT 8
- Request: LTR_retriever update from version 2.9.0 HOT 1
- Bump version to 2.0 HOT 1
- Command line fasta file scaffolds_final.fa does not exist! HOT 2
- famdb.py: command not found HOT 2
- Taxonomy::new() needs a path for a famdb directory! HOT 6
- LTRPipeline : Error - could not open clusters.dat! HOT 2
- RepeatModeler BuildDatabase can not open file
- MAFFT failed while running RepeatModeler
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tetools.