Comments (17)
from pggb.
Hi, very cool tool indeed :)
I am not sure whether to comment on this issue or create a new one, but I ran into problems in the exact same step.
I managed to run pggb for both public Xanthomonas and yeast data, but when running it for some public cucumber data (the newest versions of these genomes: ftp://cucurbitgenomics.org/pub/cucurbit/genome/cucumber), I found the following:
[smoothxg::smoothable_blocks] computing blocks
[smoothxg::smoothable_blocks] computing blocks 100.00%%
Command terminated by signal 11lying spoa to block 62339/189753 32.853%
Command being timed: "smoothxg -t 45 -g output/all.fa.pggb-s50000-p75-n5-a70-K16-k8-w10000-j5000-W0-e100.seqwish.gfa -w 10000 -j 5000 -k 0 -e 100"
User time (seconds): 17007.23
System time (seconds): 909.29
Percent of CPU this job got: 342%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:27:09
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 49033128
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 6288
Minor (reclaiming a frame) page faults: 229546238
Voluntary context switches: 15770329
Involuntary context switches: 75027
Swaps: 0
File system inputs: 135369878
File system outputs: 185148472
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I also tried the same settings but with -s 100000
, in which case I also got signal 11 and no smooth.gfa
from pggb.
49033128 = 49G. Is that more than the memory you have on this system?
from pggb.
No, the maximum memory on this system is 128GB.
I also checked with htop during running whether it still had space and it was never above half of the memory on the system (that is, everything running in total).
from pggb.
This can happen when you try to allocate a lot more memory in one go. The actual resident size never goes to the level you requested. The allocations in spoa don't seem to be guarded, or I'm not interacting with their errors correctly.
Working on a fix for this now. Hope to push in the next hour or two.
from pggb.
Please try with the current smoothxg HEAD. This should be resolved pangenome/smoothxg#8.
I've tested it on all the cases I had that were failing in a similar way.
from pggb.
Thank you! There are no errors anymore and it runs smoothly (pun intended)!
from pggb.
I think I still get the error 4:
I've updated smoothxg in the docker contaner.
[smoothxg::main] building xg index
[smoothxg::smoothable_blocks] computing blocks
[smoothxg::smoothable_blocks] computing blocks for 206004 handles: 100.00% @ 1.65e+05/s elapsed: 00:00:00:01 remain: 00:00:00:00
[smoothxg::break_blocks] splitting short sequences out of 1625 blocks: 100.00% @ 6.49e+03/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::break_blocks] split 117 blocks
[smoothxg::break_blocks] cutting blocks that contain sequences longer than max-poa-length (10000)
[smoothxg::break_blocks] cutting 1742 blocks: 100.00% @ 6.96e+03/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::break_blocks] cut 446 blocks of which 5 had repeats
Command terminated by signal 4
smoothxg -t 8 -g /data/akkermansia.fasta.gz.pggb-s100000-p90-n10-a90-K11-k8-w10000-j5000-e5000.seqwish.gfa -w 10000 -j 5000 -e 5000 -l 10000 -m /data/akkermansia.fasta.gz.pggb-s100000-p90-n10-a90-K11-k8-w10000-j5000-e5000.smooth.maf -s /data/akkermansia.fasta.gz.pggb-s100000-p90-n10-a90-K11-k8-w10000-j5000-e5000.consensus -a -C 10,100,1000,10000
35.37s user 1.68s system 193% cpu 19.12s total 155316Kb max memory
from pggb.
from pggb.
from pggb.
from pggb.
Any word on this? If the data is public, I can try to reproduce it.
from pggb.
Sorry, I didn't manage to debug smooththxg as the intermediate files are removed. And using gdb with pggb directly didn't worked out.
Here is my data. 5 (fragmented) bacterial genomes for which I wanted to create a pangenome graph. I used to concatenate the fasta files and use them as input for pggb. The genomes have 98% average nucleotide identity therefore I used a high mapping/and alignment rate in ppgb.
pggb -i combined_genomes.fasta.gz --segment-length=100000 -K 11 --map-pct-id=90 --align-pct-id=90 -n 10 -t 2 -v -l
from pggb.
from pggb.
I have a question:
The input for pggb is a fasta file with complete genomes, isn't it. But, most bacterial genomes are only available as scaffolds or contigs. Should I fill the gaps simply with NNN or should I filter the paf file in order only to allow between - genome alignments?
What do you think?
from pggb.
from pggb.
Because there was not recent activity here, the issue seems solved. Closing. If you feel otherwise, please open again.
from pggb.
Related Issues (20)
- [help] My reference genome of a diploid organism is a primary assembly HOT 3
- Gradual increase of pan-genomes HOT 1
- computational efficiency of pggb HOT 3
- container creation failed
- Command terminated by signal 6 HOT 4
- Setting the poa-length-target(s)
- A null vcf file HOT 3
- what dose "consensus path" generated with parameter -Q mean HOT 2
- pggb: option requires an argument -- 'i' HOT 1
- the effect of `n_mappings`
- For aligning chromosomes of different species over 100MYA or div is it better to use .masked files?
- Memory size required to build large genomes HOT 14
- run without errors and outputs HOT 7
- erro about paf2net.py HOT 2
- Recommended practices for downstream analyses
- mismatched line lengths at line 3 within sequence
- PGGB singularity vg deconstruct not recognizing sample prefix
- Unable to generate giraffe indexes for PGGB graph
- wfmash to speed up HOT 1
- Follow up on issues with Singularity HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pggb.