Giter Club home page Giter Club logo

Comments (17)

ekg avatar ekg commented on September 15, 2024

from pggb.

dirkjanvw avatar dirkjanvw commented on September 15, 2024

Hi, very cool tool indeed :)

I am not sure whether to comment on this issue or create a new one, but I ran into problems in the exact same step.
I managed to run pggb for both public Xanthomonas and yeast data, but when running it for some public cucumber data (the newest versions of these genomes: ftp://cucurbitgenomics.org/pub/cucurbit/genome/cucumber), I found the following:

[smoothxg::smoothable_blocks] computing blocks
[smoothxg::smoothable_blocks] computing blocks 100.00%%
Command terminated by signal 11lying spoa to block 62339/189753 32.853%
	Command being timed: "smoothxg -t 45 -g output/all.fa.pggb-s50000-p75-n5-a70-K16-k8-w10000-j5000-W0-e100.seqwish.gfa -w 10000 -j 5000 -k 0 -e 100"
	User time (seconds): 17007.23
	System time (seconds): 909.29
	Percent of CPU this job got: 342%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:27:09
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 49033128
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 6288
	Minor (reclaiming a frame) page faults: 229546238
	Voluntary context switches: 15770329
	Involuntary context switches: 75027
	Swaps: 0
	File system inputs: 135369878
	File system outputs: 185148472
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

I also tried the same settings but with -s 100000, in which case I also got signal 11 and no smooth.gfa

from pggb.

ekg avatar ekg commented on September 15, 2024

49033128 = 49G. Is that more than the memory you have on this system?

from pggb.

dirkjanvw avatar dirkjanvw commented on September 15, 2024

No, the maximum memory on this system is 128GB.
I also checked with htop during running whether it still had space and it was never above half of the memory on the system (that is, everything running in total).

from pggb.

ekg avatar ekg commented on September 15, 2024

This can happen when you try to allocate a lot more memory in one go. The actual resident size never goes to the level you requested. The allocations in spoa don't seem to be guarded, or I'm not interacting with their errors correctly.

Working on a fix for this now. Hope to push in the next hour or two.

from pggb.

ekg avatar ekg commented on September 15, 2024

Please try with the current smoothxg HEAD. This should be resolved pangenome/smoothxg#8.

I've tested it on all the cases I had that were failing in a similar way.

from pggb.

dirkjanvw avatar dirkjanvw commented on September 15, 2024

Thank you! There are no errors anymore and it runs smoothly (pun intended)!

from pggb.

SilasK avatar SilasK commented on September 15, 2024

I think I still get the error 4:

I've updated smoothxg in the docker contaner.


[smoothxg::main] building xg index
[smoothxg::smoothable_blocks] computing blocks
[smoothxg::smoothable_blocks] computing blocks for 206004 handles: 100.00% @ 1.65e+05/s elapsed: 00:00:00:01 remain: 00:00:00:00
[smoothxg::break_blocks] splitting short sequences out of 1625 blocks: 100.00% @ 6.49e+03/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::break_blocks] split 117 blocks
[smoothxg::break_blocks] cutting blocks that contain sequences longer than max-poa-length (10000)
[smoothxg::break_blocks] cutting 1742 blocks: 100.00% @ 6.96e+03/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::break_blocks] cut 446 blocks of which 5 had repeats
Command terminated by signal 4
smoothxg -t 8 -g /data/akkermansia.fasta.gz.pggb-s100000-p90-n10-a90-K11-k8-w10000-j5000-e5000.seqwish.gfa -w 10000 -j 5000 -e 5000 -l 10000 -m /data/akkermansia.fasta.gz.pggb-s100000-p90-n10-a90-K11-k8-w10000-j5000-e5000.smooth.maf -s /data/akkermansia.fasta.gz.pggb-s100000-p90-n10-a90-K11-k8-w10000-j5000-e5000.consensus -a -C 10,100,1000,10000
35.37s user 1.68s system 193% cpu 19.12s total 155316Kb max memory

from pggb.

ekg avatar ekg commented on September 15, 2024

from pggb.

SilasK avatar SilasK commented on September 15, 2024

from pggb.

ekg avatar ekg commented on September 15, 2024

from pggb.

ekg avatar ekg commented on September 15, 2024

Any word on this? If the data is public, I can try to reproduce it.

from pggb.

SilasK avatar SilasK commented on September 15, 2024

Sorry, I didn't manage to debug smooththxg as the intermediate files are removed. And using gdb with pggb directly didn't worked out.

Here is my data. 5 (fragmented) bacterial genomes for which I wanted to create a pangenome graph. I used to concatenate the fasta files and use them as input for pggb. The genomes have 98% average nucleotide identity therefore I used a high mapping/and alignment rate in ppgb.

pggb -i combined_genomes.fasta.gz --segment-length=100000 -K 11 --map-pct-id=90 --align-pct-id=90 -n 10 -t 2 -v -l

from pggb.

ekg avatar ekg commented on September 15, 2024

from pggb.

SilasK avatar SilasK commented on September 15, 2024

I have a question:

The input for pggb is a fasta file with complete genomes, isn't it. But, most bacterial genomes are only available as scaffolds or contigs. Should I fill the gaps simply with NNN or should I filter the paf file in order only to allow between - genome alignments?
What do you think?

from pggb.

ekg avatar ekg commented on September 15, 2024

from pggb.

subwaystation avatar subwaystation commented on September 15, 2024

Because there was not recent activity here, the issue seems solved. Closing. If you feel otherwise, please open again.

from pggb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.