Comments (8)
seqwish uses disk-backed data structures, maybe you ran out of disk space?
from pggb.
There is 15T of disk space under the working path, of which seqwish produces about 1T of process files. Maybe I didn't formulate temp-dir, which causes the alignment results in the work path while the index results are in other spaces? I modified -D to the current working path in an attempt, thanks for your help
from pggb.
To be used efficiently by seqwish, the disk needs to be local and ideally SSD. It should support efficient random access.
If you do not have such a disk, one option is to create a ramdisk and use that as the scratch directory.
Take care not to run seqwish on networked storage with high latencies. This will cause the kind of problem you're seeing.
Exactly where did the seqwish job slow down? Would you share some of the log?
from pggb.
Thanks for your help, we got back into running the pipeline and just came to the slow down part.
We are running on a local HDD disk.
Seqwish is slow down in the indexing process and the log file is as follows:
[seqwish::seqidx] 0.000 indexing sequences
[seqwish::seqidx] 40.197 index built
[seqwish::alignments] 40.197 processing alignments
[seqwish::alignments] 20878.751 indexing
top
NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
0 0.836t 0.831t 0.830t D 16.8 42.9 45082:29 seqwish
The output file looks like this:
92K 4I3HBy.sqi
6.3G RmtjUs.sqq
845G vXXiRd.sqa
Could this be because I aligned softmask genome?
seqwish --version
is v0.7.9-0-gd9e7ab5
from pggb.
from pggb.
Hello! Sorry I'm bothering you again. This time we waited for the end of the procedure. We ran:
pggb -i chr1_pan_filter_shorter_than_50k.fa -o chr1.pan -n 1589 -t 128 -p 90 -s 5000 -S -m -V ref:#
but got the following error:
[wfmash::skch::Map::mapQuery] mapped 100.00% @ 2.59e+06 bp/s elapsed: 00:00:43:05 remain: 00:00:00:00
[wfmash::skch::Map::mapQuery] count of mapped reads = 2225, reads qualified for mapping = 2228, total input reads = 2228, total input bp = 6704481258
[wfmash::map] time spent mapping the query: 2.59e+03 sec
[wfmash::map] mapping results saved in: /dev/stdout
wfmash -s 5000 -l 25000 -p 90 -n 1588 -k 19 -H 0.001 -X -t 128 --tmp-base chr1.pan chr1_pan_50k.fa --approx-map
322888.33s user 1654.82s system 12388% cpu 2619.72s total 19614516Kb max memory
[wfmash::align] Reference = [chr1_pan_50k.fa]
[wfmash::align] Query = [chr1_pan_50k.fa]
[wfmash::align] Mapping file = chr1.pan/wfmash-1331Yx
[wfmash::align] Alignment identity cutoff = 0.72%
[wfmash::align] Alignment output file = /dev/stdout
[wfmash::align] time spent loading the reference index: 0.164741 sec
[wfmash::align::computeAlignments] aligned 100.00% @ 4.48e+07 bp/s elapsed: 01:07:41:59 remain: 00:00:00:00
[wfmash::align::computeAlignments] count of mapped reads = 2228, total aligned bp = 5109032990714
[wfmash::align] time spent computing the alignment: 1.14e+05 sec
[wfmash::align] alignment results saved in: /dev/stdout
wfmash -s 5000 -l 25000 -p 90 -n 1588 -k 19 -H 0.001 -X -t 128 --tmp-base chr1.pan chr1_pan_50k.fa -i chr1.pan/chr1_pan_50k.fa.ee441bc.mappings.wfmash.paf --invert-filtering
14407243.04s user 128933.60s system 12733% cpu 114153.25s total 14749796Kb max memory
[seqwish::seqidx] 0.000 indexing sequences
[seqwish::seqidx] 38.848 index built
[seqwish::alignments] 38.848 processing alignments
[seqwish::alignments] 23203.418 indexing
[seqwish::alignments] 1714820.808 index built
[seqwish::transclosure] 1714820.852 computing transitive closures
[seqwish::transclosure] 1714823.517 0.00% 0-10000000 overlap_collect
Command terminated by signal 9
seqwish -s chr1_pan_50k.fa -p chr1.pan/chr1_pan_50k.fa.ee441bc.alignments.wfmash.paf -k 19 -f 0 -g chr1.pan/chr1_pan_50k.fa.ee441bc.417fcdf.seqwish.gfa -B 10000000 -t 128 --temp-dir chr1.pan -P
4914907.12s user 80460.79s system 285% cpu 1750681.50s total 1269699096Kb max memory
Does this seem to be caused by insufficient memory?
We were confused because we had done partition and only had a sequence of about 5M per sample. Perhaps it is currently difficult to build graphs on samples of 1000 orders of magnitude?
from pggb.
from pggb.
Thanks for your help. We have used a reference guided approach like [https://github.com/pangenome/HPRCyear1v2genbank]. But we'll try smaller partitions.
from pggb.
Related Issues (20)
- Converting to a single connected component HOT 2
- vcfwave not found HOT 2
- Construct pangenome HOT 3
- Problems with skipping the wfmash step HOT 8
- Use of pggb to generate alignments for phylogenomics HOT 2
- Wrong setting of POA, but it still finish without error HOT 4
- Path to paf2net.py not correct in latest Docker build HOT 2
- 'SN:Z' and 'SO:i' tags for segments/nodes HOT 1
- How to handle when two contigs from the same assembly sligtly overlap HOT 7
- Construct PanGenome HOT 7
- How to understand the vcf file output by PGGB? HOT 3
- the parameters to build pangenome for metagenomes HOT 2
- [docker] wfmash: error while loading shared libraries: libwfa2cpp.so.0 HOT 1
- --n-haplotypes option not recognized HOT 2
- vg deconstruct error HOT 5
- Command terminated by signal 8 HOT 2
- How to visulize PGGB graph in Sequence Tube Map?
- seqwish fails when `--output-dir` contains commas HOT 1
- Parameters optimization
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pggb.