Comments (11)
You can't use the wfmash-xxxx
file as -a/--input-paf
because it is a temporary file of wfmash
that contains only the mappings, that is the regions to align, so there are no CIGAR strings in it. seqwish
warns you of this ([seqwish] WARNING: input alignment file wfmash-3TaQ4Q does not have CIGAR strings
). Moreover, it seems that such a file presents invalid information in it, which is triggering the error. Try running pggb
by using the output of wfmash
(in your case, it should be called output/wfash-3TaQ4Q.paf
).
from pggb.
@AndreaGuarracino
Thank you for your quickly reply! But when I use pggb -i 19-genomes.merge.fa -n 19 -o output -p 90 -s 100000 -t 5 -T 5 -M -Z
to create the pan-genome graph, it cannot generate the paf file. There is only a wfmash-3TaQ4Q temp file.
from pggb.
Weird, or maybe you haven't waited long enough. What does the estimated mapping and alignment time say in the log? I suggest reducing -s 50000
and waiting a bit more. If the problem persists, please share the output/...log
file.
from pggb.
It occurs the following log at last.
[E::fai_load3_core] Failed to open FASTA file 19-genomes.merge.fa
wfmash -X -s 100000 -p 90 -n 18 -t 16 19-genomes.merge.fa 19-genomes.merge.fa
15440.41s user 792.33s system 1172% cpu 1384.73s total 7245936Kb max memory
from pggb.
from pggb.
[E::fai_load3_core] Failed to open FASTA file 19-genomes.merge.fa
It is not able to see the FASTA file in input, very strange. Can I see your 19-genomes.merge.fa.fai
file too? And also head /home/cuixb/data/analysis_data/graph-pan-genome/pggb-result/wfmash-3TaQ4Q
?
from pggb.
19-genomes.merge.fa.fai file:
19-genomes.merge.fa.zip
head of wfmash-3TaQ4Q file:
Darmor_v10#1#A01 32958928 27800000 28300000 + Darmor_v10#1#C01 48239358 47247687 47879060 5741 631373 10 id:f:90.9308
Darmor_v10#1#A01 32958928 0 3800000 + Darmor_v5#1#chrC01 38829317 850 4733913 44055 4733063 12 id:f:93.0793
Darmor_v10#1#A01 32958928 27000000 29700000 + Darmor_v5#1#chrC01 38829317 35738139 38333401 25321 2700000 12 id:f:93.7809
Darmor_v10#1#A01 32958928 30500000 31200000 + Darmor_v5#1#chrC01 38829317 38267435 38823342 6814 700000 16 id:f:97.3442
Darmor_v10#1#A01 32958928 29900000 30500000 - Darmor_v5#1#chrAnn_random 48658326 1918964 2515790 5847 600000 16 id:f:97.4553
Darmor_v10#1#A01 32958928 15700000 16300000 - Darmor_v5#1#chrAnn_random 48658326 3155785 3717399 5876 600000 17 id:f:97.9259
Darmor_v10#1#A01 32958928 27800000 28300000 + Express617#1#chrC01 44118044 38888171 39510831 5664 622660 10 id:f:90.972
Darmor_v10#1#A01 32958928 28700000 29900000 + Express617#1#chrC01 44118044 40944888 42168781 11515 1223893 12 id:f:94.0823
Darmor_v10#1#A01 32958928 27800000 28300000 + FAFU_ZS11#1#chrC01 54641295 49487432 50101595 5581 614163 10 id:f:90.8653
Darmor_v10#1#A01 32958928 31200000 31900000 + FAFU_ZS11#1#chrC01 54641295 50286548 50945152 6412 700000 11 id:f:91.6069
the whole wfmash-3TaQ4Q file:
wfmash-3TaQ4Q.zip
from pggb.
The FASTA index seems healthy. The input contains a lot of sequences, but I don't think (hope) that's the problem. Can you try it with other, but smaller FASTA files? With FASTA files in the same folder where your current input is, and also FASTA files present in other folders? I am wondering if there is an issue that is specific to your system. In each test, please also delete and regenerate the FASTA index, to be safe.
from pggb.
from pggb.
@ekg As you said, I have confirmed the number of sequences of the reference genome and both two files return the same value.
from pggb.
When I reinstall the whole environment for pggb using conda, it runs successfully without error.
from pggb.
Related Issues (20)
- Parameters optimization
- [help] My reference genome of a diploid organism is a primary assembly HOT 3
- Gradual increase of pan-genomes HOT 1
- computational efficiency of pggb HOT 3
- container creation failed
- Command terminated by signal 6 HOT 4
- Setting the poa-length-target(s)
- A null vcf file HOT 3
- what dose "consensus path" generated with parameter -Q mean HOT 2
- pggb: option requires an argument -- 'i' HOT 1
- the effect of `n_mappings`
- For aligning chromosomes of different species over 100MYA or div is it better to use .masked files?
- Memory size required to build large genomes HOT 14
- run without errors and outputs HOT 7
- erro about paf2net.py HOT 2
- Recommended practices for downstream analyses
- mismatched line lengths at line 3 within sequence
- PGGB singularity vg deconstruct not recognizing sample prefix
- Unable to generate giraffe indexes for PGGB graph
- wfmash to speed up HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pggb.