Comments (5)
Conda's dependencies are always a hell. @piosierra (I am tagging you here in case you know how to fix the problem) had dependency problems, and he solved them by using:
conda install -c bioconda -c conda-forge pggb
Could you please try this way?
from pggb.
Thanks a lot it worked perfectly using the -c conda-forge
. Now, in the meantime I looked both at the GitHub page and at the documentation for pggb
, and I have a few questions:
- I will assemble a diploid pangenome for five human individuals, in order to merge the .fasta files can I simply use
cat
? - I had a look at the — suggested settings for different organisms section — and at the — Organism Example Parameters section — in both there is a -G flag that even after looking into the help file of the tool I couldn't really relate to anything I know. It seems to be some sort of threshold for smoothing over particular features of the graph based on their size? Am I correct, if not what is it and how should I use it?
- After the merging is done, I will index my input.fa with
samtools faidx
; however, having to deal with only 10 haplotypes in total is it worth to go for PanSN prefix naming pattern? I'm more than happy to do so if this will become a standard and if in any way can make the VGS more organised, I was just wondering whether it would require additional pre-processing
Thanks again, for now I think these are my main doubts. Sorry for the long message, but I also have limited CPU hours on the cluster I'm using, so I want to be sure to maximize the results at each step.
from pggb.
Yes, you can simply use cat to merge the files. This should be done at the same time as you assign unique names to all the contigs. PanSN is a consistent way to do this that plays well with tools that require sample and haplotype grouping information. IMO, this pattern (one FASTA input for the whole pangenome) isn't ideal but it is organized and avoids later confusion. It also lets you do things like map reads or contigs against the entire pangenome with wfmash. I'd suggest using bgzip and samtools faidx to index the concatenated FASTA file. It does make sense to be organized even with 5 genomes.
from pggb.
@ekg thanks a lot. I'm not familiar with awk as indicated on PanSN-spec, so I was wondering is there a way consistently name individuals' .fasta with fastix
? I had a look at the Git page but couldn't find any indication on how to use it... Is it just a little script that does the job for you? Let me know, thanks.
FYI, this is a tree -h
output of the folder where I'm working:
Maybe it is helpful for you to suggest me how to proceed, thanks again.
from pggb.
We solved the problem privately, but for future possible readers, here is a link with an example about how to rename the sequences by following the PanSN-spec convention and using fastix.
from pggb.
Related Issues (20)
- computational efficiency of pggb HOT 3
- container creation failed
- Command terminated by signal 6 HOT 4
- Setting the poa-length-target(s)
- A null vcf file HOT 3
- what dose "consensus path" generated with parameter -Q mean HOT 2
- pggb: option requires an argument -- 'i' HOT 1
- the effect of `n_mappings`
- For aligning chromosomes of different species over 100MYA or div is it better to use .masked files?
- Memory size required to build large genomes HOT 14
- run without errors and outputs HOT 7
- erro about paf2net.py HOT 2
- Recommended practices for downstream analyses
- mismatched line lengths at line 3 within sequence
- PGGB singularity vg deconstruct not recognizing sample prefix
- Unable to generate giraffe indexes for PGGB graph
- wfmash to speed up HOT 1
- Follow up on issues with Singularity HOT 1
- How to handle plasmid sequences for bacterial assemblies
- PGGB takes more than 96 hours of walltime on HPC HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pggb.