Giter Club home page Giter Club logo

Comments (5)

AndreaGuarracino avatar AndreaGuarracino commented on September 15, 2024

Conda's dependencies are always a hell. @piosierra (I am tagging you here in case you know how to fix the problem) had dependency problems, and he solved them by using:

conda install -c bioconda -c conda-forge pggb

Could you please try this way?

from pggb.

Overcraft90 avatar Overcraft90 commented on September 15, 2024

Hi @AndreaGuarracino,

Thanks a lot it worked perfectly using the -c conda-forge. Now, in the meantime I looked both at the GitHub page and at the documentation for pggb, and I have a few questions:

  1. I will assemble a diploid pangenome for five human individuals, in order to merge the .fasta files can I simply use cat?
  2. I had a look at the — suggested settings for different organisms section — and at the — Organism Example Parameters section — in both there is a -G flag that even after looking into the help file of the tool I couldn't really relate to anything I know. It seems to be some sort of threshold for smoothing over particular features of the graph based on their size? Am I correct, if not what is it and how should I use it?
  3. After the merging is done, I will index my input.fa with samtools faidx; however, having to deal with only 10 haplotypes in total is it worth to go for PanSN prefix naming pattern? I'm more than happy to do so if this will become a standard and if in any way can make the VGS more organised, I was just wondering whether it would require additional pre-processing

Thanks again, for now I think these are my main doubts. Sorry for the long message, but I also have limited CPU hours on the cluster I'm using, so I want to be sure to maximize the results at each step.

from pggb.

ekg avatar ekg commented on September 15, 2024

Yes, you can simply use cat to merge the files. This should be done at the same time as you assign unique names to all the contigs. PanSN is a consistent way to do this that plays well with tools that require sample and haplotype grouping information. IMO, this pattern (one FASTA input for the whole pangenome) isn't ideal but it is organized and avoids later confusion. It also lets you do things like map reads or contigs against the entire pangenome with wfmash. I'd suggest using bgzip and samtools faidx to index the concatenated FASTA file. It does make sense to be organized even with 5 genomes.

from pggb.

Overcraft90 avatar Overcraft90 commented on September 15, 2024

@ekg thanks a lot. I'm not familiar with awk as indicated on PanSN-spec, so I was wondering is there a way consistently name individuals' .fasta with fastix? I had a look at the Git page but couldn't find any indication on how to use it... Is it just a little script that does the job for you? Let me know, thanks.

FYI, this is a tree -h output of the folder where I'm working:
Screenshot from 2022-05-18 17-31-44
Maybe it is helpful for you to suggest me how to proceed, thanks again.

from pggb.

AndreaGuarracino avatar AndreaGuarracino commented on September 15, 2024

We solved the problem privately, but for future possible readers, here is a link with an example about how to rename the sequences by following the PanSN-spec convention and using fastix.

from pggb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.