Comments (6)
Yes, that's right-- at the moment the tool is limited to 65534 path groups (speak "samples" or "taxa"). I did not find it likely that there are data sets with more distinct samples/taxa out there right now. How many samples does your data set have?
Typically, you want to group your paths into samples or haplotypes, but this requires that path names adhere to the PanSN naming scheme. Then, you can simply group by sample (-S
) or haplotype (-H
)
from panacus.
Oh, and if your paths are not PanSN compatible, you can still do the grouping by hand, by specifying a path-to-group mapping with -g
from panacus.
Thank you for getting back to me so quickly. In fact, we only have 27 samples, and the genome size of each sample is 2.5G, so it should not be a problem for human pan-genome to visualize our data.
We used minigraph-cactus for pan-genome construction and then used vg to convert gfa1.1 format for visual analysis, I would like to ask how we should conduct quality control or other operations to complete the visualization.
Best yours.
from panacus.
Ok, then this means that you need to group the paths by samples (-S
) or haplotypes (-H
). Regarding quality control, I think panacus
is a good starting point, here is my suggestion:
- Generate an HTML page that contains coverage histograms+growth curves for all count types:
RUST_LOG=info panacus histgrowth -t4 -l 1,2,1,1,1 -q 0,0,1,0.5,0.1 -H -c all -a -o html test.giffa2.1.0.gfa > test.giffa2.histgrowth.all.html
- I find the coverage plots very insightful for quality control. Typically, you expect that the two highest bars correspond to coverage by a single sample/haplotype and by all samples/haplotypes, respectively. Anything else indicates that you might want to re-consider your alignment parameters
- I find the node-resolved coverage table extremely helpful for checking some basic properties of pangenome graphs, especially in combination with node length information (see script gfa2nodelen.py.zip). The table can be generated with
RUST_LOG=info panacus table -t4 -H -c node test.giffa2.1.0.gfa > test.giffa2.coverage.node.tsv
- I am a bit surprised that you have 170 mio. nodes in your graph, given a genome size of 2.5Gbp per sample. For comparison, the HPRC+chinese human pangenome graph (also generated with minigraph-cactus) contains 211 haplotypes, each with ~2.7Gbp length has only about 119 mio. nodes. Now, this does not necessarily mean that your graph has poor quality, the number of nodes depends very much also on the diversity of the genomes. The large number of nodes might make the analysis that I propose (see 3.) a bit more resource-demanding, but typical HPCs nowadays should be able to deal with these large tables.
from panacus.
If you have further questions on QC of your pangenome graph, please email me at [email protected]
from panacus.
OK!I will send the detailed information to your email for consultation!
With best wishes
from panacus.
Related Issues (20)
- What is the meaning of common and consensus? HOT 4
- Option to output plots in separate PNG files HOT 11
- Could you please make an new release? HOT 3
- How to Visualize the results of the minigraph-cactus? HOT 21
- Request software updates HOT 6
- Feature request: Alternative plot with #nodes/#edges vs AC
- A problem while running panacus-visualize HOT 1
- Update Readme to reflect installation of python dependencies in the installation section. HOT 1
- compiler error in rustc-serialize HOT 3
- AttributeError: 'DataFrame' object has no attribute 'cumulative' HOT 13
- Haplotype labels in TSV, visualisations ? HOT 8
- panacus-visualize.py is overwhelmed by 1000 haplotypes HOT 10
- how is panacus treating Ns HOT 4
- What does the output content represent HOT 7
- could you give us an example of cactus? HOT 7
- Installation instructions HOT 2
- Merge different chroms stats into one graph HOT 3
- path coordinates HOT 8
- After `panacus hist` and `panacus growth`, the final visualization will show `#nodes` instead of `bps`. I use `-c bp` for hist HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from panacus.