Comments (8)
Hi Benjamin,
I think the option that you're looking for is the ordered growth plot that is implemented in the subcommand ordered-histgrowth
. Subcommand growth
/histgrowth
, computes the average over all possible permutations adding 1..n samples to the pangenome, hence a particular position of the x-axis cannot be associated with a single sample / haplotype / path.
from panacus.
Thanks you for your fast reply.
I also tried with ordering, however I think I miss the documentation to understand how to make any outputs reusing the P-lines or W-lines.
For instance, below is my haplotype list:
# content of myexp.paths.haplotypes.txt
CH_320_5#0#Chr1__CH_320_5:0-49187872
KR_091_H1#0#Chr1__KR_091_H1:0-47283768
KR_091_H2#0#Chr1__KR_091_H2:0-43236572
KZ_150_8_H1#0#Chr1__KZ_150_8_H1:0-47660619
KZ_150_8_H2#0#Chr1__KZ_150_8_H2:0-51744517
Marouch_v3p1#0#Chr1_Marouch_v3p1:0-44417728
RRxCH240_1_plA_H2#0#Chr1_RRxCH240_1_plA_H2:0-45594464
RRxCH240_1_plB_H2#0#Chr1_RRxCH240_1_plB_H2:0-46275222
Rojo_HORA#0#Chr1_Rojo_HORA:0-45121954
RougeRoussillon_H1#0#Chr1_RougeRoussillon_H1:0-49875448
RougeRoussillon_H2#0#Chr1_RougeRoussillon_H2:0-48193998
Stella_v1p1#0#Chr1__Stella_v1p1:0-43521413
Sungold#0#Chr1__Sungold:0-44445119
The same labels are present in the GFA file.
# grep -e "^P" mytest.gfa | cut -f1,2
P Sungold#0#Chr1__Sungold#0
P Stella_v1p1#0#Chr1__Stella_v1p1#0
P RougeRoussillon_H2#0#Chr1_RougeRoussillon_H2#0
P RougeRoussillon_H1#0#Chr1_RougeRoussillon_H1#0
P Rojo_HORA#0#Chr1_Rojo_HORA#0
P Rojo_HCUR#0#Chr1_Rojo_HCUR#0
P RRxCH240_1_plB_H2#0#Chr1_RRxCH240_1_plB_H2#0
P RRxCH240_1_plA_H2#0#Chr1_RRxCH240_1_plA_H2#0
P Marouch_v3p1#0#Chr1_Marouch_v3p1#0
P KZ_150_8_H2#0#Chr1__KZ_150_8_H2#0
P KZ_150_8_H1#0#Chr1__KZ_150_8_H1#0
P KR_091_H2#0#Chr1__KR_091_H2#0
P KR_091_H1#0#Chr1__KR_091_H1#0
P CH_320_5#0#Chr1__CH_320_5#0
However, whatever I try, the output CSV or plots only show 1, 2, 3, ... labels for the haplotypes. It never shows the labels.
Then, I'm not sure where haplotype corresponds to 4, which one to 7, ...etc...
from panacus.
You're touching a sensitive issue here.. yes the documentation is rather weak.
I didn't have a chance to look at my code yet, but it could be that panacus
gets confused with the path names because they contain 3 #
(whereas panacus
expects at most 2 in order to make complete sense of the name (see https://github.com/pangenome/PanSN-spec for more details).
Also, what does the log output say if you run the tool with RUST_LOG=info panacus ...
?
from panacus.
Hi, Benjamin,
I have the impression the meaning of the histogram might not be what you think.
Let me see if I am getting your point right.
The histogram does not show how many bps appear in haplotype 1, how many in haplotype 2, and so on.
Each bar in the histogram represents the following:
- first bar: how many bps appear only in a single haplotype (this can be haplotype 1 or haplotype 2,or ... haplotype n, as long as they appear only once),
- second bar: how many bps appear in two haplotypes (for example appearing in haplotype 1 and haplotype 4 but in no other haplotype)
- ...
- nth bar: how many bps appear in all haplotypes (also refered to as core).
What you would like, if I am understanding correctly, is to know "how many bps appear in haplotype 1 (eg, Sungold#0#Chr1__Sungold#0), how many bps appear in haplotype 2 (eg, Stella_v1p1#0#Chr1__Stella_v1p1#0) and so on. Is it correct?
from panacus.
Thank you for your explanation.
But then how did you obtain the graph which is at the bottom of the readme of this github.
I may be confused, but as I understand it this is the graph growth, haplotype after haplotype, because haplotype labels are on the X axis.
I wanted to reproduce this graph.
But whatever command I try (ordered-histgrowth
, histgrowth
...), I never get haplotype labels on the figure.
I initially followed the README tutorial, and I did not manage to get these labels.
In the README, you have this line:
echo 'HG03492 HG00438 HG00621 HG00673 HG02080 HG00733 HG00735 HG00741 HG01071 HG01106 HG01109 HG01123 HG01175 HG01243 HG01258 HG01358 HG01361 HG01928
HG01952 HG01978 HG02148 HG01891 HG02055 HG02109 HG02145 HG02257 HG02486 HG02559 HG02572 HG02622 HG02630 HG02717 HG02723 HG02818 HG02886 HG03098
HG03453 HG03486 HG03516 HG03540 HG03579 NA18906 NA20129 NA21309' | tr ' ' '\n' > hprc-v1.0-mc-grch38.order.samples.txt
Which generates hprc-v1.0-mc-grch38.order.samples.txt
. I expected that combining command ordered_histogram
and this file would generate this graph.
However, this file is not used in any of the commands of the README.
I expected this to be a mistake and used this file for the ordering, and my issue started there.
from panacus.
@blinard-BIOINFO thanks for pointing this out, this is indeed a bug in the documentation. The correct command should be
RUST_LOG=info panacus ordered-histgrowth -c bp -O hprc-v1.0-mc-grch38.order.samples.txt -t4 -l 1,2,3,42 -S -e hprc-v1.0-mc-grch38.paths.grch38.txt hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.ordered-histgrowth.bp.tsv
I'm running the example now again to see whether it produces the intended output.
from panacus.
@blinard-BIOINFO you are right, the labels do not come through--this seems to be a bug. Other than that, the plot is identical to the one shown in the README
from panacus.
@blinard-BIOINFO: @heringerp fixed the issue.
from panacus.
Related Issues (20)
- Option to output plots in separate PNG files HOT 11
- Could you please make an new release? HOT 3
- command is not supported for more than 65534 HOT 6
- How to Visualize the results of the minigraph-cactus? HOT 21
- Request software updates HOT 6
- Feature request: Alternative plot with #nodes/#edges vs AC
- A problem while running panacus-visualize HOT 1
- Update Readme to reflect installation of python dependencies in the installation section. HOT 1
- compiler error in rustc-serialize HOT 3
- panacus-visualize.py is overwhelmed by 1000 haplotypes HOT 10
- how is panacus treating Ns HOT 4
- Discrepancy between graph length, reference length, and novel base pairs HOT 3
- macOS binary missing HOT 1
- Calling `panacus hist` with `-s` but then not supplying the subset file cases no error
- coverage/quorum HOT 1
- what dose coverage and quorum mean? HOT 3
- Validity of results HOT 1
- Ordered growth for PGGB gfa file HOT 18
- After `panacus hist` and `panacus growth`, the final visualization will show `#nodes` instead of `bps`. I use `-c bp` for hist HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from panacus.