Hello, Great tool. Is it possible to display the haplotypes labels from GFA P-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Haplotype labels in TSV, visualisations ? about panacus HOT 8 CLOSED

marschall-lab commented on September 23, 2024

Haplotype labels in TSV, visualisations ?

from panacus.

Comments (8)

danydoerr commented on September 23, 2024

Hi Benjamin,

I think the option that you're looking for is the ordered growth plot that is implemented in the subcommand ordered-histgrowth. Subcommand growth/histgrowth, computes the average over all possible permutations adding 1..n samples to the pangenome, hence a particular position of the x-axis cannot be associated with a single sample / haplotype / path.

from panacus.

blinard-BIOINFO commented on September 23, 2024

Thanks you for your fast reply.
I also tried with ordering, however I think I miss the documentation to understand how to make any outputs reusing the P-lines or W-lines.

For instance, below is my haplotype list:

# content of myexp.paths.haplotypes.txt

CH_320_5#0#Chr1__CH_320_5:0-49187872
KR_091_H1#0#Chr1__KR_091_H1:0-47283768
KR_091_H2#0#Chr1__KR_091_H2:0-43236572
KZ_150_8_H1#0#Chr1__KZ_150_8_H1:0-47660619
KZ_150_8_H2#0#Chr1__KZ_150_8_H2:0-51744517
Marouch_v3p1#0#Chr1_Marouch_v3p1:0-44417728
RRxCH240_1_plA_H2#0#Chr1_RRxCH240_1_plA_H2:0-45594464
RRxCH240_1_plB_H2#0#Chr1_RRxCH240_1_plB_H2:0-46275222
Rojo_HORA#0#Chr1_Rojo_HORA:0-45121954
RougeRoussillon_H1#0#Chr1_RougeRoussillon_H1:0-49875448
RougeRoussillon_H2#0#Chr1_RougeRoussillon_H2:0-48193998
Stella_v1p1#0#Chr1__Stella_v1p1:0-43521413
Sungold#0#Chr1__Sungold:0-44445119

The same labels are present in the GFA file.

# grep -e "^P" mytest.gfa | cut -f1,2
P	Sungold#0#Chr1__Sungold#0
P	Stella_v1p1#0#Chr1__Stella_v1p1#0
P	RougeRoussillon_H2#0#Chr1_RougeRoussillon_H2#0
P	RougeRoussillon_H1#0#Chr1_RougeRoussillon_H1#0
P	Rojo_HORA#0#Chr1_Rojo_HORA#0
P	Rojo_HCUR#0#Chr1_Rojo_HCUR#0
P	RRxCH240_1_plB_H2#0#Chr1_RRxCH240_1_plB_H2#0
P	RRxCH240_1_plA_H2#0#Chr1_RRxCH240_1_plA_H2#0
P	Marouch_v3p1#0#Chr1_Marouch_v3p1#0
P	KZ_150_8_H2#0#Chr1__KZ_150_8_H2#0
P	KZ_150_8_H1#0#Chr1__KZ_150_8_H1#0
P	KR_091_H2#0#Chr1__KR_091_H2#0
P	KR_091_H1#0#Chr1__KR_091_H1#0
P	CH_320_5#0#Chr1__CH_320_5#0

However, whatever I try, the output CSV or plots only show 1, 2, 3, ... labels for the haplotypes. It never shows the labels.
Then, I'm not sure where haplotype corresponds to 4, which one to 7, ...etc...

from panacus.

danydoerr commented on September 23, 2024

You're touching a sensitive issue here.. yes the documentation is rather weak.

I didn't have a chance to look at my code yet, but it could be that panacus gets confused with the path names because they contain 3 # (whereas panacus expects at most 2 in order to make complete sense of the name (see https://github.com/pangenome/PanSN-spec for more details).

Also, what does the log output say if you run the tool with RUST_LOG=info panacus ...?

from panacus.

lucaparmigiani commented on September 23, 2024

Hi, Benjamin,

I have the impression the meaning of the histogram might not be what you think.
Let me see if I am getting your point right.

The histogram does not show how many bps appear in haplotype 1, how many in haplotype 2, and so on.

Each bar in the histogram represents the following:

first bar: how many bps appear only in a single haplotype (this can be haplotype 1 or haplotype 2,or ... haplotype n, as long as they appear only once),
second bar: how many bps appear in two haplotypes (for example appearing in haplotype 1 and haplotype 4 but in no other haplotype)
...
nth bar: how many bps appear in all haplotypes (also refered to as core).

What you would like, if I am understanding correctly, is to know "how many bps appear in haplotype 1 (eg, Sungold#0#Chr1__Sungold#0), how many bps appear in haplotype 2 (eg, Stella_v1p1#0#Chr1__Stella_v1p1#0) and so on. Is it correct?

from panacus.

blinard-BIOINFO commented on September 23, 2024

Thank you for your explanation.

But then how did you obtain the graph which is at the bottom of the readme of this github.

I may be confused, but as I understand it this is the graph growth, haplotype after haplotype, because haplotype labels are on the X axis.

I wanted to reproduce this graph.
But whatever command I try (ordered-histgrowth, histgrowth...), I never get haplotype labels on the figure.
I initially followed the README tutorial, and I did not manage to get these labels.

In the README, you have this line:

echo 'HG03492 HG00438 HG00621 HG00673 HG02080 HG00733 HG00735 HG00741 HG01071 HG01106 HG01109 HG01123 HG01175 HG01243 HG01258 HG01358 HG01361 HG01928
HG01952 HG01978 HG02148 HG01891 HG02055 HG02109 HG02145 HG02257 HG02486 HG02559 HG02572 HG02622 HG02630 HG02717 HG02723 HG02818 HG02886 HG03098
HG03453 HG03486 HG03516 HG03540 HG03579 NA18906 NA20129 NA21309' | tr ' ' '\n' > hprc-v1.0-mc-grch38.order.samples.txt

Which generates hprc-v1.0-mc-grch38.order.samples.txt. I expected that combining command ordered_histogram and this file would generate this graph.

However, this file is not used in any of the commands of the README.
I expected this to be a mistake and used this file for the ordering, and my issue started there.

from panacus.

danydoerr commented on September 23, 2024

@blinard-BIOINFO thanks for pointing this out, this is indeed a bug in the documentation. The correct command should be

RUST_LOG=info panacus ordered-histgrowth -c bp -O hprc-v1.0-mc-grch38.order.samples.txt -t4 -l 1,2,3,42 -S -e hprc-v1.0-mc-grch38.paths.grch38.txt hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.ordered-histgrowth.bp.tsv

I'm running the example now again to see whether it produces the intended output.

from panacus.

danydoerr commented on September 23, 2024

@blinard-BIOINFO you are right, the labels do not come through--this seems to be a bug. Other than that, the plot is identical to the one shown in the README

from panacus.

danydoerr commented on September 23, 2024

@blinard-BIOINFO: @heringerp fixed the issue.

from panacus.

Haplotype labels in TSV, visualisations ? about panacus HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent