Giter Club home page Giter Club logo

Comments (8)

danydoerr avatar danydoerr commented on June 26, 2024

Hi Benjamin,

I think the option that you're looking for is the ordered growth plot that is implemented in the subcommand ordered-histgrowth. Subcommand growth/histgrowth, computes the average over all possible permutations adding 1..n samples to the pangenome, hence a particular position of the x-axis cannot be associated with a single sample / haplotype / path.

from panacus.

blinard-BIOINFO avatar blinard-BIOINFO commented on June 26, 2024

Thanks you for your fast reply.
I also tried with ordering, however I think I miss the documentation to understand how to make any outputs reusing the P-lines or W-lines.

For instance, below is my haplotype list:

# content of myexp.paths.haplotypes.txt

CH_320_5#0#Chr1__CH_320_5:0-49187872
KR_091_H1#0#Chr1__KR_091_H1:0-47283768
KR_091_H2#0#Chr1__KR_091_H2:0-43236572
KZ_150_8_H1#0#Chr1__KZ_150_8_H1:0-47660619
KZ_150_8_H2#0#Chr1__KZ_150_8_H2:0-51744517
Marouch_v3p1#0#Chr1_Marouch_v3p1:0-44417728
RRxCH240_1_plA_H2#0#Chr1_RRxCH240_1_plA_H2:0-45594464
RRxCH240_1_plB_H2#0#Chr1_RRxCH240_1_plB_H2:0-46275222
Rojo_HORA#0#Chr1_Rojo_HORA:0-45121954
RougeRoussillon_H1#0#Chr1_RougeRoussillon_H1:0-49875448
RougeRoussillon_H2#0#Chr1_RougeRoussillon_H2:0-48193998
Stella_v1p1#0#Chr1__Stella_v1p1:0-43521413
Sungold#0#Chr1__Sungold:0-44445119

The same labels are present in the GFA file.

# grep -e "^P" mytest.gfa | cut -f1,2
P	Sungold#0#Chr1__Sungold#0
P	Stella_v1p1#0#Chr1__Stella_v1p1#0
P	RougeRoussillon_H2#0#Chr1_RougeRoussillon_H2#0
P	RougeRoussillon_H1#0#Chr1_RougeRoussillon_H1#0
P	Rojo_HORA#0#Chr1_Rojo_HORA#0
P	Rojo_HCUR#0#Chr1_Rojo_HCUR#0
P	RRxCH240_1_plB_H2#0#Chr1_RRxCH240_1_plB_H2#0
P	RRxCH240_1_plA_H2#0#Chr1_RRxCH240_1_plA_H2#0
P	Marouch_v3p1#0#Chr1_Marouch_v3p1#0
P	KZ_150_8_H2#0#Chr1__KZ_150_8_H2#0
P	KZ_150_8_H1#0#Chr1__KZ_150_8_H1#0
P	KR_091_H2#0#Chr1__KR_091_H2#0
P	KR_091_H1#0#Chr1__KR_091_H1#0
P	CH_320_5#0#Chr1__CH_320_5#0

However, whatever I try, the output CSV or plots only show 1, 2, 3, ... labels for the haplotypes. It never shows the labels.
Then, I'm not sure where haplotype corresponds to 4, which one to 7, ...etc...

Capture d’écran du 2024-04-04 12-00-45

from panacus.

danydoerr avatar danydoerr commented on June 26, 2024

You're touching a sensitive issue here.. yes the documentation is rather weak.

I didn't have a chance to look at my code yet, but it could be that panacus gets confused with the path names because they contain 3 # (whereas panacus expects at most 2 in order to make complete sense of the name (see https://github.com/pangenome/PanSN-spec for more details).

Also, what does the log output say if you run the tool with RUST_LOG=info panacus ...?

from panacus.

lucaparmigiani avatar lucaparmigiani commented on June 26, 2024

Hi, Benjamin,

I have the impression the meaning of the histogram might not be what you think.
Let me see if I am getting your point right.

The histogram does not show how many bps appear in haplotype 1, how many in haplotype 2, and so on.

Each bar in the histogram represents the following:

  • first bar: how many bps appear only in a single haplotype (this can be haplotype 1 or haplotype 2,or ... haplotype n, as long as they appear only once),
  • second bar: how many bps appear in two haplotypes (for example appearing in haplotype 1 and haplotype 4 but in no other haplotype)
  • ...
  • nth bar: how many bps appear in all haplotypes (also refered to as core).

What you would like, if I am understanding correctly, is to know "how many bps appear in haplotype 1 (eg, Sungold#0#Chr1__Sungold#0), how many bps appear in haplotype 2 (eg, Stella_v1p1#0#Chr1__Stella_v1p1#0) and so on. Is it correct?

from panacus.

blinard-BIOINFO avatar blinard-BIOINFO commented on June 26, 2024

Thank you for your explanation.

But then how did you obtain the graph which is at the bottom of the readme of this github.

hprc-v1 0-mc-grch38 ordered-histgrowth bp

I may be confused, but as I understand it this is the graph growth, haplotype after haplotype, because haplotype labels are on the X axis.

I wanted to reproduce this graph.
But whatever command I try (ordered-histgrowth, histgrowth...), I never get haplotype labels on the figure.
I initially followed the README tutorial, and I did not manage to get these labels.

In the README, you have this line:

echo 'HG03492 HG00438 HG00621 HG00673 HG02080 HG00733 HG00735 HG00741 HG01071 HG01106 HG01109 HG01123 HG01175 HG01243 HG01258 HG01358 HG01361 HG01928
HG01952 HG01978 HG02148 HG01891 HG02055 HG02109 HG02145 HG02257 HG02486 HG02559 HG02572 HG02622 HG02630 HG02717 HG02723 HG02818 HG02886 HG03098
HG03453 HG03486 HG03516 HG03540 HG03579 NA18906 NA20129 NA21309' | tr ' ' '\n' > hprc-v1.0-mc-grch38.order.samples.txt

Which generates hprc-v1.0-mc-grch38.order.samples.txt. I expected that combining command ordered_histogram and this file would generate this graph.

However, this file is not used in any of the commands of the README.
I expected this to be a mistake and used this file for the ordering, and my issue started there.

from panacus.

danydoerr avatar danydoerr commented on June 26, 2024

@blinard-BIOINFO thanks for pointing this out, this is indeed a bug in the documentation. The correct command should be

RUST_LOG=info panacus ordered-histgrowth -c bp -O hprc-v1.0-mc-grch38.order.samples.txt -t4 -l 1,2,3,42 -S -e hprc-v1.0-mc-grch38.paths.grch38.txt hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.ordered-histgrowth.bp.tsv

I'm running the example now again to see whether it produces the intended output.

from panacus.

danydoerr avatar danydoerr commented on June 26, 2024

@blinard-BIOINFO you are right, the labels do not come through--this seems to be a bug. Other than that, the plot is identical to the one shown in the README
hprc-v1 0-mc-grch38 ordered-histgrowth bp

from panacus.

danydoerr avatar danydoerr commented on June 26, 2024

@blinard-BIOINFO: @heringerp fixed the issue.

from panacus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.