Comments (21)
Thank you for your positive reply. We will get back to you after the test.
Best
from panacus.
Besides, we found the example file "hprc-v1.0-mc-grch38.gfa" is also the version gfa1.0. could you give me a example to visualize the MC result with version gfa 1.1?
from panacus.
l found the first line of "hprc-v1.0-mc-grch38.gfa" is H VN:Z:1.0, but our MC result is H VN:Z:1.1 RS:Z:Reference1.
I don't know if that's the reason for the problem
from panacus.
The difference between v1.0 and v1.1 is that the latter also supports W-lines. However, panacus does not alter its behavior depending on different GFA versions. So, no, the GFA header line does not matter. In fact, panacus will still process GFA files even if they do not contain that header line.
Our problem is that the path file obtained in the first step is empty, but we can extract the path file after using vg to convert to version 1.0, I would like to ask why?
If the original GFA file is v1.1, I'd guess paths are encoded in W-lines:
grep -e '^W' hprc-v1.0-mc-grch38.gfa | cut -f2-6 | awk '{ print $1 "#" $2 "#" $3 ":" $4 "-" $5 }' >> hprc-v1.0-mc-grch38.paths.txt
Now we have found a problem, according to the example you provided, we can not extract the results of MC for visual analysis, including #4.
The description in #4 for hprc-v1.0-mc-grch38.gfa is obsolete, because panacus now provides shortcuts for grouping (-H
, '-S'), so you do not need to do that by hand. The following code replaces all of #4:
grep -ive 'grch38\|chm13' hprc-v1.0-mc-grch38.paths.txt > hprc-v1.0-mc-grch38.paths.haplotypes.txt
RUST_LOG=info panacus histgrowth -t16 -q 0,1,0.5,0.1 -S -s hprc-v1.0-mc-grch38.paths.haplotypes.txt hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.histgrowth.node.txt
panacus-visualize hprc-v1.0-mc-grch38.histgrowth.node.txt > hprc-v1.0-mc-grch38.histgrowth.node.pdf
... and to produce a plot quantifying the non-reference sequence of the graph:
grep -ie 'grch38' hprc-v1.0-mc-grch38.paths.txt > hprc-v1.0-mc-grch38.paths.grch38.txt
RUST_LOG=info panacus histgrowth -t16 -c bp -q 0,1,0.5,0.1 -e hprc-v1.0-mc-grch38.paths.grch38.txt -S -s hprc-v1.0-mc-grch38.paths.haplotypes.txt hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.histgrowth.nogrch38.bp.txt
panacus-visualize hprc-v1.0-mc-grch38.histgrowth.nogrch38.bp.txt > hprc-v1.0-mc-grch38.histgrowth.nogrch38.bp.pdf
from panacus.
Dear developer:
The code "RUST_LOG=info panacus histgrowth -t16 -c bp -q 0,1,0.5,0.1 -e hprc-v1.0-mc-grch38.paths.grch38.txt -S -s hprc-v1.0-mc-grch38.paths.haplotypes.txt hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.histgrowth.nogrch38.bp.txt" this step takes a long time, but nothing is generated in the code, what is the reason for this?
log file:
[2023-12-04T00:17:57Z INFO panacus::cli] running panacus on 16 threads
[2023-12-04T00:17:57Z INFO panacus::cli] constructing indexes for node/edge IDs, node lengths, and P/W lines..
[2023-12-04T00:20:55Z INFO panacus::cli] ..done; found 12817 paths/walks and 188902154 nodes
[2023-12-04T00:20:55Z INFO panacus::cli] loading data from group / subset / exclude files
[2023-12-04T00:20:55Z INFO panacus::abacus] loading coordinates from pig.full.paths.haplotypes.txt
[2023-12-04T00:20:56Z INFO panacus::abacus] loading coordinates from pig.full.paths.Sscrofa.txt
[2023-12-04T00:20:56Z INFO panacus::cli] loading graph from pig.full.gfa
[2023-12-04T00:20:56Z INFO panacus::abacus] parsing path + walk sequences
[2023-12-04T04:36:40Z INFO panacus::abacus] counting abacus entries..
[2023-12-04T04:36:52Z INFO panacus::cli] abacus has 26 path groups and 188902155 countables
[2023-12-04T04:36:52Z INFO panacus::cli] constructing histogram..
[2023-12-04T04:36:53Z INFO panacus::hist] calculating growth for coverage >= 1A and quorum >= 0R
[2023-12-04T04:36:53Z INFO panacus::hist] calculating growth for coverage >= 1A and quorum >= 1R
[2023-12-04T04:36:53Z INFO panacus::hist] calculating growth for coverage >= 1A and quorum >= 0.5R
[2023-12-04T04:36:53Z INFO panacus::hist] calculating growth for coverage >= 1A and quorum >= 0.1R
[2023-12-04T04:36:53Z INFO panacus::cli] reporting (hist-)growth table
[2023-12-04T04:37:43Z INFO panacus] done; time elapsed: 15586.143364131s
from panacus.
Hi, do you mind sharing the output file hprc-v1.0-mc-grch38.histgrowth.nogrch38.bp.txt
in this thread?
from panacus.
The code is similar to your suggestion
grep -e '^W' pig.full.gfa | cut -f2-6 | awk '{ print $1 "#" $2 "#" $3 ":" $4 "-" $5 }' >> pig.full.paths.txt
grep -ive 'Sscrofa' pig.full.paths.txt > pig.full.paths.haplotypes.txt
RUST_LOG=info panacus histgrowth -t 16 -q 0,1,0.5,0.1 -S -s pig.full.paths.haplotypes.txt pig.full.gfa > pig.full.histgrowth.node.txt
panacus-visualize pig.full.histgrowth.node.txt > pig.full.histgrowth.node.pdf
grep -ie 'Sscrofa' pig.full.paths.txt > pig.full.paths.Sscrofa.txt
RUST_LOG=info panacus histgrowth -t 16 -c bp -q 0,1,0.5,0.1 -e pig.full.paths.Sscrofa.txt -S -s pig.full.paths.haplotypes.txt pig.full.gfa > pig.full.histgrowth.noref.bp.txt
panacus-visualize pig.full.histgrowth.noref.bp.txt > pig.full.histgrowth.noref.bp.pdf
from panacus.
Thanks, I'll have a look at it now.
from panacus.
Regarding the analysis on the hprc-v1.0-mc-grch38.gfa
graph:
The plot looks fine, except that it seems the GRCh38 paths are still included. If you followed the above code, I suppose that's my mistake, because in this graph, the GRCh38 paths are encoded in P-lines, which I did not extract into the paths file. My guess is that your current file hprc-v1.0-mc-grch38.paths.txt
is empty.
Here is the code to correct for that. If you re-run panacus with the updated file, your graph should look more what you expected:
grep '^P' hprc-v1.0-mc-grch38.gfa | cut -f2 > hprc-v1.0-mc-grch38.paths.txt
grep -e '^W' hprc-v1.0-mc-grch38.gfa | cut -f2-6 | awk '{ print $1 "#" $2 "#" $3 ":" $4 "-" $5 }' >> hprc-v1.0-mc-grch38.paths.txt
grep -ie 'grch38' hprc-v1.0-mc-grch38.paths.txt > hprc-v1.0-mc-grch38.paths.grch38.txt
Now, regarding the analysis on your pig data set:
Again, the plots and tables look just fine, except for pig.full.histgrowth.noref.bp.pdf
where I expected less amount of sequence, since it shouldn't include the reference. I suspect that here the issue is the same as with the HPRC data set, i.e., the file with the reference paths is empty. Is this also your concern, or is there something else that you do not find plausible with the results?
from panacus.
Btw, I'd recommend adding -a
to the panacus histrowth
calls, so that you also get the coverage histograms.
from panacus.
Again, the plots and tables look just fine, except for
pig.full.histgrowth.noref.bp.pdf
where I expected less amount of sequence, since it shouldn't include the reference. I suspect that here the issue is the same as with the HPRC data set, i.e., the file with the reference paths is empty. Is this also your concern, or is there something else that you do not find plausible with the results?
Ah scratch that. I also can't find any fault in pig.full.histgrowth.noref.bp.pdf
, the amount of sequence shown seems to be much smaller than the total genome length, so it looks good to me. If you want to discuss the plot in more detail, let's do this via mail ([email protected])
from panacus.
Dear developer:
The problem is "produce a plot quantifying the non-reference sequence of the graph",
The problem we have now is that we are having some problems drawing non-reference sequences, and the results don't look very good, how do we solve this?
from panacus.
What makes you believe that "the results don't look very good"? I'm looking at the plot pig.full.histgrowth.noref.bp.pdf
and I see that the sequence of the first genome is at about 100Mbp. If you expect the genome to have an average length of 2.7Gb, then this indicates to me that a lot of sequence is excluded, so (for all I can see, given the data that you provided) the filter for non-reference sequence has worked.
from panacus.
However, I found that the non-reference sequence would reach 1.5Gb, which I think is unlikely, logically at 300Mb
from panacus.
Ok, that's a good starting point. Can you quantify the total length (in bp) of your graph?
from panacus.
I uploaded our pig data, could you please help us see what is wrong with it.
Yes, that's what I am trying to do, but the data that you provide is not sufficient to see where the issue is. Can you please also provide the file paths file to me?
grep '^P' pig.full.gfa | cut -f2 > pig.full.paths.txt
grep -e '^W' pig.full.gfa | cut -f2-6 | awk '{ print $1 "#" $2 "#" $3 ":" $4 "-" $5 }' >> pig.full.paths.txt
code: grep -ie 'Sscrofa' pig.full.paths.txt > pig.full.paths.Sscrofa.txt RUST_LOG=info panacus histgrowth -t 16 -c bp -q 0,1,0.5,0.1 -e pig.full.paths.Sscrofa.txt -S -s pig.full.paths.haplotypes.txt pig.full.gfa > pig.full.histgrowth.noref.bp.txt panacus-visualize pig.full.histgrowth.noref.bp.txt > pig.full.histgrowth.noref.bp.pdf
pig.full.histgrowth.ref.bp.pdf pig.full.histgrowth.ref.bp.txt
The file names of your upload do not agree with the command that you typed.
from panacus.
Thanks. The paths names are all fine, panacus
recognizes that they are given in PanSN format. Again, there is no evidence that your analysis with panacus has failed. If you quantify the total length (bp) of your graph, and then subtract the length of your reference sequence (~2.5Gb?) you should be able to validate this independently.
from panacus.
We were delighted to hear that the data was fine, and actually what we wanted to do was to make a graph of generic sequence growth, so that it would be a better illustration of our pan-genome being more representative. Now we are using the GFA result of Minigraph-cactus as input, how can we do that? As you said above, the node count is correct. The question now is how to get a visual result right.
from panacus.
The question now is how to get a visual result right.
What's wrong with the visualization?
from panacus.
Dear developer:
I think I understand what you mean, now we are on the right track of visualization may not be a problem, thank you for your reply, I re-think our results, because our results involve content publishing issues, so I delete part of the results hope you can understand.
Thank you very much for your help.
from panacus.
I perfectly understand. Again, we can discuss things in private via email, if that helps.
from panacus.
Related Issues (20)
- Option to output plots in separate PNG files HOT 11
- Could you please make an new release? HOT 3
- command is not supported for more than 65534 HOT 6
- Request software updates HOT 6
- Feature request: Alternative plot with #nodes/#edges vs AC
- A problem while running panacus-visualize HOT 1
- Update Readme to reflect installation of python dependencies in the installation section. HOT 1
- compiler error in rustc-serialize HOT 3
- Haplotype labels in TSV, visualisations ? HOT 8
- panacus-visualize.py is overwhelmed by 1000 haplotypes HOT 10
- how is panacus treating Ns HOT 4
- Discrepancy between graph length, reference length, and novel base pairs HOT 3
- macOS binary missing HOT 1
- Calling `panacus hist` with `-s` but then not supplying the subset file cases no error
- coverage/quorum HOT 1
- what dose coverage and quorum mean๏ผ HOT 3
- Validity of results HOT 1
- Ordered growth for PGGB gfa file HOT 18
- After `panacus hist` and `panacus growth`, the final visualization will show `#nodes` instead of `bps`. I use `-c bp` for hist HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from panacus.