Comments (5)
Should contain:
p$phylo # or p$tree ? A phylo (ape) object
p$gg_tree() # A ggtree object
from pagoo.
I am really enjoying my time exploring how to use this R package and like the concept of your suggested enhancement. Would this be something that would provide functionality to say, order the rows of $gg_binmap
based on a tree? Thanks for your contributions to this space - so helpful!
from pagoo.
Hi @shigdon , I'm glad you found pagoo useful! Yes, that's exactly one of the applications I see of incorporating the phylo object to the pagoo classes. To add this field should be straight-forward, and I haven't done it because we have focused on getting it published (now finishing revisions, so we hope to announce the publication soon). The thing is if I should let this field alone, or to also make other methods "aware" as I said in the issue description. This last thing is much more difficult if I'd want to provide a stable and robust method, and I'm not sure to implement it because there's no obvious or natural way to do it, and will probably look very arbitrary. Still thinking, tho.
For now you can use the gheatmap
function (ggtree
package) to obtain a tree attached to a heatmap. Here I copy-paste an example I had, with some notes in case you want to try and adapt the recipe:
# Extract only the shell clusters from the panmatrix
pm <- p$pan_matrix[, as.character(p$shell_clusters$cluster) ]
pm[which(pm >= 1, arr.ind = TRUE)] <- 1L # If you just want to show presence/absence, not abundance
# The following is to order the columns to make the hearmap "pretty"
csm <- colSums(pm)
spl <- split(names(csm), csm)
tpm <- t(pm)
norder <- lapply(spl, function(x) {
if (length(x)<2){
x
}else{
d <- vegan::vegdist(tpm[x, , drop=F], method = "jaccard", binary = T, na.rm = T)
hc <- hclust(d, "single")
x[hc$order]
}
})
norder <- norder[order(as.integer(names(norder)), decreasing = T)]
forder <- unlist(norder)
pm <- pm[, forder, drop = F]
# Now transform matrix from integer to character
pm[which(pm == 1 , arr.ind = TRUE)] <- "Present"
pm[which(pm == 0, arr.ind = TRUE)] <- "Absent"
library(phangorn)
library(ggtree)
library(magrittr)
# Suppose you already have a `phylo` object:
tree <- midpoint(phylo) %>% # midpoint root
ggtree() %<+% # Create ggtree
as.data.frame(p$organisms) # Attach organism metadata
tree + geom_tippoint(aes(color = Host)) %>%
gheatmap(pm, colnames = F, color = NA) +
scale_fill_manual(breaks=c("Present", "Absent"),
values=c("darkblue", "white"),
name="genotype")
This method works if the selected clusters (columns) are not too many. I usually avoid showing also the core clusters as they would appear as a big square, but under the hood there are potentially hundreds or thousands of small tiles which slow down the rendering.
Let me know if this helps you.
Bests!
from pagoo.
Thank you very much for the great response @iferres! The approach worked very well. The only change I had to make was changing pm[which(pm == 1 , arr.ind = TRUE)] <- "Present"
to pm[which(pm >= 1 , arr.ind = TRUE)] <- "Present"
because I read in my p/a csv file from roary and some of the values were integers > 1. The pan genome I was working with had ~ 10,000 gene clusters total of which ~ 1000 made up the core. Just a detail in response to your comment that the method works if the selected clusters are not too many. This recipe worked great for me and hopefully other people will navigate here if they want to do something similar. Cheers!
from pagoo.
Great! Yes, the ggtree::gheatmap
function uses ggplot2
's geom_tile()
under the hood. It would probably be more efficient to use geom_raster()
instead, which is a high performance version of geom_tile()
, and would suit better these cases. In their defence, it wasn't implemented to render these type of huge arrays we are asking for :P
Bests!
from pagoo.
Related Issues (20)
- Re-assign genes to other clusters
- Implement $drop(hard = TRUE) HOT 1
- load_pangenomeRDS() fails with old third party objects HOT 1
- Improve internal ggplot2 functions handling
- Bug when `$add_metadata()` with missing key at the end
- Add Willenbrock et al. 2007 coregenome fit function
- panaroo_2_pagoo error HOT 6
- Remove metadata option HOT 3
- Cluster annotations do not match cluster names HOT 5
- Failed with error: ‘package ‘S4Vectors’ required by ‘pagoo’ could not be found’ HOT 14
- panaroo to pagoo, support for bakta annos HOT 29
- Is there a function pagoo_2_roary ? HOT 4
- roary_2_pagoo Error: subscript contains invalid names HOT 35
- gene IDs from Panaroo (1.3.2) presence/absence matrix file do not match IDs in GFF files HOT 6
- Reading gffs: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : row names contain missing values HOT 3
- how to add metadata in gene_presence_absence.csv file? HOT 1
- Failing to process gene_presence_absence.csv file
- Is it possible to run panaroo_2_pagoo without removing "refound_"? HOT 2
- Change Shell and Cloud levels HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pagoo.