talgalili / dendextend Goto Github PK
View Code? Open in Web Editor NEWExtending R's Dendrogram Functionality
Extending R's Dendrogram Functionality
Hi,
Is it possible to use different types of plot.phylo (e.g., "cladogram", "fan", "unrooted", "radial") using dendextend?
Example:
plot(as.phylo(hclust(dist(mtcars))),type="fan")
Hi Tal,
Right now, tanglegram supports assigning any desired colors to the connectors of two dendrograms by simply making a sufficient-length vector. This allows for some super interesting graphics to highlight differences in the tree like this
I wonder if it's possible to do the same with the connector edge weights as well, to even further highlight interesting connections (lwd parameter)?
After poking around in the tanglegram.dendrogram function, it seems like for colors, you check to see if the argument supplied is a single color, in which case you repeat it as many times as needed, or you treat the list of colors as a vector and index through it as you draw your arrows.
https://github.com/talgalili/dendextend/blob/master/R/tanglegram.R#L897
https://github.com/talgalili/dendextend/blob/master/R/tanglegram.R#L906
I'm sure something similar could be done with lwd, right?
Perhaps if I have some time I can look through and try to implement this.
HI Tal, First of all greetings for 2016!
Now a quick question, which I think may imply a bug in the docs or in the cutree
function. Something I often want to do is return the cluster ids for all individuals. I often need to return:
hc <- hclust(dist(USArrests), "ave")
cutree(hc, k=4)
plot(hc)
What I want can be achieved like this:
cutree(hc, k=4, order_clusters_as_data = F)[hc$labels]
But I can't seem to achieve it with any permutation of cutree arguments
# individuals in data order, cluster ids like stats::cutree
cutree(hc, k=4, order_clusters_as_data = T, sort_cluster_numbers = T)
# individuals in data order, cluster ids still identical to stats::cutree
cutree(hc, k=4, order_clusters_as_data = T, sort_cluster_numbers = F)
# individuals in dendrogram order, cluster ids in dendrogram order
cutree(hc, k=4, order_clusters_as_data = F, sort_cluster_numbers = T)
# individuals in dendrogram order, cluster ids like stats::cutree
cutree(hc, k=4, sort_cluster_numbers = F)
sort_cluster_numbers
:sort_cluster_numbers logical (TRUE). Should the resulting cluster id numbers be sorted? (default is TRUE in order to make the function compatible with cutree ) from stats, but it allows for sensible color order when using color_branches.
But I don't really understand that since
sort_cluster_numbers
has no effect when order_clusters_as_data = T
sort_cluster_numbers = F
when order_clusters_as_data = F
to give the same cluster ids as stats::cutree (the opposite of what I would expect from the help).Thanks for any insight! Greg Jefferis.
Hi Tal,
I am looking at the vignette and think the last part of the following example code set("nodes_col", c("orange", "black", "plum", "NA"))
should better be set("nodes_col", c("orange", "black", "plum", NA))
,
or else you will have a color with value "NA"
instead of NA
.
Zuguang
dend <- iris[1:30,-5] %>% dist %>% hclust %>% as.dendrogram %>%
set("branches_k_color", k=3) %>% set("branches_lwd", c(1.5,1,1.5)) %>%
set("branches_lty", c(1,1,3,1,1,2)) %>%
set("labels_colors") %>% set("labels_cex", c(.9,1.2)) %>%
set("nodes_pch", 19) %>% set("nodes_col", c("orange", "black", "plum", "NA"))
> attr(dend[[1]][[1]][[1]], "nodePar")
$lab.col
[1] "#CC476B"
$pch
[1] 19
$lab.cex
[1] 0.9
$col
[1] "NA"
Hello,
Thanks for that package, and for the very useful as.ggdend
function!
I think there is a little bug in it, tough:
ggd1 <- 1:50 %>%
dist() %>%
hclust %>%
as.dendrogram %>%
as.ggdend
returns Error in FUN(X[[i]], ...) : subscript out of bounds
.
It works fine when adding set("branches_lty", 1)
, as in the vignette.
ggd1 <- 1:50 %>%
dist() %>%
hclust %>%
as.dendrogram %>%
set("branches_lty", 1) %>%
as.ggdend
ggplot(ggd1)
As the other features of your vignette (branches_k_color
, branches_lwd
, labels_colors
, labels_cex
) seem to be optional, I wonder why this is not the case for branches_lty
?
I have attached the image of the dendrogram I have created. Is there a code in ddextend that can be used to mirror this image? So, I would have the questions on the left, and the tree extending to the left, as an inverse of what is seen here.
I've been looking through the ddextend manual for code to this effect, but have struggled to find any.
I noticed some layout glitches when trying to plot the iris example with the 'hanged' option in ggplot. Here the reproducible example:
dend <- iris[1:30,-5] %>% dist %>% hclust %>% as.dendrogram %>% hang.dendrogram(hang = 0.1) %>%
set("branches_k_color", k=3) %>% set("branches_lwd", c(1.5,1,1.5)) %>%
set("branches_lty", c(1,1,3,1,1,2)) %>%
set("labels_colors") %>% set("labels_cex", c(.9,1.2)) %>%
set("nodes_pch", 19) %>% set("nodes_col", c("orange", "black", "plum", NA))
# plot the dend in usual "base" plotting engine:
plot(dend, horiz=TRUE)
# Now let's do it in ggplot2 :)
ggd1 <- as.ggdend(dend)
library(ggplot2)
# the nodes are not implemented yet.
ggplot(ggd1) # reproducing the above plot in ggplot2 :)
and here's the output...it doesn't look very good :)
tnx for this wonderful package and have a nice day!
gabriele
Hello Tal and All,
I am looking to produce a tanglegram of two phylogenies of the same taxa with different datasets. I have already converted the phylogenies to dendrograms using DECIPHER.
I create an ordered character vector of colours based on tip label order on the left side phylogeny (I also reverse this as color_lines works from bottom to top). My problem is that the colours are not mapping accurately. I do think this is either due to the topology of one phylogeny/dendrogram or using the original phylogeny to generate the ordered vector (although I suspect not).
(note I have also tried ape cophyloplot and experience the same issue)
command : tanglegram(dendA, dendC, color_lines = x1)
Alternatively is there a method to colour associations lines similar to common_subtrees_color_lines, but involves specifying the subtrees/ clade in the left phylogeny (via the scale bar)??
Hope this makes sense.
Hi,
I have only recently started using the various dendextend features, and haven't yet come across something that I think would be useful for some users' applications:
When adjusting col, lwd or lty in branches_attr_by_labels; could you possibly add more flexibility as to how close to the root these features shall reach? I.e., allowing for the 'type' options in branches_attr_by_labels between 'all' and 'any' to also take an integer input that indicates the number of nodes away from the root until which the branch adjustment shall act?
This may enable highlighting of distances between leaves/clusters in a given dendrogram that don't connect through the root.
Best regards,
Max
The identify method doesn't appear to be working when horiz=T
. This is my first bug report, sorry if this isn't the right venue! Below is an example taken from the dendextend tutorial. When clicking on the graphics device, identify(*,horiz=T)
does not behave as expected, and it does not return a correct result.
require(dendextend)
require(colorspace)
data(iris)
d_iris <- dist(iris[,-5])
hc_iris <- hclust(d_iris)
dend_iris <- as.dendrogram(hc_iris)
iris_species <- rev(levels(iris[,5]))
dend_iris <- color_branches(dend_iris,k=3, groupLabels=iris_species)
labels_colors(dend_iris) <-rainbow_hcl(3)[sort_levels_values(as.numeric(iris[,5])[order.dendrogram(dend_iris)])]
labels(dend_iris) <- paste(as.character(iris[,5])[order.dendrogram(dend_iris)],"(",labels(dend_iris),")",sep = "")
dend_iris <- hang.dendrogram(dend_iris,hang_height=0.1)
dend_iris <- assign_values_to_leaves_nodePar(dend_iris, 0.5, "lab.cex")
par(mar = c(3,3,3,7))
plot(dend_iris,
main = "Clustered Iris dataset
(the labels give the true flower species)",
horiz = TRUE, nodePar = list(cex = .007))
legend("topleft", legend = iris_species, fill = rainbow_hcl(3))
identify(dend_iris,horiz=T)
## Not run:
library(dendextend)
set.seed(23235)
ss <- sample(1:150, 10 )
# Getting the dend dend
dend <- iris[ss,-5] %>% dist %>% hclust %>% as.dendrogram
dend %>% plot
dend %>%
branches_attr_by_labels(c("123", "126", "23", "29")) %>%
plot
dend %>%
branches_attr_by_labels(c("123", "126", "23", "29"), "all") %>%
plot # the same as above
dend %>%
branches_attr_by_labels(c("123", "126", "23", "29"), "any") %>%
plot
dend %>%
branches_attr_by_labels(c("123", "126", "23", "29"),
"any", "col", c("blue", "red")) %>% plot
dend %>%
branches_attr_by_labels(c("123", "126", "23", "29"),
"any", "lwd", c(4,1)) %>% plot
dend %>%
branches_attr_by_labels(c("123", "126", "23", "29"),
"any", "lty", c(2,1)) %>% plot
Hi,
I am not sure if it is the intended behavior. When you call cor_cophenetic on objects of type hclust, it does not take into account the labels of the two objects. This causes a different result in the two cases below.
#example
h15 <- c(1:5) %>% dist %>% hclust(method = "average")
h15_1 <- h15
h15_2 <- h15
h15_1$labels <- c(2, 5,3, 4, 1)
h15_2$labels <- c(4, 3, 5, 1, 2)
cor_cophenetic(h15_1,h15_2)
#always 1
cor_cophenetic(as.phylo(h15_1), as.phylo(h15_2))
#0.3125
cor_cophenetic(as.dendrogram(h15_1), as.dendrogram(h15_2))
#0.3125
tanglegram(h15_1 ,h15_2 )
When using rect.dendrogram to highlight a branch, the upper boundary lies overtop of the horizontal arm of the branch. This looks a bit awkward. It would be nice to be able to push the boundary up a little bit, so you could clearly see the horizontal branches inside the coloured box.
You already provide lower_rect to allow us to tidy up the bottom of the box. Could you add upper_rect for the other end?
Thanks for a great package!
When trying to both add color to the subtrees as well as change branch width the tanglegram function gives an error. Find below both a function that reproduces the error as well as the error message. When using one or the other option separately the function works fine.
tanglegram(Phylo1,Phylo2, common_subtrees_color_branches=T, edge.lwd=4)
Error in value[is.na(values)] <- "black" : object 'value' not found
In addition: Warning message:
In branches_attr_by_clusters(dend2, dend2_clusters, values = dend1_leaves_colors[ss], :
There are NA's in the colors used by branches_attr_by_clusters. This probably means a bug somewhere. The color was replaced by 'black', but make sure your code does what you wanted it to...
After running dend_diff
my par
is set for mfrow=c(2,1)
when I run other plots afterwards. Is there a way to restrict the scope of this that it either:
par
par(mfrow=c(1,1))
once it has completedI suspect this may make it difficult to use in a script. The documentation is ambiguous as to what "side by side" means. Perhaps it would be beneficial to clarify that these are separate plots, plotted horizontally.
Hi,
I recently came across the need to plot a dendrogram representing distance between certain strings and needed to plot the strings near each node.
I found the get_nodes_xy useful but it made me plot the characters one above the other and not as an intuitive string.
By any chance could you add an implemantation of get_nodes_xy to a horizontal dendrogram?
Thanks,
Barak
Hi,
I do not know if what is the probelm but after installing using the suggested script:
require2(devtools)
install_github('dendextend', 'talgalili')
require2(Rcpp)
install_github('dendextendRcpp', 'talgalili')
I can not run 3.7. Coloring branches example, the errors I get are:
Error: could not find function "color_branches"
Error: could not find function "rainbow_hcl"
Error: could not find function "hang.dendrogram"
Error: could not find function "assign_values_to_leaves_nodePar"
Error: unexpected symbol in "value recycled"
Error in legend("topleft", legend = iris_species, fill = rainbow_hcl(3)) :
could not find function "rainbow_hcl"
Any idea how to solve it?
Thanks.
So that cutree.hclust would be in R core, and I could use cutree.dendrogram.
library(dendextend)
by_labels_branches_col
dend <- mtcars %>% dist %>% hclust %>% as.dendrogram
dend <- dend %>% color_labels(k=4)
labels_colors(dend)
dend <- dend %>% set("leaves_col", labels_colors(dend))
plot(dend)
dend <- assign_values_to_leaves_edgePar(dend=dend, value = labels_colors(dend), edgePar = "col")
dend %>% plot
x <- c(1,1,1,2,2)
names(x) <- letters[1:5]
hc <- hclust(dist(x))
cutree(hc, 4) # how did it choose which leaf should be clustered?
plot(hc)
rect.hclust(hc, k = 4)
dendextend::cutree(as.dendrogram(hc), 4, try_cutree_hclust = FALSE)
checking re-building of vignette outputs ... NOTE
Error in re-building vignettes:
...
vignette('dendextend') for the package vignette.
You can execute a demo of the package via: demo(dendextend)
More information is available on the dendextend project web-site:
https://github.com/talgalili/dendextend/
Contact: <[email protected]>
Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
To suppress the this message use:
suppressPackageStartupMessages(library(dendextend))
Attaching package: 'dendextend'
The following object is masked from 'package:stats':
cutree
Warning in min(-diff(our_dend_heights)) :
no non-missing arguments to min; returning Inf
Quitting from lines 926-932 (introduction.Rmd)
Error: processing vignette 'introduction.Rmd' failed with diagnostics:
there is no package called 'DendSer'
Execution halted
Hi,
I tried to use the function as.ggdend(), but an error is generated when type = "triangle".
Please, find below a reproducible R code:
library(dendextend)
USArrests %>% scale %>% dist %>%
hclust %>% as.dendrogram %>%
as.ggdend(type = "triangle")
I have got the following error:
Error in $<-.data.frame
(*tmp*
, "pch", value = c(NA, NA, NA, NA, NA, :
replacement has 99 rows, data has 50
Session info ---------------------------------------------------------------------------------------------------------------- setting value version R version 3.2.4 (2016-03-10) system x86_64, darwin13.4.0 ui RStudio (0.99.491) language (EN) collate fr_FR.UTF-8 tz Europe/Paris date 2016-11-08 Packages -------------------------------------------------------------------------------------------------------------------- package * version date source assertthat 0.1 2013-12-06 CRAN (R 3.2.0) class 7.3-14 2015-08-30 CRAN (R 3.2.4) cluster 2.0.3 2015-07-21 CRAN (R 3.2.4) colorspace 1.2-7 2016-10-11 CRAN (R 3.2.5) dendextend * 1.3.0 2016-08-27 CRAN (R 3.2.5) DEoptimR 1.0-6 2016-07-06 CRAN (R 3.2.5) dichromat 2.0-0 2013-01-24 CRAN (R 3.2.0) digest 0.6.10 2016-08-02 CRAN (R 3.2.5) diptest 0.75-7 2015-06-08 CRAN (R 3.2.0) flexmix 2.3-13 2015-01-17 CRAN (R 3.2.0) fpc 2.1-10 2015-08-14 CRAN (R 3.2.0) ggplot2 * 2.1.0.9001 2016-10-18 Github (hadley/ggplot2@1709196) gtable 0.2.0 2016-02-26 CRAN (R 3.2.3) kernlab 0.9-25 2016-10-03 CRAN (R 3.2.5) labeling 0.3 2014-08-23 CRAN (R 3.2.0) lattice 0.20-33 2015-07-14 CRAN (R 3.2.4) lazyeval 0.2.0 2016-06-12 CRAN (R 3.2.5) magrittr 1.5 2014-11-22 CRAN (R 3.2.0) MASS 7.3-45 2015-11-10 CRAN (R 3.2.4) mclust 5.2 2016-03-31 CRAN (R 3.2.4) modeltools 0.2-21 2013-09-02 CRAN (R 3.2.0) munsell 0.4.3 2016-02-13 CRAN (R 3.2.3) mvtnorm 1.0-5 2016-02-02 CRAN (R 3.2.3) nnet 7.3-12 2016-02-02 CRAN (R 3.2.4) plyr 1.8.4 2016-06-08 CRAN (R 3.2.5) prabclus 2.2-6 2015-01-14 CRAN (R 3.2.0) RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.2.0) Rcpp 0.12.7 2016-09-05 CRAN (R 3.2.5) reshape2 1.4.2 2016-10-22 CRAN (R 3.2.5) robustbase 0.92-6 2016-05-31 CRAN (R 3.2.5) scales 0.4.0.9003 2016-10-18 Github (hadley/scales@d58d83a) stringi 1.1.2 2016-10-01 CRAN (R 3.2.5) stringr 1.1.0 2016-08-19 CRAN (R 3.2.5) tibble 1.2 2016-08-26 CRAN (R 3.2.5) trimcluster 0.1-2 2012-10-29 CRAN (R 3.2.0) whisker 0.3-2 2013-04-28 CRAN (R 3.2.0)
Best regards,
A.
My Rstudio crushes whenever i call tanglegram as follows
tanglegram(F,G)
where F and G are my phylo trees of 50 tips each.
Note: I made the trees ultrametric using chronos before this command, otherwise it complained that they weren't ultrametric.
I decided to use the native R in Mac and while loading the dendextend library, I get the error below:
library(dendextend)
*** caught segfault ***
address 0x18, cause 'memory not mapped'
Traceback:
1: dyn.load(file, DLLpath = DLLpath, ...)
2: library.dynam(lib, package, package.lib)
3: loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]])
4: asNamespace(ns)
5: namespaceImportFrom(ns, loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]), i[[2L]], from = package)
6: loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]])
7: namespaceImport(ns, loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]), from = package)
8: loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]])
9: namespaceImport(ns, loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]), from = package)
10: loadNamespace(package, lib.loc)
11: doTryCatch(return(expr), name, parentenv, handler)
12: tryCatchOne(expr, names, parentenv, handlers[[1L]])
13: tryCatchList(expr, classes, parentenv, handlers)
14: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys.call(-4L) dcall <- deparse(call)[1L] prefix <- paste("Error in", dcall, ": ") LONG <- 75L msg <- conditionMessage(e) sm <- strsplit(msg, "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") } else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L])) if (!silent && identical(getOption("show.error.messages"), TRUE)) { cat(msg, file = stderr()) .Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))})
15: try({ attr(package, "LibPath") <- which.lib.loc ns <- loadNamespace(package, lib.loc) env <- attachNamespace(ns, pos = pos, deps)})
16: library(dendextend)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspac
Any help?
Hello,
at first I have to thank you for this great package! I have one suggestion about 'by_labels_branches_col' can you add option to look up branches using regexp? Some of my branches have category name in their labels and this would really help me.
Thank you again, great work!
### Getting the hc object
iris_dist <- iris[,-5] %>% dist
hc <- iris_dist %>% hclust
# This is how it looks without any colors:
dend <- as.dendrogram(hc)
plot(dend)
clusters <- cutree(dend, 4)[order.dendrogram(dend)]
dend %>% branches_attr_by_clusters(clusters) %>% plot
Is is possible to have no tip labels when plotting a tanglegram?
Hello,
I am trying to install this package in an R server (red hat) that doesn't have internet connection.
R version 3.2.1
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6
I receive the following message:
Installing dendextend
'/usr/lib64/R/bin/R' --no-site-file --no-environ --no-save --no-restore
--quiet CMD INSTALL
'/tmp/Rtmpp7T5VK/devtools222533ea0528/dendextend-master'
--library='/usr/lib64/R/library' --install-tests
I am worried about the messages:
I am having a problem using the library d3heatmap which needs this one and I dont know whether these errors are causing a bad installation of the d3heatmap or not.
If you have any advice please let me know
Thanks
This can be implemented using the sample.dendrogram function...
Good example:
library(cluster)
set.seed(999)
iris2 <- iris[sample(x = 1:150,size = 50,replace = F),]
clust <- diana(iris2)
dend <- as.dendrogram(clust)
temp_col <- c("red", "blue", "green")[as.numeric(iris2$Species)]
temp_col <- temp_col[order.dendrogram(dend)]
temp_col <- factor(temp_col, unique(temp_col))
library(dendextend)
dend %>% color_branches(clusters = as.numeric(temp_col), col = levels(temp_col)) %>%
set("labels_colors", as.character(temp_col)) %>%
plot
I would like to:
Hello, Tal,
I'm trying to colour labels in the two dendrograms that contain terminal branches with zero height.
One of the dendrograms is coloured successfully, the other one fails with an error message from cutree_1k.dendrogram.
Does it happen because the first dendrogram contains the branches with zero height only with 2 leaves, while the other one has up to 4 leaves in such branches?
Could you suggest any solution?
With best regards,
Marina
Here is the link to the Rdata and the code
easy_mod <- color_labels(easy, col = easy_cols) # is coloured successfuly
par(mar = c(4, 2, 2, 10))
plot(easy_mod, horiz = T)
hard_mod <- color_labels(hard, col = hard_cols) # fails to be coloured
# Error in if (length(col) < k) { : missing value where TRUE/FALSE needed
# In addition: Warning messages:
#1: In cutree_1k.dendrogram(k = x, ...) :
# Couldn't cut the tree - returning NA.
#2: In cutree_1k.dendrogram(k = x, ...) :
# No cut exists for creating 92 clusters. The possible range for clusters is: [0-76]
To be able to address also the inner nodes, it would help to get the numbers of all nodes (i.e. the numbers you show in the vignette where you visualize the depth-first search), if possible together with the members at each node. (Is there a way to get the members of nodes, other than with cutree?) By those numbers (and based on the members they represent) then the nodes could be chosen for coloring etc., something like nodes_attr_by_nodenumbers...
I hope it makes sense.
tanglegram(dend2, dend1)
Loading required namespace: colorspace
Failed with error: ‘there is no package called ‘colorspace’’
It seems that "colorspace" is not imported in the package namespace, but it is required to load for the function to complete.
I am trying to plot a simple two columned dataframe and receive the following error:
Error: all(vapply(s, is.integer, NA)) is not TRUE
The dataframe I'm working with has two columns each with class = integer so I'm not seeing where this would come from. I have attached a subset of the df that still creates the issue.
load("example_df.Rda")
heatmaply(example_df)
Any help would be greatly appreciated. Below is version and session info.
R version:
platform x86_64-redhat-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 2.3
year 2015
month 12
day 10
svn rev 69752
language R
version.string R version 3.2.3 (2015-12-10)
nickname Wooden Christmas-Tree
SessionInfo
R version 3.2.3 (2015-12-10)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringi_1.1.2 heatmaply_0.6.0 viridis_0.3.4 plotly_4.5.6
[5] ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] gtools_3.5.0 modeltools_0.2-21 purrr_0.2.2 kernlab_0.9-25
[5] reshape2_1.4.1 lattice_0.20-34 colorspace_1.2-7 htmltools_0.3.5
[9] stats4_3.2.3 viridisLite_0.1.3 yaml_2.1.13 base64enc_0.1-3
[13] DBI_0.5-1 prabclus_2.2-6 RColorBrewer_1.1-2 registry_0.3
[17] fpc_2.1-10 foreach_1.4.3 plyr_1.8.4 robustbase_0.92-6
[21] stringr_1.1.0 munsell_0.4.3 gtable_0.2.0 htmlwidgets_0.7
[25] caTools_1.17.1 mvtnorm_1.0-5 codetools_0.2-15 labeling_0.3
[29] seriation_1.2-1 flexmix_2.3-13 class_7.3-14 DEoptimR_1.0-6
[33] trimcluster_0.1-2 Rcpp_0.12.7 KernSmooth_2.23-15 scales_0.4.1
[37] diptest_0.75-7 gdata_2.17.0 jsonlite_1.1 gplots_3.0.1
[41] gridExtra_2.2.1 digest_0.6.10 gclus_1.3.1 dplyr_0.5.0
[45] grid_3.2.3 tools_3.2.3 bitops_1.0-6 magrittr_1.5
[49] lazyeval_0.2.0 tibble_1.2 cluster_2.0.5 whisker_0.3-2
[53] tidyr_0.6.0 dendextend_1.3.0 MASS_7.3-45 assertthat_0.1
[57] httr_1.2.1 iterators_1.0.8 R6_2.2.0 TSP_1.1-4
[61] mclust_5.2 nnet_7.3-12
library(dendextend)
hc <- hclust(dist(USArrests[1:5,]),"ave")
This gives out an error:
Error in as.hclust.dendrogram(x, ...) :
dendrogram entries must be 1,2,..,3 (in any order), to be coercible to "hclust"
While of course pruning leaves as dendrogram obviously work, coercing the resulting dendrogram to hclust don't work, either:
dendro <- as.dendrogram(hc)
dendro.pruned<-prune(dendro, c("Alaska","California"))
as.hclust(dendro.pruned)
Error in as.hclust.dendrogram(dendro.pruned) :
dendrogram entries must be 1,2,..,3 (in any order), to be coercible to "hclust"
Looking at the code for stats::as.hclust
, it seems the issue is the leaves' ID were not updated after pruning, like this:
> str(unclass(dendro.pruned))
List of 2
$ : atomic [1:1] 4
..- attr(*, "members")= int 1
..- attr(*, "height")= num 0
..- attr(*, "label")= chr "Arkansas"
..- attr(*, "leaf")= logi TRUE
$ :List of 2
..$ : atomic [1:1] 3
.. ..- attr(*, "label")= chr "Arizona"
.. ..- attr(*, "members")= int 1
.. ..- attr(*, "height")= num 0
.. ..- attr(*, "leaf")= logi TRUE
..$ : atomic [1:1] 1
.. ..- attr(*, "label")= chr "Alabama"
.. ..- attr(*, "members")= int 1
.. ..- attr(*, "height")= num 0
.. ..- attr(*, "leaf")= logi TRUE
..- attr(*, "members")= num 2
..- attr(*, "midpoint")= num 0.5
..- attr(*, "height")= num 52.6
- attr(*, "members")= num 3
- attr(*, "midpoint")= num 0.75
- attr(*, "height")= num 82.6
Which makes some items in unlist(dendro.pruned)
to be greater than nleaves(dendro.pruned)
, which in turn fails the first few lines of as.hclust.dendrogram:
as.hclust.dendrogram <- function(x, ...)
{
stopifnot(is.list(x), length(x) == 2)
n <- length(ord <- as.integer(unlist(x)))
iOrd <- sort.list(ord)
if(!identical(ord[iOrd], seq_len(n)))
stop(gettextf(
"dendrogram entries must be 1,2,..,%d (in any order), to be coercible to \"hclust\"",
n), domain=NA)
To fix this, I think, would need to fix the leave indices such that they are within [1, nleaves(dendro.pruned)
].
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dendextend_1.4.0
loaded via a namespace (and not attached):
[1] flexmix_2.3-13 Rcpp_0.12.9 cluster_2.0.5 whisker_0.3-2 magrittr_1.5 fpc_2.1-10
[7] MASS_7.3-45 munsell_0.4.3 mclust_5.2.2 colorspace_1.3-2 lattice_0.20-34 plyr_1.8.4
[13] prabclus_2.2-6 tools_3.3.2 nnet_7.3-12 grid_3.3.2 gtable_0.2.0 modeltools_0.2-21
[19] class_7.3-14 lazyeval_0.2.0 assertthat_0.1 tibble_1.2 gridExtra_2.2.1 kernlab_0.9-25
[25] trimcluster_0.1-2 ggplot2_2.2.1 viridis_0.3.4 robustbase_0.92-7 DEoptimR_1.0-8 scales_0.4.1
[31] diptest_0.75-7 stats4_3.3.2 mvtnorm_1.0-5
I ran into problems with cutree, when I had a tree with zero distance leafs.
Here is a minimal example that reproduces the problem:
dend<-as.dendrogram(hclust(dist(c(1,1,1,2,2))))
cutree(dend,k=5)
[1] 0 0 0 0 0
I expected the result to be:
cutree(hclust(dist(c(1,1,1,2,2))),k=5)
[1] 1 2 3 4 5
I figured out that it has to do with the function heights_per_k.dendrogram
.
It results in negative values for h:
heights_per_k.dendrogram(dend)
1 2 0
1.5 0.5 -0.5
Negative height reproduced the results, that I got before:
cutree(dend,h=-0.5)
[1] 0 0 0 0 0
There is a call to library(ape)
in collapse_branch.Rd examples.
As the ape
is suggested dependency, it should not be required to run examples according to R-exts. This can be conditionally escaped with requireNamespace
.
It is possible there are other cases of that issue in other examples, this one was to first that fails due to not having ape
pkg.
Hello,
Thanks for the quick fix of issue #12!
I have another issue related to as.ggdend
: is there any way to use the rect.dendrogram()
function within the ggplot paradigm? As rect.dendogram
returns a list of k elements and not a dendrogram, it cannot be passed as parameter of as.ggdend
.
Regards
Hi,
I'm using the function chclust
from the package rioja
to do constrained HC. The order of individuals is forced to a quantitative variable. When plotting, the option xvar
sets the "x-coordinates for the leaves of the dendrogram". Here is an example:
library(tidyverse)
library(rioja)
data <- mtcars %>% arrange(mpg) %>% distinct(mpg, .keep_all = TRUE)
x_coord <- data$mpg
data <- data %>% select(-mpg)
clustering <- data %>% sqrt() %>% dist() %>% chclust()
plot(clustering, hang = -1, xvar = x_coord)
Is it possible to achieve similar scaling with dendextend
? Thanks for your help!
From Kurt:
Unfortunately, the above is incompatible with the CRAN Policy which has
The best I can think of is for dendextendRcpp to provide its own labels
generic, with default method calling base::labels, and its own method
for dendrogram. (Not sure if this will work, though.)
I want to color the leaves of the dendrogram by a vector with the same length of the data, e.g. clustering labels from hierarchical clustering via cutree()
:
library(MASS)
library(dendextend)
library(dplyr)
library(ggplot2)
data(Cars93)
cars_cont <- select(Cars93, Price, MPG.city, MPG.highway, EngineSize,
Horsepower, RPM, Fuel.tank.capacity, Passengers,
Length, Wheelbase, Width, Turn.circle, Weight)
hc <- cars_cont %>% scale %>% dist %>% hclust
dend <- hc %>% as.dendrogram
dend %>% set("labels_colors", cutree(hc, k = 4)) %>% ggplot
But the behavior here is not as expected. That is, the colors of the leaves under a given branch of the dendrogram should all be the same color (corresponding to the four-cluster solution), but they appear to be displayed randomly.
Additionally, setting the leaf labels with a length-n vector does not appear to work properly either. For example, dend %>% ggplot
shows that the first three default leaf labels, moving from left to right, are 31, 80, and 42 -- presumably corresponding to rows 31, 80, and 42 in the dataset.
The types of cars for these indices are:
> Cars93$Type[c(31, 80, 42)]
[1] Small Small Small
But when setting the labels with the Type
vector, the appropriate labels do not appear in these positions in the dendrogram. For example, the labels in the corresponding positions after setting the labels according to the car Type
are Small Midsize Compact
using the following code:
dend %>% set("labels", Cars93$Type) %>% ggplot
Both of these issues appear to be because the set() function does not take into account the $order
of the hclust
object. When I use the following code, everything works:
dend %>% set("labels_colors", cutree(hc, k = 4)[hc$order]) %>% ggplot
dend %>% set("labels", Cars93$Type[hc$order]) %>% ggplot
Is it possible to make it so set() takes into account the $order
of the hclust
object, in order to prevent this counterintuitive labeling style?
Also, after reading through the documentation, it's not immediately obvious if the order.hclust()
or order.dendrogram()
functions attempt to fix these issues already. Is it possible that could you update the documentation here with some examples of the interplay of these functions and set()
? (Or did I miss an example?)
Thanks!
-SV
Position of text labels by the text =
argument in rect.dendrogram
is on the horizontal axis rather than the vertical axis when horiz = TRUE
.
dend15 <- c(1:5) %>% dist %>% hclust(method = "average") %>% as.dendrogram
par(mar=c(5, 5, 5, 5))
dend15 %>% plot(main="dend15")
dend15 %>% rect.dendrogram(k=3,
border = 8, lty = 5, lwd = 2, text = as.roman(1:3))
dend15 %>% plot(main="dend15", horiz = TRUE)
dend15 %>% rect.dendrogram(k=3, horiz = TRUE,
border = 8, lty = 5, lwd = 2, text = as.roman(1:3))
I'm not sure if this bug is related to dendextend rather than arules itself. It looks like there is a masking glitch here or there...
Anyway, here's how to reproduce the defect:
require(arules)
a_matrix <- matrix(c(
1,1,1,0,0,
1,1,0,0,0,
1,1,0,1,0,
0,0,1,0,1,
1,1,0,1,1
), ncol = 5)
## set dim names
dimnames(a_matrix) <- list(c("a","b","c","d","e"),
paste("Tr",c(1:5), sep = ""))
a_matrix
## this works
trans1 <- as(a_matrix, "transactions")
## this mess up the whole thing
require(dendextend)
trans2 <- as(a_matrix, "transactions")
hope this help!
regards,
gabriele
Once a dendrogram has a branch with both a line type AND a color (which is a character color), the plot.dendrogram function will not plot and return an error.
This is because I should have edgePar hold a list.
This e-mail includes an example, and what I think a solution might be.
install.packages('dendextend')
library('dendextend')
dend <- 1:2 %>% dist %>% hclust %>% as.dendrogram
plot(dend) # works fine
dend %>% set("branches_lty", 1:2) %>% plot # works fine
dend %>% set("branches_col", 1:2) %>% plot # works fine
dend %>% set("branches_col", as.character(1:2)) %>% plot # works fine
dend %>% set("branches_lty", 1:2) %>% set("branches_col", as.character(1:2)) %>% plot
dend %>% set("branches_lty", 1:2) %>% set("branches_col", as.character(1:2)) %>%
unclass %>% str
Similar title to the tanglegram issue, I was wondering if its possible to add support for variable length line widths for individual branches. An example of what I mean by this is shown on this page from this library in Python's scikit-learn module, in the "condensed tree" plot. The line widths steadily get smaller as they go down the branch, but the top horizontal 'split' lines remain the same line width.
Implement this:
https://stackoverflow.com/questions/38034663/rotate-labels-for-ggplot-dendrogram/38038719
The code to change is here:
https://github.com/talgalili/dendextend/blob/master/R/ggdend.R
These could be based on NbClust:
library(NbClust)
dd <- dist(iris[,-5])
hc <- hclust(dd, "ave")
plot(hc)
dd <- cophenetic(hc)
indexes <- c("frey", "mcclain", "cindex", "silhouette", "dunn")
for(i in indexes) {
print(i)
a = print(
NbClust(diss = dd , distance = NULL, method = "complete", index = i)
)
}
a = NbClust(diss = dd , distance = NULL, method = "complete", index = "cindex")
plot(-a$All.index~ as.numeric(names(a$All.index)), type = "b")
The median of all methods could be taken as a decent estimator.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.