Giter Club home page Giter Club logo

treedist's Introduction

TreeDist

Project Status: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows. codecov CRAN Status Badge CRAN Downloads DOI

'TreeDist' is an R package that implements a suite of metrics that quantify the topological distance between pairs of unweighted phylogenetic trees. It also includes a simple 'Shiny' application to allow the visualization of distance-based tree spaces, and functions to calculate the information content of trees and splits.

'TreeDist' primarily employs metrics in the category of 'generalized Robinson–Foulds distances': they are based on comparing splits (bipartitions) between trees, and thus reflect the relationship data within trees, with no reference to branch lengths.

Generalized RF distances

The Robinson-Foulds distance simply tallies the number of non-trivial splits (sometimes inaccurately termed clades, nodes or edges) that occur in both trees – any splits that are not perfectly identical contribute one point to the distance score of zero, however similar or different they are. By overlooking potential similarities between almost-identical splits, this conservative approach has undesirable properties.

'Generalized' RF metrics generate matchings that pair splits in one tree with similar splits in the other. Each pair of splits is assigned a similarity score; the sum of these scores in the optimal matching then quantifies the similarity between two trees.

Different ways of calculating the the similarity between a pair of splits lead to different tree distance metrics, implemented in the functions below:

  • MutualClusteringInfo(), SharedPhylogeneticInfo()

    Smith (2020) scores matchings based on the amount of information that one partition contains about the other. The Mutual Phylogenetic Information assigns zero similarity to split pairs that cannot both exist on a single tree; The Mutual Clustering Information metric is more forgiving, and exhibits more desirable behaviour; it is the recommended metric for tree comparison. (Its complement, ClusteringInfoDistance(), returns a tree distance.)

    Introduction to the Clustering Information Distance

  • NyeSimilarity()

    Nye et al. (2006) score matchings according to the size of the largest split that is consistent with both of them, normalized against the Jaccard index. This approach is extended by Böcker et al. (2013) with the Jaccard-Robinson-Foulds metric (function JaccardRobinsonFoulds()).

  • MatchingSplitDistance()

    Bogdanowicz and Giaro (2012) and Lin et al. (2012) independently proposed counting the number of 'mismatched' leaves in a pair of splits. MatchingSplitInfoDistance() provides an information-based equivalent (Smith 2020).

The package also implements the variation of the path distance proposed by Kendal and Colijn (2016) (function KendallColijn()), approximations of the Nearest-Neighbour Interchange (NNI) distance (function NNIDist(); following Li et al. (1996)), and calculates the size (function MASTSize()) and information content (function MASTInfo()) of the Maximum Agreement Subtree.

For an implementation of the Tree Bisection and Reconnection (TBR) distance, see the package 'TBRDist'.

Installation

Install and load the library from CRAN as follows:

install.packages('TreeDist')
library('TreeDist')

You can install the development version of the package with:

if(!require("curl")) install.packages("curl")
if(!require("remotes")) install.packages("remotes")
remotes::install_github("ms609/TreeDist")

Tree space analysis

Construct tree spaces and readily visualize projected landscapes, avoiding common analytical pitfalls (Smith, 2022), using the inbuilt graphical user interface (Shiny GUI):

TreeDist::MapTrees()

image

Serious analysts should consult the vignette for a command-line interface.

Documentation

See also

Other R packages implementing tree distance functions include:

  • 'ape':
    • cophenetic.phylo(): Cophenetic distance
    • dist.topo(): Path (topological) distance, Robinson-Foulds distance.
  • 'phangorn'
    • treedist(): Path, Robinson-Foulds and approximate SPR distances.
  • 'Quartet': Triplet and Quartet distances, using the tqDist algorithm.
  • 'TBRDist': TBR and SPR distances on unrooted trees, using the 'uspr' C library.
  • 'treespace': Kendall-Colijn distance and tree space visualizations.
  • 'distory' (unmaintained): Geodesic distance

References

Please note that the 'TreeDist' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

treedist's People

Contributors

actions-user avatar ms609 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

treedist's Issues

Comparing trees with non-identical tips

Thanks so much for the amazing package, and particularly the incredible documentation (could be a book??).

The docs suggest that we drop an issue if we have a use-case for comparing trees with non-identical tips, so here I am.

Use case

In phylogenomics we often sample 1000's of genes from our taxa of interest, and typically we are missing 1 or more taxa from most genes. For reference, here's a real-world example of the number of taxa in each gene tree from a published dataset of 8295 genes:

image

Since taxa are missing ~randomly, most of the cases with <100% of taxa will have non-overlapping taxon sets. This dataset is fairly representative. 16% of genes are sampled in all taxa, the rest are not. A good first approximation is that there are likely to be ~80%, or roughly 6500 different taxon sets.

I'd say that this is now very common (near universal) in modern phylogenomic studies. And most empiricist would love to be able to explore these tree sets in detail.

Useful things

The most general would be to get a matrix of normalised pairwise distances. E.g. using any suitably normalised distance metric, this should produce meaningful comparisons across all trees. This would also (I assume, maybe wrong?) allow for the visualisation of such tree sets. This seems to fit well within the remit of the package, while the next two perhaps don't.

Another useful thing would be the number of unique trees, using perhaps with options for what is meant by unique, e.g.: (i) strictly unique such that different taxon sets means unique; (ii) unique in the sense of non-conflicting (e.g. RF == 0 after reducing both trees to the common taxon set). Combined with this, grouping the trees into their unique sets would be useful.

Another thing (again I think beyond the purview of TreeDist, but I mention it in case this is something that may exist as an internal data structure of e.g. an RF calculation) is information on the observed splits in the data. I don't really know how one handles ambiguous splits in this case (e.g. a split on a tree with 42 taxa may be congruent with a large number of possible splits on the full tree of 52 taxa). One option would be to simply distribute the weight of these splits (i.e. a total weight of 1) over all possible splits with which they are congruent. Though perhaps this is too silly. The general point here is that users likely want to know which splits are common in their gene trees, and whether the common splits are all represented on their tree of interest (e.g. a species tree). Related work is on gene concordance factors, which are a summary statistic for this, but can still miss a lot of useful information about gene trees that are discordant with the species tree.

Hypervolume comparison in app

Either compare cluster hypervolumes using "hypervolume", or (better still?) discover/invent a measure of overlap based on distances alone.

`Plot3` documentation

Check that this function is up to scratch before 2.1.0 release.

Include test coverage

Arboreal matchings not tested

Update documentation: Arboreal matchings are permitted, for reasons of computational efficiency, but non-coherent matchings may be prohibited.

LAPJV with non-square matrices

Code is ready in cpp's lapjv, but call is prevented in R's LAPJV.

Can we send non-square matrices without triggering a seg fault?

MapTrees() with multiple batches

sld/mk'/mk3 trees don't plot when MST is visible; batches can't be added. CID seems to be a particular problem - plotting happens ok with PID.

Trees with different leaves:

Hi,

Is it possible to analyze trees with different leaf labels? I am interested in the general architecture of the tree rather than the identity of the individuals within...

Thanks,
Christina

Warn when tips don't match?

It's potentially confusing when distances of zero are computed with no message, e.g. where tree 1 contains underscores and tree 2 spaces.
Perhaps throw warning when comparing trees with different leaves.

Internalize multi-tree comparisons in C++

When comparing all pairs of trees, we could attain faster results by:

  • Loading all trees into C++ and converting to split lists once (rather than for each pair)
  • Storing a sorted list of splits alongside a list of their properties
    • Use a k-way merge to produce a single index of all unique splits
    • Each tree will then be represented as a series of links to splits
    • Each unique split can have its properties (in_split) calculated and stored once
    • Also possible to compare all pairs of splits once -- if this doesn't consume too much memory.

VisualizeMatching

Hello,

I'm trying to compare two trees with the following command

VisualizeMatching(JaccardRobinsonFoulds, S16, Core_2) results in:
Error in edge.width[se] <- 1 + (10 * ns) :
NAs are not allowed in subscripted assignments

Any help in fixing this error would be helpful.

Below are the phylo trees.

S16: ((((GCF_002862005_1_ASM286200v1_genomic:0.0,GCF_002861945_1_ASM286194v1_genomic:0.0,GCF_000213955_1_ASM21395v1_genomic:0.0,GCF_013315085_1_ASM1331508v1_genomic:0.0,GCF_002861975_1_ASM286197v1_genomic:0.0,GCF_013315025_1_ASM1331502v1_genomic:0.0,GCF_002861965_1_ASM286196v1_genomic:0.0,GCF_002862015_1_ASM286201v1_genomic:0.0,GCF_013315045_1_ASM1331504v1_genomic:0.0):0.000000006,((GCF_000414525_1_ASM41452v1_genomic:0.002230831,(((GCF_001546445_1_ASM154644v1_genomic:0.0,GCF_013315115_1_ASM1331511v1_genomic:0.0):0.000000005,((GCF_002861905_1_ASM286190v1_genomic:0.0,GCF_000414605_1_ASM41460v1_genomic:0.0,GCF_000414665_1_ASM41466v1_genomic:0.0,GCF_000414585_1_ASM41458v1_genomic:0.0):0.000000005,GCF_002861885_1_ASM286188v1_genomic:0.002230870)0.966:0.006820446)0.969:0.009438344,((GCF_003426565_1_ASM342656v1_genomic:0.0,piotii_GCF_003397585_1_ASM339758v1_genomic:0.0):0.000000005,(((GCF_000414545_1_ASM41454v1_genomic:0.003992352,(GCF_003408835_1_ASM340883v1_genomic:0.0,GCF_000414505_1_ASM41450v1_genomic:0.0):0.000000005)0.909:0.008040404,(GCF_003397615_1_ASM339761v1_genomic:0.000000005,(GCF_000414565_1_ASM41456v1_genomic:0.003708023,(GCF_000414625_1_ASM41462v1_genomic:0.016167672,(GCF_000414485_1_ASM41448v1_genomic:0.0,GCF_000414425_1_ASM41442v1_genomic:0.0):0.000000005)0.000:0.000000005)0.000:0.000000006)0.928:0.000000005)0.948:0.018701881,(GCF_001546455_1_ASM154645v1_genomic:0.048525006,(GCF_001563665_1_ASM156366v1_genomic:0.116224039,((GCF_003408775_1_ASM340877v1_genomic:0.000000005,GCF_002884775_1_ASM288477v1_genomic:0.015930854)0.913:0.012725504,(((GCF_001953155_1_ASM195315v1_genomic:0.001988946,(GCF_013315145_1_ASM1331514v1_genomic:0.000000005,((GCF_003397745_1_ASM339774v1_genomic:0.0,GCF_003408815_1_ASM340881v1_genomic:0.0,swidsinskii_GCF_003397705_1_ASM339770v1_genomic:0.0):0.000000005,(GCF_000025205_1_ASM2520v1_genomic:0.000000005,(GCF_002884855_1_ASM288485v1_genomic:0.0,GCF_002884875_1_ASM288487v1_genomic:0.0):0.001978715)0.931:0.003992623)0.000:0.000000005)0.469:0.000000006)0.885:0.005473514,(GCF_013315195_1_ASM1331519v1_genomic:0.002012914,((GCF_002861125_1_ASM286112v1_genomic:0.0,GCF_013315125_1_ASM1331512v1_genomic:0.0,GCF_013315255_1_ASM1331525v1_genomic:0.0,GCF_003397635_1_ASM339763v1_genomic:0.0,GCF_002861145_1_ASM286114v1_genomic:0.0):0.006168310,leopoldii_GCF_003293675_1_ASM329367v1_genomic:0.001998855)0.781:0.002009527)0.871:0.004589922)0.934:0.014500618,(GCF_003408845_1_ASM340884v1_genomic:0.033674519,((GCF_000414465_1_ASM41446v1_genomic:0.0,GCF_000414445_1_ASM41444v1_genomic:0.0):0.001943013,GCF_001546485_1_ASM154648v1_genomic:0.000000005)1.000:0.047170589)0.714:0.015802120)0.924:0.021101316)0.654:0.021430364)0.995:0.072887040)0.435:0.003580847)0.278:0.005651482)0.892:0.005483145)0.884:0.005371297)0.793:0.000000005,GCF_003408785_1_ASM340878v1_genomic:0.004504993)0.849:0.002683606)0.000:0.000000005,(GCF_001660735_1_ASM166073v1_genomic:0.0,GCF_013315075_1_ASM1331507v1_genomic:0.0):0.000000005)0.932:0.005409354,(GCF_002861165_1_ASM286116v1_genomic:0.0,GCF_001660755_1_ASM166075v1_genomic:0.0):0.001866731,((GCF_000414645_1_ASM41464v1_genomic:0.000000005,(GCF_900637625_1_52295_C01_genomic:0.0,GCF_000414685_1_ASM41468v1_genomic:0.0,GCF_001042655_1_ASM104265v1_genomic:0.0,GCF_003397665_1_ASM339766v1_genomic:0.0,GCF_003408745_1_ASM340874v1_genomic:0.0,GCF_000159155_2_ASM15915v2_genomic:0.0,GCF_000178355_1_ASM17835v1_genomic:0.0,GCF_013315005_1_ASM1331500v1_genomic:0.0):0.000000005)0.489:0.000000005,((GCF_003585655_1_ASM358565v1_genomic:0.0,GCF_000414705_1_ASM41470v1_genomic:0.0,GCF_003812765_1_ASM381276v1_genomic:0.0,GCF_003585755_1_ASM358575v1_genomic:0.0):0.000000005,GCF_003397605_1_ASM339760v1_genomic:0.004518743)0.000:0.000000005)0.748:0.000000005);

Core_2:
(GCF_013315005_1_ASM1331500v1_genomic:0.029801777,((GCF_002861945_1_ASM286194v1_genomic:0.000000005,GCF_013315085_1_ASM1331508v1_genomic:0.000031313)1.000:0.021319502,(GCF_000159155_2_ASM15915v2_genomic:0.000031236,(GCF_000178355_1_ASM17835v1_genomic:0.000438654,(GCF_001042655_1_ASM104265v1_genomic:0.000062623,GCF_900637625_1_52295_C01_genomic:0.000031311)0.387:0.000000005)0.928:0.000094032)1.000:0.021525169)1.000:0.008877453,((((GCF_003585655_1_ASM358565v1_genomic:0.037400199,(((GCF_001563665_1_ASM156366v1_genomic:0.652791246,(((GCF_002884775_1_ASM288477v1_genomic:0.079035837,GCF_003408775_1_ASM340877v1_genomic:0.090527862)1.000:0.099122682,((leopoldii_GCF_003293675_1_ASM329367v1_genomic:0.010984029,(GCF_003397635_1_ASM339763v1_genomic:0.010125736,((GCF_002861125_1_ASM286112v1_genomic:0.000000005,GCF_002861145_1_ASM286114v1_genomic:0.000031301)1.000:0.009693544,((GCF_013315125_1_ASM1331512v1_genomic:0.0,GCF_013315255_1_ASM1331525v1_genomic:0.0):0.012966743,GCF_013315195_1_ASM1331519v1_genomic:0.013648178)1.000:0.005988651)1.000:0.004698554)0.990:0.005500145)1.000:0.069052794,((GCF_002884855_1_ASM288485v1_genomic:0.000062668,GCF_002884875_1_ASM288487v1_genomic:0.000000005)1.000:0.025579950,(((GCF_013315145_1_ASM1331514v1_genomic:0.019646990,swidsinskii_GCF_003397705_1_ASM339770v1_genomic:0.029512978)1.000:0.013756732,GCF_003408815_1_ASM340881v1_genomic:0.023520587)0.995:0.005875475,(GCF_003397745_1_ASM339774v1_genomic:0.030659080,(GCF_000025205_1_ASM2520v1_genomic:0.025213714,GCF_001953155_1_ASM195315v1_genomic:0.028312469)1.000:0.011831021)0.833:0.005198931)1.000:0.031300458)1.000:0.042156286)1.000:0.115485782)1.000:0.022843945,((GCF_001546485_1_ASM154648v1_genomic:0.017705376,(GCF_000414445_1_ASM41444v1_genomic:0.000119495,GCF_000414465_1_ASM41446v1_genomic:0.000288031)1.000:0.018254326)1.000:0.187323158,GCF_003408845_1_ASM340884v1_genomic:0.273480559)1.000:0.031516103)1.000:0.105287618)1.000:0.184239398,(GCF_001546455_1_ASM154645v1_genomic:0.158834848,((((GCF_000414665_1_ASM41466v1_genomic:0.024400129,((GCF_002861905_1_ASM286190v1_genomic:0.019593979,GCF_013315115_1_ASM1331511v1_genomic:0.028631331)1.000:0.012662116,(GCF_000414585_1_ASM41458v1_genomic:0.000000005,GCF_000414605_1_ASM41460v1_genomic:0.000062682)1.000:0.024365016)0.997:0.009709274)1.000:0.017158240,(GCF_001546445_1_ASM154644v1_genomic:0.045083787,GCF_002861885_1_ASM286188v1_genomic:0.041601544)0.989:0.009811219)1.000:0.008444557,(GCF_000414625_1_ASM41462v1_genomic:0.042525597,GCF_003408835_1_ASM340883v1_genomic:0.049892270)0.999:0.010953734)1.000:0.036963531,((GCF_003426565_1_ASM342656v1_genomic:0.000997096,piotii_GCF_003397585_1_ASM339758v1_genomic:0.000227405)1.000:0.051433551,(GCF_000414545_1_ASM41454v1_genomic:0.049885899,((GCF_000414425_1_ASM41442v1_genomic:0.040267066,(GCF_000414485_1_ASM41448v1_genomic:0.041665521,GCF_000414505_1_ASM41450v1_genomic:0.029538413)1.000:0.011386125)0.275:0.004900178,(GCF_000414565_1_ASM41456v1_genomic:0.040629573,GCF_003397615_1_ASM339761v1_genomic:0.041114766)1.000:0.009480673)1.000:0.017839738)0.891:0.011092271)1.000:0.017665095)1.000:0.041690935)1.000:0.103183064)1.000:0.070463199,(GCF_000414705_1_ASM41470v1_genomic:0.049521408,(GCF_000414525_1_ASM41452v1_genomic:0.080724487,GCF_003408785_1_ASM340878v1_genomic:0.061182628)1.000:0.024820373)1.000:0.021054869)1.000:0.033807240)1.000:0.018980712,(GCF_000414645_1_ASM41464v1_genomic:0.027907653,GCF_013315075_1_ASM1331507v1_genomic:0.026610409)0.984:0.006296718)1.000:0.007287308,((((GCF_002862005_1_ASM286200v1_genomic:0.0,GCF_002862015_1_ASM286201v1_genomic:0.0):0.031702636,(GCF_003397605_1_ASM339760v1_genomic:0.018691990,GCF_013315045_1_ASM1331504v1_genomic:0.028742078)1.000:0.008607032)1.000:0.004053679,(GCF_003397665_1_ASM339766v1_genomic:0.023462907,(GCF_001660735_1_ASM166073v1_genomic:0.013671697,(GCF_003585755_1_ASM358575v1_genomic:0.022805192,(GCF_001660755_1_ASM166075v1_genomic:0.000031311,GCF_002861165_1_ASM286116v1_genomic:0.000000005)1.000:0.012753268)1.000:0.012810342)1.000:0.007322703)1.000:0.004506398)1.000:0.007041812,((GCF_002861965_1_ASM286196v1_genomic:0.0,GCF_002861975_1_ASM286197v1_genomic:0.0):0.027555977,GCF_003812765_1_ASM381276v1_genomic:0.015476481)1.000:0.005316359)0.983:0.002876082)0.313:0.002898759,((GCF_000414685_1_ASM41468v1_genomic:0.019432538,GCF_003408745_1_ASM340874v1_genomic:0.025614830)1.000:0.007383498,(GCF_000213955_1_ASM21395v1_genomic:0.021663753,GCF_013315025_1_ASM1331502v1_genomic:0.021564210)0.690:0.005300605)0.993:0.003197446)0.996:0.003120071);

The Treedist distance matrix output of Generalized RF and Nye et al. methods are zero

Dear @ms609,

Thank you again for the detailed manual and explanation of the methods! I do have a question on ClusteringInfoDistance() function and NyeSimilarity() functions.

I am running the following functions-

for distance matrix-

tree1<-read.tree(file="hosttree-d__Bacteria_p__Desulfobacterota_COG0215_tips_1.nwk")
tree2<-read.tree(file="symbionttree-d__Bacteria_p__Desulfobacterota_COG0215_tips_1.nwk")
tree1<-unroot(tree1)

#GRF
dist_rf <- ClusteringInfoDistance(tree1, tree2, normalize = TRUE)

#Nye
dist_ny <- NyeSimilarity(tree1, tree2, normalize = TRUE ,similarity = FALSE)

for p-values-

#GRF
nRep <- 100000 # Use more replicates for more accurate estimate of expected value
randomTrees <- lapply(logical(nRep), function (x) RandomTree(tree1$tip.label))
randomDists <- ClusteringInfoDistance(tree1, randomTrees, normalize = TRUE)
expectedCID <- mean(randomDists)

dist12 <- ClusteringInfoDistance(tree1, tree2, normalize = TRUE)
# Now count the number of random trees that are this similar to tree1
nThisSimilar <- sum(randomDists < dist12)
pValue <- nThisSimilar / nRep

#Nye-
nRep <- 100000 # Use more replicates for more accurate estimate of expected value
randomTrees <- lapply(logical(nRep), function (x) RandomTree(tree1$tip.label))
randomDists <- NyeSimilarity(tree1, randomTrees, normalize = TRUE,similarity = FALSE)
expectedCID <- mean(randomDists)


dist12 <- NyeSimilarity(tree1, tree2, normalize = TRUE,similarity = FALSE)
# Now count the number of random trees that are this similar to tree1
nThisSimilar <- sum(randomDists < dist12)
pValue2 <- nThisSimilar / nRep

I am getting a zero distance matrix and p-value outputs for the trees attached.
Tree1-https://github.com/Jigyasa3/errors/blob/master/hosttree-d__Bacteria_p__Desulfobacterota_COG0215_tips_1.nwk and Tree2- https://github.com/Jigyasa3/errors/blob/master/symbionttree-d__Bacteria_p__Desulfobacterota_COG0215_tips_1.nwk.
The two trees are completely identical to each other, yet the value of the distance matrix is 0. Why do you think that's happening?

Looking forward to your reply!

When no. of tips in species tree and gene tree dont match. Adding new tips to the species tree

Hi @ms609 ,

Thanks again for a great package! I am trying to run TreeDist on species tree-gene tree pair where there are multiple no. of tips in the gene tree per species.

I found the add_host_tips.R script to help add new tips of zero branch length to the species tree.
But it only seems to work for a specific version of the species tree (generated from BEAST2 output) but not from IQTREE or FASTTREE. For the IQTREE and FASTREE versions of the species tree, the script generates a tree with all branch lengths equal to zero.

I was wondering if there is a workaround for this problem in TreeDist? Will it work if the no. of tips in two trees are unequal?

Regards,
Jigyasa

Error with `VisualizeMatching`

Hey Martin, thanks for building this amazing tool. I have been getting the following error when running VisualizeMatching. I tried running VisualizeMatching with the test data you use in the manual and it's running well so I know that the code is working. However, I'm unable to identify why my data is causing this error. Other functions (eg. TreeDistance) seems to be working. Any idea what I could be missing?

Error in if (any(DF[A, ] != DF[B, ])) { :
missing value where TRUE/FALSE needed

Thanks

TreeDistData

All references to TreeDistData have been removed for initial CRAN submission, to avoid mutual dependency. Once package available on CRAN, restore missing vignettes.

NNI dist performance

cpp_robinson_foulds_matching seems to be written with a view to insertion into cpp_nni_distance, where we could also replace cpp_edge_to_splits with a more streamlined bespoke function giving us the minimum required to calculate the matching.

MapTrees() updates

  • SpectralClustering()SpectralEigens()

Mappings:

  • Add t-SNE mapping
  • Add CCA mapping?

Clustering

  • Option for clustering cutoff
    • Default: Only show 'reasonable' clusters?
    • ? Include 'reasonable' clustering in default tree set?

Internalize SPR.dist

Currently calls phangorn's SPR.dist. We can improve performance and stability, and drop import of phangorn, by moving more of this from R to C.

thinnedTrees() with multiple tree batches

in app.R, thinnedTrees() is naively defined as as.integer(seq(keptRange()[1], keptRange()[2], by = 2 ^ input$thinTrees))

This assumes that trees have been loaded from a single file. Otherwise the numbers are garbage.

It'd be nice if this didn't have to be the case. Failing that, we should disable the option if it's not relevant.

Dimension goodness plotter fails with batches

Replicate in MapTrees() GUI by:

  • Selecting a file, and subsampling two batches of trees (I used best.tr from ms609/lobo)
  • Switching to display tab

I see

ncol(distReference) == ncol(distLowDin) is not TRUE

This seems to have disappeared on repeat – perhaps because other packages (TreeSearch?) were subsequently installed?

Match identical splits first

Reduce scale of LAPJV problem by matching identical splits (i.e. run RF first), and only using LAPJV to match non-identicals.

Visualize MST stress

It would be great if it were possible to colour each edge of the plotted MST according to its stress,
i.e. log(mapped length / original length). Would have to use a diverging palette with the zero point set to the average.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.