Giter Club home page Giter Club logo

treetools's Introduction

TreeTools

codecov CRAN Status Badge CRAN Downloads DOI Project Status: Active – – The project has reached a stable, usable state and is being actively developed.

'TreeTools' is an R package that provides efficient implementations of functions for the creation, modification and analysis of phylogenetic trees.

Applications include: generation of trees with specified shapes; analysis of tree shape; rooting of trees and extraction of subtrees; calculation and depiction of node support; calculation of ancestor-descendant relationships; import and export of trees from Newick, Nexus and TNT formats; and analysis of partitions and cladistic information.

It complements packages such as 'ape', 'phangorn' and 'phytools', aiming for efficient and robust implementations of functions, typically applied to unweighted trees (i.e. those without edge lengths).

Installation

Install and load the library from CRAN as follows:

install.packages("TreeTools")
library("TreeTools")

Install the very latest version, which may be under development, with:

if (!require("devtools")) install.packages("devtools")
devtools::install_github("ms609/TreeTools")

Please note that the 'TreeTools' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

treetools's People

Contributors

actions-user avatar hadley avatar ms609 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

treetools's Issues

Error: This many leaves cannot be supported

Hi,
I am trying to measure distances between two trees, and getting this error message:
> TreeDistance(t1, t2) Error: This many leaves cannot be supported. Please contact the TreeTools maintainer if you need to use more!
I tried decreasing the number of tips to 4096 (as mentioned in some part of the TreeDist manual), but I still get this error. Is there a workaround for this, and how much tips are allowed by default? Somehow I cannot find it in the documentation.
Thank you!

Implement sort.multiPhylo

Sorting trees into a consistent (and logical?) order will make it easier to view differences in lists of trees

using WriteTNTcharacters() with continuous matrix

I tried to export a TNT version of a continuous phylogenetic matrix using this function but the resultant characters aren't separated by anything so TNT doesn't interpret them correctly.

Here is an example of what the output looks like (68 continuous characters in this taxa):

taxa_a 1.7380.9960.1270.1570.5240.3030.880.1860.0890.8420.1030.0510.8820.1230.0570.7780.0020.0460.7380.0110.1230.8350.2220.0320.3380.5290.5730.5750.030.4470.3160.6020.5110.231.8180.9950.0970.130.4340.1590.94100.0970.8270.1760.2390.8270.2110.330.7890.2960.3270.7510.4250.2780.8810.1880.2670.3660.4450.5980.36800.4850.3050.6340.5080.271

I tried initially reading the dataset in using ReadCharacters() as well as using as.matrix() before using WriteTNTCharacters() to no avail. Both these methods read in the characters correctly. Hope I didn't miss a simple fix

J

Merge `AllDescendantEdges()` with `DescendantEdges()`

  • Move edge parameter after parent & child in DescendantEdges call [breaking change?]
  • If edge = NULL, in DescendantEdges(), call AllDescendantEdges()
  • Make AllDescendantEdges() internal
  • Stop exporting AllDescendantEdges() (and move .AllDescendantEdges() into DescendantEdges())

Unsupported TNT file

Dear Martin,

I'm trying to read the TNT matrix from Mirande 2008 (Appendix S5 file characidae.tnt) using ReadTntCharacters() and I'm getting the following error:

Error in toupper (lines): invalid multibyte string 3842
In addition: Warning messages:
1: In grep ("'", lines, fixed = TRUE):
   input string 3857 is invalid at that locale
2: In grep (";", lines, fixed = TRUE):
   input string 3842 is invalid at that locale

Do you have any idea of what is happening? Can it be a problem of encoding? Is there a way to control it?

Best,

Sara

`ClusterTable` memory requirements

Running consensus_info() with 36000 trees requires a vector of 36000 ClusterTables, which requires more memory than is available.

Can we reduce the memory requirement of a ClusterTable?
(Perhaps we need to operate on the heap rather than the stack?)

DropTip will not remove tip on tree

I cannot get DropTip to remove a tip. What am I doing wrong? Tried with several trees. Here is an example:

library(phytools)
library(ggtree)
library(TreeTools)

tree2<-pbtree(n=5)
plotTree(tree2)
DropTip(tree2,'t4',preorder = TRUE,check=TRUE)
plotTree(tree2)

The tree does not change.

Thanks.

Quality of a dataset

Haag et al. measure the ruggedness of a tree landscape by training a regression model (trained on molecular datasets, implemented in C) based on:

  • Unique topologies after 100 parsimony searches: 42.9 %
  • RF-Distance between parsimony trees: 33.2 %
  • Entropy (Average Shannon entropy per column): 17.0 %
  • Patterns (unique columns)-over-taxa 13.6 %
  • % Gaps 2.5 %
  • Bollback 2.3 %
  • Sites(n columns)-over-taxa 1.5 %
  • % Invariant columns 0.6 %

Custom directory for caching

I want to use TreeTools inside a docker (singularity, in fact) container and it seems TreeTools uses /home for caching results which is a bottleneck in my application. It would be better if 1. I could tell TreeTools not to use a cache directory at all or 2. I could set the cache directory manually to be an arbitrary directory. Is this possible with TreeTools at the moment? If not, would this be relatively simple to implement? Many thanks.

Error parsing Nexus file

Hi there Martin,

Just trying to using ReadCharacters to read in a continuous dataset I am analysing in various ways.

Unfortunately the continuous decimals are parsed in individually as characters, eg 0.15 would be read in as '0' '.' '1' '5'

I had a a look at the github files and couldn't see any mention of continuous characters, though I may have missed that part. Is there a way to read them in at all?

I have attached the dataset in case that is of any use, but had to convert it to .txt instead of .nex
koch_raw_MASTER.txt

Thanks for your time

Will not install on Rstudio

*** arch - i386
/mingw32/bin/g++ -std=gnu++17 -I"C:/Users/THEODO1.ALL/R-411.1/include" -DNDEBUG -I'C:/Users/Theodore.Allnutt/Rlibs/Rcpp/include' -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c ClusterTable.cpp -o ClusterTable.o
In file included from ClusterTable.cpp:1:
../inst/include/TreeTools/ClusterTable.h:6:10: fatal error: Rcpp/Lightest: No such file or directory
#include <Rcpp/Lightest>
^~~~~~~~~~~~~~~
compilation terminated.
make: *** [C:/Users/THEODO1.ALL/R-411.1/etc/i386/Makeconf:245: ClusterTable.o] Error 1
ERROR: compilation failed for package 'TreeTools'

  • removing 'C:/Users/Theodore.Allnutt/Rlibs/TreeTools'
    Warning in install.packages :
    installation of package ‘TreeTools’ had non-zero exit status

NexusTokens() shiny interaction

NexusTokens() calls shiny::updateNumericInput("character_num"): this strikes me as not the most appropriate way to do this, shouldn't the caller be able to specify this? Attempt to remove, allowing the removal of shiny from DESCRIPTION Suggests: field.

"Preorder" classification

With #92, DropTip() now returns edges numbered in preorder, but not conforming to the additional requirements of Preorder(). Is it true to consider this "cladewise"?

Probably we need a function that tests whether a tree is in strict TreeTools-Preorder, or whether it's just in preorder; some functions may only require the latter, saving time in unnecessary renumbering. We should audit the code so we're only requesting what we require. Might this necessitate a flag, as with postorder, to indicate whether treetools conventions are followed?

@DSRovinsky wishlist

Support stats like:
[ ] Bremer/branch supports
[ ] CI/RI
[ ] bootstrapping

[ ] Ability to 'force' a tip into a clade & run a Templeton test for alternative hypotheses

Deprecate `in.Splits()`

Included at present as alias for %in%.Splits().

  • Replace in 'Quartet' on CRAN
    • Update code
    • Release on CRAN
  • Deprecate
    • Add warning (Done in 1.7.3, Jul 2022)
    • Remove from exports (Done in 1.10.0, Aug 2023)
    • Delete function

demo()

Consider which functions / function suites could be documented using the demo() functionality.

(Other packages may benefit too.)

Random Trees don't match balance

Both these tests fail, in opposite directions

expect_equal(
    mean(replicate(100, TotalCopheneticIndex(RandomTree(10, root = TRUE)))), # ~90
    TCIContext(10)$uniform.expected, # 76
    tolerance = 0.1
  )
  expect_equal(
    mean(replicate(100, TotalCopheneticIndex(ape::rtree(10, root = TRUE,  equiprob = TRUE, br = NULL)))), # ~50
    TCIContext(10)$uniform.expected,  # 76
    tolerance = 0.1
  )

Unsupported NEXUS file

Please attach the problematic NEXUS file and describe the issue

Am new in using R. Am not able to read the nexus file.

TNT multiline support

TNT can parse files with arbitrary line break positions; see dromaeodat.tnt in SI of
https://doi.org/10.1016/j.cub.2020.06.105

Work started on branch parse-tnt.

Unsolved problem: How to identify taxon names given that

  • Taxon names contain numbers in any position, and may contain exclusively digits; and
  • Character data may be interleaved

`%in%.Splits()` drops names

tree1<-read.tree(text = x<-"(a,b,(c,(d,e)));")
tree2<-read.tree(text = "(a,b,c,(d,e));")
splits1 <- as.Splits(tree1)
splits2 <- as.Splits(tree2)
splits1in2 <- splits1 %in% splits2
names(splits1in2) # should equal names(splits1); instead NULL

Support edge lengths in `UnrootTree()`

I was wondering what exactly happen the tree while unrooting it in the background?
I have got some phylogenetic alpha diversity to run on on phyloseq objects with rooted trees which I need to unroot first before running this code as it was advised by some posts to avoid the random rooting of the tree
Does it make changes in any way to the tree?

### ROOTING the tree more appropriately ####
pick_new_outgroup <- function(tree.unrooted){
  require("magrittr")
  require("data.table")
  require("ape") # ape::Ntip
  # tablify parts of tree that we need.
  treeDT <- 
    cbind(
      data.table(tree.unrooted$edge),
      data.table(length = tree.unrooted$edge.length)
    )[1:Ntip(tree.unrooted)] %>% 
    cbind(data.table(id = tree.unrooted$tip.label))
  # Take the longest terminal branch as outgroup
  new.outgroup <- treeDT[which.max(length)]$id
  return(new.outgroup)
}

new.outgroup <- pick_new_outgroup(tree.unrooted)
# > new.outgroup
# [1] "ASV679"

rootedTree <- ape::root(phy_tree(ps.3), outgroup = new.outgroup, resolve.root = TRUE)

Why I am asking? because I have been woking with the above codes without any errors or warnings
but when I unrooted the tree using unrootTree() I get this error

> new.outgroup <- pick_new_outgroup(unrooted.tree)
Warning message:
In as.data.table.list(x, keep.rownames = keep.rownames, check.names = check.names,  :
  Item 1 has 3839 rows but longest item has 3840; recycled with remainder.

Thanks

If tree has defined node.labels AddTips() will not change them

Hi,

I'm not sure if I'm using AddTips() incorrectly or there is an issue with the function. Reprex below:

# create a random tree
set.seed(0)
tree <- ape::rtree(10)
# define node labels and plot
tree$node.label <- paste0("Node_", 1:tree$Nnode)
plot(tree, show.node.label = TRUE)

# add a tip
tree2 <- TreeTools::AddTip(tree, where = "t8", label = "new", edgeLength = 0)
# plot, notice that: 1. internal node labels change, 2. "Node_1" is now recycled!
plot(tree2, show.node.label = TRUE)

# Also node.label does not change
tree2$node.label
#> [1] "Node_1" "Node_2" "Node_3" "Node_4" "Node_5" "Node_6" "Node_7" "Node_8"
#> [9] "Node_9"

Created on 2024-03-22 with reprex v2.1.0

This is a problem because tools that make use of the node.label element will produce all sorts of issues downstream.

Not sure this is a proper fix but adding something like this may work:

if ("node.label" %in% names(tree) {
  tree$node.label <- paste0("Node_",  1:tree$Nnode)
}

The names of the internal nodes will still change but the redundancy will be eliminated.

Deprecate `NonDuplicateRoot()`

Behaviour is poorly defined, and necessity of function is questionable: if edge 1 must be a root edge, why not just ignore that edge, rather than its duplicate?

Only required by TreeSearch::SPR(), which will soon be ported into C. Once this is done, the function can be deprecated and removed.

Replace root_binary with root_on_node

In root_tree.h, check whether inline IntegerMatrix root_binary(const IntegerMatrix edge, const int outgroup) really outperforms root_on_node. If not, delete it.

ape memory allocation issues

Warning: Error in : cannot allocate vector of size 8.0 Gb in consensus() / dist.topo from call to ConsensusWithout() in app.R with lobo/best.tr trees

as.TreeNumber not identifying unique topologies

After going through the documentation, I was under the impression that as.TreeNumber would generate numbers for all the different unique topologies in a data set. Currently I am working with 299 trees consisting of seven tips (all trees have all tips). I can generate the output and save the scores for all 299 trees. When looking more closely at the trees with the same score, the topologies are in fact not the same. Am I misinterpreting the usage of this function or is there an error somewhere that is causing different topologies to be scored the same?

Here is the code that is being used:
trees <- read.tree(file="299_trees.tre")
trees

tips <- TipLabels(trees)
tips

possible <- as.TreeNumber(trees, nTip=7,tipLabels = tips)

possible
class(possible)

sink("output.txt",type=c("output"))

print(possible)
sink()

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.