Giter Club home page Giter Club logo

tree-qmc's Introduction

TREE-QMC

install with bioconda

TREE-QMC is a quartet-based method for estimating species trees directly from gene trees or characters, like the popular methods ASTRAL and Weighted ASTRAL/ASTER.

However, TREE-QMC uses a different algorithmic approach than ASTRAL/ASTER. To learn more about the TREE-QMC algorithm, check out Han & Molloy, Genome Res, 2023. It also has some convenient features.

Tutorials

Check out the tutorial for gene trees and the tutorial for character matrices.

Acknowledgements

TREE-QMC is based on the Quartet Max Cut (QMC) framework introduced by Sagi Snir and Satish Rao; see Snir & Rao, IEEE/ACM TCBB, 2010 and Avni, Cohen & Snir, Syst Biol, 2015. TREE-QMC contributes fast algorithms for constructing the quartet graph directly from the input trees, rather than explicitly enumerating all quartets or sampling quartets.

TREE-QMC now implements efficient (and brute force) algorithms for the quartet weighting schemes introduced by Chao Zhang and Siavash Mirarab; see Zhang & Mirarab, Mol Biol Evol, 2022. It also implements quartet support with the approach of Sayyari & Mirarab, Mol Biol Evol, 2016.

TREE-QMC uses MQLib for its max cut heuristic; see Dunning, Gupta, & Silberholz, INFORMS Journal on Computing, 2018.

TREE-QMC uses toms743 for its Lambert's W approximation; see Fritsch, Shafer & Crowley, Communications of the ACM, 1973 and Barry, Barry & Culligan-Hensley, ACM Transactions on Mathematical Software, 1995.

Build

To build TREE-QMC, use commands:

git clone https://github.com/molloy-lab/TREE-QMC.git
cd TREE-QMC/external/MQLib
make
cd ../..
g++ -std=c++11 -O2 \
    -I external/MQLib/include -I external/toms743 \
    -o tree-qmc \
    src/*.cpp external/toms743/toms743.cpp \
    external/MQLib/bin/MQLib.a -lm \
    -DVERSION=\"$(cat version.txt)\"

Usage

To run TREE-QMC, use command:

./tree-qmc -i <input file>

Tips

#1. For convenience, add the directory containing the tree-qmc binary to your PATH variable so that you can type tree-qmc instead of <path to tree-qmc binary>tree-qmc. For bash, this can be done by adding

export PATH=$PATH:"path to tree-qmc binary"

to ~/.bash_profile file (or ~/.bashrc file).

#2. If you are having strange problems, try removing the \r characters from the input files and trying again:

cat file.txt | tr -d '\r' > newfile.txt

Options

To see the TREE-QMC usage options, use command:

./tree-qmc -h

The output help message should be

=================================== TREE-QMC ===================================
BASIC USAGE:
tree-qmc (-i|--input) <input file>

**If the directory containing the tree-qmc binary is not part of $PATH, replace
eplace tree-qmc with <path to tree-qmc binary>/tree-qmc in the command above**

Help Options:
[-h|--help]
        Prints this help message.

Input Options:
(-i|--input) <input file>
        File with gene trees in newick format (required)
[(--chars)]
        Input are characters rather than trees
        Missing states are N, -, and ?
[(--bp)]
        Input are binary characters i.e. bipartitions
        Missing states are N, -, and ?
[(-a|-mapping) <mapping file>]
        File with individual/leaf names (1st col) mapped to species (2nd col)
[(--root) <list of species separated by commas>]
        Root species tree at given species if possible
[(--rootonly) <species tree file>]
        Root species tree in file and then exit
[(--supportonly) <species tree file>]
        Compute quartet support for species tree in file and then exit

[(--pcsonly) <species tree file>]
        Compute partitioned coalescent support (PCS) for specified branch in
        species tree in file (anotate branch with PCS) and then exit

Output Options:
[(-o|--output) <output file>]
        File for writing output species tree (default: stdout)
[(--support)]
        Compute quartet support for output species tree
[(--writetable) <table file>]
        Write branch and quartet support information to CSV
[(--char2tree)]
        Write character matrix as trees (newick strings) to output and exit

Algorithm Options:
[(--hybrid)]
        Use hybrid weighting scheme (-w h)
[(--fast)]
        Use fast algorithm that does not support weights or polytomies (-w f)
[(-B|--bayes)]
       Use presets for bayesian support (-n 0.333 -x 1.0 -d 0.333)
[(-L|--lrt)]
       Use presets for likelihood support (-n 0.0 -x 1.0 -d 0.0)
[(-S|--bootstrap)]
       Use presets for boostrap support (-n 0 -x 100 -d 0)
[(-c|--contract) <float>]
       Contract internal branches with support less than specified threshold
       after mapping suport to the interval 0 to 1

Advanced Options:
[(-w|--weight) <character>]
        Weighting scheme for quartets; see paper for details
        -w n: none (default)
        -w h: hybrid of support and length (recommended)
        -w s: support only
        -w l: length only
        -w f: none fast
              Refines polytomies arbitrarily so faster algorithm can be used
[(-n|--min) <float>]
        Minimum value of input branch support
[(-x|--max) <float>]
        Maximum value of input branch support
[(-d|--default) <float>]
        Default branch support to use if branch input tree is missing support
        Default branch length is 0.0
[(--norm_atax) <integer>]
        Normalization scheme for artificial taxa; see paper for details
        --norm_atax 0: none
        --norm_atax 1: uniform
        --norm_atax 2: non-uniform (default)
[(-e|--execution) <execution mode>]
        -e 0: run efficient algorithm (default)
        -e 1: run brute force algorithm for testing
        -e 2: compute weighted quartets, then exit
        -e 3: compute good and bad edges, then exit
[(-v|--verbose) <verbose mode>]
        -v 0: write no subproblem information (default)
        -v 1: write CSV with subproblem information (subproblem ID, parent
              problem ID, depth of recursion, total # of taxa, # of artifical
              taxa, species names)
        -v 2: write CSV with subproblem information (info from v1 plus # of
              of elements in f, # of pruned elements in f, # of zeroes in f)
[(--polyseed) <integer>]
        Seeds random number generator with prior to arbitrarily resolving
        polytomies. If seed is set to -1, system time is used;
        otherwise, seed should be positive (default: 12345).
[(--maxcutseed) <integer>]
        Seeds random number generator prior to calling the max cut heuristic
        but after the preprocessing phase. If seed is set to -1, system time
        is used; otherwise, seed should be positive (default: 1).
[--shared <use shared taxon data structure to normalize quartet weights>]
        Do NOT use unless there are NO missing data!!!

Contact: Post issue to Github (https://github.com/molloy-lab/TREE-QMC/)
        or email Yunheng Han ([email protected]) & Erin Molloy ([email protected])

If you use TREE-QMC in your work, please cite:
  Han and Molloy, 2024, https://github.com/molloy-lab/TREE-QMC.

  and

  Han and Molloy, 2023, Improving quartet graph construction for scalable
  and accurate species tree estimation from gene trees, Genome Research,
  http:doi.org/10.1101/gr.277629.122.
================================================================================

tree-qmc's People

Contributors

ekmolloy avatar yhhan19 avatar

Watchers

 avatar

tree-qmc's Issues

scoring an existing tree

Hello,

Firstly, thank you very much for developing this powerful software, and congratulations to the TREE-QMC team!

I may be missing something about the current functions, but I was curious if a function could be implemented to score an existing tree (analogous to the -q option in ASTRAL).

Thanks again for developing TREE-QMC!

best,

Jacob

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.