Giter Club home page Giter Club logo

q2-phylogeny's Introduction

q2-phylogeny's People

Contributors

q2d2 avatar thermokarst avatar gregcaporaso avatar mikerobeson avatar lizgehret avatar ebolyen avatar turanoo avatar rob-knight avatar jairideout avatar jakereps avatar colinvwood avatar andrewsanchez avatar oddant1 avatar chriskeefe avatar david-rod avatar hagenjp avatar jwdebelius avatar

Stargazers

Roberto Alejandro Calzadilla avatar  avatar  avatar

Watchers

 avatar James Cloos avatar Nicholas Bokulich avatar  avatar  avatar  avatar Tiago Andrade Borges Santos avatar  avatar  avatar  avatar Keegan Evans avatar  avatar

q2-phylogeny's Issues

fasttree method can result in mismatched identifers between input and output

If there are underscores present in sequence identifiers in a fasta file provided as input, FastTree (the program) doesn't quote those identifiers by default. In newick format, underscores are treated as spaces, so on viewing the Phylogeny artifact as an skbio.TreeNode, the tip identifiers will have spaces instead of underscores.

FastTree has a -quote parameter which may fix this issue. If not, the fasttree method should handle this.

This came up on the forum here.

Questionable values are being added to `name` attribute

Bug Description
Not sure if this is an skbio issue, but since I first noticed this problem after running the Moving Pictures qiime2 tutorial, I'll raise the issue here.

When investigating the actual TreeNode data structure, it looks like the name attribute is being populated with some float values.

>>> art = qiime2.Artifact.load('rooted-tree.qza')
>>> tree = art.view(TreeNode)
>>> tree.children[1].children
[<TreeNode, name: 1830c14ead81ad012f1db0e12f8ab6a4, internal node count: 0, tips count: 0>,
 <TreeNode, name: 0.739, internal node count: 756, tips count: 759>]
>>> tree.children[1].children[1].length
0.257113916

It looks like these aren't branch lengths, so I'm guessing that they are confidence values.

Questions
Wouldn't it be appropriate to assign these sorts of values to an confidence attribute instead of name? Shouldn't name be reserved for unique identifiers? (i.e. taxonomy names)

align_to_tree_mafft_fasttree with 'auto' for n_threads only uses 1 thread with FastTreeMP

Bug Description
If you specify 'auto' for the n_threads argument in the align_to_tree_mafft_fasttree function, q2-phylogeny sets the OMP_NUM_THREADS environment variable to 0. FastTreeMP issues a warning and then uses OMP_NUM_THREADS=1.

Steps to reproduce the behavior

  1. Use n_threads='auto' in align_to_tree_mafft_fasttree on a machine with multiple processors.
  2. Check the stderr/stdout messages from the pipeline:
Command: FastTreeMP -quote -nt /tmp/qiime2-archive-uwck28s2/c827dc99-ef57-4c83-abe4-42463c6c441b/data/aligned-dna-sequences.fasta

OMP: Warning #80: OMP_NUM_THREADS="0": value too small.
OMP: Info #104: OMP_NUM_THREADS value "1" will be used.
FastTree Version 2.1.10 Double precision (No SSE3), OpenMP (1 threads)

Expected behavior
The description of the 'auto' option in the docs say all available cores should be used, not just 1.

Computation Environment

  • OS: Tested on macOS 10.15 with 1.4 GHz Quad-Core Intel Core i5 proc using QIIME 2 2021.4
  • And Linux CentOS 7 with 16 processors using QIIME 2 2020.11

Comments

  1. The FastTreeMP docs say that you can unset OMP_NUM_THREADS, and then it will use all available cores.

IMP: allow user to supply backbone tree for RAxML

Improvement Description
To decrease the time it takes to construct a phylogeny, it'd be helpful to allow the user to provide a constaint or backbone tree via the -r flag, as outlined in this RAxML Tutorial. Basically, this enforces a topological constraint on the resulting output tree. That is, it ignores paths in tree space that violate the constraint topology. IQ-TREE can also do this.

References

  1. RAxML Tutorial
  2. The initial feature request is here.

add filter-table method

This filters feature ids from a table if they are not tips in a provided tree. This will be ported from q2-feature-table, where a more general filter method is being added.

new ci test failures

@mikerobeson, I am putting out the bat signal...

I recently migrated this repo's ci testing from Travis CI to Github Workflow. As part of that migration, we now build and test the conda package for this plugin, rather than the source itself. The unit tests are currently failing:

https://github.com/qiime2/q2-phylogeny/runs/1922156894

I haven't looked to closely at this, but I suspect we are writing test data somewhere we shouldn't (and the src vs conda package change is what is revealing this issue).

If you get a chance to peek at this please let me know!

Doc update

The help string for phylogeny.methods.filter_table indicates that the method modifies the table "in-place":

Call signature: phylogeny.methods.filter_table(table, tree)
Call docstring:
Filter feature table in-place against a phylogenic tree

This lead to some confusion, because running

phylogeny.methods.filter_table(table.rarified_table, tree.rooted_tree)

doesn't provide a table.rarified_table.filtered_table (not to say that it's possible for methods to chain together like that).

What does work is

new_table = phylogeny.methods.filter_table(table.rarified_table, tree.rooted_tree)
new_table.filtered_table

Can the Call docstring be updated to indicate that this is not "in place" the same way that pandas can filtered a dataframe in place?

Add pplacer

Addition Description
Add a function to add sequences to existing tree using pplacer

Swap FastTree for FastTreeMP (included in bioconda installation)

Currently q2_phylogeny.fasttree uses the FastTree binary, which is single threaded. FastTree offers a parallel version FastTreeMP that the bioconda recipe for FastTree installs with it (both are in the .../envs/.../bin folder after installing the qiime2 core distribution.

This defaults to using all available cores (at least in my two uses of it I've seen that), but it can be modified using export OMP_NUM_THREADS=# as an environment variable, which Python's subprocess.run should know about? (I know it requires passing env variables in to .Popen, but .run should theoretically be fine [may require adding shell=True though]).

This could default to 1, and have a threads parameter or some variant of that which would just update the environment variable before calling the command.

add method to midpoint root trees

The trees generated in QIIME 1 by default (i.e., the trees generated by FastTree) are not rooted, but the new phylogenetic diversity metric implementations require rooted trees. Add midpoint rooting method to support import of QIIME 1 trees.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.