Giter Club home page Giter Club logo

Comments (9)

cswarth avatar cswarth commented on August 15, 2024

[edited to add internal node names]

Illustrating the rooting process using strictly ETE-based visualization.
The ETE documents suggest using TreeNode.set_outgroup() when rooting a tree. We are currently using a different method that disconnects the naïve node from the tree and attaches the entire tree as a single child of the naïve node.

Illustrated below is an original tree as produced by dnaml, and then (2) the same tree rooted by our current method. The result is not a binary tree as the root has only one child.
(3) is the same tree rooted using TreeNode.set_outgroup() on the naïve node.
(4) is the same tree rooted by adding the two subtrees directly as children of the naive sequence.

I'm not sure which one is more correct or whether this matters at all.

(1) Before rooting:

      /-seed_QA255
   /1|
  |   \-265064-1
-2|
  |--115652-1
  |
   \-naive2
(2) After rooting using current method:

            /-seed_QA255
         /1|
-naive22|   \-265064-1
        |
         \-115652-1
(3) After rooting with tree.set_outgroup(naive_node)
   /-naive2
  |
-2|      /-seed_QA255
  |   /1|
   \-|   \-265064-1
     |
      \-115652-1
(4) Adding children of root directly as children of naive node
         /-seed_QA255
      /1|
-naive2  \-265064-1
     |
      \-115652-1

from cft.

WSDeWitt avatar WSDeWitt commented on August 15, 2024

Yeah I think getting a unifurcation we see in (2) is expected since we first set naive as outgoup (as we do in dnaml) then reroot on it (i.e. swing it above the initial root) in ETE. The method TreeNode.set_outgroup() is not needed for rerooting the dnaml tree, because naive was already set to outgroup by dnaml. However, the fasttree step does not have a defined outgroup, so our code for rerooting that tree (for pruning) does use that ETE method to set naive as outgroup, then calls our swing-outgroup-up rerooting function.

Method (4) seems reasonable too, but it removes a node (the original root), which will affect tree optimality (e.g. parsimony score will probably increase if that node had contained mutations shared by its children).

But I think I am confused about why (1) and (3) should look different. It looks like (3) has one more internal node.

from cft.

cswarth avatar cswarth commented on August 15, 2024

Edited the post to include internal node names when viewed on github; that might make it a little easier to see what's moved around.

from cft.

WSDeWitt avatar WSDeWitt commented on August 15, 2024

Ah ok, in (3) it looks like there is another unnamed node that is the common ancestor of 1 and 115652-1, but is that just a drawing artifact so it's the same as (1)?

from cft.

cswarth avatar cswarth commented on August 15, 2024

Not an artifact, that's a real un-named node. The comment in set_outgroup notes the conditions under which a new node is created.

        # If outgroup is a child from root, but with more than one
        # sister nodes, creates a new node to group them

from cft.

WSDeWitt avatar WSDeWitt commented on August 15, 2024

Oof, ok. I think that currently comes in to play only in rerooting the fasttree tree. After we run set_outgroup and our reroot_tree this would give two consecutive unifurcations from the root. This shouldn't impact pruning though, so maybe we're ok?

from cft.

cswarth avatar cswarth commented on August 15, 2024

Not sure how we are ok. The current rerooting does not result in a binary tree. It's unfortunate I chose a tree whose naive sequence is named naive2 but there really are two nodes in a line to the root.

               /-seed_QA255
            /1|
-naive2 - 2|   \-265064-1
           |
            \-115652-1

This matters because I cannot line up sequences if there are two nodes in a row like that. Also dendropy will simply barf when iterating this tree.

Why do I care about dendropy? Because it has an in-order traversal routine that iterates over nodes in the right order, top to bottom, to implement Erick's suggestion.

ETE also has node iteration routines - "preorder", "postorder", and "levelorder" - but none of them is the right order for what we want.

from cft.

cswarth avatar cswarth commented on August 15, 2024

NVM about dendropy, just cons up our own inorder traversal routine for ETE,

def inorder(t):
    c = t.children
    if c:
        for x in inorder(c[0]):
            yield x
        yield t
        for x in inorder(c[1]):
            yield x
    else:
        yield t

from cft.

cswarth avatar cswarth commented on August 15, 2024

Resolved by calling set_outgroup(naive_node) in cftweb/cftweb/cluster.py

from cft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.