Comments (9)
[edited to add internal node names]
Illustrating the rooting process using strictly ETE-based visualization.
The ETE documents suggest using TreeNode.set_outgroup()
when rooting a tree. We are currently using a different method that disconnects the naïve node from the tree and attaches the entire tree as a single child of the naïve node.
Illustrated below is an original tree as produced by dnaml
, and then (2) the same tree rooted by our current method. The result is not a binary tree as the root has only one child.
(3) is the same tree rooted using TreeNode.set_outgroup()
on the naïve node.
(4) is the same tree rooted by adding the two subtrees directly as children of the naive sequence.
I'm not sure which one is more correct or whether this matters at all.
(1) Before rooting:
/-seed_QA255
/1|
| \-265064-1
-2|
|--115652-1
|
\-naive2
(2) After rooting using current method:
/-seed_QA255
/1|
-naive22| \-265064-1
|
\-115652-1
(3) After rooting with tree.set_outgroup(naive_node)
/-naive2
|
-2| /-seed_QA255
| /1|
\-| \-265064-1
|
\-115652-1
(4) Adding children of root directly as children of naive node
/-seed_QA255
/1|
-naive2 \-265064-1
|
\-115652-1
from cft.
Yeah I think getting a unifurcation we see in (2) is expected since we first set naive as outgoup (as we do in dnaml) then reroot on it (i.e. swing it above the initial root) in ETE. The method TreeNode.set_outgroup()
is not needed for rerooting the dnaml tree, because naive was already set to outgroup by dnaml. However, the fasttree step does not have a defined outgroup, so our code for rerooting that tree (for pruning) does use that ETE method to set naive as outgroup, then calls our swing-outgroup-up rerooting function.
Method (4) seems reasonable too, but it removes a node (the original root), which will affect tree optimality (e.g. parsimony score will probably increase if that node had contained mutations shared by its children).
But I think I am confused about why (1) and (3) should look different. It looks like (3) has one more internal node.
from cft.
Edited the post to include internal node names when viewed on github; that might make it a little easier to see what's moved around.
from cft.
Ah ok, in (3) it looks like there is another unnamed node that is the common ancestor of 1
and 115652-1
, but is that just a drawing artifact so it's the same as (1)?
from cft.
Not an artifact, that's a real un-named node. The comment in set_outgroup
notes the conditions under which a new node is created.
# If outgroup is a child from root, but with more than one
# sister nodes, creates a new node to group them
from cft.
Oof, ok. I think that currently comes in to play only in rerooting the fasttree tree. After we run set_outgroup
and our reroot_tree
this would give two consecutive unifurcations from the root. This shouldn't impact pruning though, so maybe we're ok?
from cft.
Not sure how we are ok. The current rerooting does not result in a binary tree. It's unfortunate I chose a tree whose naive sequence is named naive2
but there really are two nodes in a line to the root.
/-seed_QA255
/1|
-naive2 - 2| \-265064-1
|
\-115652-1
This matters because I cannot line up sequences if there are two nodes in a row like that. Also dendropy
will simply barf when iterating this tree.
Why do I care about dendropy? Because it has an in-order traversal routine that iterates over nodes in the right order, top to bottom, to implement Erick's suggestion.
ETE also has node iteration routines - "preorder", "postorder", and "levelorder" - but none of them is the right order for what we want.
from cft.
NVM about dendropy, just cons up our own inorder traversal routine for ETE,
def inorder(t):
c = t.children
if c:
for x in inorder(c[0]):
yield x
yield t
for x in inorder(c[1]):
yield x
else:
yield t
from cft.
Resolved by calling set_outgroup(naive_node)
in cftweb/cftweb/cluster.py
from cft.
Related Issues (20)
- Investigate shm indel in QA255.157-Vk HOT 10
- improve view-output output HOT 1
- Add process_partis.py option for a specific indel HOT 6
- Make it very obvious when sequences have an indel HOT 7
- Olmsted frozen deployments HOT 4
- Resolve insertion in QA255.016-VL HOT 2
- Add indel tree analysis to pipeline? HOT 9
- Archiving data for manuscripts HOT 6
- Track build options alongside input data HOT 1
- Check whether naive was sampled HOT 1
- Output full partis cluster fasta HOT 2
- Flow chart defining correspondence between Overbaugh and Matsen teams on B-cell research HOT 11
- Reconsider subsampling/pruning procedure HOT 19
- update testing framework HOT 1
- Remove commented out / unused code and run code formatter
- Preserve ambiguous nucleotides HOT 1
- Nextflow
- Validate parsing of partis multiplicity HOT 15
- Use new partis fcn add_seqs_to_line() HOT 6
- Introduction on the program HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cft.