Comments (8)
Update
The issue is that the root node may have an edge length. Node.distance_from_root() includes the root's edge length, but Tree.calc_node_root_distance() ignores it.
Output
distance_from_root: 2.4693498882842526
[2.4693498882842526, 2.4693498882842526, 2.4693498882842526]
calc_node_root_distances: 0.5186108276003377
[0.5186108276003377, 0.5186108276003377, 0.5186108276003377]
root edge length: 1.950739060683915
difference between 2 calculation methods: 1.9507390606839148
Code
import dendropy as dp
def distance_from_root(node):
dist = node.edge.length
parent = node.parent_node
while parent:
dist += parent.edge.length
parent = parent.parent_node
return dist
def distance_from_root2(node):
dist = 0
while node.parent_node:
dist += node.edge.length
node = node.parent_node
return dist
species = dp.TaxonNamespace(["A", "B", "C"])
t = dp.simulate.birth_death_tree(birth_rate=1.0, death_rate=0, taxon_namespace=species)
# First style
print()
one = max(map(distance_from_root, t.leaf_node_iter()))
print("distance_from_root: {}".format(one))
print([x.distance_from_root() for x in t.leaf_node_iter()])
print()
# Second style
two = max(map(distance_from_root2, t.leaf_node_iter()))
print("calc_node_root_distances: {}".format(two))
print(t.calc_node_root_distances())
print()
print("root edge length: {}".format(t.seed_node.edge.length))
print("difference between 2 calculation methods: {}".format(one-two))
from dendropy.
Nice catch. The question is now, which of the behaviors is correct? Or,
put another way, which is less surprising?
On the one hand, if the declared intention is "distance from root node",
then it seems that any edge lengths before the root node should be
disregarded.
On the other, one might expect the distance from root node for tips to
be the entire tree height?
I am leaning toward the former, i.e., to consider both tree height and
distance from root node to start from the root node itself, ignoring the
edge length of the root node if any.
On 7/6/16 1:13 AM, Niko Yasui wrote:
Update
The issue is that the root node may have an edge length.
Node.distance_from_root() includes the root's edge length, but
Tree.calc_node_root_distance() ignores it.Output
distance_from_root: 2.4693498882842526
[2.4693498882842526, 2.4693498882842526, 2.4693498882842526]calc_node_root_distances: 0.5186108276003377
[0.5186108276003377, 0.5186108276003377, 0.5186108276003377]root edge length: 1.950739060683915
difference between 2 calculation methods: 1.9507390606839148Code
import dendropy as dp
def distance_from_root(node):
dist = node.edge.length
parent = node.parent_node
while parent:
dist += parent.edge.length
parent = parent.parent_node
return distdef distance_from_root2(node):
dist = 0
while node.parent_node:
dist += node.edge.length
node = node.parent_node
return distspecies = dp.TaxonNamespace(["A", "B", "C"])
t = dp.simulate.birth_death_tree(birth_rate=1.0, death_rate=0, taxon_namespace=species)First style
print()
one = max(map(distance_from_root, t.leaf_node_iter()))
print("distance_from_root: {}".format(one))
print([x.distance_from_root() for x in t.leaf_node_iter()])
print()Second style
two = max(map(distance_from_root2, t.leaf_node_iter()))
print("calc_node_root_distances: {}".format(two))
print(t.calc_node_root_distances())
print()print("root edge length: {}".format(t.seed_node.edge.length))
print("difference between 2 calculation methods: {}".format(one-two))—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#60 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AABmR0d2lUxc20Au9YWiawZTFQW4GqIhks5qSzmOgaJpZM4JE3Py.
Jeet Sukumaran
[email protected]
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
from dendropy.
Since I'm using dendropy to reproduce Mesquite-like behavior, and Mesquite assumes that the root node bifurcates immediately, I have so far manually set the root node's edge length to 0 in my birth/death trees.
I agree with your assessment of the situation. Based on my reading of the function names and my theoretical understanding of trees, the root node should have an edge with length 0 (or even better, not have an edge).
from dendropy.
I've also noticed that contained coalescent trees tend to have extremely short edges near the root, especially when species tree edges are short relative to the population size.
Could this be because on line 516 in coalescent.py, coalesce_nodes is forcing all remaining taxa to rapidly coalesce, instead of using the total length of the root node's edge length to coalesce?
from dendropy.
If period=None
is specified, then no restriction is put on the total
time available for coalescence, and coalescence proceeds until all nodes
have coalesced into one.
BUT the rate of coalescence should continue to follow the Kingman
distribution.
Ln 272 shows that if time_remaining
(which is assigned the value of
period
) is None
, the loop repeats without restriction.
-- jeet
On 7/7/16 9:03 PM, Niko Yasui wrote:
@jeetsukumaran https://github.com/jeetsukumaran,
I've also noticed that contained coalescent trees tend to have extremely
short edges near the root, especially when species tree edges are short
relative to the population size.Could this be because on line 516 in coalescent.py, coalesce_nodes is
forcing all remaining taxa to rapidly coalesce, instead of using the
total length of the root node's edge length to coalesce?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#60 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AABmR58f7b9P_QauurfYRo1CqIS_fPgZks5qTaH3gaJpZM4JE3Py.
Jeet Sukumaran
[email protected]
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
from dendropy.
That's true, I'll keep digging. There is something about the comments that I'm confused about, though.
Line 101
The waiting time for coalescence of *any* two gene lineages in a sample of
$n$ gene lineages evolving in a population with haploid size $N$ is an
exponentially-distributed random variable with rate of $\\choose{N, 2}$ and
an expectation of $\\frac{1}{\choose{N, 2}}$.
This seems incorrect, because the code is using the below formula, but also because the number of gene lineages is completely ignored.
$N$ * Exponential($\\choose{n, 2}$)
Why is the exponential multiplied by the population size?
Also, line 80 should describe the continuous time version, rather than the descrete time version:
A random draw from the "Kingman distribution" (discrete time version):
from dendropy.
The documentation is indeed incorrect (notation slippage, with the
number of genes
formula we actually use is the correct one:
133: rate = combinatorics.choose(n_genes, n_to_coalesce)
134: tmrca = rng.expovariate(rate)
The number of gene lineages is n_genes
. n_to_coalesce
defaults to 2.
That is, the waiting time to coalescence for any two genes in a sample
of
documentation). This formula gives the rate in units of
time in, e.g. calendar time, you multiply the result by
Thank you for pointing this documentation error out.
-- jeet
On 7/7/16 10:56 PM, Niko Yasui wrote:
That's true, I'll keep digging. There is something about the comments
that I'm confused about, though.Line 101
The waiting time for coalescence of *any* two gene lineages in a sample of $n$ gene lineages evolving in a population with haploid size $N$ is an exponentially-distributed random variable with rate of $\\choose{N, 2}$ and an expectation of $\\frac{1}{\choose{N, 2}}$.
This seems incorrect, because the code is using the below formula, but
also because the number of gene lineages is completely ignored.
$N$ * Exponential($\choose{n, 2}$ )Based on the wikipedia page, I suspect the correct formula should be
something like this:Exponential(
$N$ ) /$\choose{n, 2}$ ,where the time for a given pair of genes to coalesce is divided by the
number of pairs of genes. What do you think?Also, line 80 should describe the continuous time version, rather than
the descrete time version:A random draw from the "Kingman distribution" (discrete time version):
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#60 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AABmR5oVfOE9CYaCkK1lU4OjNCoZteGFks5qTbxygaJpZM4JE3Py.
Jeet Sukumaran
[email protected]
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
from dendropy.
Thanks for the clarification!
from dendropy.
Related Issues (20)
- `Tree.mrca` has side effect of changing rooting state HOT 6
- `shuffle_taxa` raises `DeprecationWarning`
- Keep supporting Python2.7? HOT 3
- Some examples in `/docs/source/examples` are broken
- Add developer documentation
- Track PyPi URL change for project homepage
- Chore: Fix `ResourceWarning`s from tests
- Chore: fix `distutils` deprecation warning HOT 1
- `paup_estimate_tree_nj` example experiences intermittent failure HOT 1
- Provide public interface for `_convert_node_to_root_polytomy` HOT 1
- quote_underscores argument unrecognized HOT 1
- Documentation Deployment
- Create smoke tests
- Multithreaded option fails with no error message - leaf labels issue? HOT 2
- Error runnig picrust due to cannot import name '_convert_node_to_root_polytomy' HOT 2
- Website dendropy.org is down HOT 5
- treesim.birth_death_tree() raises division by zero error when used to extend an existing tree HOT 4
- Final Steps for Version 5 Release
- JOSS Paper?
- Support values written as quoted node labels since >4.6.1 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dendropy.