Giter Club home page Giter Club logo

phylotrackpy's Introduction

phylotrackpy: a python phylogeny tracker

Tests Documentation Status PyPi Package Version PyPI - Wheel codecov Contributor Covenant DOI

In in silico evolution experiments, we have the luxury of being able to perfectly track the phylogenies of our populations, rather than having to just infer them after the fact. Phylotrackpy is a Python package designed to help you do so as efficiently as possible.

At face value, measuring a phylogeny in in silico evolution may seem very straightforward: you just need to keep track of what gives birth to what. However, multiple aspects turn out to be non-trivial. The goal of Phylotrackpy is to implement these things the right way once so that we all can stop needing to re-implement them over and over. Phylotrackpy is a python library designed to flexibly handle all aspects of recording phylogenies in in silico evolution.

Phylogeny Trackers in Other Languages

C++

Phylotrackpy is essentially a wrapper around Phylotracklib, which is implemented in C++. If you need a C++ phylogeny tracker, you can use that one directly (it is part of the larger Empirical library, which is header-only so you can just include the parts you want).

Julia

A phylogeny tracker written in Julia is also available.

Features

  • Pruning: Ability to prune out taxa that are extinct and have no extant descendants (to keep memory use under control)
  • Flexible taxon definitions: Flexible control of how taxa are defined (e.g. by genotype, by phenotype, by trait, or by something more complex)
  • Efficiency: Highly efficient (implemented in C++ under the hood)
  • Phylostatistics: Includes various phylogenetic topology metrics
  • Flexible output: Easily add columns to output files.

Running a parallel/distributed simulation? Check out hstrat, which provides an alternate parallel/distributed-friendly methodology for decentralized phylogenetic tracking.

High level usage

There are three main steps to tracking a phylogeny using phylotrackpy:

You may also want to:

For more detailed instructions, see the documentation

Installation

Phylotrackpy is available through pip:

pip install phylotrackpy

To install the latest development version:

pip install git+https://github.com/emilydolson/phylotrackpy

To install from a local sorce copy:

pip install . --upgrade

Note that development and local installs will require local compilation of C++ bindings. Pre-built wheels are available with the PyPi distribution. See our documentation for more complete information on local builds.

Useful background information

There are certain quirks associated with real-time phylogenies (especially digital ones) that you might not be used to thinking about if you're used to dealing with reconstructed phylogenies. Many of these discrepancies are the result of the very different temporal resolutions on which these types of phylogenies are measured, and the fact that the taxonomic units we work with are often at a finer resolution than species. We document some here so that they don't catch you off guard:

  • Multifurcations are real: In phylogenetic reconstructions, there is usually an assumption that any multifurcation/polytomy (i.e. a node that has more than two child nodes) is an artifact of having insufficient data. In real-time phylogenies, however, we often observe multifurcations that we know for sure actually happened.
  • Not all extant taxa are leaf nodes: In phylogenetic reconstructions, there is usually an assumption that all extant (i.e. still living) taxa are leaf nodes in the phylogeny (i.e. none of them are parents/offspring of each other; similar taxa are descended from a shared common ancestor). In real-time phylogenies it is entirely possible that one taxon gives birth to something that we have defined as a different taxon and then continues to coexist with that child taxon.
  • Not all nodes are branch points: In phylogenetic reconstructions, we only attempt to infer where branch points (i.e. common ancestors of multiple taxa) occurred. We do not try to infer how many taxa existed on a line of descent between a branch point and an extant taxa. In real-time phylogenies we observe exactly how many taxa exist on this line of descent and we keep a record of them. In practice there are often a lot of them, depending on you define your taxa. It is unclear whether we should include these non-branching nodes when calculating phylogenetic statistics (which is why Phylotrackpy lets you choose whether you want to).

An example of a full digital evolution phylogeny

The above image represents an actual phylogeny measured from digital evolution. Each rectangle represents a different taxon. It's position along the x axis represents the span of time it existed for. Note that there are often sections along a single branch where multiple taxa coexisted for a period of time. Circles represent extant taxa at the end of this run.

Dependencies

  • pybind11 (for wrapping C++ code into Python)
  • Empirical (where the C++ version of this code lives)

Testing dependencies

  • pytest

Documentation dependencies

  • myst_parser (for writing documentation in markdown)
  • sphinx_rtd_theme (theme for readthedocs)

Contributing

Contributions are welcome! See CONTRIBUTING.md.

Citing

If Phylotrack contributes to a scientific publication, please cite it as

Dolson, E., Rodriguez-Papa, S., & Moreno, M. A. (2024). Phylotrack: C++ and Python libraries for in silico phylogenetic tracking. arXiv preprint arXiv:2405.09389. https://doi.org/10.48550/arXiv.2405.09389

@misc{dolson2024phylotrack,
      doi={10.48550/arXiv.2405.09389},
      url={https://arxiv.org/abs/2405.09389},
      title={Phylotrack: C++ and Python libraries for in silico phylogenetic tracking},
      author={Emily Dolson and Santiago Rodriguez-Papa and Matthew Andres Moreno},
      year={2024},
      eprint={2405.09389},
      archivePrefix={arXiv},
      primaryClass={q-bio.PE}
}

Consider also citing pybind11 if you are using PhylotrackPy. And don't forget to leave a star on GitHub!

Developers

phylotrackpy's People

Contributors

dependabot[bot] avatar emilydolson avatar mmore500 avatar rodsan0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

phylotrackpy's Issues

`calc_diversity` segfault

Describe the bug
calc_diversity segfault

To Reproduce
Steps to reproduce the behavior:

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from phylotrackpy import systematics
>>> sys = systematics.Systematics(lambda x: x)
>>> sys.load_from_file("consolidated.csv", "id", True)
>>> sys
<phylotrackpy.systematics.Systematics object at 0x7f122983d770>
>>> sys.calc_diversity()
zsh: segmentation fault (core dumped)  python3

consolidated.csv

Expected behavior
no segfault ๐Ÿ˜…

Screenshots
If applicable, add screenshots to help explain your problem.

Computational environment (please complete the following information):

  • OS: popOS
  • Python version 3.10

Additional context
phylotrackpy version 0.1.12

JOSS Review: Documentation hosting and building

JOSS Review issue

I'd suggest publishing the documentation e.g. on readthedocs.org or as gh pages. Lacking that, I build the documentation locally in the cloned repo and came across/observed several (potential) issues

  • Users/contributors could benefit from instructions how to install dependencies (code and docs)
  • Users/contributors could benefit from instructions how to build the documentation
  • On a Linux system, the env. variable LC_ALL must be set, otherwise make html in docs/ errors out.
  • Packages sphinx_tippy and sphinxcontrib_bibtex should be added to doc dependencies

Avoid storing __eq__ operator for every taxon_info object

Currently, to support NumPy's unconventional == operator (returns array of bools, not a bool), the constructor for taxon_info reaches into Python, grabs the object's classes __eq__ operator, and then uses a try-catch statement to swap in the numpy.array_equal operator.

For a given systematics manager, however, the == should be the same for all taxon_info objects. Thus, there is almost certainly a way to store single copy per systematics manager and avoid having to reach into Python every time the constructor is called.

Possible solutions:

  1. The equals operator could live in the wrapped systematics object (need to figure out how to give taxon_info objects access to it).
  2. The taxon_info object could have the equals operator as a static member (tricky, because it can seg-fault on destruction)

Add tests

The underlying C++ implementation is very well tested, but the Python wrapper needs some tests to see if we've got weird memory issues

`load_from_file` segfaults if root is extant

a.csv

id,ancestor_list,origin_time,destruction_time,num_orgs,tot_orgs,num_offspring,total_offspring,depth
1,[NONE],0,10,1,1,2,3,0

b.csv, note root destruction_time is inf

id,ancestor_list,origin_time,destruction_time,num_orgs,tot_orgs,num_offspring,total_offspring,depth
1,[NONE],0,inf,1,1,2,3,0
from phylotrackpy.systematics import Systematics
sys = Systematics(lambda x: x, True, True, False, False)
sys.load_from_file("a.csv", "id")  # ok
sys.load_from_file("b.csv", "id")  # segfaults

`colless_like_index` crashes for empty tree

Describe the bug
Calculation of colless-like index crashes on empty trees

To Reproduce
Steps to reproduce the behavior:

>>> from phylotrackpy import systematics
>>> syst = systematics.Systematics(lambda x: x)
>>> syst.colless_like_index()

Expected behavior
Shouldn't crash.
Should give numerical result (perhaps NaN).

Computational environment (please complete the following information):

  • OS: Fedora
  • Python version 3.12.2

Additional context
v0.2.0

Creating a taxon manually can cause invalid snapshot files

    tax1, tax2 = "hello", "hello2"
    sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
    org = ExampleOrg(tax1)
    org2 = ExampleOrg(tax2)
    org_tax = systematics.Taxon(0, tax1)
    org2_tax = sys.add_org(org2, org_tax)
    org3_tax = sys.add_org(org2)
    org4_tax = sys.add_org(org, org2_tax)
    org5_tax = sys.add_org(org, org4_tax)

produces

id,ancestor_list,origin_time,destruction_time,num_orgs,tot_orgs,num_offspring,total_offspring,depth
3,[1],0,inf,2,2,0,0,2
2,[NONE],0,inf,1,1,0,0,0
1,[0],0,inf,1,1,1,1,1

Because org_tax is not registered through add_org it won't be included in the Snapshot file, leaving dangling ancestor references that segfault when you try to load the Snapshot file back up.

`get_ave_depth` NaN

Describe the bug
get_ave_depth returns NaN

To Reproduce
Steps to reproduce the behavior:

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from phylotrackpy import systematics
>>> sys = systematics.Systematics(lambda x: x)
>>> sys.load_from_file("consolidated.csv", "id", True)
>>> sys
<phylotrackpy.systematics.Systematics object at 0x7f122983d770>
>>> sys.get_ave_depth()
nan

consolidated.csv

Expected behavior
I think get_ave_depth() should return a defined value for this tree?

Computational environment (please complete the following information):

  • OS: popOS
  • Python version 3.10

Additional context
Add any other context about the problem here.

phylotrackpy version 0.1.12

Track count of coalescence events

Track the count of coalesence events. Requested by @amlalejini.

Should be easy to do by adding a counter that we increment whenever we set mrca to null. Might need to add a couple extra safety checks to make sure we're only incrementing it when mrca has definitely changed, not just when it might have changed. Also need to make sure that lazy mrca evaluation won't introduce inaccuracies (if so, might need to allow turning off lazy mrca evaluation)

`get_mean_pairwise_distance` hangs (>10 min) on deserialized tree

Describe the bug
get_mean_pairwise_distance hangs (>10 min) and cannot be interrupted for valid alife standard file.

To Reproduce

from phylotrackpy import systematics
syst = systematics.Systematics(lambda x: x)
syst.load_from_file("test.csv", "id", True)
syst.get_mean_pairwise_distance(True)

test.csv

Expected behavior
Should not hang.

Screenshots
tree (2)

Computational environment (please complete the following information):

  • OS: fedora
  • Python version: 3.10.13
  • phylotrackpy v0.2.0

Document the funky stuff I just did with deep copies of taxon objects

Basically, there is no time you should ever make a copy of a taxon object (counterexamples welcome, but I'm pretty sure there's not). However, you might want to have one be part of your organism class and make a deep copy of an organism. To make this possible, I have added a fake deepcopy operator to the taxon object. Obviously, that has the potential to be super confusing (and I'm open to alternative approaches to this one), so we should document it really clearly.

Bioinformatics compatability tools

When tracking phylogenies in real time, we often end up with extant (i.e. alive) taxa that are not leaf nodes. Some (possibly most or all?) bioinformatics and paleontology tools represent this scenario as "asymmetric speciation". In other words, if species A gives birth to species B but doesn't go extinct, we would represent that as

A
|
B

But they would represent it as:

     A
  __|__
 |       |
 A       B

We should add a function that does this conversion to phylotrackpy trees, for use in compatability with bioinformatics tools

Installation failure on macOS

Describe the bug
Installation failure (using pip) for macOS

To Reproduce
Steps to reproduce the behavior:

python3 -m pyenv
source pyenv/bin/activate
pip3 install phylotrackpy

Output

Collecting phylotrackpy
  Using cached phylotrackpy-0.2.0.tar.gz (18 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: phylotrackpy
  Building wheel for phylotrackpy (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  ร— Building wheel for phylotrackpy (pyproject.toml) did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [16 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-12-arm64-cpython-311
      creating build/lib.macosx-12-arm64-cpython-311/phylotrackpy
      copying phylotrackpy/__init__.py -> build/lib.macosx-12-arm64-cpython-311/phylotrackpy
      running build_ext
      building 'phylotrackpy.systematics' extension
      creating build/temp.macosx-12-arm64-cpython-311
      clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk -DVERSION_INFO=0.2.0 -DEMP_OPTIONAL_THROW_ON=1 -I/private/var/folders/lp/1hl9w82571572d7shskck5r0fr62d7/T/pip-build-env-_fkye0uv/overlay/lib/python3.11/site-packages/pybind11/include -I/Users/lalejina/devo_ws/alife-2024-phylo-tutorial/pyenv/include -I/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c systematics_bindings.cpp -o build/temp.macosx-12-arm64-cpython-311/systematics_bindings.o -fvisibility=hidden -g0 -std=c++20 -mmacosx-version-min=10.14
      systematics_bindings.cpp:12:10: fatal error: 'Empirical/include/emp/Evolve/Systematics.hpp' file not found
      #include "Empirical/include/emp/Evolve/Systematics.hpp"
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      1 error generated.
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for phylotrackpy
Failed to build phylotrackpy
ERROR: Could not build wheels for phylotrackpy, which is required to install pyproject.toml-based projects

Computational environment (please complete the following information):

  • OS: macOS Sonoma v14.5 (23F79)
  • Python version: 3.11.3

Requested feature: `GetNumTips()`

Is your feature request related to a problem? Please describe.
Given that interior phylogeny nodes may be marked active, a direct way to count tip nodes besides GetNumActive would be useful.

JOSS Review: Functionality, documentation

JOSS Review issue

General documentation

I very much appreciate the detailled introduction and explanations regarding the subtleties in running in-silico vs. in-vitro or in-vivo phylogenetic tracking.

Usage example

That being said, I'm missing similar in-depth usage instructions at a more concrete level than what is
provided in the docs. The usage example does not go beyond showing how to instantiate the Systematics class. What can I do with the class from there on? I tried adding an organism, but needed to mock up an Organism class first. Unfortunately, I'm left
entirely on my own with that task. Would be helpful to have some minimum requirements for such an object or even better a template (abstract base class).

Writing an Organism class

Here's my attempt:

from phylotrackpy import systematics

syst = systematics.Systematics(lambda org: org.genotype)

class Organism():
    genotype = "ACTG"


org0 = Organism()

org0.genotype

syst.add_org(org0)

This code lived in a jupyter notebook, executing the last line of code crashed the kernel.

Coding style

I would recommend not to use sys for a local variable name as it obscures the python native library of the same name (which many python codes import)

Usage example recommendation

For a proper usage example I would expect to be instructed how to build a phylogeny from scratch and how to analyse it. You could also provide the data for below figure (from the README) and instruct the user how to reproduce it.

Copied from README.md

API documentation

Please complete the API documentation and, if done, remove the note "Partially under construction". Probably related to #13 and #22

`load_from_file` takes >10 min for large files

Describe the bug
The load_from_file function takes suspiciously long to load big data files (~400k rows).

To Reproduce
Steps to reproduce the behavior:

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from phylotrackpy import systematics
>>> sys = systematics.Systematics(lambda x: x)
>>> sys.load_from_file("full.csv", "id", True)

full.csv

Expected behavior
go fast(er)! ๐Ÿ˜…

Screenshots
If applicable, add screenshots to help explain your problem.

Computational environment (please complete the following information):

  • OS: popOS
  • Python version 3.10

Additional context
phylotrackpy version 0.1.12

Clearer error on comparison between unsupported taxon_info types

Currently, trying to mix incompatible taxon_info types (i.e. those that don't support == comparison with each other) in the same systematics manager throws a deprecation warning and nothing more. However, it can result in incorrect phylogenies. We should throw a real error in this case.

Loading from bad path crashes Python interpreter

Describe the bug
Loading from a bad path should give an exception but should not crash the Python interpreter.

To Reproduce

from phylotrackpy import systematics

syst = systematics.Systematics(lambda x: x)
syst.load_from_file("", "id")

JOSS review - exporting data

JOSS Review issue

Is your feature request related to a problem? Please describe.
Often it is likely we'll want to do a downstream analysis on evolutionary trees from a simulatrion - for instance, beyond the output available within Phylotrack, I may want to interrogate originations and extinctions of my taxa in another language, and conduct some comparative phylogenetic methods on that in concert with my genotypes.

Describe the solution you'd like
From interrogating the docs and the paper, I see that the data can be output in a Artificial Life Phylogeny Data Standard format, and the paper suggests that can be converted to e.g. Newick. When thinking about adding usage examples, to your documentation (issue #34), would it be possible to add a workflow that demonstrates this for the uninitiated? I think that emphasising the intraoperability of this tool will highlight its strengths and potential uses!

JOSS Review: Input data format conversion

JOSS Review issue

This is not a critical review issue but I came across it while reviewing:

Is your feature request related to a problem? Please describe.
Reading phylogenies stored in foreign file formats seems not to be supported.

Describe the solution you'd like
I'm running stochastic simulations and store the resulting trajectories in a newick tree file or in a json document. Now I want to analyse my data with phylotracks. How can I get the data into phylotracks?

JOSS Review: Installation

JOSS Review issue

Installation proceeds as claimed in the README: pip install phylotrack. However, I would suggest adding instructions how to
install the latest development snapshot (e.g. via pip git+https://github.com/emilydolson/phylotrackpy) and how to install from sources. The latter is important for developers and potential contributors. Could also link to the corresponding section in phylotrackpylotrackpy/CONTRIBUTING.md .

Allow real-time collapsing of unifurcations to save space

This should actually happen in Empirical, but I'm putting this issue here so we don't lose it.

@mmore500 had a good point today about how we can collapse the "stem" of the phylogeny after coalescence events to save space. However, it won't help is in the (relatively common) case where we have a couple very deep branches maintained forever by ecological dynamics. However, we could handle both cases by allowing all unifurcations to be collapsed in real time. This could potentially save a decent amount of space.

Packaging: wheel not available, build from source fails

Describe the bug
Wheel is not available on HPCC. Fallback to build from source fails

To Reproduce

module purge || :
module load GCCcore/12.2.0 Python/3.10.8 || :
python3 -m venv env
source env/bin/activate
python3 -m pip install "phylotrackpy==0.2.0"
Collecting phylotrackpy==0.2.0
  Using cached phylotrackpy-0.2.0.tar.gz (18 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: phylotrackpy
  Building wheel for phylotrackpy (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  ร— Building wheel for phylotrackpy (pyproject.toml) did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [16 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-310
      creating build/lib.linux-x86_64-cpython-310/phylotrackpy
      copying phylotrackpy/__init__.py -> build/lib.linux-x86_64-cpython-310/phylotrackpy
      running build_ext
      building 'phylotrackpy.systematics' extension
      creating build/temp.linux-x86_64-cpython-310
      gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -O2 -ftree-vectorize -march=x86-64 -mtune=generic -fno-math-errno -fPIC -O2 -ftree-vectorize -march=x86-64 -mtune=generic -fno-math-errno -fPIC -fPIC -DVERSION_INFO=0.2.0 -DEMP_OPTIONAL_THROW_ON=1 -I/tmp/pip-build-env-s4a73k8e/overlay/lib/python3.10/site-packages/pybind11/include -I/mnt/ufs18/home-089/mmore500/env310-/include -I/opt/software/Python/3.10.8-GCCcore-12.2.0/include/python3.10 -c systematics_bindings.cpp -o build/temp.linux-x86_64-cpython-310/systematics_bindings.o -fvisibility=hidden -g0 -std=c++20
      systematics_bindings.cpp:12:10: fatal error: Empirical/include/emp/Evolve/Systematics.hpp: No such file or directory
         12 | #include "Empirical/include/emp/Evolve/Systematics.hpp"
            |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      compilation terminated.
      error: command '/opt/software/GCCcore/12.2.0/bin/gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for phylotrackpy
Failed to build phylotrackpy
ERROR: Could not build wheels for phylotrackpy, which is required to install pyproject.toml-based projects

[notice] A new release of pip available: 22.2.2 -> 24.0
[notice] To update, run: pip install --upgrade pip

Expected behavior
The necessary wheel to work with the HPCC should be deployed to PyPi.
Additionally, the Empirical source should be packaged into the source distribution on PyPi so that fallback source builds do not fail

Computational environment (please complete the following information):

  • OS: [e.g. iOS] dev-amd20 centOS
  • Python version [e.g. 3.8] 3.10

Additional context
This workaround to install directly from GitHub works

python3 -m pip install git+https://github.com/emilydolson/[email protected]

Tests fail on wheels for windows

I discovered this bug while adjusting the wheel tests performed by cibuiltwheel (previously weren't run on built wheels)

Describe the bug
CIBuildWheel wheels are failing unit tests on Windows. For now, I've disabled wheel tests on windows.

To Reproduce
Re-enable Windows wheel tests by deleting this line.

CIBW_TEST_SKIP: "*win32"

Expected behavior
Wheel tests should pass on Windows.

Computational environment (please complete the following information):

  • OS: windows
  • Python version: Cpython 3.6

Additional context
Here is the full error from log https://github.com/emilydolson/phylotrackpy/actions/runs/6986563896/job/19012150190

  ================================== FAILURES ===================================
  ___________________________ test_systematics[taxa0] ___________________________
  
  taxa = ['hello', 'hello 2']
  
      @mark.parametrize(
          "taxa",
          (
              ["hello", "hello 2"],
              [1, 2],
              [1.0, 2.0],
              [[1], [1, 2]],
              [np.array([1]), np.array([1, 2])],
          ),
      )
      def test_systematics(taxa):
          tax1, tax2 = taxa
          sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
          org = ExampleOrg(tax1)
          org2 = ExampleOrg(tax2)
          org_tax = systematics.Taxon(0, tax1)
  >       org2_tax = sys.add_org(org2, org_tax)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:65: RuntimeError
  ___________________________ test_systematics[taxa1] ___________________________
  
  taxa = [1, 2]
  
      @mark.parametrize(
          "taxa",
          (
              ["hello", "hello 2"],
              [1, 2],
              [1.0, 2.0],
              [[1], [1, 2]],
              [np.array([1]), np.array([1, 2])],
          ),
      )
      def test_systematics(taxa):
          tax1, tax2 = taxa
          sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
          org = ExampleOrg(tax1)
          org2 = ExampleOrg(tax2)
          org_tax = systematics.Taxon(0, tax1)
  >       org2_tax = sys.add_org(org2, org_tax)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:65: RuntimeError
  ___________________________ test_systematics[taxa2] ___________________________
  
  taxa = [1.0, 2.0]
  
      @mark.parametrize(
          "taxa",
          (
              ["hello", "hello 2"],
              [1, 2],
              [1.0, 2.0],
              [[1], [1, 2]],
              [np.array([1]), np.array([1, 2])],
          ),
      )
      def test_systematics(taxa):
          tax1, tax2 = taxa
          sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
          org = ExampleOrg(tax1)
          org2 = ExampleOrg(tax2)
          org_tax = systematics.Taxon(0, tax1)
  >       org2_tax = sys.add_org(org2, org_tax)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:65: RuntimeError
  ___________________________ test_systematics[taxa3] ___________________________
  
  taxa = [[1], [1, 2]]
  
      @mark.parametrize(
          "taxa",
          (
              ["hello", "hello 2"],
              [1, 2],
              [1.0, 2.0],
              [[1], [1, 2]],
              [np.array([1]), np.array([1, 2])],
          ),
      )
      def test_systematics(taxa):
          tax1, tax2 = taxa
          sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
          org = ExampleOrg(tax1)
          org2 = ExampleOrg(tax2)
          org_tax = systematics.Taxon(0, tax1)
  >       org2_tax = sys.add_org(org2, org_tax)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:65: RuntimeError
  ___________________________ test_systematics[taxa4] ___________________________
  
  taxa = [array([1]), array([1, 2])]
  
      @mark.parametrize(
          "taxa",
          (
              ["hello", "hello 2"],
              [1, 2],
              [1.0, 2.0],
              [[1], [1, 2]],
              [np.array([1]), np.array([1, 2])],
          ),
      )
      def test_systematics(taxa):
          tax1, tax2 = taxa
          sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
          org = ExampleOrg(tax1)
          org2 = ExampleOrg(tax2)
          org_tax = systematics.Taxon(0, tax1)
  >       org2_tax = sys.add_org(org2, org_tax)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:65: RuntimeError
  _______________________ test_taxa_serialization[taxa0] ________________________
  
  taxa = ['hello', 'hello 2']
  
      @mark.parametrize(
          "taxa",
          (
              ["hello", "hello 2"],
              [1, 2],
              [1.0, 2.0],
              [[1], [1, 2]],
              [np.array([1]), np.array([1, 2])],
          ),
      )
      def test_taxa_serialization(taxa):
          tax1, tax2 = taxa
          sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
          org = ExampleOrg(tax1)
          org2 = ExampleOrg(tax2)
  >       org_tax = sys.add_org(org)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:99: RuntimeError
  _______________________ test_taxa_serialization[taxa1] ________________________
  
  taxa = [1, 2]
  
      @mark.parametrize(
          "taxa",
          (
              ["hello", "hello 2"],
              [1, 2],
              [1.0, 2.0],
              [[1], [1, 2]],
              [np.array([1]), np.array([1, 2])],
          ),
      )
      def test_taxa_serialization(taxa):
          tax1, tax2 = taxa
          sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
          org = ExampleOrg(tax1)
          org2 = ExampleOrg(tax2)
  >       org_tax = sys.add_org(org)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:99: RuntimeError
  _______________________ test_taxa_serialization[taxa2] ________________________
  
  taxa = [1.0, 2.0]
  
      @mark.parametrize(
          "taxa",
          (
              ["hello", "hello 2"],
              [1, 2],
              [1.0, 2.0],
              [[1], [1, 2]],
              [np.array([1]), np.array([1, 2])],
          ),
      )
      def test_taxa_serialization(taxa):
          tax1, tax2 = taxa
          sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
          org = ExampleOrg(tax1)
          org2 = ExampleOrg(tax2)
  >       org_tax = sys.add_org(org)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:99: RuntimeError
  _______________________ test_taxa_serialization[taxa3] ________________________
  
  taxa = [[1], [1, 2]]
  
      @mark.parametrize(
          "taxa",
          (
              ["hello", "hello 2"],
              [1, 2],
              [1.0, 2.0],
              [[1], [1, 2]],
              [np.array([1]), np.array([1, 2])],
          ),
      )
      def test_taxa_serialization(taxa):
          tax1, tax2 = taxa
          sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
          org = ExampleOrg(tax1)
          org2 = ExampleOrg(tax2)
  >       org_tax = sys.add_org(org)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:99: RuntimeError
  _______________________ test_taxa_serialization[taxa4] ________________________
  
  taxa = [array([1]), array([1, 2])]
  
      @mark.parametrize(
          "taxa",
          (
              ["hello", "hello 2"],
              [1, 2],
              [1.0, 2.0],
              [[1], [1, 2]],
              [np.array([1]), np.array([1, 2])],
          ),
      )
      def test_taxa_serialization(taxa):
          tax1, tax2 = taxa
          sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
          org = ExampleOrg(tax1)
          org2 = ExampleOrg(tax2)
  >       org_tax = sys.add_org(org)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:99: RuntimeError
  ____________________________ test_shared_ancestor _____________________________
  
      def test_shared_ancestor():
          sys = systematics.Systematics(taxon_info_fun, True, True, False, False)
          org1 = ExampleOrg("hello")
          org2 = ExampleOrg("hello 2")
          org3 = ExampleOrg("hello 3")
          org4 = ExampleOrg("hello 4")
      
  >       org1_tax = sys.add_org(org1)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:123: RuntimeError
  ____________________________ test_phylostatistics _____________________________
  
      def test_phylostatistics():
          sys = systematics.Systematics(str, True, True, False, False)
      
          sys.set_update(0)
  >       id1 = sys.add_org(25)
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:157: RuntimeError
  ________________________________ test_deepcopy ________________________________
  
      def test_deepcopy():
          sys = systematics.Systematics(lambda x: x, True, True, False, False)
  >       tax = sys.add_org("hello")
  E       RuntimeError: Internal Error (in D:\a\phylotrackpy\phylotrackpy\Empirical/include/emp/Evolve/Systematics.hpp line 1458): !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.",
  E       !store_position && "Trying to add org to position-tracking systematics manager without position. Either specify a valid position or turn of position tracking for systematic manager.": [1]
  
  D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py:209: RuntimeError
  =========================== short test summary info ===========================
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_systematics[taxa0]
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_systematics[taxa1]
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_systematics[taxa2]
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_systematics[taxa3]
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_systematics[taxa4]
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_taxa_serialization[taxa0]
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_taxa_serialization[taxa1]
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_taxa_serialization[taxa2]
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_taxa_serialization[taxa3]
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_taxa_serialization[taxa4]
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_shared_ancestor
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_phylostatistics
  FAILED D:\a\phylotrackpy\phylotrackpy\test\test_systematics.py::test_deepcopy
  ======================== 13 failed, 5 passed in 6.47s =========================
Error: Command python -m pytest --import-mode=importlib D:\a\phylotrackpy\phylotrackpy/test failed with code 1. None

Outstanding API documentation changes

  • Write/find section comparing positional to non-positional tracking
    • Link it in the set_store_position docstring
  • Test nullptr -> None conversion in parent() and beyond
  • Add support for Sphinx Tippy or some other tool tip extension
  • Add Sphinx bibliographies (can be based on Empirical implementation)
  • Change all "systematics manager" to "phylotrackpy"

Mysterious interaction with Jupyter/IPython causes deserialization to hang

Describe the bug

This bug is deeply, deeply cursed. ๐Ÿง™

Somehow, the Jupyter environment interacts with phylotrackpy to cause deserialization to hang.
However, once you interrupt and run again it works fine. Does not occur when running interactively in shell.

To workaround, it is necessary to use multiprocess to perform phylotrackpy operations in a forked process.
Here's some example code for the next time this issue is encountered.

import multiprocessing

records = []
for replicate, tree_df in tqdm(df.groupby("replicate")):
    tree_df = tree_df.reset_index(drop=True)
    attrs = {
        col: mit.one(tree_df[col].unique())
        for col in tree_df.columns
        if len(tree_df[col].unique()) == 1 and col not in ["dataSource"]
    }

    def calc_mean_evolutionary_distinctiveness(tree_df):
        tree = apc.RosettaTree(tree_df).as_phylotrack
        return tree.get_mean_evolutionary_distinctiveness(
            tree_df["origin_time"].max()
        )

    def get_mean_evolutionary_distinctiveness(tree_df):
        with multiprocessing.Pool(1) as pool:
            result = pool.map(calc_mean_evolutionary_distinctiveness, [tree_df])
            return result[0]

    records.append(
        {
            **attrs,
            "replicate": replicate,
            "metric": "sum branch lengths",
            "value": alifestd_sum_origin_time_deltas_asexual(tree_df),
        },
    )
    records.append(
        {
            **attrs,
            "replicate": replicate,
            "metric": "mean evolutionary distinctiveness",
            "value":get_mean_evolutionary_distinctiveness(tree_df),
        },
    )

dfmetrics = pd.DataFrame.from_records(records)

To Reproduce
Steps to reproduce the behavior:

mv reproduce.ipynb{.json,}
jupyter nbconvert --execute --inplace reproduce.ipynb

Also occurs in interactive sessions.

out.csv
reproduce.ipynb.json

  • had to make JSON for GH issues upload compatibility

Expected behavior
A clear and concise description of what you expected to happen.

Notebooks should not hang when using phylotrackpy.

Screenshots
If applicable, add screenshots to help explain your problem.

Computational environment (please complete the following information):

  • OS: Fedora
  • Python version 3.10.4
Last updated: 2024-04-13T15:52:06.351382-04:00

Python implementation: CPython
Python version       : 3.10.14
IPython version      : 8.22.1

Compiler    : GCC 13.2.1 20240316 (Red Hat 13.2.1-7)
OS          : Linux
Release     : 6.8.4-200.fc39.x86_64
Machine     : x86_64
Processor   : 
CPU cores   : 8
Architecture: 64bit

Git hash: 008cdf2d71f41e37c3f6b4539121ff39fb53310e

Git branch: debug2

phylotrackpy                      : 0.2.0
seaborn                           : 0.13.2
numpy                             : 1.23.5
alifedata_phyloinformatics_convert: 0.16.2
more_itertools                    : 9.1.0
pandas                            : 1.5.2
joblib                            : 1.3.2
teeplot                           : 1.0.1
keyname                           : 0.5.2

Watermark: 2.4.3

Additional context
Mostly adding this to the issue tracker so we can +1 it if encountered again.

Column key `"info"` defaulted in `load_from_file` isn't included by default when calling `snapshot`

This causes files serialized with Snapshot using default params to not be loadable, which is probably confusing to end users.

Option 1: change default taxon column key to "id" which is guaranteed to be present in alife standard data
Option 2: serialize taxa to "info" column by default, like

sys.add_snapshot_fun(systematics.encode_taxon, "info")

Option 3: leave as-is and document

JOSS Review: Paper

JOSS Review issue

The paper reads great and leaves only few minor details to be desired:

  • line 164: Ref. "Moreno et al (under review)" seems to be unpublished, is it possible to link to a preprint instead?
  • There is no URL link for the reference to pybind (Jakob, 2017).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.