csvoss / syndra Goto Github PK

Convert high-level facts-about biology into an executable model, using logical deduction (the name is a pun on synthesis + INDRA)

Python 31.18% Shell 0.56% C 65.12% C++ 3.14%

syndra's People

Contributors

Stargazers

Watchers

Forkers

bgyori johnbachman arey0pushpa ykankaya hanmeh

syndra's Issues

More more tests!

Finish coverage of Predicate and AtomicPredicate, and add tests for whatever else seems useful as well.

Write abstract for Masterworks by March 29

Form is due March 31.

https://www.eecs.mit.edu/academics-admissions/masterworks-2016/registration-form-students

More tests!

In particular, to help me debug check_sat.

First stab at PySB->Kappa translation

Make this a pip-installable package

Test deduction rules: confirm A phosphorylates B + phosphorylated B is active => A activates B

This should take the form of a test case that asserts the first two predicates, gets a model, then checks the satisfiability of the third predicate on the resulting model.

Make solver.py behave like z3's solver, but for Syndra predicates instead of z3 predicates

I think this would be a better abstraction.

Then I could add Syndra predicates, push, and pop, and solver.py would maintain the state of the z3 solver it's using, and I would be able to do stuff like

solver.add(Syndra predicate)
solver.get_model()
solver.push()
solver.add(Syndra predicate)
solver.get_model()
solver.pop()

and have that all do the right thing.

Maybe this is what my solver was supposed to do all along, and now I'm just realizing it!

More features in structure.py and predicate.py

(from conv. with Hector)

For predicate.py:
ExistsNode
Something that will be powerful enough to allow us to encode that if a site is labeled ThreoninePhosphorylated, then it must be labeled Phosphorylated and also be labeled HasThreonine

For structure.py:
with_parent
"NOT" versions of all of the structure.py methods

Replace higher-order functions with functions over arrays

Get private work wiki up and running

Fix tests to work with slightly-changed API

Implement a system to manage labels intelligently

Allow multiple labels and choosing from the powerset of possible labels.

Bare-bones draft

Introduction + proposed work.

Solve Occam's Razor problem

(description copypasted from Dropbox Paper)

Currently, from the following Syndra predicate…

ModelHasRule(lambda r: And(
    PregraphHas(r, kinase.labeled(active)),
    PregraphHas(r, substrate),
    PostgraphHas(r, kinase.labeled(active)),
    PostgraphHas(r, substrate.labeled(phosphate)),
    Not(PregraphHas(r, substrate.labeled(phosphate))),
))

…we get the following model:

[Rule({substrate with links to kinase; kinase-(active, phosphate) with links to substrate} -> {substrate-(active, phosphate) with links to kinase; kinase-(active, phosphate) with links to substrate})]

This is a list consisting of one rule, which says that substrate bound to active-phosphorylated kinase becomes active-phosphorylated substrate bound to active-phosphorylated kinase, technically satisfying the Syndra predicate even though there are unnecessary labels and bindings.

It’s even worse with the model we get from Walter’s example:

[Rule({RAF-(GTP, phosphate) with links to ERK1, HRAS, MEK1, SAF1; ERK1 with links to RAF, HRAS, MEK1, SAF1; HRAS with links to RAF, ERK1, MEK1, SAF1; MEK1 with links to RAF, ERK1, HRAS, SAF1; SAF1 with links to RAF, ERK1, HRAS, MEK1} -> {RAF-(GTP, phosphate) with links to ERK1, ERK1, HRAS, HRAS, MEK1, MEK1, SAF1, SAF1; ERK1-(GTP, phosphate) with links to RAF, RAF, HRAS, HRAS, MEK1, MEK1, SAF1, SAF1; HRAS-(GTP, phosphate) with links to RAF, RAF, ERK1, ERK1, MEK1, MEK1, SAF1, SAF1; MEK1-(GTP, phosphate) with links to RAF, RAF, ERK1, ERK1, HRAS, HRAS, SAF1, SAF1; SAF1-(GTP, phosphate) with links to RAF, RAF, ERK1, ERK1, HRAS, HRAS, MEK1, MEK1}), Rule({RAF with links to ERK1, HRAS, MEK1, SAF1; ERK1-(GTP, phosphate) with links to RAF, HRAS, MEK1, SAF1; HRAS with links to RAF, ERK1, MEK1, SAF1; MEK1 with links to RAF, ERK1, HRAS, SAF1; SAF1 with links to RAF, ERK1, HRAS, MEK1} -> {RAF with links to ERK1, HRAS, MEK1, SAF1; ERK1 with links to RAF, HRAS, MEK1, SAF1; HRAS with links to RAF, ERK1, MEK1, SAF1; MEK1 with links to RAF, ERK1, HRAS, SAF1; SAF1 with links to RAF, ERK1, HRAS, MEK1}), Rule({RAF-(GTP, phosphate) with links to ERK1, HRAS, MEK1, SAF1; ERK1-(GTP, phosphate) with links to RAF, HRAS, MEK1, SAF1; HRAS-(GTP, phosphate) with links to RAF, ERK1, MEK1, SAF1; MEK1-(GTP, phosphate) with links to RAF, ERK1, HRAS, SAF1; SAF1-(GTP, phosphate) with links to RAF, ERK1, HRAS, MEK1} -> {RAF-(GTP, phosphate) with links to ERK1, HRAS, MEK1, SAF1; ERK1-(GTP, phosphate) with links to RAF, HRAS, MEK1, SAF1; HRAS-(GTP, phosphate) with links to RAF, ERK1, MEK1, SAF1; MEK1-(GTP, phosphate) with links to RAF, ERK1, HRAS, SAF1; SAF1-(GTP, phosphate) with links to RAF, ERK1, HRAS, MEK1})]

There are two ways to fix these unnecessary labels:

Implement a minimizer: minimize the number of rules, links, and labels.
Implement some Syndra predicates to allow the user to require that links or labels not exist.

We must make it so that we can correctly implement the statement “active Enzyme phosphorylates Substrate at site S222”. In this statement, the enzyme is active, so it’s probably bound to some agent we don’t know about yet that activates it. This is an argument for implementing a minimizer over implementing more Syndra predicates: an implementation of “active enzyme phosphorylates substrate @ S222” must permit the extension of the rule when more statements clarifying “active” are added, while also remaining minimal in the case when no such statements are added. This cannot be done with extra Syndra predicates, because if we try to make the rule stay minimal by requiring it be bound to no other extra things, then it will not be able to be extended by new statements.

However, if we’re trying to minimize the number of rules, maybe we’ll end up with two statements “A+B→C” and “D→E” being combined by Syndra into the single rule “A+B+D→C+E”, so we still want some way of saying that the rule “A+B→C” does not involve any agents we haven’t mentioned yet. This is an argument for implementing more Syndra predicates over implementing a minimizer.

This is a puzzle.

Ideas:

Implement some combination of both? This is a lot of work.
Make the minimizer ignore the number of rules? Then it still might spuriously merge rules together.
Make the minimizer maximize the number of rules? ← probably a bad plan
Make PregraphHas take in all of the agents/structures, and make it require that no other structures besides those be present unless they are linked; then make linkage way more costly than splitting into separate rules? This works, but it’s weird; why privilege linkage? Surely something must break this.

First stab at a macro (policy): macro for "A binds B"

Create boilerplate for library.py

to contain macros and other nuggets of biological information

Fix `pip install syndra`

I added Syndra to PyPI, and pip search syndra works, but pip install syndra doesn't work yet. Not sure why.

Fix INDRA integration to work with new engine

This should be a port-over of statements_to_predicates.py. It should actually work as-is – the INDRA patternmatching hasn't changed and statements_to_predicate.py interfaces with Syndra via macros.py whose interface hasn't changed.

First stab at PySB grammar

Stable API for Syndra, with documentation

Implement converting z3 models into nice manageable Python set-like objects

Possibly even as a library all on its own.

Guide for integrating Syndra with new KRs

Tidy up INDRA integration into its own directory, as good example for others to consult
Make guide for making a new such directory

Extra functionality for refinements over labels

For example, we should allow the user to make labels like SerinePhosphorylated from which we can infer that a site labeled SerinePhosphorylated would also count as being labeled Phosphorylated as well as Serine.

Feature idea: print models with Kappaesque syntax

This idea came from a conversation with Hector. A syntax more like Kappa would be better than the conventions I came up with for printing models.

Debug system which should produce phosphorylation rule

Now that I can pretty-print my z3 models, I see that the following Syndra predicate:

ModelHasRule(lambda r: And(
        PregraphHas(r, kinase.labeled(active)),
        PregraphHas(r, substrate),
        PostgraphHas(r, kinase.labeled(active)),
        PostgraphHas(r, substrate.labeled(phosphate)),
)).get_python_model()

produces the following model:

[Rule({kinase-(active, phosphate) with links to kinase} -> {kinase-(active, phosphate) with links to kinase})]

This is in error; the substrate does not become phosphorylated. (Maybe it's incorrectly merging substrate and kinase into the same agent?) Fix this.

Change Node to be an enum

This should help with the Occam's Razor bug.

Subtasks (woah, these checkboxes show up on the issue page!):

Modify predicate.py: simplify API, move extra stuff to solver.py
Move pythonize.py to solver.py
Move datatypes.py to solver.py
Make it so that solver.py keeps track of the nodes that should be in the enum, and calls predicate's get_predicate method with that enum provided
Modify structure.py to use the enum
Modify predicate.py to pass the enum down to predicate.py
Modify pythonize.py (now in solver) to be compatible with the enum
Try adding in the edge assertions again, now that everything's been cleaned up

Additional macros (policies)

A phosphorylates B
phosphorylated B is active
A activates B

Causality demo with Walter's example

Given the context from Walter's example (HRAS-MEK-ERK pathway + HRAS is not a kinase), create a demo that is able to infer both (a) that a causal gap with an unboxed statement exists, and (b) that there are a few ways to close the causal gap:

MEK1P phosphorylates SAF1, or
ERK1P phosphorylates SAF1, or
RAF, when bound to HRAS-GTP, phosphorylates SAF1

The goal here is that we can take in a whole pile of statements, have Syndra automatically unbox some of them using the info we get from other statements, and then at the end tell the user if there are any loose ends – unboxed statements still unboxed – remaining.

Example workflow for using Syndra

Prove theorem about only needing a bounded-horizon to find nonlocal causal relationships

e.g. if A activates B via a series of steps, prove that we only have to look at finitely many steps (since there are only finitely many rules) in order to show the deduction.

Re-check atomic_predicate.py

Do the atomic predicates still work now that sets of nodes and sets of edges are implemented as arrays, not functions?

Fix importing of INDRA dependencies

This has actually been a problem for a while, but it's more annoying now. I need to figure out the intended way to import INDRA. Fixing this may involve making a pull request to INDRA.

I cannot do something like this:

from indra.indra import trips

because the first indra is not a module (it lacks an __init__.py). Even supposing that I fix that, I run into issues where stuff in INDRA assumes that we'll be doing imports a la import indra.whatever, which breaks things when I'm instead doing indra.indra.whatever.

It would be nice if I could simply fix this by using the inner indra subfolder of the outer indra folder as my module, but I can't do that because the inner indra has dependencies on the folder data, which is in the outer indra folder. This feels like an abstraction violation, so I'll see if I can fix it.

Also to consider: looking into whether indra is pip-installable; then we could get rid of the submodule altogether.