Giter Club home page Giter Club logo

arrows.jl's People

Contributors

jburroni avatar minasyan avatar zenna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

amit2014 grseb9s

arrows.jl's Issues

Cleanup naming mess

Name are important but are a bit of a mess

  • Define clear rules of what names are valid within a given context
  • Make sure all Ports have unique names
  • Differentiate between the name of a SubArrow and of its Arrow
  • Then simplify rename!
  • Decide on more refined naming interface, .e.g port_name, sub_port_names

Implement Constant Propagation

Could either use simply true, false to represent whether a Value is a constant, or construct a new type, e.g:

abstract type Const end
struct IsConst <: Const end
struct NotConst <: Const end

Invariants:

  • The output of any SourceArrow is a constant
  • All outputs of any PrimArrows are all constant if and only if all inputs are constant.
  • This should be applied recursively to inner CompArrows.

Q. Suppose we can infer that a constant input to mul is 0 then the output is also 0, should this be a constant?

Inner constructor warning from propagate

using Arrows

WARNING: deprecated syntax "inner constructor Propagation(...) around /home/zenna/repos/Arrows.jl/src/propagate/propagate.jl:7".
Use "Propagation{#s170}(...) where #s170" instead.

Revamp Properties

Port properties as they stand are restrictive. Problems:

  • Relational ports don't have an direction
  • There are multiple kinds of error ports
  • We have lots of little functions like is_error: not extensible

Syntax for filtering

Very often we need to select core items according to filters

  • SubPort

  • Port

  • SubArrow

  • Value

  • Arrow

  • 1

  • Many

  • a range

We want to filter them by

  • is_error_port
  • is_param_port
  • is_in_port
  • Combinations thereof

Examples

  • All the in sub ports
  • All the out ports
  • The first parametric port
  • The first parametric sub_port
  • All arrows with more than one parametric port

Let's have a syntax for this. It needs to specify what type of thing we want

  • What kind of thing we want

  • Do we want 1, Many, a range

  • Predicates that filter it

  • All the in sub ports of arr
    ▹ [;] arr

  • All the out ports of arr
    ◂ [:] arr

  • The first parametric port
    ⬧ 1 θ

  • The last parametric sub_port
    ⬨ end δ

This is not julia syntax, what can we do instead?

  • @Fil ▹ [;] arr

  • @▹ [;] arr

  • ▹([:], arr, δ)

  • ▹([1:4], arr, δ)

  • arr[1,2,3], arr, δ)

  • ▹[1:5](arr, δ)

  • arr[▹, 1:5, δ]

I think the correct order is something like this: ▹ [;] arr δ because it reads in this way.
"I want all sub_in_ports of arr which have this property`

▹([;], arr, δ)

But to be consistent with getfield

(arr, δ)[1:3]
(arr)
(arr, θ)[1]

Disable Global Context fails

Currently if you call disable_global_ctx!() as the first thing you will do it will work fine.

However, if you say call Var(Integer) then disable then call Var(Integer) again it won't catch that it has been disabled, presumably because the global global_ctx_is_disabled has been compiled in.

If you remove the const it works as expected, but this probably has terrible performance.

Don't link all ports to parent

Currently, when we parametrically invert a function we attacH its parametric inputs to its parent (using link_to_parent). Similarly when we compute domain_errors we attach error outputs to the parent.

This means we have a lot of links projecting from each nested composite arrow all the way out to its outermost parent. On the one hand this may seem necessary because we need all the parametric inputs to execute the function for example. On the other hand it is bad for modularity because

  • Suppose I discover redundant parameters, if I "fix" an inner composite arrow, I need to update every other composite arrow that it is a parent of
  • If I decide I want to fix a parametric input to be constant I need to update all the parent compositions

An alternative approach is to link to not link any port to its parent; the ports of an arrow are specified implicitly as the ports on some of the SubArrows. I believe I initially tried this idea and realised it's problematic because:

  • You can't reuse inports without a dupl
  • You can't have shortcut connections directly from in_port to out_port without plugging an identity function in htere
  • You can't specify the types of an arrow without its contents
  • It forces you to conflate implementation with specification

So I think it's a bad idea.

But perhaps there is a middle ground

@arr macro

Be able to write

@arr function f(x::Int, y::Int)
  2x + y
end

@arr should:

  • construct both the function and the associated composite arrow
  • extract the names and types

Tricky points:

  • How to get return type names
  • How to treat some arguments as not inputs to the arrow, e.g.
@arr function f(x::Float64, y::Float64, z::Int)
  for i = 1:z # is not an input to the arrow
    x = x + y
   end
   x
end
  • Should we recurse into function calls? Should the following code call g as a function and expand it out within f or make a reference to g with f_arr (how would we do this).
@arr function f(x::Float64, y::Float64)
  a = x + y
  g(a)
end

0 vs 1 indexing

julia is 1, most targets (e.g. stan, theano) will have 0 based indexing.

Dirty CompArrow after updating inner CompArrow

If you add a port to a CompArrow and it is referenced by a SubArrow within one or more other CompArrows, (even itself) they may be dirty in that

  • If you remove ports, all those edges will be invalid
  • If you change the types of ports; something that is type consistent may become type inconsistent
  • If there are new ports, then there will not be corresponding vertices in its parents edges (is this still true if it is a self, parent?)

This makes the currently way to do invert for example, problematic

Enforce port names to be unique

Not globally unique, just unique with respect to the arrow. Non-uniqueness of names means we cannot distinguish a port by name

Homoiconicity

In a homoiconic language, the data structure representing the program can be manipulated by the program itself. This enables code generation either dynamically (e.g. LISP, Julia) or statically (e.g. Scala).

Parametric Inverse of Cond

Implement parametric inverse of CondArrow

CondArrow is a special arrow because it has short-circuiting behaviour. We must ensure that its parametric inverse also short-circuits in the analogous way.

Do we need functypes

Currenly we create functions them lift them to arrows, as is done in haskell.
In Haskell this is useful (necessary even) because we can lift any function to an arrow.
In Julia however, it currently means we have another layer of complexity which adds nothing.

  • It adds nothing because we can't find arbitrary functions to arrows; we can't even get the return type.
  • It means code is more clunky with all this lifting going on

Will either of these change in the future; should we just start with primitive arrows.

We could then write

+ >>> sin

And not worry about it

Nonuniqueness of Arrow names breaks compilation

We do not check or enforce that Arrows have different names, but compilation uses the name to construct an associated Julia function. Only one of several arrows with the same name will be compiled. Possible solutions

  • Enforce unique names
  • Generate unique names when compiling, i.e. julia name is not same as Arrow name

The great restructuring may make this redundant

Change CompArrow to Circuit

There is a conflation of concepts, I have:

Arrows which are like Functions
PrimArrows which are like primitive, built-in functions, or IntrinsicFunctions
CompArrows which are like Functions

I think instead, a what is currently a CompArrow is really about the program graph, a computational graph. Instead I should have

"A computational graph"
mutable struct Circuit
  edges::LG.DiGraph
  ....
end

Fix id_loss

Id_loss breaks (I believe) because the inverse arrow has domain_error outports so the correspondance between the out_ports of inv and the in_ports of fwd are broken.
I do that correspondence in a pretty naive way (assume they are aligned in number). I think there are three levels of fixes

  1. ignore the doman_loss outputs when doing the correspondence, and make sure to link them to parent
  2. Do the correspondence by name instead of number. We'd need to ensure that invert preserves names, and that #35
  3. Figure out more semantic/explicit way to represent correspondences, this is in part what #17 attempts to do.

We should do 1. now and revisit the rest later

Implement Parameter Generating Function Transformation

r: X -> Y is a parameter generating function w.r.t a parametric inverse f⁻¹, if ∀ x, let y = f(x), then:

f⁻¹(y; r(x)) = x

A parameter generating function can be implemented in at least two different ways:

  1. As a graph transformation of f
  2. By parametrically inverting f twice

Focusing on (1), there are two sticky issues

  • How to maintain correspondence between the parameter values in r and f⁻¹.
  • How to handle parametric inverses with a variable number of parameters e.g. fib

Make Arrows subtype Relational

Arrows are kind of like functions which are a kind of relation. By explicitly representing relations (e.g. AddMul) and the relationship between a particular function there are several benefits

  • Less redundant type information
    The type of Sqrt is the reverse of the type of Sqr because both are different sides of same relation. We should only need to store this once.

  • Simplify inversion code

  • Make connecton to parameter generating function explicit

This chage has started in the branch relational

Make CompArrows Symbolic

Composite Arrows currently contain other Composite arrows, for example we can do

carr = CompArrow(:carr)
add_sub_arr!(carr, carr)

This allows us to write recursive functions but more generally not have to effectively unroll everything into one flat composition. Another plus is that everything is self contained; there is no need for an external register or directory what symbols correspond to.

However there is a big cost to generality. Compare with an Expr object in Julia, and abstract syntax trees in general. For example:

f(x) = x + g(x)

g here is purely a symbol; it does not contain any other function. Looking at the lowered code, g eventually resolves to Main.g which is a GlobalRef, which is still just a name resolved to a particular package.

When we resolve the function with particular types we get then get GlobalRefs which correspond to intrinsics, or we get invocations.

Of course we don't have to follow the Julia model but the power of interface based design starts with a symbol representation of the program that doesn't prematurely encode the implementation of a particular method.

Make domain_loss optimze

Assuming problem is not bug and #32 is not the issue (likely as we saw this problem in reverseflow), we need another approach. Possible causes

  • Node loss simply defines a difficult to optimize landscape
  • Discontinuities of either totalization or domain loss functions lead to above
  • Contribution of each node loss to loss term that is minimized is arbitrary and misleads optimization
  • Some kind of dead-end, i.e. some node loss term is dominating and solving that leads us to a poor area of the search space

Possible solutions

  • Gradient based optimization may help #29
  • Different domain loss functions
  • Different totalization functions

The effects of all of these will be better analyzed once I have analysis code working (depends on #12 #21).

If all else fails, we will need to deal with assertions in a different way.

Order Ports by order in which computed

Given a parametric inverse f, find an ordering of the ports, order its Ports by the order in which they are computed.

"""
# Arguments:
- `prts`: Ports from the same arrow
# Returns:
- `ordering`: `prts[ordering[t]]` is the port in ordering computed. 
function order_ports(prts::vector{Port})::Vector{Int}

This function is currently needed for analysis. We want to be able to see how the node errors of primitives vary across the arrow as a function of the position in which it is computed (e.g. to determine if solving earlier constraints first is easier), and how this varies as a function of the optimization process.

Make boundary ports implicit

Proposal is to make the boundary ports of a CompArrow implicitly defined by labeling SubPorts of SubArrows as on boundary.

Pros:

  • Significantly improve modularity
  • Boundary ports could inherit port properties without copying
  • Might lead to code simplification

Cons:

  • I think I tried this before and then decided I favoured explicit ports. But I don't remember the reason and if this proposal is significantly different
  • Sometimes a property could not be inhereted directly, e.g. may not be able to inherit name because it would cause duplicate names.
  • We may not want to inherit exactly, e.g. if type of boundary port is more precise that type of inner port.
  • Ah ha! I think I remember why I didn't use it. It becomes impossible for an input to be reused without the use of a dupl
  • May in other places make code more complex
  • Can't have "pass through" connection, i.e. inport goes directly to output
  • Could do inheritance without copying anyway; this idea is orthogonal to implicitness

Overloaded SubPort operations will have unexpected behaviour

Currently if you do x + y where x and y are subports, a new AddArrow will be created and added to the parent composition of x and y, and links will be made from x and y to the in_ports of the subarrow.

However, there are several cases we haven't accounted for which will simply break or give undesired or unexpected behaviour:

  • x and y have different parents, should this be an undefined operation or should we attempt a kind of merge
  • x or y are dst SubPorts rather than src. Should this be undefined or should we go back to the source. Should we delete the current edge

Fix `aprx_error!`

aprx_error! is broken because after adding ports to a CompArrow, it's parent needs to be updated.

There are more fundamental changes to all the core data-structures which may make this problem go away, but for now we need a quick fix.

Approaches

  • Make aprx_totalize use walk! instead of lightwalk.
  • After adding a node, somehow traverse to parent and update it at that tiem
    Have a genericclean` function which fixes these issues in the parent/

I think the first of these is probably the easiest.

Clean up analyze api

Should be easy to invert and optimize with respect to anything and plot results with respect to anything

Differentiate between id_error and domain_error

Currently we use port labels which are simply Symbols to given labels to ports.

But there are for example

  • Different kinds of error ports
  • These different kinds are still all error ports

One option is to use the julia type hierarchy to express this

abstract type Property end
abstract type Error <: Property end
abstract type IdError <: Error end

Fix type of maprecur

And update policies accordingly

# FIXME, fix maprecur so can add return typep to this of ::Vector{<:Policy}
"Recursively get all the policies of `carr`"
policies(carr::CompArrow) = maprecur(DetPolicy, carr)

CoolType not defined

CoolType needs to be renamed for one.

Generally the following problem results from the fact that no concrete type results from the last input of over

julia> simple_cnet2 = over(lift(conv2dfunc))
CompositeArrow{3,2} - 1 subarrows

julia> simple_cnet2^C

julia> simple_cnet2(D, weights, b)
ERROR: UndefVarError: CoolType not defined
 in inppintype at /home/zenna/.julia/v0.4/Arrows/src/arrowtypes/compositearrow.jl:36
 in function_sequence at /home/zenna/.julia/v0.4/Arrows/src/compile.jl:121
 in compile at /home/zenna/.julia/v0.4/Arrows/src/compile.jl:164
 in call at /home/zenna/.julia/v0.4/Arrows/src/compile.jl:182

Variable number of dimension array types

Many functions have natural extension to variable dimension.
In theano for example, addition (+) applies to scalars, vectors, matrices and tuples of arbitrary dimension.

It's cumbersome to need to specify which kind of + you want, and limits modularity.

Variable dimension types are difficult to reason about using a solver because

  1. You have a variable number of variables
  2. The number of variables could be unbounded
  3. The number of variables depends on the values of other type parameters

Approach 1: Two pronged approach

In this approach a value N (integer variable or constant) for the number of dimensions.
If N is a concrete integer then the dimension sizes are as we have already established.
If N is a variable then the type also contains a procedure to construct a type from a concrete dimension size.

For example + would be something like

+ :: N, N >>> N  # maps two tensors of dimension N to another of dimension N
+ :: N -> [xi for i = 1:n], [xi for i = 1:n] >>> [xi for i = 1:n]  # The dimensions should have the same size

Then we solve by

  1. Solving for dimension numbers Ns
  2. Generating typing for concrete dimension numbers, and solving for this
  3. If unsat, add clause and return to 1

This is like DPLL in the type system.

My concerns are that there may be an unbounded (or very large) number of solutions to the #dimension problems but at the size level it may be unsat. This is the standard problem in DPLL, but we are not propagating much "theory" level information back.

Pros

  • Can encode complex types

Cons

  • Would likely need a way to propagate more information between layers for it to be useful

Approach 2 : cases

In this approach, we encode all the information into a single formula, uses case

We create type groups, which are simply a finite number of cases. e.g. for +

+ :: [M], [M] >>> [M]
+ :: [M,N], [M,N] >>> [M,N]
....

In SMT we use an enumeration value to encode the values in the different type.

Pros

  • Leaves decision making up to solver
  • can add arbitary types to a symbol

Cons

  • More varables in solving process
  • All functions are bounded length

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.