categoricaldata / hydra Goto Github PK

View Code? Open in Web Editor NEW

64.0 64.0 9.0 20.18 MB

Transformations transformed

License: Apache License 2.0

Scala 2.81% Haskell 38.21% Java 58.57% Shell 0.01% ANTLR 0.41%

hydra's People

Contributors

Stargazers

Watchers

Forkers

socioprophet fortress-biotech sitch klausharbo jimkont amarathavale owenanalytics akatie dianchengwangchn

hydra's Issues

Use topological sort in type inference

Currently in Hydra, the type of every element must be given along with the data (term) of the element. This is a holdover from Hydra's initial type system, which was based on STLC rather than Hindley-Milner (which permits type inference). While it is possible now to infer the type of many elements rather than relying on type annotations (which are labor intensive and error-prone for the developer), this must be done by ordering the elements in a graph to take into account the dependencies introduced by element references. Only in those cases where the reference structure forms a cycle will type annotations be required.

Currently, types are inferred on a term-by-term or element-by-element basis, but the change described above will require types to be inferred for an entire graph at once.

Port Flows primitives to Java

See https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Dsl/Lib/Flows.hs

Allow arbitrary case for element names

Hydra currently makes some assumptions about case conventions for element and field names, e.g.

The local part of type names are in PascalCase
The local part of non-type element names are in camelCase
Field names are in camelCase

While these may be considered as best practices for models defined natively in Hydra, the only hard and fast rule should be that names contain only alphanumeric ASCII characters and underscores in their local part, while slashes '/' are also allowed in the namespace part. Certain lexical operations are currently broken by models which do not follow the best practices described above, and this should be fixed.

Port Adapters module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Adapters/Coders.hs

Port TermAdapters module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Adapters/Term.hs

Port Reduction module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Reduction.hs

Transform variable types to Haskell typedefs

While Hydra currently does not support typedefs (in contrast to newtypes / wrappers), the new data model specification which Hydra has been migrating to does have constructs which could be considered as typedefs. E.g. in Description := string, the name "Description" would currently be bound to TypeLiteral LiteralString in Hydra, while for User := Person the name "User" would be bound to TypeWrap $ Nominal (Name "Person"). This asymmetry exists because names and variables were not unified until recently, so a type expression like TypeVariable (Name "Person") was not possible. Wrappers were essentially a workaround for this limitation.

Wrapper types and terms have not yet been fully migrated to the new data model specification. Once they are, we will have two options for binding "User" to "Person". One is more like a typedef: TypeVariable (Name "Person") and the other is more like a newtype: TypeWrap $ TypeVariable (Name "Person"). The Haskell DSL will need to distinguish between the two (most likely using the wrap helper function for the latter case), and the Haskell coder will be able to generate one as an actual typedef, and the other as a newtype into Haskell. Since Java does not support typedefs, these will either need to be turned into newtypes by the adapter framework, or we can manifest them using subclasses (which would give you a kind of one-way typedef). Note that it will also be possible to manifest typedefs in Haskell which bind type parameters to concrete types, e.g. GeoPolygon := Polygon GeoPoint would be realized using the application type constructor.

Port Math lib to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Lib/Math.hs

Port CoreLanguage module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/CoreLanguage.hs

Port Kv module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Kv.hs

Port Strings module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Dsl/Lib/Strings.hs

Port Sets lib to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Lib/Sets.hs

Make adapter framework tolerant of recursive types

...even when mapping into highly constrained languages like JSON and YAML. For example, it should be possible to serialize all terms to JSON according to their types. Currently, recursive types cause the solver to enter an infinite loop.

Port Literals primitives to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Dsl/Lib/Literals.hs

Port Flows module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Monads.hs

This may have to be dealt with differently in Java than it is in Haskell

Migrate CoreDecoding module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/CoreDecoding.hs

Language-agnostic test suite for primitive functions

The behavior of Hydra's primitive functions are currently tested individually in Haskell and Java, with no guarantee of parity. We should create shared test cases for each primitive function (starting with hand-picked inputs and expected outputs). These can be defined using the DSLs and propagated into language variants. We will create new src/gen-test directories to receive the generated test cases.

Fix type annotations for Sets Java port

Using the Map module as an example:

public class Map<A, B> extends PrimitiveFunction<A>

The above is incorrect since the PrimitiveFunction typeclass is simply an annotation and not related to the types of the actual function being implemented. This (and others in Sets) should be fixed to look like:

public class Map<A> extends PrimitiveFunction<A>

where none of the class methods also use A as a typeclass.

Migrate type inference to Java

See https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Inference.hs and its dependencies

Port Prims module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Dsl/Prims.hs

Hydra-Python

If a Python implementation of Hydra existed right now, we would be using it right now. While not as urgent as Hydra-Go (#65), Hydra-Python would put Hydra-based graph and data integration tooling in contact with AI and data science tooling from Microsoft, LinkedIn, and of course the open-source community. The biggest unknown is how Hydra's strongly typed environment will interact with Python's dynamically-typed one. Python does have most of the required building blocks w.r.t. types and terms, while the fact that type annotations are optional in Python means that code generation into Python is likely to be far simpler than code generation into Java.

Flow support in Haskell DSL

A large portion of Hydra's code is monadic -- specifically, it uses Flow, which is a variation on the State monad. However, this portion of the code has not yet been ported into the DSL or propagated into Java; right now, we are porting monadic code to Java by hand. We are not blocked on this, since it is relatively straightforward to port the code by hand, but it is an obstacle to fully closing the loop, and represents a future maintenance burden. Introduce Flow-specific syntax to the Haskell DSL, and also supply any missing primitives. Verify support for monadic code in generated Haskell and Java.

Migrate Common module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Common.hs

Port Grammars module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Dsl/Grammars.hs

Support user-defined functions (UDFs) in Haskell and Java

UDFs will behave identically to primitive functions; the only difference is that primitive functions are expected to be built-in, and present in every Hydra implementation, whereas UDFs are domain-specific. Building on #62, UDFs would simply be bound to their implementations in a distinct namespace, and the order of preference for variable name resolution would be: 1) lambda-bound variables, 2) UDFs, 3) built-in primitive functions, 4) elements (let-bound variables).

Hydra slack requires an invite

The link given in this repo presumes prior membership

Port Lexical module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Lexical.hs

Port Optionals module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Lib/Optionals.hs

Port Bootstrap module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Dsl/Bootstrap.hs

Migrate JSON coder to Java

See https://github.com/CategoricalData/hydra/tree/e4d85ce5f77798550faba05cb11665aac7106b5f/hydra-haskell/src/main/haskell/Hydra/Langs/Json

This will allow data, schemas, and programs to be communicated between Haskell and Java

Add a default branch for case statements

Logically, a case statement should have one term for each field of the corresponding union type, but in practice, we often care about only a subset of the fields, and we provide a default value for the rest. Add this functionality into Hydra's union elimination terms.

Migrate Codegen module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Codegen.hs

Port LiteralAdapters module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Adapters/Literal.hs

Port Kernel module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Kernel.hs

Hydra-Go

A partial or even full Go implementation of Hydra is likely to be required for some of our work at LinkedIn. Start by creating a Go model and Go language constraints, then proceed to a type-level Go coder. Extend this to a bicoder if practical, and implement as much of the Hydra kernel as is required for that particular use case. The recent work on migrating the kernel to Java will likely make a Go migration simpler.

Port CoreEncoding module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/CoreEncoding.hs

Add a records-to-TinkerPop-elements coder

It has been suggested that annotations could be used for specifying a mapping from record types (e.g. originating in Avro) to property graph edge types. The format of these annotations is still being discussed, but a few basic requirements come to mind:

Annotations must specify edge labels, out-vertex and in-vertex identifiers -- i.e. strategies for constructing the label, out-vertex id, and in-vertex id on the basis of fields in the record type
Optionally, annotations may specify a strategy for constructing the id of an edge based on record data. If no strategy is provided, a default strategy will be used
Explicitly supported properties may be indicated with annotations
Annotations may specify a strategy for turning un-annotated fields into properties, defaulting to a simple strategy if none is provided (e.g. no extra properties, or all extra fields map to properties)
Both edges and vertices should be supported as targets for a given record type
Elements and element types should be parametric in terms of vertex ids, edge ids, and property values, in order to support a variety of target systems. Specific applications will fill the parameters as appropriate (e.g. edge and vertex ids are always strings or always integers, property values are JSON, or Hydra terms, etc.)

Support for producing multiple elements (edges and/or vertices, possibly with the same label) for the same record should be explored.

Port Base module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Dsl/Base.hs

Port Annotations module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Dsl/Standard.hs

Port Maps primitives to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Dsl/Lib/Maps.hs

Port PhantomLiterals module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Dsl/PhantomLiterals.hs

Rename the "m" (annotation) type parameter to "a"

The type parameter for annotations is usually given the letter "m" ("M" in Java) for historical reasons. An "a" would make more sense. However, there are thousands of occurrences to replace, and grep is not (always) our friend.

"Let" support in Haskell DSL

Similar to #68 (Flow support), we need support for let-terms (inner definitions) in the Haskell DSL, with corresponding support in generated Haskell and Java. Only the simplest functions can make do without inner definitions ("where" / local variables).

Generate per-type term coders in Haskell

Often, it is necessary to take an instance of a generated type like LatLon and encode it as a term, or conversely to take a term representing a LatLon and decode it to an instance of the generated type. This encoding is logically defined, but is not materialized as actual Haskell or Java code which can do the encoding/decoding for you.

In many cases, it would be helpful if Hydra were to generate these coders along with the type definition. In other cases, however, this would be overkill, and result in a large amount of code which is never used. The open problem is not so much how to generate the coders, which should be straightfoward, but when to generate them.

Port Sets primitives to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Dsl/Lib/Sets.hs

Allow DSL-based elements to be overridden with primitives

In Java, it will be necessary in some cases to realize certain Hydra kernel elements as primitives instead of, or in addition to, generated code. For example, a few kernel functions like those in the Rewriting module will not map to Java in a straightforward way. In other cases, the declarative and constructive nature of programs expressed in Hydra Core will be a performance bottleneck in Java, and it will be expedient to substitute for the generated code, special-purpose imperative code which has been hand-written for Java.

This will require a slight change to the data model: instead of a special primitive constructor in the term grammar, we will simply use the variable constructor, but variables will resolve to primitives before they resolve to graph elements. The order of preference will be: 1) lambda-bound variables, 2) primitive functions, 3) elements. That way, a primitive function can be placed "in front of" a corresponding element, taking over its role in terms of evaluation. This behavior will be present in both Haskell and Java, though it is particularly important for Java at this time.

Port Rewriting module to Java

https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Rewriting.hs