Giter Club home page Giter Club logo

hydra's People

Contributors

aman-dureja avatar dianchengwangchn avatar jimkont avatar joshsh avatar sitch avatar wisnesky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

hydra's Issues

Use topological sort in type inference

Currently in Hydra, the type of every element must be given along with the data (term) of the element. This is a holdover from Hydra's initial type system, which was based on STLC rather than Hindley-Milner (which permits type inference). While it is possible now to infer the type of many elements rather than relying on type annotations (which are labor intensive and error-prone for the developer), this must be done by ordering the elements in a graph to take into account the dependencies introduced by element references. Only in those cases where the reference structure forms a cycle will type annotations be required.

Currently, types are inferred on a term-by-term or element-by-element basis, but the change described above will require types to be inferred for an entire graph at once.

Allow arbitrary case for element names

Hydra currently makes some assumptions about case conventions for element and field names, e.g.

  • The local part of type names are in PascalCase
  • The local part of non-type element names are in camelCase
  • Field names are in camelCase

While these may be considered as best practices for models defined natively in Hydra, the only hard and fast rule should be that names contain only alphanumeric ASCII characters and underscores in their local part, while slashes '/' are also allowed in the namespace part. Certain lexical operations are currently broken by models which do not follow the best practices described above, and this should be fixed.

Transform variable types to Haskell typedefs

While Hydra currently does not support typedefs (in contrast to newtypes / wrappers), the new data model specification which Hydra has been migrating to does have constructs which could be considered as typedefs. E.g. in Description := string, the name "Description" would currently be bound to TypeLiteral LiteralString in Hydra, while for User := Person the name "User" would be bound to TypeWrap $ Nominal (Name "Person"). This asymmetry exists because names and variables were not unified until recently, so a type expression like TypeVariable (Name "Person") was not possible. Wrappers were essentially a workaround for this limitation.

Wrapper types and terms have not yet been fully migrated to the new data model specification. Once they are, we will have two options for binding "User" to "Person". One is more like a typedef: TypeVariable (Name "Person") and the other is more like a newtype: TypeWrap $ TypeVariable (Name "Person"). The Haskell DSL will need to distinguish between the two (most likely using the wrap helper function for the latter case), and the Haskell coder will be able to generate one as an actual typedef, and the other as a newtype into Haskell. Since Java does not support typedefs, these will either need to be turned into newtypes by the adapter framework, or we can manifest them using subclasses (which would give you a kind of one-way typedef). Note that it will also be possible to manifest typedefs in Haskell which bind type parameters to concrete types, e.g. GeoPolygon := Polygon GeoPoint would be realized using the application type constructor.

Make adapter framework tolerant of recursive types

...even when mapping into highly constrained languages like JSON and YAML. For example, it should be possible to serialize all terms to JSON according to their types. Currently, recursive types cause the solver to enter an infinite loop.

Language-agnostic test suite for primitive functions

The behavior of Hydra's primitive functions are currently tested individually in Haskell and Java, with no guarantee of parity. We should create shared test cases for each primitive function (starting with hand-picked inputs and expected outputs). These can be defined using the DSLs and propagated into language variants. We will create new src/gen-test directories to receive the generated test cases.

Fix type annotations for Sets Java port

Using the Map module as an example:

public class Map<A, B> extends PrimitiveFunction<A> 

The above is incorrect since the PrimitiveFunction typeclass is simply an annotation and not related to the types of the actual function being implemented. This (and others in Sets) should be fixed to look like:

public class Map<A> extends PrimitiveFunction<A> 

where none of the class methods also use A as a typeclass.

Hydra-Python

If a Python implementation of Hydra existed right now, we would be using it right now. While not as urgent as Hydra-Go (#65), Hydra-Python would put Hydra-based graph and data integration tooling in contact with AI and data science tooling from Microsoft, LinkedIn, and of course the open-source community. The biggest unknown is how Hydra's strongly typed environment will interact with Python's dynamically-typed one. Python does have most of the required building blocks w.r.t. types and terms, while the fact that type annotations are optional in Python means that code generation into Python is likely to be far simpler than code generation into Java.

Flow support in Haskell DSL

A large portion of Hydra's code is monadic -- specifically, it uses Flow, which is a variation on the State monad. However, this portion of the code has not yet been ported into the DSL or propagated into Java; right now, we are porting monadic code to Java by hand. We are not blocked on this, since it is relatively straightforward to port the code by hand, but it is an obstacle to fully closing the loop, and represents a future maintenance burden. Introduce Flow-specific syntax to the Haskell DSL, and also supply any missing primitives. Verify support for monadic code in generated Haskell and Java.

Support user-defined functions (UDFs) in Haskell and Java

UDFs will behave identically to primitive functions; the only difference is that primitive functions are expected to be built-in, and present in every Hydra implementation, whereas UDFs are domain-specific. Building on #62, UDFs would simply be bound to their implementations in a distinct namespace, and the order of preference for variable name resolution would be: 1) lambda-bound variables, 2) UDFs, 3) built-in primitive functions, 4) elements (let-bound variables).

Add a default branch for case statements

Logically, a case statement should have one term for each field of the corresponding union type, but in practice, we often care about only a subset of the fields, and we provide a default value for the rest. Add this functionality into Hydra's union elimination terms.

Hydra-Go

A partial or even full Go implementation of Hydra is likely to be required for some of our work at LinkedIn. Start by creating a Go model and Go language constraints, then proceed to a type-level Go coder. Extend this to a bicoder if practical, and implement as much of the Hydra kernel as is required for that particular use case. The recent work on migrating the kernel to Java will likely make a Go migration simpler.

Add a records-to-TinkerPop-elements coder

It has been suggested that annotations could be used for specifying a mapping from record types (e.g. originating in Avro) to property graph edge types. The format of these annotations is still being discussed, but a few basic requirements come to mind:

  • Annotations must specify edge labels, out-vertex and in-vertex identifiers -- i.e. strategies for constructing the label, out-vertex id, and in-vertex id on the basis of fields in the record type
  • Optionally, annotations may specify a strategy for constructing the id of an edge based on record data. If no strategy is provided, a default strategy will be used
  • Explicitly supported properties may be indicated with annotations
  • Annotations may specify a strategy for turning un-annotated fields into properties, defaulting to a simple strategy if none is provided (e.g. no extra properties, or all extra fields map to properties)
  • Both edges and vertices should be supported as targets for a given record type
  • Elements and element types should be parametric in terms of vertex ids, edge ids, and property values, in order to support a variety of target systems. Specific applications will fill the parameters as appropriate (e.g. edge and vertex ids are always strings or always integers, property values are JSON, or Hydra terms, etc.)

Support for producing multiple elements (edges and/or vertices, possibly with the same label) for the same record should be explored.

Rename the "m" (annotation) type parameter to "a"

The type parameter for annotations is usually given the letter "m" ("M" in Java) for historical reasons. An "a" would make more sense. However, there are thousands of occurrences to replace, and grep is not (always) our friend.

"Let" support in Haskell DSL

Similar to #68 (Flow support), we need support for let-terms (inner definitions) in the Haskell DSL, with corresponding support in generated Haskell and Java. Only the simplest functions can make do without inner definitions ("where" / local variables).

Generate per-type term coders in Haskell

Often, it is necessary to take an instance of a generated type like LatLon and encode it as a term, or conversely to take a term representing a LatLon and decode it to an instance of the generated type. This encoding is logically defined, but is not materialized as actual Haskell or Java code which can do the encoding/decoding for you.

In many cases, it would be helpful if Hydra were to generate these coders along with the type definition. In other cases, however, this would be overkill, and result in a large amount of code which is never used. The open problem is not so much how to generate the coders, which should be straightfoward, but when to generate them.

Allow DSL-based elements to be overridden with primitives

In Java, it will be necessary in some cases to realize certain Hydra kernel elements as primitives instead of, or in addition to, generated code. For example, a few kernel functions like those in the Rewriting module will not map to Java in a straightforward way. In other cases, the declarative and constructive nature of programs expressed in Hydra Core will be a performance bottleneck in Java, and it will be expedient to substitute for the generated code, special-purpose imperative code which has been hand-written for Java.

This will require a slight change to the data model: instead of a special primitive constructor in the term grammar, we will simply use the variable constructor, but variables will resolve to primitives before they resolve to graph elements. The order of preference will be: 1) lambda-bound variables, 2) primitive functions, 3) elements. That way, a primitive function can be placed "in front of" a corresponding element, taking over its role in terms of evaluation. This behavior will be present in both Haskell and Java, though it is particularly important for Java at this time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.