categoricaldata / hydra Goto Github PK
View Code? Open in Web Editor NEWTransformations transformed
License: Apache License 2.0
Transformations transformed
License: Apache License 2.0
Currently in Hydra, the type of every element must be given along with the data (term) of the element. This is a holdover from Hydra's initial type system, which was based on STLC rather than Hindley-Milner (which permits type inference). While it is possible now to infer the type of many elements rather than relying on type annotations (which are labor intensive and error-prone for the developer), this must be done by ordering the elements in a graph to take into account the dependencies introduced by element references. Only in those cases where the reference structure forms a cycle will type annotations be required.
Currently, types are inferred on a term-by-term or element-by-element basis, but the change described above will require types to be inferred for an entire graph at once.
Hydra currently makes some assumptions about case conventions for element and field names, e.g.
While these may be considered as best practices for models defined natively in Hydra, the only hard and fast rule should be that names contain only alphanumeric ASCII characters and underscores in their local part, while slashes '/' are also allowed in the namespace part. Certain lexical operations are currently broken by models which do not follow the best practices described above, and this should be fixed.
While Hydra currently does not support typedefs (in contrast to newtypes / wrappers), the new data model specification which Hydra has been migrating to does have constructs which could be considered as typedefs. E.g. in Description := string
, the name "Description" would currently be bound to TypeLiteral LiteralString
in Hydra, while for User := Person
the name "User" would be bound to TypeWrap $ Nominal (Name "Person")
. This asymmetry exists because names and variables were not unified until recently, so a type expression like TypeVariable (Name "Person")
was not possible. Wrappers were essentially a workaround for this limitation.
Wrapper types and terms have not yet been fully migrated to the new data model specification. Once they are, we will have two options for binding "User" to "Person". One is more like a typedef: TypeVariable (Name "Person")
and the other is more like a newtype: TypeWrap $ TypeVariable (Name "Person")
. The Haskell DSL will need to distinguish between the two (most likely using the wrap
helper function for the latter case), and the Haskell coder will be able to generate one as an actual typedef, and the other as a newtype into Haskell. Since Java does not support typedefs, these will either need to be turned into newtypes by the adapter framework, or we can manifest them using subclasses (which would give you a kind of one-way typedef). Note that it will also be possible to manifest typedefs in Haskell which bind type parameters to concrete types, e.g. GeoPolygon := Polygon GeoPoint
would be realized using the application type constructor.
...even when mapping into highly constrained languages like JSON and YAML. For example, it should be possible to serialize all terms to JSON according to their types. Currently, recursive types cause the solver to enter an infinite loop.
https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Monads.hs
This may have to be dealt with differently in Java than it is in Haskell
The behavior of Hydra's primitive functions are currently tested individually in Haskell and Java, with no guarantee of parity. We should create shared test cases for each primitive function (starting with hand-picked inputs and expected outputs). These can be defined using the DSLs and propagated into language variants. We will create new src/gen-test
directories to receive the generated test cases.
Using the Map module as an example:
public class Map<A, B> extends PrimitiveFunction<A>
The above is incorrect since the PrimitiveFunction
typeclass is simply an annotation and not related to the types of the actual function being implemented. This (and others in Sets) should be fixed to look like:
public class Map<A> extends PrimitiveFunction<A>
where none of the class methods also use A as a typeclass.
See https://github.com/CategoricalData/hydra/blob/main/hydra-haskell/src/main/haskell/Hydra/Inference.hs and its dependencies
If a Python implementation of Hydra existed right now, we would be using it right now. While not as urgent as Hydra-Go (#65), Hydra-Python would put Hydra-based graph and data integration tooling in contact with AI and data science tooling from Microsoft, LinkedIn, and of course the open-source community. The biggest unknown is how Hydra's strongly typed environment will interact with Python's dynamically-typed one. Python does have most of the required building blocks w.r.t. types and terms, while the fact that type annotations are optional in Python means that code generation into Python is likely to be far simpler than code generation into Java.
A large portion of Hydra's code is monadic -- specifically, it uses Flow, which is a variation on the State monad. However, this portion of the code has not yet been ported into the DSL or propagated into Java; right now, we are porting monadic code to Java by hand. We are not blocked on this, since it is relatively straightforward to port the code by hand, but it is an obstacle to fully closing the loop, and represents a future maintenance burden. Introduce Flow-specific syntax to the Haskell DSL, and also supply any missing primitives. Verify support for monadic code in generated Haskell and Java.
UDFs will behave identically to primitive functions; the only difference is that primitive functions are expected to be built-in, and present in every Hydra implementation, whereas UDFs are domain-specific. Building on #62, UDFs would simply be bound to their implementations in a distinct namespace, and the order of preference for variable name resolution would be: 1) lambda-bound variables, 2) UDFs, 3) built-in primitive functions, 4) elements (let-bound variables).
The link given in this repo presumes prior membership
This will allow data, schemas, and programs to be communicated between Haskell and Java
Logically, a case statement should have one term for each field of the corresponding union type, but in practice, we often care about only a subset of the fields, and we provide a default value for the rest. Add this functionality into Hydra's union elimination terms.
A partial or even full Go implementation of Hydra is likely to be required for some of our work at LinkedIn. Start by creating a Go model and Go language constraints, then proceed to a type-level Go coder. Extend this to a bicoder if practical, and implement as much of the Hydra kernel as is required for that particular use case. The recent work on migrating the kernel to Java will likely make a Go migration simpler.
It has been suggested that annotations could be used for specifying a mapping from record types (e.g. originating in Avro) to property graph edge types. The format of these annotations is still being discussed, but a few basic requirements come to mind:
Support for producing multiple elements (edges and/or vertices, possibly with the same label) for the same record should be explored.
The type parameter for annotations is usually given the letter "m" ("M" in Java) for historical reasons. An "a" would make more sense. However, there are thousands of occurrences to replace, and grep is not (always) our friend.
Similar to #68 (Flow support), we need support for let-terms (inner definitions) in the Haskell DSL, with corresponding support in generated Haskell and Java. Only the simplest functions can make do without inner definitions ("where" / local variables).
Often, it is necessary to take an instance of a generated type like LatLon
and encode it as a term, or conversely to take a term representing a LatLon
and decode it to an instance of the generated type. This encoding is logically defined, but is not materialized as actual Haskell or Java code which can do the encoding/decoding for you.
In many cases, it would be helpful if Hydra were to generate these coders along with the type definition. In other cases, however, this would be overkill, and result in a large amount of code which is never used. The open problem is not so much how to generate the coders, which should be straightfoward, but when to generate them.
In Java, it will be necessary in some cases to realize certain Hydra kernel elements as primitives instead of, or in addition to, generated code. For example, a few kernel functions like those in the Rewriting
module will not map to Java in a straightforward way. In other cases, the declarative and constructive nature of programs expressed in Hydra Core will be a performance bottleneck in Java, and it will be expedient to substitute for the generated code, special-purpose imperative code which has been hand-written for Java.
This will require a slight change to the data model: instead of a special primitive
constructor in the term grammar, we will simply use the variable
constructor, but variables will resolve to primitives before they resolve to graph elements. The order of preference will be: 1) lambda-bound variables, 2) primitive functions, 3) elements. That way, a primitive function can be placed "in front of" a corresponding element, taking over its role in terms of evaluation. This behavior will be present in both Haskell and Java, though it is particularly important for Java at this time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.