Giter Club home page Giter Club logo

sparrow's People

Contributors

gpoirier avatar jonas avatar suhailshergill avatar yawaramin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sparrow's Issues

Support for converting case class to Row

Motivation

The library currently offer the possibility of reading a Spark Row and convert it to a case class (or custom structure - but optimized for case classes). What's missing is going the other way around, from the case class to the Row, to be able to serialize case classes and load them back. A collateral product of this is likely be to produce a Spark StructType for a case class.

When we have bidirectional conversion Row <==> case class, then we can use Parquet to take case of the Row <==> Array[Byte] part to get full serialization of case classes and get Spark's SQL support on that data.

Input

  • a failure case as captured in #15 which may suggest to possible failures/bugs in the parquet <-> binary serialization layer
    which would have a direct impact on the motivation for this issue.

Output

  • an assessment of what the above failure examples mean.
    • is parquet serialization lossy? if so, can it be fixed? is it recoverable?
    • if parquet serialization to binary isn't lossy, then working code for converting case classes to Row with some of the following cases covered
      • custom field types with private/non-public constructor
      • nested field types
      • case classes with non-case class fields in them. a good example is joda DataTime
        • it's possible that the above may not be doable, or feasible, but in that case documentation regarding how to tackle such scenarios is to be added (eg. is the library user expected to define their own serializer/deserializers to the subset support by us, and if so what is the subset of cases our library would cover?)

Test

  • CI (with unit tests covering the above) passes

Add example project to showcase sparrow

Motivation

Provide an example which covers a simple case but which still gives a good introduction on how to use sparrow. The example project can also serve as an integration test.

Input

Output

  • Example code using @schema macro with @embedded case classes
  • Integration tests for example code

Test

  • CI passes

Update the macro to support overloaded apply

The way the DSL code is generated by createSchema regarding the apply partially applied function, it prevent the possibility to have overloaded apply methods on the companion object of a case class where the macro is being used. There's a FIXME in the code related to this:
https://github.com/ypg-data/sparrow/blob/master/core/src/test/scala/com.mediative.sparrow/SchemaSpec.scala#L67

The solution is to replace the following style in the generated code:

implicit val schema = (
  field[String]("name") and
  field[Long]("count")
)(apply _)

By something of this style:

implicit val schema = (
  field[String]("name") and
  field[Long]("count")
)((name: String, count: Long) => apply(name, count))

Publish build artifacts to bintray

Motivation

Currently build artifacts are published to http://ypg-data.github.io/repo which is mostly a manual process. Configure the project to publish artifacts to bintray to make it easier to release new versions and use them in internal Mediative projects.

Input

Output

  • Ability to publish release artifacts to bintray
  • Stretch goal: publish snapshot/release artifacts via Travis
  • Example SBT project that depends on the published artifacts

Test

  • CI passes, especially tests in the example project passes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.