Giter Club home page Giter Club logo

quasar's Introduction

Quasar Travis Discord

Quasar is a purely-functional compiler and optimizing planner for queries expressed in terms of the Multidimensional Relational Algebra (MRA). Quasar has support for arbitrary backends, both heavyweight (full evaluation engines) and lightweight (simple reads with optional pushdown of structural operations and columnar predicates), including full classpath isolation for lightweight backends.

It's important to note that Quasar is not, in and of itself, a runnable application. It is a library which is used by the broader SlamData product, much of which is closed-source. Contributions are very much welcome, as is feedback, questions, and general conversation. Join the Discord!

Building and Testing

Quasar builds with SBT:

$ ./sbt
> test:compile
> test

If running on Windows, you may use the SBT batch file instead of the shell script.

Code Organization

Probably the most interesting part of the codebase is the optimizing query planner, which is implemented in the qsu submodule, based on data structures defined in qscript. I recommend starting by looking at the LPtoQS class, which defines a kleisli composition that clearly lays out all of the phases of the compiler. The core data structure used by the compiler is QSUGraph, which is a purely functional representation of a directed acyclic graph, which in turn represents data flow in a query.

The formulation of the query plan itself is a fixed-point data structure dictated by several pattern functors composed via coproducts. The primary such pattern functor is QScriptCore. You would generally deconstruct and interpret this query plan using general folds provided by matryoshka.

Query operations which are pushed down to the underlying data source are represented by ScalarStage, and carried via InterpretedRead. Data sources are always free to only implement a subset of the pushdown functionality.

The codebase makes extremely heavy use of Scalaz and Cats throughout (using shims to solve the impedance between them), and many high-level operations (such as datasets) are represented as fs2 streams.

Local Datasource

A Datasource implementation providing access to the filesystems local to the JVM.

Configuration for the local datasource has the following JSON format

{
  "rootDir": String,
  "format": {
    "type": "json",
    "variant": "array-wrapped" | "line-delimited",
    [precise: Boolean]
  },
  ["readChunkSizeBytes": Number,]
  ["compressionScheme": "gzip"]
}
  • rootDir an absolute path to a local directory at which to root the datasource, all paths handled by the datasource will be interpreted relative to this physical directory.
  • format the format of all resources in the datasource, currently JSON is supported in both array-wrapped and line-delimited variants.
  • readChunkSizeBytes (optional) an integer indicating the chunk size to use when reading local files, the default is 1048576 (1MB). Different values may yield higher throughput depending on the filesystem.
  • compressionScheme (optional) whether to expect resources to be compressed, currently gzip is the only supported compression scheme. Omitting this option indicates uncompressed resources.

Legal

Copyright © 2014 - 2019 SlamData Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.