Giter Club home page Giter Club logo

dvryaboy's Projects

awesome-bigdata icon awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

bud icon bud

Prototype Bud runtime (Bloom Under Development)

cascading icon cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.

elephant-bird icon elephant-bird

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, and HBase code.

elephant-twin icon elephant-twin

Elephant Twin is a framework for creating indexes in Hadoop

flume icon flume

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.

gitbook icon gitbook

The GitBook documentation for Aqueduct

hadoop-lzo icon hadoop-lzo

Patched, refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20

idl_storage_guidelines icon idl_storage_guidelines

This document attempts to capture useful patterns and warn about subtle gotchas when it comes to designing and evolving schemas for long-term serialized data. It is not intended as a guide for how to best represent a particular dataset or process.

impatient icon impatient

source examples to support the "Cascading for the Impatient" blog post series

lakefs icon lakefs

lakeFS - Data version control for your data lake | Git for data

massquerylanguage icon massquerylanguage

The Mass Spec Query Language (MassQL) is a domain specific language meant to be a succinct way to express a query in a mass spectrometry centric fashion.

parquet-format-1 icon parquet-format-1

As we are moving to Apache, please open your pull requests on: https://github.com/apache/incubator-parquet-format

pig icon pig

Mirror of Apache Pig

scribe icon scribe

Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.