Giter Club home page Giter Club logo

awesome-streaming's Introduction

Awesome Streaming

Awesome

A curated list of awesome streaming (stream processing) frameworks, applications, readings and other resources. Inspired by other awesome projects.

Table of Contents

Streaming Engine

  • Apache Apex [Java] - unified platform for big data stream and batch processing.
  • flink-streaming [Java] - system for high-throughput, low-latency data stream processing that supports stateful computation, data-driven windowing semantics and iterative stream processing.
  • gearpump [Scala] - lightweight real-time distributed streaming engine built on Akka.
  • heron - Twitter's real-time analytics platform that is fully API-compatible with Storm. Storm has been replaced by Heron at Twitter.
  • mantis - Netflix's event stream processing system.
  • millwheel - framework for building low-latency data-processing applications that is widely used at Google.
  • mupd8(muppet) [Scala/Java] - mapReduce-style framework for processing fast/streaming data.
  • pulsar [Java] - an open-source, real-time analytics platform and stream processing framework.
  • s4 [Java] - general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.
  • Apache Samza [Scala/Java] - distributed stream processing framework that build on Kafka(messaging, storage) and YARN(fault tolerance, processor isolation, security and resource management).
  • spark-streaming [Scala] - makes it easy to build scalable fault-tolerant streaming applications.
  • SPQR [Java] - dynamic framework for processing high volumn data streams through pipelines.
  • Apache Storm [Clojure/Java] - distributed real-time computation system. Storm is to stream processing what Hadoop is to batch processing.
  • tigon [C++/Java] - high throughput real-time streaming processing framework built on Hadoop and HBase.
  • hailstorm [Haskell] - distributed stream processing with exactly-once semantics based on Storm.

IoT

  • sensorbee [Go] - lightweight stream processing engine for IoT.
  • quarks [Java] - a programming model and runtime that enables continuous streaming analytics on gateways and edge devices which can work with centralized systems to provide efficient and timely analytics across the whole IoT ecosystem: from the center to the edge, opens sourced by IBM.

Reactive Streams

  • akka-streams [scala] - an implementation of Reactive Streams in Akka.
  • monifu [scala] - high-performance Scala / Scala.js library for composing asynchronous and event-based programs.

DSL

  • summingbird [Scala] - library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.
  • coast [Scala] - a DSL that builds DAGs on top of Samza and provides exactly-once semantics.
  • Apache Beam [Java] - unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs), open sourced by Google.

Data Pipeline

  • Apache Kafka [Scala/Java] - distributed, partitioned, replicated commit log service, which provides the functionality of a messaging system, but with a unique design.
  • metaq [Java] - Taobao's high available, high performance distributed messaging system
  • nsq [Go] - realtime distributed messaging platform designed to operate at scale, handling billions of messages per day.
  • camus [Java] - Linkedin's Kafka -> HDFS pipeline.
  • databus [Java] - Linkedin's source-agnostic distributed change data capture system.
  • flume [Java] - distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
  • suro [Java] - data pipeline service for collecting, aggregating, and dispatching large volume of application events including log data.

Online Machine Learning

  • streamDM [Scala] - mining Big Data streams using Spark Streaming from Huawei.
  • jubatus [C++] - distributed processing framework and streaming machine learning library.
  • Apache Samoa [Java] - distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.
  • trident-ml [Java] - realtime online machine learning library based on Trident.
  • StormCV [Java] - enables the use of Apache Storm for video processing by adding computer vision (CV) specific operations and data model.

Stream SQL

  • pipelinedb [C] - An open-source relational database that runs SQL queries continuously on streams, incrementally storing results in tables.
  • squall [Java] - Squall executes SQL queries on top of Storm for doing online processing.
  • StreamCQL [Java] - Continuous Query Language on RealTime Computation System.

Benchmark

  • storm-benchmark [Java] - a set of benchmarks to test Storm performance.
  • storm-perf-test [Java] - a simple storm performance/stress test.
  • streaming-benchmarks [Java] - Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, etc.
  • flotilla [Go] - Automated message queue orchestration for scaled-up benchmarking.

Toolkit

  • akka [Scala] - toolkit and runtime for building highly concurrent, distributed, and resilient message-driven application on the JVM.
  • pulsar [Python] - Actor based event driven concurrent framework for Python.
  • aeron [Java/C++] - efficient reliable unicast and multicast message transport.
  • StreamFlow [Java] - stream processing tool designed to help build and monitor processing workflows.
  • samza-luwak [Java] - uses Luwak, a stored-query engine built on Lucene, to implement full-text search on streams.
  • Turbine [Java] - tool for aggregating streams of Server-Sent Event (SSE) JSON data into a single stream.

Readings

Blogs

Articles

  1. In-Stream Big Data Processing
  2. The world beyond batch: Streaming 101 by Tyler Akidau.

Streaming Algorithms and their applications

from Real Time Analytics: Algorithms and Systems (VLDB 2015)

Problem Description Application
Sampling Obtain a representative set of the stream A/B Testing
Filtering Extract elements which meet a certain criterion Set membership
Correlation Find data subsets (subgraphs) in (graph) data stream which are highly correlated to a given data set Fraud detection
Estimating Cardinality Estimate the number of distinct elements Site audience analysis
Estimating Quantiles Estimate quantiles of a data stream with small amount of memory Network analysis
Estimating Moments Estimating distribution of frequencies of different elements Databases
Finding Frequent Elements Identify items in a multiset with frequency more than a threshold θ Trending Hashtags
Counting Inversions Estimate number of inversions Measure sortedness
Finding Subsequences Find Longest Increasing Subsequences (LIS), Longest Common Subsequence (LCS), subsequences similar to a given query sequence Traffic analysis
Path Analysis Determine whether there exists a path of length ≤ ` between two nodes in a dynamic graph Web graph analysis
Anomaly Detection Detect anomalies in a data stream Sensor networks
Temporal Pattern Analysis Detect patterns in a data stream Traffic analysis
Data Prediction Predict missing values in a data stream Sensor data analysis
Clustering Cluster a data stream Medical imaging
Graph analysis Extract unweighted and weighted matching, vertex cover, independent sets, spanners, subgraphs (sparsification) and random walks, computing min-cut Web graph analysis
Basic Counting Estimate m' of the number m of 1-bits in the sliding window (of size n) such that ` m' − m
Significant One Counting Estimate m' of the number m of 1-bits in the sliding window (of size n) such that if m ≥ θn, then ` m' − m

License

Creative Commons License

Licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.