dbiir / paraflow Goto Github PK
View Code? Open in Web Editor NEWA real-time analytical system for ID-associated data
Home Page: https://dbiir.github.io/paraflow/
License: Apache License 2.0
A real-time analytical system for ID-associated data
Home Page: https://dbiir.github.io/paraflow/
License: Apache License 2.0
Write a detailed documentation on project aims, installation, usage guides and configuration explanations.
Segments are cached in memory before flushing to HDFS.
Distribtued locks are required to safely flushing in-memory segments without interrupting running queries.
Pixels is a new 'smart' columnar format developed by us, and it will be open source soon.
We need add support for Pixels, and integrate our optimizations over columnar storage into this project.
paraflow-presto
module is for executing SQL queries using Presto.
This is a list of all related issues of this module, this can be seen as a mini task board.
paraflow-loader-producer module is for loading data into Kafka cluster.
This is a list of all related issues of this module, this can be seen as a mini task board.
Kafka AdminClient
seems not be thread safe. When several clients are trying to create a topic at the same time, exceptions will be thrown.
Define LoaderProducer
api.
Add validation support for configuration.
lib
dir exists under the directory of ParaFlow/dist/ParaFlow-xxx/
.lib
dir.RealTimeAnalysis/dist/bin/
to lib
.Presto/presto-server/target/lib
to lib
.ParaFlow-xxx/
dir to a tar file as ParaFlow-xxx.tar
.scp
tar file to each server specified in servers
file.tar
file to specified dir in each server.paraflow-tools is for convenient tools to compile, test and deploy paraflow
system.
This is a list of all related issues of this module, this can be seen as a mini task board.
Provide an open website using Github Pages. The website contains:
paraflow-commons module is for common classes shared by several modules.
This is a list of all related issues of this module, this can be seen as a mini task board.
Support message filters and transformations before loading into Kafka.
TO BE PROCESSED LATER
A producer client for loading data into Kafka.
The client provides following apis:
send()
createTopic()
createUser()
createDatabase()
createFiberTable()
createRegularTable()
createFiberFunc()
registerFilter()
registerTransformer()
We shall design a configuration server, which is a centralized service for all collectors and loaders in the cluster. Each collector or loader listens on this service to get latest configuration parameters.
In this way, we can avoid copying the same configuration files all over the cluster, and tune the cluster on the fly without halting.
setup and learn a little about etcd. Intend to use it for leader election, metadata cache, etc.
Refactor MetaServer:
connection
instance.TransactionController
.Add models in paraflow-metaserver
.
List all public interfaces of MetaServer.
Sub task of #22
API:
List<String> listDatabases()
List<String> listTables(String database)
Database getDatabase(String database)
Table getTable(String database, String table)
Column getColumn(String database, String table, String column)
Status createDatabase(Database database)
Status createTable(Table table)
Status deleteDatabase(String database)
Status deleteTable(String database, String table)
Status renameDatabase(String oldName, String newName)
Status renameTable(String database, String oldName, String newName)
Status renameColumn(String database, String table, String oleName, String newName)
Status createFiber(String database, String table, long value)
List<Long> listFiberValues(String database, String table, long value)
Status addBlockIndex(String database, String table, long fiberV, String beginTime, String endTime, String path)
List<String> filterBlockPaths(String database, String table, String timeLow, String timeHigh)
List<String> filterBlockPaths(String database, String table, long fiberV, String timeLow, String timeHigh)
Models:
String name;
String locationUri;
User user;
Database database;
long creationTime;
long lastAccessTime;
User owner;
String tableName;
String tableLocationUri;
List<Column> columns;
String userName;
String userPass;
String roleName;
long creationTime;
long lastVisitTime;
Table table;
String colName;
String dataType;
int colIndex;
Enum
type)OK
DATABASE_ALREADY_EXISTS
DATABASE_NOT_FOUND
TABLE_ALREADY_EXISTS
TABLE_NOT_FOUND
COLUMN_ALREADY_EXISTS
COLUMN_NOT_FOUND
FIBER_ALREADY_EXISTS
BLOCK_INDEX_ERROR
Implement a Web-based user interface, including the front-end and the back-end.
The Web UI will interact with existing Paraflow modules to provide a user-friendly layout optimization service.
Follow this PR in Presto to replace existing Parquet writer with a native one.
Currently, we only support Parquet.
ORC is as popular as Parquet, and it has well integrated with Presto.
We need add support for ORC.
Main loop for meta server start and stop
MetaServer
uses gRPC as the service definition tool.
There are many useless css files in the paraflow-http-server
module. Remove and compress them as much as possible.
A cache for metadata.
basic kafka consumer client to pull messages from specified topic and partitions.
Implement paraflow-commons logger system.
paraflow-loader-consumer
module is for loading data from Kafka into file system.
This is a list of all related issues of this module, this can be seen as a mini task board.
In ct.sh, testRP function should cd to corresponding dir and then execute mvn command.
I find some segments are staged on local tmpfs forever, and it seems no one is responsible for flushing them to HDFS any more.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.