datafibers / df Goto Github PK
View Code? Open in Web Editor NEWBig Data Swiss Knifes
Home Page: http://www.datafibers.com
License: Apache License 2.0
Big Data Swiss Knifes
Home Page: http://www.datafibers.com
License: Apache License 2.0
Current, the streamed data is saved to local file in df server first, then upload to HDFS. If the file is too big, we will not see the file in HDFS. A better way is to start writing to HDFS once the block size is reached. Later, we can merge the file together.
Make the data is archived into Hadoop and/or a file storage web service before it expires from Kafka.
This will be a far away feature. Put it here as placeholder.
This is new feature to batch load file to HDFS.
In this case, the file need to be arrived in DF server first. DF just move/copy the file to HDFS.
Need to refactor the server of DF as follows
DF Agent is now unblocking with verx.
When 1st thread is not fininshed streaming while 2nd thread starts. There are chances to get both threads's data mixed up. As result, the Kafka will have bad data.
The resolution is to make it blocking to stream file one by one.
We need a better logging system, such as log4j.
We also need to log all jobs seperately
Create a demo for streaming
DF Demo df-data-collector can only get updated data when US stock is open. For demo purpose, we also need data available when the market is closed. We'll consider to use spoof data and also consider to add China market as another option.
Test cases are needed for both client and server
Need to polish the introduction to publish in the main site
Stream file function requires following improvement
We should add options to move the files which are processed to some archive folder so that we know the files are processed.
We also need to support filter files to be process, such as files with leading _
In this case, we can collaborate with stream generator to cosume files smoothly
resolve is as follows
This is to ingest csv file into HDFS
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.