Giter Club home page Giter Club logo

honey-faucet-rs's Introduction

honey-faucet-rs

DESIGN

Event-Driven Data Warehouse Creation for NLP and on-chain analytics, with a focus on NFT data.

A 'flow' is an ETL pipeline sequence that generates an output, in the form of either transformed data or analysis.
A 'flow_step' signifies an operation on set of data. (copy, move, transform, ...)
The modular design provides a framework for building scalable, custom ETL pipelines.

CLI

The CLI tool is designed to query specific subjects of interest prior to setting up a pipeline.
The current set of supported actions hit the twitter v2 api endpoint.

Suggested Use (Topic Search):

  • The [Counts] command should be used to gauge how much data there is for a particular topic.
  • The [Recent] command should be used to view the first 100 results of raw tweet data for a particular topic.
  • At this point, if enough data is available for a particular topic, a flow can be set up for it.
  • Otherwise, one can do a deep dive using the [Tweet Lookup], [User Timeline], [Mentions Timeline], or [Users Lookup] commands.

Suggested Use (Account R&D - Persons/Projects of Interest):

  • If the unique user name is known, then the user_id can be found using the [Users Lookup] command.
  • Next, the [User Timeline] or [Mentions Timeline] commands can be used to view a portion of the timeline data for a particular user.

Recent Command

Counts Command

Tweet Lookup Command

User Timeline Command

Users Lookup Command

AUTOMATED PIPELINE EXECUTION

flow-controller - This mechanism will facilitate the booting and stopping of jobs for the day based on configured cron schedules.
job-controller - This mechanism will facilitate the scheduling and execution of job steps.

SUPPORTED FEATURES

nlp-recent-topic-land - This flow step will pull and land recents data for a topic.
nlp-user-timeline-land - This flow step will pull and land standard timeline data for a particular user.
nlp-topic-land - This flow step will pull and land data specified by date for a topic. (WIP - R&D for v1.1 endpoint for archive search)

NOTES

*design is subject to change as implementation progresses.
methodology is agile and re-factoring takes place after each feature is finished.

Current NFT Sentiment Analysis Design:
- Collect data for an initial Training and Test Set.
- Perform standard sentiment analysis on tweet text for a particular topic.
- R&D to include tweet impressions in analysis (likes + retweets + comments)
- Aggregate Persons of Interest as separate topics and include them for the analysis. (EX. Do Kwon for topic = "LUNA")
---- Avoid shitposters, anime pfps, and trolls
- Aggregate Projects of Interest as separate topics and include them for the analysis. (EX: Cyberkongz,Nansen.ai,LooksRare for topic = "NFT")
- R&D on analysis tuning

The current design will consist of a 4-step Flow.
- The first step will be called nlp-topic-land.
---- This step will pull data down from Twitter and save it in a parquet file format.
---- The second step will remove duplicate tweets. (rm_dups)
- The third step will load and place a dataset in a ml pipeline in spark to prep data.
- The fourth step will make classifications on a data set based on various algorithms. (Naive Bayes, SVM, Logistic Regression, LSTM, etc...)
- The idea is to be able to provide sentiment classification (positive / negative) for a particular topic for a particular interval (Past 24HR, Past 7D, Past 30D).

honey-faucet-rs's People

Contributors

0xfiending avatar

Stargazers

devxtoshi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.