featurebasedb / featurebase-examples Goto Github PK
View Code? Open in Web Editor NEWExamples for FeatureBase Community
License: MIT License
Examples for FeatureBase Community
License: MIT License
Now we have capabilities to do bulk inserts with the product, and cloud offering, we need an example that shows how to use Python to pull from something like Kafka and insert into FB using the new endpoints.
Meroxa is a code-first data application platform that enables developers to build and deploy data products quickly and easily. Meroxa's platform is designed to maximize a developer's time spent building data products, and minimizes the time spent on maintaining fragile data systems.
FeatureBase is a real-time analytical database built on bitmaps. It is open source, in-memory, and provides SQL support, real-time updates, and analytical processing for your growing data.
With a Meroxa example, FeatureBase can index a wide variety of information for use in doing analytical processing on the data.
This example should illustrate how to signup for both Meroxa and FeatureBase's cloud offerings and deploy the code provided to illustrate using both systems for ingestion of a moderate amount of data, likely stored in Kafka and S3.
Written in Python. Shows some tables and graphs, using the Snow example's dashboard.
A Docker container is required for running the FeatureBase binary. This container should also run Kafka, Zookeeper, and the Kafka consumer, as seen here: https://github.com/FeatureBaseDB/featurebase-examples/tree/main/kafka-starter
To enable a configurable schema file, create a simple Flask endpoint that takes a JSON file POST, saves it to the container and then restarts the consumer. The consumer should start with a sample schema JSON (as seen in the kafka-starter example) and display the current schema. Here are the endpoints and sample outputs:
/schema [GET]
[
{
"name": "user_id",
"path": ["user_id"],
"type": "string"
},
{
"name": "name",
"path": ["name"],
"type": "string"
},
{
"name": "age",
"path": ["age"],
"type": "id"
}
]
/schema [POST] (validates JSON)
{"response": "OK"}
Invalid JSON:
{"response": "FAILED"}
The port to expose for the Flask endpoint is to be 20202.
A sample Python file that sends its schema to the port and then submits data via the Kafka libs is also needed. A README.md is to be written and posted to the /docker-example directory in this repo.
The docker examples, docker-simple
and docker-cluster
both need a way to spin up a container, copy a CSV into it, then run the consumer, per the instructions:
idk/molecula-consumer-csv \
--auto-generate \
--index=allyourbase \
--files=sample.csv
Once the container starts, in that docker compose context, the container should have access to the docker naming scheme and docker network.
Implement a means to do this to the two repos and ensure it can consume the sample.csv file, which should be included in the example. All code should go under the respective example directories.
This example would show how to integrate FeatureBase into a Weaviate project.
Implement a prototype that logs data out of a game server.
Matchmaking in games is primarily a matter of segmenting a large population down to finding a small well fit set. Sometimes as small a set as finding a single player for 1v1 competitive games.
Given the scope of potentially billions of players, needing to be broken down to a set of less than 100, this should be a solid example for FeatureBase.
Implement an example that shows off using a model ON the analytics data that is pulled out of FeatureBase.
FeatureBase is an analytics engine, but we have yet to store our own metrics in our software. The intent of this issue is to create a project that provides generalized tracking of metrics from a variety of sources. These include, but are not limited to:
We'll need graphs, over time, so a few requirements include:
Please add comments or requirements to this ticket as needed.
Initially this work will be done in the featurebase-examples repo, but will be spun out once the bulk of the work is complete. At that time, the new repo can carry the requirements and issues for continued work on the project.
This project aims to implement a basic scheme synthesis for FeatureBase from sample data.
Use of the example would provide a run-once activity that:
Some notes about scheme updates or changes would be needed.
This example would illustrate the use of querying FeatureBase for data and then displaying the data in a wide variety of different visualized graphs, using chart.js or other JS based graphing libraries.
Uses SQL synthesis from a call to GPT-3 to write SQL.
Uses a simple Python Flask file to display the dashboard and handle calls to FB and GPT-3.
Evaluate existing technologies and choose a Python pipeline for triggering searches to FB and then running the resulting data against a model.
Additionally, take inferencing from a model and put the results in a pipeline.
This prototype (and others like it that are available) would be built from existing demos authored by engineering.
Various large public data sets exist on Kaggle, etc. which may be used with ML pipelines.
This project would illustrate utilizing FeatureBase as a store of logged inferences with a given vision model, such as: https://huggingface.co/microsoft/resnet-50
Historical analysis would be done, with an intent to identify outliers in the logged data.
A visualization component would also be interesting.
Add next step link in docker compose outputs for user to see what to do next, for example to click on a link to view the UI locally.
Write an example that takes the prometheus metrics and run them into FeatureBase. Present a small graph example that can be used to write more examples.
Move the Kafka based example to a new directory. Update the link in README.md.
Create a new standalone dockerfile that uses just FeatureBase, plus some Python example to insert via SQL3.
Build a framework that takes a conversational query and sends it to GPT-3's APIs. Use that query against an existing FB data set.
This issue is prerequisite for #2.
Using Snow and query synthesis from GPT-3, assemble a basic framework for implementing query->graph functions.
Guide should contain:
We need more comprehensive smoke tests for the releases. Internal tests are continuing to improve, but with the complications brought by Docker deployments, we don't want to miss anything for the users.
This issue replaces #14, #10, #4 and #8.
A large amount of data is desired for ingestion. Evaluate datasets which may lead to:
Query exploration is desired for reporting by the user in a simple UI. This feature requires the ability of #2 for graphing the results.
This example inserts 5 billion game draws (initial draw) of Set, the game.
A post has been started discussing how to model the data for insertion and querying.
FeatureBase is a binary-tree database built on Roaring Bitmaps. This makes it suitable for running analytics on massive data sets in real time. If you've never used FeatureBase before, you can get it running locally in about 5 minutes.
Today, we're going to take a look at using FeatureBase to simulate and analyze a very large number of Set games in real-time.
Set is a card game designed by Marsha Falco in 1974 and published by Set Enterprises in 1991. The deck consists of 81 unique cards that vary in four features across three possibilities for each kind of feature: number of shapes (one, two, or three), shape (diamond, squiggle, oval), shading (solid, striped, or open), and color (red, green, or purple).
In a game of Set, the cards are shuffled and then 12 cards are drawn from the top of the deck and placed on the table. Game play then commences by players beginning to identify sets in the initial deal.
In this example, we are going to focus on the initial draw only. We won't be pulling cards and dealing new ones from the remainder of the deck, in other words. We'll simulate one billion draws of twelve cards (from a full deck) and then proceed to do one billion draws of fifteen cards (adding three cards each time) and so on until we have a total of five billion game draws.
There are 1,080 unique sets possible in the game. Let's think about this for a minute by creating a couple of large binary numbers to look at for visualization. The first number will be 81 digits and represent a single set of of the 1,080 possible sets. We'll also put the different attribute headers at the top of this number to help us figure out what a given binary place represents which card.
We'll use green, purpl, and red for colors. S will respresent squiggles.
set_0
< solid >< open >< shaded >
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
< green >< purpl >< red >< green >< purpl >< red >< green >< purpl >< red >
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O>
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
123123123123123123123123123123123123123123123123123123123123123123123123123123123
---------------------------------------------------------------------------------
100000000100000000100000000000000000000000000000000000000000000000000000000000000
In this representation, we are saying we have a solid green squggle of count one, a solid purple squiggle of count one and a solid red squiggle of count one. This is a set because all the attributes are either different (colors in this example) or all the same (shading, count and shape).
Now let's do one where the three cards have different shading, color, count and shape:
set_1
< solid >< open >< shaded >
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
< green >< purpl >< red >< green >< purpl >< red >< green >< purpl >< red >
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O>
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
123123123123123123123123123123123123123123123123123123123123123123123123123123123
---------------------------------------------------------------------------------
001000000000000000000000000000000000000000000000000010000000000000100000000000000
Now we'll do a sample draw of 15 cards.
draw_0
< solid >< open >< shaded >
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
< green >< purpl >< red >< green >< purpl >< red >< green >< purpl >< red >
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
<S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O><S><♦️><O>
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
123123123123123123123123123123123123123123123123123123123123123123123123123123123
---------------------------------------------------------------------------------
100011000100000000100000010100000000000010000010000000101100000100000010010000000
---------------------------------------------------------------------------------
Here's what that draw looks like, in a real game:
Now we AND the two numbers:
---------------------------------------------------------------------------------
100000000100000000100000000000000000000000000000000000000000000000000000000000000
100011000100000000100000010100000000000010000010000000101100000100000010010000000
---------------------------------------------------------------------------------
100000000100000000100000000000000000000000000000000000000000000000000000000000000
Given that result is equivalent to the the set we mention above, we have a match. There may be other sets present on the board, but we're going to switch to using decimal numbers to respresent the different cards.
As there are 81 total cards, we're going to use 0 through 80 to represent those cards. So, a sample draw of those same 15 cards above now becomes:
[0,4,5,9,18,25,27,40,46,54,56,57,63,70,73]
As for our sample set we chose, that becomes:
[0,9,18]
We need a list of valid sets.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.