Giter Club home page Giter Club logo

blockset's Introduction

BLOCKSET

This repository contains the source code and experimental workflow accompanying our KDD 2021 manuscript (https://dl.acm.org/doi/10.1145/3447548.3467368). For the corresponding python package, visit: https://github.com/megh1241/python-blockset Detailed instructions are given below.

Prerequisites

This source code has only been tested on Ubuntu 18.04.3 LTS, though we expect it to work on other Operating Systems with minor changes. In the near future, we plan to construct python bindings and release a pip installable python package.

1)You will need to install Build tools, openMP and make sure your compiler supports C++14. 2)You will also need to install sklearn, numpy, argparse, joblib matplotlib and joypy(for plotting). 3)You will need to install redis for the BLOCKSET-as-a-service experiments. 4)BLOCKSET utilizes a third-party library( rapidjson ) and must be downloaded and installed.

sudo apt-get update
sudo apt-get install build-essential
pip install numpy sklearn matplotlib redis argparse joblib
pip install joypy

Installing

git clone https://github.com/megh1241/blockset.git
cd blockset 
git clone https://github.com/Tencent/rapidjson.git
mkdir build
cd build
cmake .
make #This will generate the exe executable

Running

For embedded BLOCKSET and larger than RAM BLOCKSET, the workflow is as follows.

  1. Run train_sklearn.sh to train and save a random forest/gradient boosted tree model in scikitlearn (or xgboost). Refer to train_sklearn.sh for specific instruction for how to run it and parameter explanations. Specifically, look for lines with #TODO that contain instructions for the user to modify parameters.
  2. Run pack.sh to pack the model using BLOCKSET's packing layout. This script will read the sklearn trained model, pack the nodes and finally save the model in a custom binary format.
  3. Run cold_start_inference.sh to perform RF/GBT inference on the test data and run the latency benchmarks. In order to ensure a cold-start latency in our benchmarks, we create identical copies of the model. We then perform inference on a different copy of the model per data point in a cyclical manner. That way, caching effects are eliminated.

For BLOCKSET-as-service, the workflow is as follows.

  1. Run train_sklearn.sh to train and save a random forest/gradient boosted tree model in scikitlearn (or xgboost). Refer to train_sklearn.sh for specific instruction for how to run it and parameter explanations. Specifically, look for lines with #TODO that contain instructions for the user to modify parameters.
  2. Run pack.sh to pack the model using BLOCKSET's packing layout. This script will read the sklearn trained model, pack the nodes and save the model. Make sure to change format to "text" in the script to ensure that the model is saved in a text format.
  3. Run lambda_exp/write_redis.py. This script will write the model to a Redis KV store. Before, running make sure to open the python script and change filepaths and parameters as explained in the inline #TODO comments.
  4. unzip lambda_exp/my_function. Go to runscript.sh and follow the instructions marked by #TODO. Then, perform inferece by running the script. ./runscript.sh .

For the baseline experiments(comparison with xgboost and sklearn), the workflow is as follows.

  1. XGBoost baseline: Check #TODOs in scripts/xgb_train_baseline.py and run to train an xgboost model. Check #TODOs in scripts/xgb_predict_baseline.py and run to perform inference on xgboost model.
  2. SKlearn baseline: Run ./train_sklearn.sh with --saveformat joblib to train and save an sklearn model. Run scripts/skl_predict_baseline.py to perform inference on sklearn model.Note: Run python3 scripts/skl_predict_baseline.py --help for a list and description of required commandline arguments.

Generating the Paper Plots

Figure 6

python aggregate_bargraph.py

Figure 7

cd scripts/kdd_plot_scripts
python boxplot_layout_rf_classification.py latency joy bin
python boxplot_layout_rf_regression.py latency joy bin
python boxplot_layout_gbt_classification.py latency joy bin
python boxplot_layout_gbt_regression.py latency joy bin

Figure 8

cd scripts/kdd_plot_scripts
python boxplot_layout_rf_classification.py Blocks joy bin
python boxplot_layout_rf_regression.py Blocks joy bin
python boxplot_layout_gbt_classification.py Blocks joy bin
python boxplot_layout_gbt_regression.py Blocks joy bin

Figure 9

cd scripts/kdd_plot_scripts
python boxplot_depth_cifar.py latency violin

Figure 10

cd scripts/kdd_plot_scripts/lambda_plots_paper
python plot_layout.py

Figure 11

cd scripts/kdd_plot_scripts
python boxplot_layout_embedded_rf.py latency joy bin

Figure 12

cd scripts/kdd_plot_scripts
python boxplot_layout_embedded_gbt.py latency joy bin

Python Bindings

The scripts can also be run in python via the python bindings.

Installation of Python Bindings

cd blockset/Python
pip3 install -e .

Running the scripts in Python

cd blockset/Python
python3 example.py <cmdlineargs>

The commandline arguments to be provided in the python version are identical to the commandline args provided to the C++ version. See example.py for a detailed description of the commandline arguments, pack.sh and inference.sh for example scripts to run example.py.

blockset's People

Contributors

megh1241 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

stjordanis

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.