Giter Club home page Giter Club logo

yizenov / compass_query_optimizer Goto Github PK

View Code? Open in Web Editor NEW
7.0 0.0 2.0 7.19 MB

The COMPASS Query Optimizer

Home Page: https://dl.acm.org/doi/10.1145/3448016.3452840

License: BSD 3-Clause "New" or "Revised" License

Python 0.28% C++ 81.26% CMake 3.03% Cuda 3.37% Go 0.30% Yacc 0.83% Lex 0.11% C 0.85% LLVM 0.05% Java 9.02% C# 0.07% Shell 0.50% Thrift 0.28% Dockerfile 0.04%
database database-design databases query-engine query-optimization sql

compass_query_optimizer's Introduction

COMPASS: Online Sketch-based Query Optimization for In-Memory Databases

UPDATE: Reproducing results and figures

Follow the detailed instructions from instructions.pdf.

Table of Contents

  1. Experimental Setup
  2. Install MapD Database System
  3. Create Database and Load Data
  4. Download IMDB Dataset
  5. Download Join Order Benchmark
  6. Pre-build Sketch Templates
  7. Run a Query
  8. Questions
  9. Acknowledgments
  10. References
  11. Citation

1. Experimental Setup

COMPASS was developed in the following environment:

  • Machine: 2x Intel(R) Xeon(R) CPU E5-2660 v4 and NVIDIA Tesla K80
  • OS: Ubuntu 16.04 LTS
  • CUDA v10, Clang/LLVM

2. Install MapD Database System

COMPASS is built as an extension on top of a clone of MapD System, version 3.6.1 and later it was rebranded to OmniSciDB. Find location where the project is cloned and follow the commands below:

unzip compass_query_optimizer-master.zip
cd compass_query_optimizer-master/mapd-core/
source scripts/mapd-deps-ubuntu.sh
cd ThirdParty/librdkafka/
source /usr/local/mapd-deps/mapd-deps.sh && cd ../../
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=debug ..
make clean & make -j 56

The number of threads is 56. COMPASS has hyper parameters such as sketch size in mapd-core/Catalog/COMPASS_init_variables.txt. See variables_details.txt for more details about COMPASS hyper parameters.

3. Create Database and Load Data

All databases created by the user are at .mapd/data/. The default user is mapd.

  • Run server: ./bin/mapd_server --data ~/.mapd/data/

  • Run client and create database called imdb:

    ./bin/mapdql mapd -u mapd -p HyperInteractive
    CREATE DATABASE imdb (owner = 'mapd');
    \q

  • Run client and create schemas: cat schematext.sql | ./bin/mapdql imdb -u mapd -p HyperInteractive

  • Stop server ctrl + c on keyboard

See dataset/README.md for more details about IMDB schemas.

4. Download IMDB Dataset

The dataset that was used is Internet Movie Data Base (IMDB). The data is publicly availabe in text files and open-source imdbpy package was used to transform txt files to CSV files in [1]. See for more details in here. The 3.6 DB snapshot is from May 2013 and it can be downloaded here. The dataset includes 21 CSV files i.e. 21 relations in total. It also includes queries to create the necessary relations written in schema.sql or schematext.sql files.

5. Download Join Order Benchmark

The workload that COMPASS's performance was evaluated based on Join Order Benchmark(JOB). JOB consists of 113 queries in total including 33 query families that each family queries differ only in selection predicates. See queries/README.md for more details about workload.

6. Pre-build Sketch Templates

In the case of relations with no selection predicates, building the sketches can be avoided by using pre-built sketch templates. This is an additional optimization in order to keep the overhead low. See sketch-templates/README.md for more details about building templates. We already uploaded the prebuilt sketches for sketch sizes 73 x 1021 on imdb workload. However one needs to re-create the templates in case of changing sketch sizes by running the queries in sketch-templates.

7. Run Query

Current implementation is only on GPU part of MapD system. However one can implement COMPASS on CPU mode. In order to run a query follow the instructions below:

  • Run server: ./bin/mapd_server --data ~/.mapd/data/
  • Run client: ./bin/mapdql imdb -u mapd -p HyperInteractive
  • Choose GPU mode: \gpu
  • Turn on COMPASS: \fpd
  • Paste query from a query file.
  • Stop client: \q
  • Stop server: ctrl + c on keyboard

8. Questions

If you have questions, please contact:

9. Acknowledgments

This work is supported by NSF award (number 2008815) and by the U.S. Department of Energy Early Career Award (DOE Career).

10. References

[1] Query optimization through the looking glass, and what we found running the Join Order Benchmark
[2] Filter Push-Down extension for MapD

11. Citation

@misc{compass-github,
  author = {Yesdaulet Izenov},
  title = "{The COMPASS Query Optimizer}",
  howpublished = "\url{https://github.com/yizenov/compass_query_optimizer}"
}

compass_query_optimizer's People

Contributors

yizenov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.