Giter Club home page Giter Club logo

docker_apache_drill_datagrip's Introduction

Docker Image for Apache Drill

  • This will create a Docker container of Apache Drill for analyzing file-based data (e.g., parquet files).
  • While creating the container, the image will also generate a /data folder for the data files.
  • This image will also extract the jdbc driver jar of Apache Drill to connect SQL user interfaces (e.g., Data Grip) to the Drill container.

This image allows you to generate an out-of-the-box analytics environment without cluttering your machine. If this is not your cup of tea, head over to the instructions for a standard installation of Apache Drill.

Docker Environment

Install Docker by following the instructions here. If you do not have an account with Docker, you may be asked to create one.

Ensure that Docker is correctly running using the following command:

docker version

You should see a result similar to the following:

Client: Docker Engine - Community
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        6247962
 Built:             Sun Feb 10 04:12:39 2019
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 04:13:06 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Tip: Use the Docker app to adjust the memory, swap, and CPUs allocated to Docker containers.

The docker image

The docker image contains the following files:

  • The docker-compose.yml file includes the description of how to build and configure the container that will run Apache Drill.
  • The .env file contains the parameter DRILL_VERSION that determines which version of Apache Drill is being built and run. Currently, the DRILL_VERSION is set to 1.16.0. If you need to change this, adapt the .env file.
  • The /build/Dockerfile contains the build descriptions for the container.
  • The run_drill.sh contains the startup script for Apache Drill.
  • The .gitignore prevents that any file in the data folder or any parquet/csv files will be added to the repository.

Get the docker image

Use the GitHub Desktop or run the following commands (assuming that you have git installed):

  1. Check for git
git version

You should see a result similar to the following:

git version 2.20.1 (Apple Git-117)
  1. Check out the repo into a directory of your choice:
git clone https://github.com/mschermann/docker_apache_drill_datagrip.git

Build and start the container

  1. Check the configuration:
docker-compose config

You should see a result similar to the following:

services:
  drill:
    build:
      args:
        DRILL_VERSION: 1.16.0
      context: /<YOUR PATH>/build
      dockerfile: Dockerfile
    command: ./run_drill.sh
    container_name: drill
    environment:
      DRILL_VERSION: 1.16.0
    hostname: drill
    ports:
    - 8047:8047/tcp
    - 31010:31010/tcp
    restart: on-failure
    tty: true
    volumes:
    - /<YOUR PATH>/data:/data:rw
  1. Build the container
docker-compose build

You should see that docker starts to build the container. This will take a while depending on your internet speed and machine configuration.

Building drill
Step 1/13 : FROM centos:latest
...
  1. Start and stop the container
docker-compose up

You should see that docker starts the container. When you see the Drill message of the day, Drill is up and running:

Attaching to e382a3c16e10_drill
e382a3c16e10_drill | Apache Drill 1.16.0
e382a3c16e10_drill | "There are two types of analysts in the world: those who use Drill and those who don't."

Additionally, this step has also extracted the JDBC driver for Apache Drill (e.g., drill-jdbc-all-1.16.0.jar) into the /build folder.

You can stop the container with Control+C, which should result in the following output:

Gracefully stopping... (press Ctrl+C again to force)
Stopping e382a3c16e10_drill ... done

Explore Drill

Drill is starting a web GUI at http://localhost:8047.

Drill Overview

If you click on Query, you can run SQL queries directly from the browser (Do not use this for any heavy-load querying).

Drill Overview Query

Make sure that everything works fine by entering the example query SELECT * FROM cp.employee.json LIMIT 20.

Drill Example Query

It will show you a waiting screen and, if everything works fine, the results.

Drill Example Query Results

Now, head over to the Drill Documentation and start learning how to use Drill.

Connect to the Drill container with a SQL tool

Let's connect Data Grip to the Drill container.

  1. Create a new Driver in Data Grip by pointing towards the JDBC driver for Apache Drill in the /build folder.

Data Grip Driver

  1. Create a data source using the Drill driver. Test the connection and make sure you get the green checkmark.

Data Grip Datasource

  1. Run a Sample Query

Using the same query as above (SELECT * FROM cp.employee.json LIMIT 20), you should see the following output.

Data Grip Sample Query

At this point, you are all set. Add your data files to the /data folder, and you should be able to query them.

If you use parquet data files, the following command will give you the five rows of the data.

SELECT * FROM dfs.`/data` LIMIT 5;

Head over to the Drill documentation for a more in-depth explanation and help.

Access the Drill container

You may want to access the Drill container at some point in time. The following step show how to connect to the container.

  1. Find the name of your container.
docker ps

This should result in an output like this:

CONTAINER ID  IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES
e382a3c16e10  docker_drill_parquet_datagrip_drill  "./run_drill.sh"  About an hour ago  Up 47 seconds  0.0.0.0:8047->8047/tcp, 0.0.0.0:31010->31010/tcp e382a3c16e10_drill
  1. Access the container From the NAMES column, you can see that this container is called e382a3c16e10_drill. You can connect to this container using the following command:
docker exec -it e382a3c16e10_drill bash

This will result in a prompt inside the container:

[root@drill drill]#

Acknowledgements

docker_apache_drill_datagrip's People

Contributors

mschermann avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.