Giter Club home page Giter Club logo

moj-analytical-services.docker-lookup-tables's Introduction

Docker image for creating lookup tables

Docker image to parse data from files in github and add them to a data store in lookup tables

Usage

Create a repo with name prefixed by lookup_ (making sure your repo name doesn't contain any dashes)

Add a deploy.json file in the top level of your repo containing:

.. code:: JSON

{
  "type": "lookup"
}

Create a ./data directory in the top level of the repo. This is where you will store your lookup tables in the following structure:

    ├── data/
    |   |
    │   ├── database_overwrite.json (optional, see below)
    |   |
    │   ├── lookup_table1/
    |   |   ├── data.csv
    |   |   ├── meta.json
    |   |   ├── README.md
    |   |
    │   ├── lookup_table2/
    |   |   ├── lookup_table2.csv
    |   |   ├── lookup_table2.json

Each folder in data/ should be named after the lookup table that you want to deploy. Inside that lookup table folder you can add whatever you want in there (e.g. a README.md). But it must contain the following:

  1. A csv file (your lookup table) which is either named data or has the same name as the directory it is in (i.e. the name off the lookup table). This csv should have a header.

  2. A json file (your lookup table's metadata schema) which is either named meta or has the same name as the directory it is in (i.e. the name off the lookup table) For information on the table schema file see here: https://github.com/moj-analytical-services/etl_manager <https://github.com/moj-analytical-services/etl_manager>_

You do not need to provide a database json. This is inferred when the lookup database is deployed.

{
    "description": "A lookup table deployed from {your lookup repo name}",
    "name": "{your lookup repo name}",
    "bucket": "moj-analytics-lookup-tables",
    "base_folder": "{your lookup repo name}/database"
}

You can set overides to these values by adding a database_overwrite.json to your data/ folder. The values you can override are the bucket and the description. You may want to change the bucket to one that you control access to if you do not wish everyone in the organisation to be able to access the lookup table (which is the default). Note your s3 bucket must be prefixed with alpha-lookup-.

Create a release and concourse should add a job that will create a new database and add the csv data from each .csv file in to a table.

When concourse deploys your new lookup table you should see the following outputs:

  • A new table partition in your lookup tables database where the partition is release={github release}
  • your data and meta data folders (i.e. in the same structure as your lookup repository) in the s3 path s3://moj-analytics-lookup-tables/{your lookup repo name}/{release}/

Note that the bucket will not be moj-analytics-lookup-tables if you speficied a different bucket in your database_override.json

These files in raw can be read directly from S3 if you do not wish to use the database versions of your lookup tables.

Running Locally

To build locally:

docker build -t docker-lookup-tables:test . 

Testing

You can test the structure of your lookup repo with the following docker command:

# you can point to your local build or directly to the image on the ECR instance
docker run \
    --entrypoint "" \
    docker-lookup-tables:test \
    pytest /tests/test_data.py

To just test on your own python:

python -m venv env
source env/bin/activate
pytest tests/

Running manually

You will need to make sure you have AWS admin priveledges in your environment and have docker installed.

To ensure you're setup correctly please refer to this guidance on setting up Docker and ECR.

Clone their lookup repo and checkout the release they want to deploy. Once in the root dir of that repo and the correct release is checked out you can run:

docker run \
  -e RELEASE_TAG=$(git describe --tags) \
  -e GITHUB_REPO=$(basename `git rev-parse --show-toplevel`) \
  -e AWS_REGION=$AWS_REGION \
  -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
  -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
  -e AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN \
  -v ${PWD}/data:/etl/data \
  593291632749.dkr.ecr.eu-west-1.amazonaws.com/docker-lookup-tables:<latest-release> # Note need to change latest-release

moj-analytical-services.docker-lookup-tables's People

Contributors

isichei avatar s-block avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.