Giter Club home page Giter Club logo

onsdigital.census-rm-sample-loader's Introduction

Census sample loader

This project contains a simplified sample loading script for Response Management case setup. (Currently in use for performance test setup on Kubernetes) It will take as arguments a sample CSV file, a Collection Exercise UUID and an Action Plan UUID.

The Sample Loader will generate UUIDs for Sample Units and place messages (See example.xml for format) directly onto the Case.CaseDelivery queue. Case service will then create corresponding Cases. All of the attributes will be added to the queue as part of the message

Setting up the python environment

This project uses pyenv and pipenv for python version and dependency management, install with

brew install pyenv pipenv

Install dependencies with

make build

Enter the environment shell with

pipenv shell

Building and pushing the docker container

docker build -t eu.gcr.io/census-rm-ci/census-rm-sample-loader:<TAG> .
docker push eu.gcr.io/census-rm-ci/census-rm-sample-loader:<TAG>

Testing Locally with Docker

To test the script locally you must run a RabbitMQ container. A docker-compose.yml file exists for this purpose.

docker-compose up -d

Once RabbitMQ is running you can run the sample loader.

Sample Loader

Usage

pipenv run python load_sample.py sample.csv <COLLECTION_EXERCISE_UUID> <ACTIONPLAN_UUID>

e.g.

pipenv run python load_sample.py sample_100000.csv 2fc107ee-96f5-465b-923e-38914ce63e3e 2c64c460-2543-4abe-8728-01bbb0449807

Logging

You can set the global log level with the LOG_LEVEL environment variable, when the sample loader runs as a script it defaults to INFO logging from script itself and ERROR for other log sources (e.g. pika).

Viewing messages in the Rabbit queue

The Rabbit docker image included in docker-compose.yml has the management plugin enabled. This can be accessed when runnning on http://localhost:15672 use guest:guest as the credentials.

Running in Kubernetes

To run the load_sample app in Kubernetes

./run_in_kubernetes.sh

You can also run it with a specific image rather than the default with

IMAGE=fullimagelocation ./run_in_kubernetes.sh

This will deploy a sample loader pod in the context your kubectl is currently set to and attach to the shell, allowing you to run the sample loader within the cluster. The pod is deleted when the shell is exited.

Copying across a sample file

To get a sample file into a pod in kubernetes you can use the kubectl cp command

While the sample loader pod is running, from another shell run

kubectl cp <path_to_sample_file> <namespace>/<sample_load_pod_name>:<destination_path_on_pod>

Downloading a sample file from bucket

Given a file is in the sample bucket, shell into the sample pod, then run command

SAMPLE_BUCKET=<env_name>-sample python download_file_from_bucket.py --sample_file <name_of_file_in_bucket.csv>

This will download the file from the bucket onto the persistent volume which is mounted to the directory: /home/sampleloader/sample_files

Sample File Validator

The is a validation script provided which performs a basic sanity check of the sample file.

To run the sample validation locally run

pipenv run python validate_sample.py <sample_file>

See the SAMPLE_ROW_SCHEMA in validate_sample.py for the schema spec.

Dummy Sample File Generator

The generate_sample_file.py script generates a dummy sample file of random data designed to have a realistic shape.

Run it with the provided treatment code quantities config:

pipenv run python generate_sample_file.py

And it will write out the generated file to sample_file.csv

To run it with customised config and output path:

pipenv run python generate_sample_file.py -o /custom/path.csv -t /path/to/treatment_code_quantities.csv

Where the treatment code quantities file is a csv with headers "Treatment Code" and "Quantity" specifying the quantities of each treatment code to include in the generated sample

An optional flag -s or --sequential_uprn can be used to generate unique UPRN's sequentially instead of randomly, making it faster to generate a massive file.

Sample file redactor

The redact_sample.py script takes in a sample file and outputs the same sample with random data for redacted fields

To redact all sensitive fields, run it with the script :

pipenv run python redact_sample.py sample_files/my_sample.csv

And it will write out the redacted file to my_sample_redacted.csv

To redact just the HTC fields, run the script with the flag --redact-htc-only:

pipenv run python redact_sample.py sample_files/my_sample.csv --redact-htc-only

And it will write out the redacted file to my_sample_htc_redacted_only.csv

onsdigital.census-rm-sample-loader's People

Contributors

adamhawtin avatar adeharrison avatar apolloakora avatar aroundaboutdev avatar gavinleeedwards avatar leokhoward avatar liamtoozer avatar lukeloze1 avatar nickgrantx avatar ryangrundy7 avatar yoganandkunche avatar zekizeki avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.