Giter Club home page Giter Club logo

mcj-booking-log's Introduction

MCJ-Booking-Log

Scrape Marin County Jail Booking Log for inmate data.

The scraper retrieves data from the public booking log and writes it to a Postgres (9.5+) database. It can be run as a python script or by using the provided Docker container. The following database connection parameters must be set as environment variables or passed to the container, respectively.

  • POSTGRES_HOST
  • POSTGRES_PORT
  • POSTGRES_DB
  • POSTGRES_USER
  • POSTGRES_PASSWORD

To run the scraper via python, first install the runtime dependencies

pip install -r requirements.txt

and run the script after ensuring the necessary environment variables are set

python -m bookinglog.scrape

Or use the Docker container to run the scraper

docker build . --tag=mcj-scraper
docker run \
    --rm \
    -e POSTGRES_DB=${POSTGRES_DB} \
    -e POSTGRES_HOST=${POSTGRES_HOST} \
    -e POSTGRES_PORT=${POSTGRES_PORT} \
    -e POSTGRES_USER=${POSTGRES_USER} \
    -e POSTGRES_PASSWORD=${POSTGRES_PASSWORD} \
    --name=mcj-booking-log \
    mcj-scraper

Development

First, create a new Python 3.5 environment. Activate it and install dependencies into it

pip install -r requirements.txt -r requirements-dev.txt

Choose values for and set the environment variables POSTGRES_PORT, POSTGRES_DB, POSTGRES_USER and POSTGRES_PASSWORD for the development database and set POSTGRES_HOST=localhost. Then setup the dev database

# Start database container.
docker run \
    -d \
    -p ${POSTGRES_PORT}:5432 \
    -e POSTGRES_USER=${POSTGRES_USER} \
    -e POSTGRES_DB=${POSTGRES_DB} \
    -e POSTGRES_PASSWORD=${POSTGRES_PASSWORD} \
    --name=mcj-db-dev \
    postgres:9.6-alpine

# Create tables, etc.
psql \
    --host=${POSTGRES_HOST} \
    --port=${POSTGRES_PORT} \
    --user=${POSTGRES_USER} \
    --db=${POSTGRES_DB} \
    --file=db/schema.sql

The linter and test suite can be run from within the python environment

flake8 --config=.flake8 bookinglog/ tests/
pytest tests/

If you don't have a dev database set up, tests against it can be disabled by passing -m "not integration" when running pytest. Tests that interact with the website to be scraped can also be optionally disabled via -m "not external".

mcj-booking-log's People

Contributors

dslaw avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.