Giter Club home page Giter Club logo

scrapers_ca_app's Introduction

Canadian Legislative Scrapers

Dependency Status

This Django project runs the Canadian legislative scrapers, displays the status of each scraper, and returns the scraped data as JSON.

Development

Follow the instructions in the Python Quick Start Guide to install Homebrew, Git, PostgreSQL, Python and virtualenv.

mkvirtualenv scrapers_ca_app
git clone [email protected]:opennorth/scrapers_ca_app.git
cd scrapers_ca_app

Set up the submodule and switch it to master:

git submodule init
git submodule update
cd scrapers
git checkout master
cd ..

Install the requirements:

pip install -r requirements.txt

Create a database (dropdb pupa if it already exists):

dropdb pupa
createdb pupa
python manage.py migrate --noinput
pupa dbinit ca

Run all the scrapers:

python manage.py update

Or run specific scrapers:

python manage.py update ca_ab_edmonton ca_ab_grande_prairie_county_no_1

Install the foreman gem:

gem install foreman

Start the web app:

foreman start

Deployment

heroku apps:create

Add configuration variables (replace REPLACE):

heroku config:set PRODUCTION=1
heroku config:set SSL_VERIFY=1
heroku config:set AWS_ACCESS_KEY_ID=REPLACE
heroku config:set AWS_SECRET_ACCESS_KEY=REPLACE
heroku config:set DJANGO_SECRET_KEY=REPLACE
heroku config:set DATABASE_URL=`heroku config:get REPLACE`

You can generate a secret key in Python:

from django.utils.crypto import get_random_string
get_random_string(50, 'abcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*(-_=+)')

You'll need a production tier PostgreSQL database to use PostGIS (replace DATABASE):

heroku addons:add heroku-postgresql:standard-0
heroku pg:wait
heroku pg:promote DATABASE
heroku addons:remove heroku-postgresql:dev
heroku pg:psql

In the PostgreSQL shell, run:

CREATE EXTENSION postgis;

You'll need the geo buildpack for GeoDjango:

heroku buildpacks:set https://github.com/cyberdelia/heroku-geo-buildpack.git
heroku buildpacks:add heroku/python

Setup the database (replace DATABASE):

heroku pg:reset DATABASE
heroku run pupa dbinit ca
heroku run python manage.py migrate --noinput

Add to the Heroku Scheduler:

python manage.py update
python manage.py upload

Checking consistency

python manage.py check

Eliminating duplicates

If a scraper creates duplicates, you may need to:

python manage.py flush MODULE_NAME

Troubleshooting

Make sure PostgreSQL is running. If you use Homebrew, you can find instructions with:

brew info postgres

Bugs? Questions?

This repository is on GitHub: https://github.com/opennorth/scrapers_ca_app, where your contributions, forks, bug reports, feature requests, and feedback are greatly welcomed.

Copyright (c) 2013 Open North Inc., released under the MIT license

scrapers_ca_app's People

Contributors

jpmckinney avatar agarrow avatar menerve avatar mirabuck avatar drmeers avatar cmyr avatar belambic avatar michaelmulley avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.