Giter Club home page Giter Club logo

water-crisis-scraper's Introduction

Water Crisis

Scrape and explore data related to Cape Town's water crisis.

πŸ“₯ Data sources

πŸ’§ Dam Levels

The dam_levels module handles government data on historical and current dam levels in the Western Cape.

A CSV of dam levels from 2012 to 2018 is obtainable at no cost from the Dam levels data set page, hosted on the City of Cape Town Open Data Portal.

Disclaimer:

This site provides products or services using data that has been modified for use from its original source, www.capetown.gov.za, the official website of the City of Cape Town. The City of Cape Town makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at one’s own risk.

πŸ™ Properties

The properties module processes average property price data for South Africa, sourced from the property24 website site.

Here is the starting reference point on the website for those values: property values in South Africa. That webpage allows going down to province or suburb levels and getting the average value and count of properties for that area.

For example:

Property values in Western Cape Currently the average price of properties in Western Cape is R 3 425 228. There are currently 65890 properties on the market in Western Cape.

The values are visible in the browser and accessible when parsing the HTML. They are assumed to be current, but there is no indication as to how frequently they are updated.

This section of the project deals with regulary saving the raw HTML files to the unprocessed_html directory and then later extracting values from the local files when required. It is expensive to keep these files about 700 files covering the whole country is about 50MB. An alternative process could be to fetch, process and discard the HTML data. Or to get suburb data for one province of interest but only top-level data for other provinces. This means appending to a CSV and not overwriting it.

Note that included data here is limited to the point of view of property24 website listings, but should still be useful for analysis.

The Property24 website also offers more granular data, including visualisations and history, such as Cape Town City Centre property trends. This is not handled in this project though.

See the usage instructions for the properties module.

News

See the news module.

The script there parses the output from a curl script in tools

water-crisis-scraper's People

Contributors

dependabot[bot] avatar michaelcurrin avatar snyk-bot avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

water-crisis-scraper's Issues

Share data

Make processed available online for public use.

I can scrape and parse data to get a CSV, but need to share with others.

As API, DB, or just a CSV on a website.

Ease running of commands

There are property scripts which can be run in sequence. These are covered in the manual.

This can be easier using ideas like the following:

  • Makefile or python/bash script to run multiple commands.
  • Create a new entry point script to run the other scripts, with friendly help and arguments.
  • Turn scripts into commands with use of a class and command lines. Run against the directory.

Automate processing of property data

For now there is a process to scrape HTML files and get then later run a script which extracts the necessary data from all HTML files and put the data in a single CSV.

This can be streamlined to go straight to a CSV or a DB. The HTML files can still be downloaded as an intermediate step - they could be deleted after they are parsed or after 7 days. Or just archived.

Visualize data

Build a website or use Datastudio to view the CSV/DB data which must hosted somewhere so it is available online.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.