Giter Club home page Giter Club logo

project-atlas-sao-paulo's Introduction

Project Atlas - São Paulo 🌎

Project Atlas - São Paulo is a Data Science and Engineering initiative that aims at developing relevant and curated Geospatial features about the city of São Paulo, Brazil. It's ultimate use is varied, but it is mainly focused on Machine Learning tasks, such as Real State price prediction.

It aggregates several attributes from many public data sources at different levels of interest, which can be used to match geospatially referenced data (lat,long pairs for example).

A breakdown of the data sources currently used and their original references can be found below, but the official documentation of the project contains the full list of data sources.

Data sources

  • GeoSampa: geospatial data exploration tool provided by the São Paulo's Department of Urban Development;
  • IBGE: raw datasets from the 2010 Census conducted by the Brazilian Institute of Geography and Statistics;
  • Infocidade: multiple sources from the city of São Paulo's local government entities;
  • São Paulo Open Data Portal: multiple curated data sources from the city of São Paulo;

Technologies used

The main technologies used in this project were:

  1. Data processing: Apache Spark, pyspark;
  2. Geospatial data wrangling: Apache Sedona, geopandas, fiona;
  3. Data Versioning: dvc

Project Architecture

The project is broken down by different levels of granularities of Geospatial references. These are:

  1. Census Sectors: the lowest level of census information, that can be roughly approximated to a street block;
  2. Zip code different from a block, the zip code can be roughly approximated to a streets;
  3. Area of Ponderation: areas of ponderation are aggregations of census sectors, which vary in size. The level of interest is important as some of the data sources in the project are only described in such level;
  4. Neighborhoods: neighborhoods are not formally defined and place in between districts and areas of ponderation when it comes to size;
  5. Districts: districts are administrative regions in the city of São Paulo, which is defined formally by law (with that, their boundaries do not change that much over time);

The map below iilustrates this relationship:

Example map for Levels of Interest

Note: The outer red line indicates the districts. The green line represents the neighborhoods. The blue lines indicate zipcodes, as so on.

project-atlas-sao-paulo's People

Contributors

mateuspicanco avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.