Giter Club home page Giter Club logo

geobr's Introduction

geobr: Download Official Spatial Data Sets of Brazil

logo

logo

geobr is a computational package to download official spatial data sets of Brazil. The package includes a wide range of geospatial data in geopackage format (like shapefiles but better), available at various geographic scales and for various years with harmonized attributes, projection and topology (see detailed list of available data sets below).

The package is currently available in R and Python.

R Python Repo
CRAN/METACRAN Version
CRAN/METACRAN Total downloads
CRAN/METACRAN downloads per month
Codecov test coverage
Lifecycle: stable
R build status
PyPI version
Downloads
Downloads
Lifecycle: maturing
Python build status
GitHub stars

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Installation R

# From CRAN
install.packages("geobr")
library(geobr)

# or use the development version with latest features
utils::remove.packages('geobr')
devtools::install_github("ipeaGIT/geobr", subdir = "r-package")
library(geobr)

obs. If you use Linux, you need to install a couple dependencies before installing the libraries sf and geobr. More info here.

Installation Python

pip install geobr

Windows users:

conda create -n geo_env
conda activate geo_env  
conda config --env --add channels conda-forge  
conda config --env --set channel_priority strict  
conda install python=3 geopandas  
pip install geobr

Basic Usage

The syntax of all geobr functions operate on the same logic so it becomes intuitive to download any data set using a single line of code. Like this:

R, reading the data as an sf object

library(geobr)

# Read specific municipality at a given year
mun <- read_municipality(code_muni=1200179, year=2017)

# Read all municipalities of given state at a given year
mun <- read_municipality(code_muni=33, year=2010) # or
mun <- read_municipality(code_muni="RJ", year=2010)

# Read all municipalities in the country at a given year
mun <- read_municipality(code_muni="all", year=2018)

More examples in the intro Vignette

Python, reading the data as a geopandas object

from geobr import read_municipality

# Read specific municipality at a given year
mun = read_municipality(code_muni=1200179, year=2017)

# Read all municipalities of given state at a given year
mun = read_municipality(code_muni=33, year=2010) # or
mun = read_municipality(code_muni="RJ", year=2010)

# Read all municipalities in the country at a given year
mun = read_municipality(code_muni="all", year=2018)

More examples here

Available datasets:

👉 All datasets use geodetic reference system "SIRGAS2000", CRS(4674).

Function Geographies available Years available Source
read_country Country 1872, 1900, 1911, 1920, 1933, 1940, 1950, 1960, 1970, 1980, 1991, 2000, 2001, 2010, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 IBGE
read_region Region 2000, 2001, 2010, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 IBGE
read_state States 1872, 1900, 1911, 1920, 1933, 1940, 1950, 1960, 1970, 1980, 1991, 2000, 2001, 2010, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 IBGE
read_meso_region Meso region 2000, 2001, 2010, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 IBGE
read_micro_region Micro region 2000, 2001, 2010, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 IBGE
read_intermediate_region Intermediate region 2017, 2019, 2020 IBGE
read_immediate_region Immediate region 2017, 2019, 2020 IBGE
read_municipality Municipality 1872, 1900, 1911, 1920, 1933, 1940, 1950, 1960, 1970, 1980, 1991, 2000, 2001, 2005, 2007, 2010, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022 IBGE
read_municipal_seat Municipality seats (sedes municipais) 1872, 1900, 1911, 1920, 1933, 1940, 1950, 1960, 1970, 1980, 1991, 2010 IBGE
read_weighting_area Census weighting area (área de ponderação) 2010 IBGE
read_census_tract Census tract (setor censitário) 2000, 2010, 2017, 2019, 2020 IBGE
read_statistical_grid Statistical Grid of 200 x 200 meters 2010 IBGE
read_metro_area Metropolitan areas 1970, 2001, 2002, 2003, 2005, 2010, 2013, 2014, 2015, 2016, 2017, 2018 IBGE
read_urban_area Urban footprints 2005, 2015 IBGE
read_amazon Brazil's Legal Amazon 2012 MMA
read_biomes Biomes 2004, 2019 IBGE
read_conservation_units Environmental Conservation Units 201909 MMA
read_disaster_risk_area Disaster risk areas 2010 CEMADEN and IBGE
read_indigenous_land Indigenous lands 201907, 202103 FUNAI
read_semiarid Semi Arid region 2005, 2017 IBGE
read_health_facilities Health facilities 201505, 202303 CNES, DataSUS
read_health_region Health regions and macro regions 1991, 1994, 1997, 2001, 2005, 2013 DataSUS
read_neighborhood Neighborhood limits 2010 IBGE
read_schools Schools 2020, 2023 INEP
read_comparable_areas Historically comparable municipalities, aka Áreas mínimas comparáveis (AMCs) 1872, 1900, 1911, 1920, 1933, 1940, 1950, 1960, 1970, 1980, 1991, 2000, 2010 IBGE
read_urban_concentrations Urban concentration areas (concentrações urbanas) 2015 IBGE
read_pop_arrangements Population arrangements (arranjos populacionais) 2015 IBGE

Other functions:

Function Action
list_geobr List all datasets available in the geobr package
lookup_muni Look up municipality codes by their name, or the other way around
grid_state_correspondence_table Loads a correspondence table indicating what quadrants of IBGE's statistical grid intersect with each state
cep_to_state Determine the state of a given CEP postal code
... ...

Note 1. Data sets and Functions marked with "dev" are only available in the development version of geobr.

Note 2. Most data sets are available at scale 1:250,000 (see documentation for details).

Coming soon:

Geography Years available Source
read_census_tract 2007 IBGE
Longitudinal Database* of micro regions various years IBGE
Longitudinal Database* of Census tracts various years IBGE
... ... ...

'*' Longitudinal Database refers to áreas mínimas comparáveis (AMCs)

Contributing to geobr

If you would like to contribute to geobr and add new functions or data sets, please check this guide to propose your contribution.


Related projects

As of today, there is another R package with similar functionalities: simplefeaturesbr. The geobr package has a few advantages when compared to simplefeaturesbr, including for example:

  • A same syntax structure across all functions, making the package very easy and intuitive to use
  • Access to a wider range of official spatial data sets, such as states and municipalities, but also macro-, meso- and micro-regions, weighting areas, census tracts, urbanized areas, etc
  • Access to shapefiles with updated geometries for various years
  • Harmonized attributes and geographic projections across geographies and years
  • Option to download geometries with simplified borders for fast rendering
  • Stable version published on CRAN for R users, and on PyPI for Python users

Similar packages for other countries/continents


Credits ipea

Original shapefiles are created by official government institutions. The geobr package is developed by a team at the Institute for Applied Economic Research (Ipea), Brazil. If you want to cite this package, you can cite it as:

  • Pereira, R.H.M.; Gonçalves, C.N.; et. all (2019) geobr: Loads Shapefiles of Official Spatial Data Sets of Brazil. GitHub repository - https://github.com/ipeaGIT/geobr.

geobr's People

Contributors

alandasilva avatar aspeddro avatar augusto-herrmann avatar babisan08 avatar bafurtado avatar caiong avatar cavedo95 avatar dependabot[bot] avatar felipegermanos avatar guiducar avatar igorfnasc avatar jimhester avatar joaocarabetta avatar jtrecenti avatar jvfe avatar kauebraga avatar mralbu avatar olivroy avatar pauloaraujoo avatar pedro-andrade-inpe avatar pedrojorge7 avatar rafapereirabr avatar rodrigoarruda14 avatar samuel-rosa avatar vss-2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geobr's Issues

Include data of `faces de quadra` do IBGE

data: ftp://geoftp.ibge.gov.br/recortes_para_fins_estatisticos/malha_de_setores_censitarios/censo_2010/base_de_faces_de_logradouros/

info:
ftp://geoftp.ibge.gov.br/recortes_para_fins_estatisticos/malha_de_setores_censitarios/censo_2010/base_de_faces_de_logradouros/1_Leia_me/Base%20de%20Faces%20de%20Logradouros%20do%20CD%202010.pdf

add Terras Indígenas

for future consideration: add a read_terras_indigenas function that grabs polygons from FUNAI

Create vignette 2

Igor, eu sugiro mover a vignette 2 de Georeferencing-gain para outro branch do repositório, e trazer ela para o main branch quando tivermos artigo em estágio mais avançado. Assim ela nao entra na submissão versão 1.0 do pacote para o CRAN. O que acha?

Suggestion: make code_muni="all" the default (and also for code_uf)

personal opinion here: I don't like to have to tell the function which subset of the data I want. The natural expectation is that you will get all the data. Making code_muni="all" the default would avoid this.

I think having this default at least for the small datasets (mun, uf, micro, macro) makes sense.

Far large dataset such as setor censitario, faces, etc. than you could force user to define *_muni="all"

Urbanized areas

  1. Create script prep_urban_areas do download and clan IBGE data on urbanized areas (years 2005 and 2015)

  2. Create geobr function read_urban_area() to download the data

obs. data available at ftp://geoftp.ibge.gov.br/organizacao_do_territorio/tipologias_do_territorio/areas_urbanizadas_do_brasil

Add function to read grade_estatistica 2010

read_grade( cod_uf = xxxxx, cod_muni = xxxxx, year = 2010) { 

# read sf municipality
   temp_muni <- read_muni(cod_muni = xxxxx, year = 2010)

# read bbox das grades do brasil (PRECISA SER CRIADO)

# overlay muni and grade bboxes

# identify grade id

# fazer download do grade ID do muni.ZIP

# Unzipar grade ID do muni

# ler grade ID do muni

# crop do muni sf e grade_id

}

add UF, or name UF to municipality dataset

e ai @rafapereirabr ! Ta ficando legal!!

Minor suggestion: add UF name to municipalities dataset.

Sometimes the person will only have UF names and Municipality names (as in my case now with the Brasil Mais Produtivo data). And municipal names are not unique, only within State.

Corrigir nome de colunas na base de area de ponderação

Paulo, alterar o nome das colunas para seguir padrão do pacote. Isso precisa ser alterado tanto no script da função quanto na base de dados

nome das cols atualmente: cod_areapond, cod_mum, cod_uf

como deve ficar: code_weighting_area, code_muni, code_state

single progress bar for read functions?

Function calls such as:

states <- read_state(year=2010, code_state = "all")

create one progress bar each time a state will be donloaded, summing up 27 progress bars in this case. Possibly creating one single progress bar that grows as each state is downloaded would be more interesting for the user.

read_uf por sigla

incluir opção de ler a sigla do estado, exemplo m <- read_uf(cod_uf="SP", ano=2010)

microregion codes are not retained fully by read_micro_region

when I choose a single UF, like cod_micro = 24 , the read_micro_region function returns the right geometries, but it does not retain the full microregion codes. Instead they are all "24".

That will be a serious problem if a user later wants to merge microregion data (employment rates, % rural population, ...) from some other source onto this dataframe.

Add function `read_country`

Adicionar função read_country. Que envolve

# 1 carregar dados dos estados
read_state(cod_state="all")
# Dar merge nos poligonos. Um das funções abaixo:
st_union 

 st_combine

Municipalities in the border of regions

It would be interesting if the function (or functions) implemented to solve #38, #39, and #45 allows the user to choose what to do with municipalities that are not fully within the region. Some possibilities:

  1. Include all municipalities as long as they have some overlap with the region (maybe the default)
  2. Remove the municipalities in the border
  3. Cut the polygos of municipalities in the border in order to guarantee that the returned area is the same of the region

Package takes a lot of space: 95mb

Do you really need the files at: /geobr/data/* ?

It seems this makes the package much larger than necessary (I assume because brazil_2010 is a geometry right?) in terms of size.

Include 2010 data in `./data`

Include table with metadata of all geometries ./data. Something like:

municipality_name municipality_code state_name state_code region_name state_initials region_code geom
... ... ... ... ... ... ... ...
  • Na coluna geom informar dados referentes aos municipios

Create vignettes

Create one or two vignettes demonstrating the package functions

holes in read_country() with no arguments

read_country() with no arguments returns the geometry of Brazil in 2010 with several holes. The data for 2014 and 2015 also have some holes. Perhaps removing the holes manually after computing the union solves the problem as shown in the code below.

(I saw in another issue the discussion that st_union is very slow. unionSpatialPolygons from maptools is much faster)

require(geobr)
require(dplyr)
require(sp)
require(sf)
require(maptools)

sp_states <- read_state(year=2010, code_state = "all") %>% as("Spatial")

result <- unionSpatialPolygons(sp_states, rep(TRUE, 27))

outerRings = Filter(function(f){f@ringDir==1},result@polygons[[1]]@Polygons)
outerBounds = SpatialPolygons(list(Polygons(outerRings,ID=1)))
plot(outerBounds)

m <- st_as_sf(outerBounds)

write_sf(m, "brazil.shp")

Harmonizing data columns across years: Mesoregion

Colocar todas bases de mesoregiões com mesma estrutura de arquivo. Exemplo:

          nome_meso cod_meso  Geometry
1 Leste Rondoniense     1102  POLYGON ((-62.22055 -8.5908...
2   Madeira-Guaporé     1101  POLYGON ((-63.32721 -7.9767...
...

Python Version

Is there any work/planning to build a python version?

If not, can I start one? There is no licensing on the project. So I am not sure about if you are ok with other users building on your .rds files.

read_weighting_area retornando "Error in parse_url"

test <- read_weighting_area(code_weighting = 35, year=2010)
test <- read_weighting_area(code_weighting = "SP", year=2010)

Os códigos acima retornam "Error in parse_url(url) : length(url) == 1 is not TRUE".

Estou entrando algo errado, ou a função está com problemas?

Harmonizing data columns across years: Municipalities

Colocar todas bases de município com mesma estrutura de arquivo. Exemplo:

         nome_mun   cod_mun      geometry
1      Acrelândia   1200013      POLYGON ((-67.13424 -9.6762...
2    Assis Brasil   1200054      POLYGON ((-69.5814 -10.3806...
...

read_biomes() and default year

A call to read_biomes() without any argument does not work as it requires an year. It could work even without any argument by having 2004 as default value. I don't know whether the other read functions have a default year, but I think they could always return the latest data available when the year argument is not used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.