Giter Club home page Giter Club logo

aopdata's Introduction

aopdata: Data from the Access to Opportunities Project

CRAN/METACRAN Version CRAN/METACRAN Total downloads Codecov test coverage cmd check status

logo

aopdata is an R package to download data from the Access to Opportunities Project (AOP). The AOP is a research initiative led by the Institute for Applied Economic Research (Ipea) with the aim to study transport access to opportunities in Brazilian cities.

The aopdata package brings annual estimates of access to employment, health, education and social protection services by transport mode at a fine spatial resolution for the 20 largest cities in Brazil. The package also brings data on the spatial distribution of population by sex, race, income and age, as well as the distribution of jobs, schools, health care facilities and social assistance reference centers.

Data for 2017, 2018 and 2019 are already available, and cover accessibility estimates by car and active transport modes (walking and cycling) for the 20 largest cities in the country, and by public transport for over 9 major cities. More information on the AOP website.

Installation

# From CRAN
install.packages("aopdata")
library(aopdata)

# or use the development version with latest features
utils::remove.packages('aopdata')
devtools::install_github("ipeaGIT/aopdata", subdir = "r-package")
library(aopdata)

Overview of the package

The aopdata package includes five core functions.

  • read_population() - Download population data
  • read_landuse() - Download landuse data
  • read_access() - Download accessibility estimates
  • aopdata_dictionary() - Opens aopdata data dictionary on a web browser
  • read_grid() - Download the H3 hexagonal spatial grid

For a detailed explanations of these functions, check the vignettes:

Basic Usage

Data dictionary

The dictionary of data columns is presented in the documentation of each function. However, you can also open the data dictionary on a web browser by running:

# for English
aopdata_dictionary(lang = 'en')

# for Portuguese
aopdata_dictionary(lang = 'pt')

Accessibility estimates

The read_access() function downloads accessibility estimates for a given city, mode and year. For the sake of convenience, this function will also automatically download the population and land use data for the cities selected. Note that accessibility estimates are available for peak and off-peak periods for public_transportand car modes.

# Download accessibility, population and land use data
cur <- read_access(
  city = 'Curitiba',
  mode = 'public_transport', 
  peak = TRUE,
  year = 2019
  )

You many also set the parameter geometry = TRUE so that functions return a spatial sf object with the geometries of the H3 spatial grid.

# Download accessibility, population and land use data
cur <- read_access(
  city = 'Curitiba', 
  mode = 'public_transport', 
  peak = TRUE,
  year = 2019,
  geometry = TRUE
  )

Population and land use data

In case you are only interested in using the population and land use data generated by the Access to Opportunities Project, you can download these data sets separately. Please note that the population available comes from the latest Brazilian 2010 census, while land use data cna be downloaded for 2017, 2018 or 2019.

# Land use data
lu_for <- read_landuse(
  city = 'Fortaleza', 
  year = 2019,
  geometry = TRUE
  )

# Population data
pop_for <- read_population(
  city = 'Fortaleza', 
  year = 2010,
  geometry = TRUE
  )

Read only spatial grid data

In case you would like to download only the H3 spatial grid of cities in the AOP project, you can use the read_grid() function.

h3_for <- read_grid(city = 'Fortaleza')

Note

In all of the functions above, note that:

  • The city parameter can also be a 3-letter abbreviation of the city.
df <- read_access(city = 'cur', mode = 'public_transport', year = 2019)
df <- read_grid(city = 'for')
  • You may also download the data for all cities of the project at once using city = 'all':
all <- read_landuse(city = 'all', year = 2019)

Acknowledgement ipea

The R package aopdata is developed by a team at the Institute for Applied Economic Research (Ipea), Brazil.

Citation

If you use this package in your own work, please cite it as one of the publications below:

Population and land use data

  • Pereira, Rafael H. M. et al. (2022) Distribuição espacial de características sociodemográficas e localização de empregos e serviços públicos das vinte maiores cidades do Brasil. Texto para Discussão 2772. Ipea - Instituto de Pesquisa Econômica Aplicada. Available at http://repositorio.ipea.gov.br/handle/11058/11225

Accessibility data

  • Pereira, Rafael H. M. et al. (2022) Estimativas de acessibilidade a empregos e serviços públicos via transporte ativo, público e privado nas 20 maiores cidades do Brasil em 2017, 2018, 2019. Texto para Discussão. Ipea - Instituto de Pesquisa Econômica Aplicada.

aopdata's People

Contributors

dhersz avatar diegobt86 avatar joaobazzo avatar kauebraga avatar mvpsaraiva avatar rafapereirabr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

aopdata's Issues

separate function for "land use" and population data.

The reason for this is beacause they como from different years.

  • Land use data is annual data from 2017 onwards.
  • Current population data is from the 2010 census. We will eventually add population data from the next census (2021? 2022?)

Landuse database is duplicated when year is 2017 or 2018

The landuse database (from read_landuse) is coming duplicated when the chosen year is 2017 or 2018:

> landuse_2017 <- aopdata::read_landuse(city = c("for"), 
                                       year = 2017,
                                       geometry = FALSE)

> table(landuse_2017$year)

2017 2019 
2562 2562 

Also, all the population observations from 2017 or 2018 are NA for that year:

> a <- landuse_2017[year == 2017]
> summary(a$P001)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
     NA      NA      NA     NaN      NA      NA    2562 

The issue may be here, where the year column from the aop_population dataset is always equal to 2019. When merging the two datasets with all = TRUE and year = 2017|2018 from the landuse, the two datasets are rbinded.

aop <- data.table::merge.data.table(aop_population, aop_landuse, by = c('year', 'abbrev_muni', 'name_muni', 'code_muni', 'id_hex'), all = TRUE)

alert for internet connection problem

I just got this from the CRAN team:

Dear maintainer,

Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_aopdata.html.

Please correct before 2021-04-12 to safely retain your package on CRAN.

It seems we need to remind you of the CRAN policy:

'Packages which use Internet resources should fail gracefully with an informative message
if the resource is not available or has changed (and not give a check warning nor error).'

This needs correction whether or not the resource recovers.

The CRAN Team

and a secont message from them:

I have seen several different failures in the last few days:

--- re-building ‘access_maps.Rmd’ using rmarkdown
Linking to GEOS 3.8.1, GDAL 3.0.4, PROJ 6.3.2
Quitting from lines 36-43 (access_maps.Rmd)
Error: processing vignette 'access_maps.Rmd' failed with diagnostics:
object 'type' not found
--- failed re-building ‘access_maps.Rmd’

--- re-building ‘landuse_maps.Rmd’ using rmarkdown
Linking to GEOS 3.8.1, GDAL 3.0.4, PROJ 6.3.2
Quitting from lines 35-42 (landuse_maps.Rmd)
Error: processing vignette 'landuse_maps.Rmd' failed with diagnostics:
Cannot open "/tmp/RtmprkkzTh/working_dir/RtmpKy82ey/hex_grid_for.gpkg";
The source could be corrupt or not supported. See st_drivers() for a
list of supported formats.
--- failed re-building ‘landuse_maps.Rmd’

--- re-building ‘access_maps.Rmd’ using rmarkdown
Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 6.3.2
Using year 2019
Using mode public_transport
Quitting from lines 36-43 (access_maps.Rmd)
Error: processing vignette 'access_maps.Rmd' failed with diagnostics:
Timeout was reached: [www.ipea.gov.br] Connection timed out after 10013
milliseconds
--- failed re-building ‘access_maps.Rmd’

And that website currently also has

Check: whether package can be installed
Result: WARN
Found the following significant warnings:
Warning: unable to re-encode 'utils.R' line 334

(checking in Latin-1) and

 --- re-building ‘landuse_maps.Rmd’ using rmarkdown
 Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
 Quitting from lines 35-42 (landuse_maps.Rmd)
 Error: processing vignette 'landuse_maps.Rmd' failed with diagnostics:
 some columns are not in the data.table:

abbrev_muni,name_muni,code_muni,id_hex
--- failed re-building ‘landuse_maps.Rmd’

--- re-building ‘landuse_maps.Rmd’ using rmarkdown
Warning in engine$weave(file, quiet = quiet, encoding = enc) :
Pandoc (>= 1.12.3) and/or pandoc-citeproc not available. Falling
back to R Markdown v1.
Linking to GEOS 3.6.4, GDAL 2.2.4, PROJ 5.2.0
Quitting from lines 35-42 (landuse_maps.Rmd)
Error: processing vignette 'landuse_maps.Rmd' failed with diagnostics:
Timeout was reached: [www.ipea.gov.br] Operation timed out after
10140 milliseconds with 0 out of 0 bytes received
--- failed re-building ‘landuse_maps.Rmd’

--
Brian D. Ripley,

CMD Check fails on Linux

Vignettes are throwing this error on both macOS-latest (oldrel) and ubuntu-20.04 (release).

  • creating vignettes ... ERROR
    Error: --- re-building ‘access_inequality.Rmd’ using rmarkdown
    Quitting from lines 26-31 (access_inequality.Rmd)
    Error: Error: processing vignette 'access_inequality.Rmd' failed with diagnostics:
    package or namespace load failed for 'sf' in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
    there is no package called 'units'
    --- failed re-building ‘access_inequality.Rmd’

--- re-building ‘access_maps.Rmd’ using rmarkdown
Quitting from lines 27-31 (access_maps.Rmd)
Error: Error: processing vignette 'access_maps.Rmd' failed with diagnostics:
package or namespace load failed for 'sf' in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
there is no package called 'units'
--- failed re-building ‘access_maps.Rmd’

Filter only available cities when reading access for public transport

Right now, the behavior when trying to download accessibility by public transport for all cities returns an error:

# download data
aop_data <- aopdata::read_access(city = "all", mode = "public_transport", year = 2019, geometry = TRUE)

`Using mode public_transport
Downloading accessibility data for the year 2019
  |======================================================================================================| 100%
Error in aopdata::read_access(city = "all", mode = "public_transport",  : 
  One of the selected cities does not have public transport data for that year.`

I think it would be nice to pre-filter only cities with public transport for that year and return the accessibility estimates for them, instead of returning an error.
A message indicating the cities would be nice as well.

CRAN issues, yet again.

Guys, I need some help here. This is the message I got today:

Dear maintainer,

package aopdata_0.2.2.tar.gz has been auto-processed.
The auto-check found additional issues for the last version released on CRAN:
M1mac https://www.stats.ox.ac.uk/pub/bdr/M1mac/aopdata.out
CRAN incoming checks do not test for these additional issues and you will need an appropriately instrumented build of R to reproduce these.
Hence please reply-all and explain: Have these been fixed?

Log dir: https://win-builder.r-project.org/incoming_pretest/aopdata_0.2.2_20210423_192923/
The files will be removed after roughly 7 days.
Installation time in seconds: 5
Check time in seconds: 98
R version 4.1.0 alpha (2021-04-22 r80209)

Pretests results:
Windows: https://win-builder.r-project.org/incoming_pretest/aopdata_0.2.2_20210423_192923/Windows/00check.log
Status: OK
Debian: https://win-builder.r-project.org/incoming_pretest/aopdata_0.2.2_20210423_192923/Debian/00check.log
Status: OK

Last released version's CRAN status: ERROR: 3, WARN: 2, OK: 8
See: https://CRAN.R-project.org/web/checks/check_results_aopdata.html

Last released version's additional issues:
M1mac https://www.stats.ox.ac.uk/pub/bdr/M1mac/aopdata.out

CRAN Web: https://cran.r-project.org/package=aopdata

No strong reverse dependencies to be checked.

Best regards,
CRAN teams' auto-check service
Flavor: r-devel-linux-x86_64-debian-gcc, r-devel-windows-ix86+x86_64
Check: CRAN incoming feasibility, Result: Note_to_CRAN_maintainers
Maintainer: 'Rafael H. M. Pereira [email protected]'

Allow for user to download data few selected cities


a <- read_access(city = c('Fortaleza', 'Recife'), mode = 'public_transport',  year = 2019)
table(a$abbrev_muni)
table(a$mode)


b <- read_access(city = c('for', 'rec'), mode = 'public_transport',  year = 2019)
table(b$abbrev_muni)
table(b$mode)

Improve error message when `mode = car`

Error message could be more specific about the availability of access estimates by car

df_car <- read_access(
  city='Curitiba',
  mode='car',
  year=2018,
  peak = F,
  geometry = TRUE,
  showProgress = FALSE
)

Error in select_mode_input(temp_meta, mode = m) :
Error: This 'mode' is not available for this 'city' & 'year.' It must be one of the following: bicycle public_transport walk

non-ASCII characters

The function rm_accent() currently throws this WARNING with CMD Check:

checking R files for non-ASCII characters ... WARNING
Found the following file with non-ASCII characters:
utils.R
Portable packages must use only ASCII characters in their R code,
except perhaps in comments.
Use \uxxxx escapes for other characters.

CRAN fixes

Two issues raised by CRAM:

  1. Also, 'Writing R Extensions' asks you not to use progress bars in non-interactive use such as R CMD check.

  2. 'Packages which use Internet resources should fail gracefully with an informative message if the resource is not available or has changed (and not give a check warning nor error).'

Internet resources should fail gracefully

This again:

'Packages which use Internet resources should fail gracefully with an informative message
if the resource is not available or has changed (and not give a check warning nor error).'

Encoding issue

We got the following message from CRAN:

Version: 0.2.0
Check: whether package can be installed
Result: WARN
Found the following significant warnings:
Warning: unable to re-encode 'utils.R' line 382
Flavor: r-devel-linux-x86_64-debian-clang

I'm replacing the rm_accent() function with a simple base::iconv(city, to="ASCII//TRANSLIT")

Installation via GitHub

I'm trying to install the package from GitHub, but the following message appears:

> devtools::install_github("ipeaGIT/aopdata")
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo ipeaGIT/aopdata@HEAD
Error: Failed to install 'aopdata' from GitHub:
  Does not appear to be an R package (no DESCRIPTION)

read_access sugestions

I've been downloading the data through the read_access function and I saw three enhancements that could be made.

  1. The first is the need to mention the cities code in the read_access, read_population, read_grid, and read_landuse documentation. I could only find the codes when I downloaded using city = all.

  2. Also, the function could be implemented to accept a vector of cities' names, instead of a single city. By doing this the user doesn't need to use any lapply() command to extract the desired files.

  3. When we use mode = 'public_transport' with city = 'all', we get this error:

> a <- aopdata::read_access(city = 'all', mode = 'public_transport', 
peak = TRUE, year = 2019, geometry = TRUE)
Using mode public_transport
Downloading accessibility data from year 2019
  |=====================================================| 100%
Error in aopdata::read_access(city = "all", mode = "public_transport",  : 
  One of the selected cities does not have public transport data for that year.

There could be a condition that uses c("for", "rec", "bho","rio", "spo", "cur", "poa") when the commands (mode = 'public_transport' with city = 'all') are used together.

If you guys agree with these suggestions, I can go forward and open a PR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.