Giter Club home page Giter Club logo

cmhc's Introduction

cmhc

R-CMD-check CRAN status CRAN_Downloads_Badge

API wrapper for extracting CMHC data out of the CMHC Housing Market Information Portal.

Reference

Documentation is available on the GitHub pages.

The example vignettes contain some common use cases.

Installation

The stable version of cmhc can be easily installed from CRAN.

install.packages("cmhc")

Alternatively, the latest development version can be installed from Github.

remotes::install_github("mountainmath/cmhc")

Usage

Consult the example vignette for more information. As an example, this is how to extract time series information for vacancy rate data by bedroom type for the Vancouver Census Metropolitan Area ("59933").

library(cmhc)
vacancy_data <- get_cmhc(survey="Rms",series="Vacancy Rate",dimension="Bedroom Type",
                         breakdown="Historical Time Periods",  geo_uid="59933")

Starting with version v.0.3.2 the package has an interactive query builder helper function select_cmhc_table() that interactively walks through the available data and builds parameters for get_cmhc() like the example above. This makes it easy to discover data and build function calls to CMHC tables.

Contributing

  • We encourage contributions to improve this project. The best way is through issues and pull requests.
  • If you want to get in touch, we are pretty good at responding via email or via twitter at @vb_jens.

Cite cmhc

If you wish to cite cmhc:

von Bergmann, J. (2024) cmhc: R package to access, retrieve, and work with CMHC data. v0.2.8. DOI: 10.32614/CRAN.package.cmhc

A BibTeX entry for LaTeX users is

  @Manual{cmhc,
    author = {Jens {von Bergmann}},
    title = {cmhc: R package to access, retrieve, and work with CMHC data},
    year = {2024},
    doi = {10.32614/CRAN.package.cmhc},
    note = {R package version 0.2.8},
    url = {https://mountainmath.github.io/cmhc/},
  }

Related packages

The cmhc package is designed to work well with the cancensus package working with Canadian Census data the cansim package for regular StatCan tables, and matches the census geographies via a GeoUID column that is shared across these packages. The tongfen package facilitates making geographies from different census years that CMHC reports on comparable over time.

CMHC Attribution

Subject to the CMHC Data License Agreement, licensed products using CMHC data should employ the following acknowledgement of source:

Acknowledgment of Source

a. You shall include the following notice on all reproductions of the Information:

Source: Canada Mortgage and Housing Corporation (CMHC), name of product or information, reference date. This information is reproduced and distributed on an “as is” basis with the permission of CMHC.

b. Where any Information is contained within a Value-added Product, you shall include on such Value-added Product the following notice:

Adapted from Canada Mortgage and Housing Corporation, name of product or information, reference date. This does not constitute an endorsement by Canada Mortgage and Housing Corporation of this product. or any other notice approved in advance in writing by CMHC.

cmhc's People

Contributors

bdbmax avatar daniel-simeone avatar dshkol avatar mountainmath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cmhc's Issues

Halifax data missing?

It seems that the Halifax data is now missing. Perhaps the underlying structure has been changed?

cmhc::get_cmhc(survey = "Rms", series = "Vacancy Rate", dimension = "Bedroom Type", breakdown = "Historical Time Periods", geo_uid = 205, year = 2020)
and
cmhc::get_cmhc(survey = "Rms", series = "Vacancy Rate", dimension = "Bedroom Type", breakdown = "Survey Zones", geo_uid = 205)

Both give "No data available."

The given survey/series/dimension/breakdown all appear in list_cmhc_tables() and the SGC Code (205) is as given by get_cmhc_geography(level = "MET")

This is the case for all calls to geo_uid 205 that I looked at (across all 31 tables that have both Historical Time Periods and Survey Zones as available breakdowns).

I'll try to look into the get_cmhc function to see what is happening.

Survey Zones naming over time

Hello!

By getting the data through get_cmhc from different year, the naming of what seems to be the same survey zone can differ over time; here's an example.

plateau <- lapply(2015:2016, \(yr) {
  out <- cmhc::get_cmhc(survey = "Rms",
                        series = "Vacancy Rate",
                        dimension = "Rent Ranges",
                        breakdown = "Survey Zones",
                        geo_uid = 24462,
                        year = yr)
  out$`Survey Zones`[grepl("^Plateau", out$`Survey Zones`)]
})

print(unique(do.call(c, plateau)))

Output: [1] "Plateau Mont-Royal" "Plateau-Mont-Royal"

Naming for le Plateau in Montreal changes overtime. Before 2015 (included), there was no hyphen, and after 2015, the hyphen appeared. I believe this is the same zone, but there's no way to really be sure? From the description of the get_cmhc_geography function, it's stated that the geographic data corresponds to an extract from 2017, and that it won't necessary match regions from other years.
Could a year argument be added to the get_cmhc_geography function, letting us match names to spatial polygon for every individual year? And then year over year we could match the actual zones rather than names that might differ from a single string (in the hypothetical case that this is indeed the same survey zone).

Here is another example of names differing in the data, and a zone disappearing in some years:

st_lin <- lapply(2016:2021, \(yr) {
  out <- cmhc::get_cmhc(survey = "Rms",
                        series = "Vacancy Rate",
                        dimension = "Rent Ranges",
                        breakdown = "Survey Zones",
                        geo_uid = 24462,
                        year = yr)
  out$`Survey Zones`[grepl("^Saint-Lin", out$`Survey Zones`)]
})

print(st_lin)

Output: 
[[1]]
character(0)

[[2]]
[1] "Saint-Lin\u0096Laurentides V" "Saint-Lin\u0096Laurentides V"
[3] "Saint-Lin\u0096Laurentides V" "Saint-Lin\u0096Laurentides V"
[5] "Saint-Lin\u0096Laurentides V" "Saint-Lin\u0096Laurentides V"
[7] "Saint-Lin\u0096Laurentides V"

[[3]]
character(0)

[[4]]
character(0)

[[5]]
character(0)

[[6]]
[1] "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V"
[4] "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V"
[7] "Saint-Lin-Laurentides V"

Maybe the zone just has a different naming in some years?

I think getting the survey zones geography for every year, if at all possible, would be the best way to fix these non-matching namings. These zones also have a METZONE_UID in the output of the get_cmhc_geography, which would help idenfity the zone coming from the data to the spatial zone, if that code was also in the output of the get_cmhc. But having seen the content of the httr::POST call, I understand there's only a name in that table to identify the zone; and as stated, this name isn't constant over years.

I understand CMHC data isn't super easy to work with! From your experience working with it, do you see a possibility to solve this problem? The only thing I can think of is either get spatial polygons of zones for every year (which would be very reliable), or merging years of data with names using the closest string match (less reliable).

Thanks !

More modular parameters

Eventually there should be a clear interface for selecting different data in different ways. It will take some experimenting and playing with the CMHC calls to figure out what the best way of structuring the interface.

Inaccessible geographies

Thanka a lot for this package, I look forward to using it to have much easier access to CMHC data!

While I can access data using cmhc::get_cmhc, I cannot access geographies using cmhc::get_cmhc_geography. It looks like the endpoint to the AWS bucket isn't right:

> cmhc::set_cmhc_cache_path("~/cmhc_cache", install = TRUE, overwrite = TRUE)
Your original .Renviron will be backed up and stored in your R HOME directory if needed.
Your cache path has been stored in your .Renviron and can be accessed by Sys.getenv("CMHC_CACHE_PATH").
[1] "~/cmhc_cache"
> cmhc::get_cmhc_geography(level = "ZONE")
Downloading geographies, this may take a minute...
List of 6
 $ Code     : chr "PermanentRedirect"
 $ Message  : chr "The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future "| __truncated__
 $ Endpoint : chr "mountainmath.s3.amazonaws.com"
 $ Bucket   : chr "mountainmath"
 $ RequestId: chr "DEVEV28Z0WV4E5FB"
 $ HostId   : chr "Wpn6zIAqge9ld7WR4zrY1cNmhGwV9tTfvVVw16Kxk6FVrY0AyYmfDrl+7xbYbTnbbeyZC7KTAG0="
 - attr(*, "headers")=List of 7
  ..$ x-amz-bucket-region: chr "ca-central-1"
  ..$ x-amz-request-id   : chr "DEVEV28Z0WV4E5FB"
  ..$ x-amz-id-2         : chr "Wpn6zIAqge9ld7WR4zrY1cNmhGwV9tTfvVVw16Kxk6FVrY0AyYmfDrl+7xbYbTnbbeyZC7KTAG0="
  ..$ content-type       : chr "application/xml"
  ..$ transfer-encoding  : chr "chunked"
  ..$ date               : chr "Thu, 06 Oct 2022 17:26:01 GMT"
  ..$ server             : chr "AmazonS3"
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 - attr(*, "class")= chr "aws_error"
NULL
Error in parse_aws_s3_response(r, Sig, verbose = verbose) : 
  Moved Permanently (HTTP 301).
In addition: Warning message:
In dir.create(file.path(base_directory)) :
  'C:\Users\maxim\OneDrive - McGill University\Documents\cmhc_cache' already exists

Let me know if there's any more information from my end you'd need me to share.

Thank you!

MetId is now required

There seems to be a change in the internal handling of the data on the data portal, to access up-to-date data one needs to specify the MetId in the POST parameters. In the current development version v0.2.6 this is now fudged in for regions within CMAs, but there seems to be an internal MetId for regions outside of CMAs that I will have to get from CMHC.

Rent ranges dimension

Hello again!

When getting the vacancy rates in the rent ranges dimension, the Rent Ranges column gets tweaked into (I believe) unwanted duplicated rows, and unreadable values.

 cmhc::get_cmhc(survey = "Rms", 
                            series = "Vacancy Rate", 
                            dimension = "Rent Ranges",
                            breakdown = "Survey Zones", 
                            geo_uid = "24462")
New names:                                                                                                                      
• `"$1` -> `"$1...6`
• `"$1` -> `"$1...10`
• `"$1` -> `"$1...14`
# A tibble: 390 × 7
   `Survey Zones`                    `Rent Ranges`    Value Quality     Censu…¹ Survey Series
   <chr>                             <fct>            <dbl> <fct>       <chr>   <chr>  <chr> 
 1 Downtown Montréal/Îles-des-Soeurs "Less Than $750"  NA   NA          2016    Rms    Vacan…
 2 Downtown Montréal/Îles-des-Soeurs "$750 - $999"      5.1 Fair (Use … 2016    Rms    Vacan…
 3 Downtown Montréal/Îles-des-Soeurs "\"$1...6"        NA   NA          2016    Rms    Vacan…
 4 Downtown Montréal/Îles-des-Soeurs "000 - $1"        NA   NA          2016    Rms    Vacan…
 5 Downtown Montréal/Îles-des-Soeurs "249\""            4.6 Good        2016    Rms    Vacan…
 6 Downtown Montréal/Îles-des-Soeurs "\"$1...10"        7.9 NA          2016    Rms    Vacan…
 7 Downtown Montréal/Îles-des-Soeurs "250 - $1"        NA   NA          2016    Rms    Vacan…
 8 Downtown Montréal/Îles-des-Soeurs "499\""           NA   NA          2016    Rms    Vacan…
 9 Downtown Montréal/Îles-des-Soeurs "\"$1...14"        6.3 NA          2016    Rms    Vacan…
10 Downtown Montréal/Îles-des-Soeurs "500 +\""         NA   NA          2016    Rms    Vacan…
# … with 380 more rows, and abbreviated variable name ¹​`Census geography`
# ℹ Use `print(n = ...)` to see more rows
Warning message:
Problem while computing `Value = parse_numeric(.data$Value)`.
ℹ NAs introduced by coercion 

Let me know if any more information is needed,
Thanks again!

Region codes

Should have a lookup table for region codes to make it easier to pull data for different regions. Right now the region codes are hard-coded into the parameters.

Column name variability - MET_CODE versus METCODE

The function get_cmhc_geography appears to have differerent column names for the METCODE depending on the level of geography selected.
The ZONE level is called MET_CODE while the MET level is called METCODE.
The latter looks like it might be hardcoded in the internal function census_to_cmhc_geocode, while the former may come from the gdb files.

library(tidyverse)
library(cmhc)
get_cmhc_geography("ZONE") %>%  select(starts_with("MET") )
get_cmhc_geography("MET") %>%    select(starts_with("MET") )

image

Nanaimo CMA_UID

The CMA_UID lookup for Nanaimo CMA (Census GeoUID 59938) is not working. The tables I got from CMHC code this to METCODE 4460, but inspecting the HMIP it now looks like this changed to 1100.

I will follow-up with CMHC to see if they have updated geographic identifiers, and if yes, get an updated code list.

For now I am hacking Nanaimo into the current active development branch for v0.2.8. The branch can be installed via

remotes::install_github("mountainmath/[email protected]")

Double CT identifiers

There is an issue with some StatCan CT identifiers appearing twice with different internal CMHC identifiers. For example

get_cmhc(survey = "Rms", series = "Vacancy Rate", dimension = "Bedroom Type", breakdown = "Historical Time Periods", geo_uid = "3050016.02") 

gives an error because there are two internal CMHC CTs for this StatCan census tract as can be seen when calling

cmhc::cmhc_ct_translation_data |> filter(CTUID=="3050016.02")

Not sure what's going on here, will have to check into this in more detail.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.