walkerke / census-with-r-book Goto Github PK

View Code? Open in Web Editor NEW

76.0 76.0 27.0 93.2 MB

Source for Analyzing US Census Data: Methods, Maps, and Models in R by Kyle Walker, published with CRC Press

Home Page: https://walker-data.com/census-r

License: Other

R 0.31% Shell 0.01% TeX 0.31% CSS 1.25% HTML 81.29% JavaScript 16.83% Python 0.01%

census-with-r-book's Introduction

Welcome! I'm Kyle Walker, and here is some information about my current work:

I'm the author of Analyzing US Census Data: Methods, Maps, and Models in R, available to read online for free and forthcoming in print with CRC Press in 2022.
I'm an R developer actively working on the following packages:
- tidycensus, which helps R users get demographic & spatial data from the US Census Bureau ready-to-go for use in their analyses;
- tigris, which downloads US Census Bureau spatial data and loads it directly into R as simple features objects;
- mapboxapi, an R interface to Mapbox web services. Use the package to optimize routes, draw isochrones, read and write vector tiles, use custom Mapbox maps in Leaflet projects, and more;
- crsuggest, which gives R users projected coordinate system suggestions for their spatial datasets.
I'm an academic researching data science and visualization tools for spatial demography. I'm currently teaching courses in exploratory data analysis with Python and introductory Urban Studies.
I am Director of Research at a boutique data science and strategy firm, where we use cutting-edge geospatial & machine learning methods to improve companies' business outcomes.
I also consult through my personal firm, Walker Data, where I work with individual clients and organizations to integrate tools like tidycensus into their workflows and to learn R and spatial data analysis.

If you are interested in working with me, send me a note at [email protected] and let's discuss your idea!

census-with-r-book's People

Contributors

Stargazers

Watchers

census-with-r-book's Issues

Typo and update tracker

Tracking typos here as I find them. Readers: feel free to let me know here!

No chapter reference link to Ch9; incorrect link to Ch10 https://walker-data.com/census-r/analyzing-census-microdata.html#pums-data-and-the-tidyverse
Should say the 2010 Brazilian Census, not 2020 https://walker-data.com/census-r/analyzing-census-microdata.html#pums-data-and-the-tidyverse
Broken reference due to misspelling https://walker-data.com/census-r/analyzing-census-microdata.html#pums-data-and-the-tidyverse

mb_matrix function not working

Hello

I am working with Ch. 7 in your book and I wanted to run the distance and proximity analysis you have in chapter 7.4. Specifically travel time.

I am bringing and csv file (destinations) and US Census block groups (centroids). When I run the mb_matrix function I get the following error:

Attached is the csv file for destinations.

I wonder if in the problem is in the geocoding process.

FamilyChildCareAndCentersAndPreschools.csv

I used tidygeocoder to geocode the destinations

childcare <- read.csv("D:/Googledrive/COE/FamilyChildCareAndCentersAndPreschools.csv") 

childcare$fulladdr <- paste(as.character(childcare$Street),
                            as.character(childcare$City),
                            as.character(childcare$State),
                            as.character(childcare$Zipcode))

geoCodechildcare <- geocode(childcare, address = 'fulladdr',
                            lat = latitude, long = longitude, method = "arcgis")

childcareSF <- st_as_sf(geoCodechildcare,
                        coords = c("longitude", "latitude"), na.fail = TRUE,
                        crs = 4326)

childcareSF <- st_transform(childcareSF, crs = 6501)


stearns_distances <- block_groups("MN", "Stearns", cb = TRUE)

st_crs(stearns_distances)

CRS.new <- st_crs("EPSG:6501")

stearns_6501 <- st_transform(stearns_distances, CRS.new)

library(mapboxapi)

# mb_access_token("pk............, install = TRUE)

# readRenviron("~/.Renviron")

times <- mb_matrix(stearns_6501, childcareSF)`

Running code in text localmoran _perm only generates 5 Column tibble data and next step in pipeline generates error. Have to manually reduce column names in 7.7.3 as no pi_simfolded,skewness, kurtosis columns. ? Hidden parameter

simpler (?) code for computing many segregation indices

Hi Kyle,
here's a small suggestion for 8.1.

Instead of

ca_urban_data %>%
  split(~urban_name) %>%
  imap_dfr(~{
    .x %>%
      filter(variable %in% c("white", "hispanic")) %>%
      dissimilarity(
        group = "variable",
        unit = "GEOID",
        weight = "estimate"
      )
  }, .id = "urban_name") %>%
  arrange(desc(est))

I'd propose

ca_urban_data %>%
  filter(variable %in% c("white", "hispanic")) %>%
  group_by(urban_name) %>%
  group_modify(~
      dissimilarity(.x,
        group = "variable",
        unit = "GEOID",
        weight = "estimate"
      )) %>% 
  arrange(desc(est))

This doesn't require mixing Base R (split) and tidyverse, and I find it more naturally that the operation is done on the grouped data frame. It's really only stylistic though, so feel free to close.

Possible error in code block in section 4.5.2?

4.5.2 Designing and styling the population pyramid

When I run the last code block in 4.5.2 I get:

Error: Breaks and labels are different lengths
Run rlang::last_error() to see where the error occurred.
In addition: Warning message:
Removed 30 rows containing missing values (position_stack).

FYI, the errors go away when I #comment out the following lines:

labels = ~ number_format(scale = .001, suffix = "k")(abs(.x)),

scale_y_discrete(labels = ~ str_remove_all(.x, "Age\\s|\\syears")) +

Speedier database import

This looks great, so happy to see it all put together!

At lunch with @dtburk, we talked about how slow it sounded to import the 1910 full count data in section 11.2.2.

I was curious and so figured I'd report back, but please don't feel like you need to add another caveat about workflow. Also, I still remain mystified by database administration, so there may be drawbacks I'm unaware of. Plus dBeaver is a nice thing to show to help people get comfortable with databases.

Those caveats aside, in my experimentation, I found chunked reading to be a lot faster than dBeaver's import. On my machine it was probably closer to 2.5 hours through dBeaver, but this takes ~15 minutes, loading only 10k rows at a time (it also works with both csv and fixed width extracts).

read_ipums_micro_chunked(
  "usa_00004.xml", 
  IpumsSideEffectCallback$new(function(x, pos) {
    dbWriteTable(
      conn, Id(schema = "ipums", table = "census1910"), x, append = TRUE
    )
  })
)

Updates needed

In Chapter 7, Mapbox is no longer allowing depart_at for the driving profile; I'll need to change to driving-traffic.
In Chapter 5, I should modify to show that 2021 shapefiles are available

Rebuild the book and add more as needed

walkerke / census-with-r-book Goto Github PK

census-with-r-book's Introduction

census-with-r-book's People

Contributors

Stargazers

Watchers

Forkers

census-with-r-book's Issues

Typo and update tracker

mb_matrix function not working

Column names in ch7 dfw_lisa

simpler (?) code for computing many segregation indices

Possible error in code block in section 4.5.2?

Speedier database import

Updates needed

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent