Giter Club home page Giter Club logo

census-with-r-book's Introduction

Welcome! I'm Kyle Walker, and here is some information about my current work:

If you are interested in working with me, send me a note at [email protected] and let's discuss your idea!

census-with-r-book's People

Contributors

dkahle avatar elbersb avatar pursuitofdatascience avatar walkerke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

census-with-r-book's Issues

Typo and update tracker

Tracking typos here as I find them. Readers: feel free to let me know here!

mb_matrix function not working

Hello

I am working with Ch. 7 in your book and I wanted to run the distance and proximity analysis you have in chapter 7.4. Specifically travel time.

I am bringing and csv file (destinations) and US Census block groups (centroids). When I run the mb_matrix function I get the following error:

image

Attached is the csv file for destinations.

I wonder if in the problem is in the geocoding process.

FamilyChildCareAndCentersAndPreschools.csv

I used tidygeocoder to geocode the destinations

childcare <- read.csv("D:/Googledrive/COE/FamilyChildCareAndCentersAndPreschools.csv") 

childcare$fulladdr <- paste(as.character(childcare$Street),
                            as.character(childcare$City),
                            as.character(childcare$State),
                            as.character(childcare$Zipcode))

geoCodechildcare <- geocode(childcare, address = 'fulladdr',
                            lat = latitude, long = longitude, method = "arcgis")

childcareSF <- st_as_sf(geoCodechildcare,
                        coords = c("longitude", "latitude"), na.fail = TRUE,
                        crs = 4326)

childcareSF <- st_transform(childcareSF, crs = 6501)


stearns_distances <- block_groups("MN", "Stearns", cb = TRUE)

st_crs(stearns_distances)

CRS.new <- st_crs("EPSG:6501")

stearns_6501 <- st_transform(stearns_distances, CRS.new)

library(mapboxapi)

# mb_access_token("pk............, install = TRUE)

# readRenviron("~/.Renviron")

times <- mb_matrix(stearns_6501, childcareSF)`

Column names in ch7 dfw_lisa

Running code in text localmoran _perm only generates 5 Column tibble data and next step in pipeline generates error. Have to manually reduce column names in 7.7.3 as no pi_simfolded,skewness, kurtosis columns. ? Hidden parameter

simpler (?) code for computing many segregation indices

Hi Kyle,
here's a small suggestion for 8.1.

Instead of

ca_urban_data %>%
  split(~urban_name) %>%
  imap_dfr(~{
    .x %>%
      filter(variable %in% c("white", "hispanic")) %>%
      dissimilarity(
        group = "variable",
        unit = "GEOID",
        weight = "estimate"
      )
  }, .id = "urban_name") %>%
  arrange(desc(est))

I'd propose

ca_urban_data %>%
  filter(variable %in% c("white", "hispanic")) %>%
  group_by(urban_name) %>%
  group_modify(~
      dissimilarity(.x,
        group = "variable",
        unit = "GEOID",
        weight = "estimate"
      )) %>% 
  arrange(desc(est))

This doesn't require mixing Base R (split) and tidyverse, and I find it more naturally that the operation is done on the grouped data frame. It's really only stylistic though, so feel free to close.

Possible error in code block in section 4.5.2?

4.5.2 Designing and styling the population pyramid

When I run the last code block in 4.5.2 I get:

Error: Breaks and labels are different lengths
Run rlang::last_error() to see where the error occurred.
In addition: Warning message:
Removed 30 rows containing missing values (position_stack).

FYI, the errors go away when I #comment out the following lines:

labels = ~ number_format(scale = .001, suffix = "k")(abs(.x)),

scale_y_discrete(labels = ~ str_remove_all(.x, "Age\\s|\\syears")) + 

Speedier database import

This looks great, so happy to see it all put together!

At lunch with @dtburk, we talked about how slow it sounded to import the 1910 full count data in section 11.2.2.

I was curious and so figured I'd report back, but please don't feel like you need to add another caveat about workflow. Also, I still remain mystified by database administration, so there may be drawbacks I'm unaware of. Plus dBeaver is a nice thing to show to help people get comfortable with databases.

Those caveats aside, in my experimentation, I found chunked reading to be a lot faster than dBeaver's import. On my machine it was probably closer to 2.5 hours through dBeaver, but this takes ~15 minutes, loading only 10k rows at a time (it also works with both csv and fixed width extracts).

read_ipums_micro_chunked(
  "usa_00004.xml", 
  IpumsSideEffectCallback$new(function(x, pos) {
    dbWriteTable(
      conn, Id(schema = "ipums", table = "census1910"), x, append = TRUE
    )
  })
)

Updates needed

  • In Chapter 7, Mapbox is no longer allowing depart_at for the driving profile; I'll need to change to driving-traffic.
  • In Chapter 5, I should modify to show that 2021 shapefiles are available

Rebuild the book and add more as needed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.