safegraphinc / safegraphr Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 4.0 5.1 MB

R code for common, repeatable data wrangling and analysis of SafeGraph data

Home Page: https://safegraphinc.github.io/SafeGraphR/

License: Apache License 2.0

R 100.00%

safegraphr's People

Contributors

Stargazers

Watchers

Forkers

jwilliamsholt predt badruddoza felixgeo

safegraphr's Issues

Link to your package down website

You may want to put this link in your about section.

https://safegraphinc.github.io/SafeGraphR/

Turn on GitHub Pages

Hi there @felixsafegraph ! Not sure whether it's better to reach out here or on Slack. But the SafeGraphR package is ready for beta, and there's a docsite in here now. Could you turn on GitHub pages for the SafeGraphR github (set to docs/)? I don't have access. Thank you!

Install error

Any solution for this install error:

Number of files (0) does not match number of start_dates (1) to go along with them

Hello -

I am running the following code on data downloaded directly from the SafeGraph Shop:

pitt.data <- read_shop(
  filename = "safe-graph-data.zip",
  keeplist = c("patterns", "home_panel_summary.csv"),
  by = "placekey",
  expand_int = "visitors_by_day",
  name = "visits",
  start_date = lubridate::ymd("2018-01-01"))

However, doing so results in the following error:

Error in read_many_patterns(filelist = patfiles, dir = exdir, recursive = FALSE,  : 
  Number of files (0) does not match number of start_dates (1) to go along with them.

My desired result is having the visitors by day calculated for each placekey by day (2018-01-01, 2018-01-02, etc., etc.).

Looking forward to a response.

Read_many_patterns ERROR

So I am trying to use the Read_Many_Patterns function, however, I keep getting this ERROR:
Attempted to find start_date from filename but failed. The zipped file I have came directly from SafeGraph and are the 2019-06-core_poi_patterns_part1. There are 10 zip files for June alone.

read_many_patterns breaks aggregation when missing values for distance_from_home

read_many_patterns appears to be having a problem handling missing values for distance_from_home when aggregating by county FIPS code.

For example, when I make a call to read_many_patterns with the below code to read weekly patterns for a single state, every other variable reads without issue, but the entire column of distance_from_home is filled with NA values.

Not every POI is missing data for distance_from_home, so this is not the expected behavior for this function. Is there any way around this?

patterns <- read_many_patterns("patterns_dir",
recursive = TRUE,
naics_link = poi_link,
by = c('state_fips', 'county_fips'),
filter = 'state_fips == 34')

cbg_pop issue with poi_cbg codes?

Hi,
I re-ran some old code today, and I'm wondering if the cbg_pop file has changed: the poi_cbg codes are appearing as ultra-small values (i.e. 4.960370e-314) rather than 12-digit numbers.

Any help greatly appreciated.

Error: read_distancing() fails to detect files

I am attempting to read a collection of v2 social distancing files for 2020 in the directory expected by read_distancing() however, the function appears to be broken.

I have all social distancing patterns in the layout expected by this function within my current working directory, yet the function is unable to detect them and fails, as shown from this reprex:

library(SafeGraphR)
library(tidyverse)

# Start with all social distancing for 2020
setwd("Y:/Gavin/social-distancing/social-distancing/v2")

distancing <- read_distancing(
  start = lubridate::ymd('2020-01-01'),
  end = lubridate::ymd('2020-03-10')
)
#> Running read_distancing with default select and by - this will select only the device count variables, and aggregate to the county level. Change the select and by options if you don't want this. This message will be displayed only once per session.
#> [1] ".2020/01/01/"
#> Error in data.table::fread(file = target, select = select, ...): File '.2020/01/01/' does not exist or is non-readable. getwd()=='Y:/Gavin/social-distancing/social-distancing/v2'

^{Created on 2021-04-20 by the reprex package (v2.0.0)}

initial_rowno in expand_cat_json() is wrong if by=F, JSON empty, and na.rm = T

Hi,

Not sure if this is a bug or intended behavior. The JSON in the second row of the input datatable is empty. If I expand the JSON with by = F and na.rm = T, the initial_rowno variable for rows 3 and 4 of the output is 2, when it should be 3. If I set na.rm = F, it becomes 3.

Obviously this issue can be avoided entirely by setting na.rm = F. Maybe it should be obvious to me why this behavior occurs, but it confused me so I thought I'd bring it up.

Thanks for all your work on this package, by the way!

patterns <- data.table::data.table(state_fips = c(1,2,3),
                                   cat_origin = c('{"a": "2", "b": "3"}',
                                                  '{}',
                                                  '{"a": "4", "b": "5"}'))
> patterns
   state_fips           cat_origin
1:          1 {"a": "2", "b": "3"}
2:          2                   {}
3:          3 {"a": "4", "b": "5"}
> 
expand_cat_json(
  patterns,
  'cat_origin',
  'index',
  by = F,
  na.rm = T
)
   initial_rowno cat_origin index
1:             1          2     a
2:             1          3     b
3:             2          4     a
4:             2          5     b
expand_cat_json(
  patterns,
  'cat_origin',
  'index',
  by = F,
  na.rm = F
)
   initial_rowno cat_origin index
1:             1          2     a
2:             1          3     b
3:             3          4     a
4:             3          5     b

Feature request: expand open hours

Per our conversation on Slack, it would be great if this package could process the open_hours field from SafeGraph (see here for spec). I had hoped to write up a PR but, having compared my amateurish attempt to the existing codebase, maybe it's better if I just supply the code I put together here and you decide how to proceed.

library(data.table)
library(SafeGraphR)
library(fst)
library(magrittr)

# Load Core POI data ----
core_poi <- read_many_csvs(dir = "/data1/safegraph/core_poi/2020/11/06/11/")

# Limit to POI that give open hours
open_hours_only <- core_poi[open_hours != ""]

convert_hour_str <- function(time_str, midnight_is_zero = TRUE) {
    # Convert an %H:%M time string to numeric, e.g., "08:15" -> 8.25
    time_POSIX <- as.POSIXlt(time_str, format = "%H:%M")
    result <- hour(time_POSIX) + minute(time_POSIX) / 60
    if (!midnight_is_zero) {
        result[result == 0] <- 24
    }
    return(result)
}

convert_JSON_hours <- function(hours_clean) {
    # Convert a JSON string listing hours open and closed into a data.table
    # hours_clean <- unique_hours$open_hours_clean[96] # DEBUG
    
    hour_list <- jsonlite::fromJSON(hours_clean) # This takes a long, long time.
    
    # Keep only non-empty
    hour_list <- hour_list[lapply(hour_list,length)>0]
    
    hour_dt <- rbindlist(lapply(hour_list, as.data.table), idcol = "dow")
    setnames(hour_dt, c("V1", "V2"), c("open", "close"))
    hour_dt[, `:=`(open = convert_hour_str(open),
                   close = convert_hour_str(close, midnight_is_zero = F))]
    hour_dt
}

expand_hours <- function(dt) {
    # dt <- open_hours_only[1:10000] # DEBUG

    # To save on parsing time, get unique values of open_hours
    unique_hours <- dt[, .N, by = open_hours] %>% .[, N := NULL]
    
    # Remove extra escaped quotes
    unique_hours[, open_hours_clean := stringr::str_replace_all(open_hours, '\\"\\"','\\"')]
    
    # Get a data.table where each obs is row-by-dow-open/close interval
    unique_hours_dt <- unique_hours[, convert_JSON_hours(open_hours_clean), by = open_hours]
    
    # Merge (M:M) back to original dataset
    dt_final <- merge(dt[, .(placekey, open_hours)], 
                      unique_hours_dt, 
                      by = "open_hours", 
                      allow.cartesian = T)
    
    dt_final <- dt_final[, .(placekey, dow, open, close)]
    dt_final
}

expanded_hours <- expand_hours(open_hours_only[sample(.N, 100)])

What version of R does SafeGraphR work with?

I've tried all the packages by to 3.6, and none work with SafeGraphR. ?

safegraphinc / safegraphr Goto Github PK

safegraphr's People

Contributors

Stargazers

Watchers

Forkers

safegraphr's Issues

Link to your package down website

Turn on GitHub Pages

Install error

Number of files (0) does not match number of start_dates (1) to go along with them

Read_many_patterns ERROR

read_many_patterns breaks aggregation when missing values for distance_from_home

cbg_pop issue with poi_cbg codes?

Error: read_distancing() fails to detect files

initial_rowno in expand_cat_json() is wrong if by=F, JSON empty, and na.rm = T

Feature request: expand open hours

What version of R does SafeGraphR work with?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent