Giter Club home page Giter Club logo

safegraphr's People

Contributors

felixsafegraph avatar nickch-k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

safegraphr's Issues

Turn on GitHub Pages

Hi there @felixsafegraph ! Not sure whether it's better to reach out here or on Slack. But the SafeGraphR package is ready for beta, and there's a docsite in here now. Could you turn on GitHub pages for the SafeGraphR github (set to docs/)? I don't have access. Thank you!

Number of files (0) does not match number of start_dates (1) to go along with them

Hello -

I am running the following code on data downloaded directly from the SafeGraph Shop:

pitt.data <- read_shop(
  filename = "safe-graph-data.zip",
  keeplist = c("patterns", "home_panel_summary.csv"),
  by = "placekey",
  expand_int = "visitors_by_day",
  name = "visits",
  start_date = lubridate::ymd("2018-01-01"))

However, doing so results in the following error:

Error in read_many_patterns(filelist = patfiles, dir = exdir, recursive = FALSE,  : 
  Number of files (0) does not match number of start_dates (1) to go along with them.

My desired result is having the visitors by day calculated for each placekey by day (2018-01-01, 2018-01-02, etc., etc.).

Looking forward to a response.

Read_many_patterns ERROR

So I am trying to use the Read_Many_Patterns function, however, I keep getting this ERROR:
Attempted to find start_date from filename but failed. The zipped file I have came directly from SafeGraph and are the 2019-06-core_poi_patterns_part1. There are 10 zip files for June alone.

read_many_patterns breaks aggregation when missing values for distance_from_home

read_many_patterns appears to be having a problem handling missing values for distance_from_home when aggregating by county FIPS code.

For example, when I make a call to read_many_patterns with the below code to read weekly patterns for a single state, every other variable reads without issue, but the entire column of distance_from_home is filled with NA values.

Not every POI is missing data for distance_from_home, so this is not the expected behavior for this function. Is there any way around this?

patterns <- read_many_patterns("patterns_dir",
recursive = TRUE,
naics_link = poi_link,
by = c('state_fips', 'county_fips'),
filter = 'state_fips == 34')

cbg_pop issue with poi_cbg codes?

Hi,
I re-ran some old code today, and I'm wondering if the cbg_pop file has changed: the poi_cbg codes are appearing as ultra-small values (i.e. 4.960370e-314) rather than 12-digit numbers.

Any help greatly appreciated.

Error: read_distancing() fails to detect files

I am attempting to read a collection of v2 social distancing files for 2020 in the directory expected by read_distancing() however, the function appears to be broken.

I have all social distancing patterns in the layout expected by this function within my current working directory, yet the function is unable to detect them and fails, as shown from this reprex:

library(SafeGraphR)
library(tidyverse)

# Start with all social distancing for 2020
setwd("Y:/Gavin/social-distancing/social-distancing/v2")

distancing <- read_distancing(
  start = lubridate::ymd('2020-01-01'),
  end = lubridate::ymd('2020-03-10')
)
#> Running read_distancing with default select and by - this will select only the device count variables, and aggregate to the county level. Change the select and by options if you don't want this. This message will be displayed only once per session.
#> [1] ".2020/01/01/"
#> Error in data.table::fread(file = target, select = select, ...): File '.2020/01/01/' does not exist or is non-readable. getwd()=='Y:/Gavin/social-distancing/social-distancing/v2'

Created on 2021-04-20 by the reprex package (v2.0.0)

initial_rowno in expand_cat_json() is wrong if by=F, JSON empty, and na.rm = T

Hi,

Not sure if this is a bug or intended behavior. The JSON in the second row of the input datatable is empty. If I expand the JSON with by = F and na.rm = T, the initial_rowno variable for rows 3 and 4 of the output is 2, when it should be 3. If I set na.rm = F, it becomes 3.

Obviously this issue can be avoided entirely by setting na.rm = F. Maybe it should be obvious to me why this behavior occurs, but it confused me so I thought I'd bring it up.

Thanks for all your work on this package, by the way!

patterns <- data.table::data.table(state_fips = c(1,2,3),
                                   cat_origin = c('{"a": "2", "b": "3"}',
                                                  '{}',
                                                  '{"a": "4", "b": "5"}'))
> patterns
   state_fips           cat_origin
1:          1 {"a": "2", "b": "3"}
2:          2                   {}
3:          3 {"a": "4", "b": "5"}
> 
expand_cat_json(
  patterns,
  'cat_origin',
  'index',
  by = F,
  na.rm = T
)
   initial_rowno cat_origin index
1:             1          2     a
2:             1          3     b
3:             2          4     a
4:             2          5     b
expand_cat_json(
  patterns,
  'cat_origin',
  'index',
  by = F,
  na.rm = F
)
   initial_rowno cat_origin index
1:             1          2     a
2:             1          3     b
3:             3          4     a
4:             3          5     b

Feature request: expand open hours

Per our conversation on Slack, it would be great if this package could process the open_hours field from SafeGraph (see here for spec). I had hoped to write up a PR but, having compared my amateurish attempt to the existing codebase, maybe it's better if I just supply the code I put together here and you decide how to proceed.

library(data.table)
library(SafeGraphR)
library(fst)
library(magrittr)

# Load Core POI data ----
core_poi <- read_many_csvs(dir = "/data1/safegraph/core_poi/2020/11/06/11/")

# Limit to POI that give open hours
open_hours_only <- core_poi[open_hours != ""]

convert_hour_str <- function(time_str, midnight_is_zero = TRUE) {
    # Convert an %H:%M time string to numeric, e.g., "08:15" -> 8.25
    time_POSIX <- as.POSIXlt(time_str, format = "%H:%M")
    result <- hour(time_POSIX) + minute(time_POSIX) / 60
    if (!midnight_is_zero) {
        result[result == 0] <- 24
    }
    return(result)
}

convert_JSON_hours <- function(hours_clean) {
    # Convert a JSON string listing hours open and closed into a data.table
    # hours_clean <- unique_hours$open_hours_clean[96] # DEBUG
    
    hour_list <- jsonlite::fromJSON(hours_clean) # This takes a long, long time.
    
    # Keep only non-empty
    hour_list <- hour_list[lapply(hour_list,length)>0]
    
    hour_dt <- rbindlist(lapply(hour_list, as.data.table), idcol = "dow")
    setnames(hour_dt, c("V1", "V2"), c("open", "close"))
    hour_dt[, `:=`(open = convert_hour_str(open),
                   close = convert_hour_str(close, midnight_is_zero = F))]
    hour_dt
}

expand_hours <- function(dt) {
    # dt <- open_hours_only[1:10000] # DEBUG

    # To save on parsing time, get unique values of open_hours
    unique_hours <- dt[, .N, by = open_hours] %>% .[, N := NULL]
    
    # Remove extra escaped quotes
    unique_hours[, open_hours_clean := stringr::str_replace_all(open_hours, '\\"\\"','\\"')]
    
    # Get a data.table where each obs is row-by-dow-open/close interval
    unique_hours_dt <- unique_hours[, convert_JSON_hours(open_hours_clean), by = open_hours]
    
    # Merge (M:M) back to original dataset
    dt_final <- merge(dt[, .(placekey, open_hours)], 
                      unique_hours_dt, 
                      by = "open_hours", 
                      allow.cartesian = T)
    
    dt_final <- dt_final[, .(placekey, dow, open, close)]
    dt_final
}

expanded_hours <- expand_hours(open_hours_only[sample(.N, 100)])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.