The
googleway
package provides some excellent and highly versatile methods for
querying and analyzing data from the Google Maps APIs.
tidygoogleway
builds on the functionality in googleway with a single
purpose - to provide a tidy interface to the Google Places
API. The
methods in this package assume that you are starting with a
dataframe/tibble of location data that you wish to enrich with data from
Google Places.
You can install tidygoogleway from Github using the following command:
# You must have devtools installed first
devtools::install_github("joshmuncke/tidygoogleway")
To use this package you’ll need a Google Places API
key. You
can save this key to your environment variables using
googleway::set_key
and it will be automatically picked up by
tidygoogleway
.
googleway::set_key("<YOUR API KEY>")
The add_google_places
function expects a dataframe with (at the
minimum) a field containing the name and address of the locations you
wish to add Google Places data to. It will return a dataframe with the
relevant Places data appended (i.e. it’s pipe-able).
Often a Google Places search will return multiple results. In this
instance add_google_places
function will perform a string similarity
comparison on the location name and address between the values you
provide and the values returned from Google. If you supply latitude and
longitude fields then add_google_places
will factor a geographic
distance into this calculation too.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(magrittr)
library(furrr)
#> Loading required package: future
library(purrr)
#>
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#>
#> set_names
library(tidygoogleway)
# The macdonalds dataframe contains the name and address of 11 McDonalds locations in Los Angeles
mcdonalds %>% head(5)
#> # A tibble: 5 x 2
#> name address
#> <chr> <chr>
#> 1 McDonalds 2809 N Lincoln Blvd Santa Monica, CA 90405
#> 2 McDonalds 4680 Lincoln Blvd, Los Angeles, CA 90292
#> 3 McDonalds 2457 Lincoln Blvd, Venice, CA 90291
#> 4 McDonalds 1540 2nd Ave, Santa Monica, CA 90405
#> 5 McDonalds 2902 West Pico Blvd, Santa Monica, CA 90405
# Now add Google Places data to our dataframe
enriched <- mcdonalds %>% add_google_places(name, address, radar = F)
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
#> The radar argument is now deprecated
enriched %>% select(name, address, google_place_id, google_rating)
#> # A tibble: 11 x 4
#> name address google_place_id google_rating
#> <chr> <chr> <chr> <dbl>
#> 1 McDonal… 2809 N Lincoln Blvd Santa … ChIJG1-i-tm6woARShGW… 3.6
#> 2 McDonal… 4680 Lincoln Blvd, Los Ang… ChIJsWoNelHBwoARe6bL… 3.6
#> 3 McDonal… 2457 Lincoln Blvd, Venice,… ChIJo_SkgY26woARR0FJ… 3.5
#> 4 McDonal… 1540 2nd Ave, Santa Monica… ChIJIzmJms-kwoARsrO3… 3.6
#> 5 McDonal… 2902 West Pico Blvd, Santa… ChIJlaqRZhe7woARwM2i… 3.5
#> 6 McDonal… 2712 Santa Monica Blvd, Sa… ChIJEc3LTka7woARUaiX… 3.5
#> 7 McDonal… 11300 National Blvd, Los A… ChIJM7ZVgq67woARCeiE… 3.7
#> 8 McDonal… 10623 Venice Blvd, Los Ang… ChIJZ295STC6woARhE1O… 3.5
#> 9 McDonal… 3571 Rosecrans Ave, Hawtho… ChIJL_usBci1woARxjKJ… 3.8
#> 10 McDonal… 15810 Crenshaw Blvd, Garde… ChIJgzc3DaG1woARezwC… 3.9
#> 11 McDonal… 101 W Manchester Ave, Los … ChIJMb-tCr_JwoARwjL-… 3.6
By default, only the best matching location will be returned (so the
number of rows in will be the same as the number of rows out). If you
wish to override this behaviour and return multiple results use
.keep_all = T
.
Note that if you use the default .keep_all = T
you may end up with
more rows than you started with. These can be filtered using the
mean_distance
column (geometric mean of geo-distance and string
distance) or google_result_number
(ordering of results from Google
Places API).
Often for these kinds of use cases you are iterating over a large number
of locations. To speed this process up (and provide progress visibility)
add_google_places
utilizes the furrr
library.
N.B. In order to make use of the parallel processing capabilities you
must set plan(multiprocess)
prior to running the add_google_places
command. This syntax should work on Windows and Mac.