Giter Club home page Giter Club logo

tidygeocoder's Introduction

Hi there 👋

  • 🔭 I'm a data scientist at Booz Allen where I work on NLP applications in the healthcare space.
  • 🌎 I maintain the R package tidygeocoder.
  • 📷 I also enjoy photography and use darktable for photo editing. You can find some of my pictures on flickr and instagram.

tidygeocoder's People

Contributors

cambonj avatar chris31415926535 avatar dieghernan avatar dpprdan avatar jessecambon avatar ottothecow avatar twesleyb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tidygeocoder's Issues

Is this compatible with R on Mac OS X Mojave?

I just updated to Mojave, and I'm running RStudio Version 1.3.929. I've been able to install, load, and use other packages -- but both install.packages("tidygeocoder") and devtools::install_github("jessecambon/tidygeocoder") produce the following errors:

`> install.packages("tidygeocoder")
also installing the dependencies ‘lwgeom’, ‘rgeos’, ‘tmaptools’

Warning in install.packages :
  unable to access index for repository https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6:
  cannot open URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/PACKAGES'
Packages which are only available in source form, and may need compilation of
  C/C++/Fortran: ‘lwgeom’ ‘rgeos’
Do you want to attempt to install these from sources? (Yes/no/cancel) Yes
installing the source packages ‘lwgeom’, ‘rgeos’, ‘tmaptools’, ‘tidygeocoder’

trying URL 'https://cran.rstudio.com/src/contrib/lwgeom_0.2-1.tar.gz'
Content type 'application/x-gzip' length 498434 bytes (486 KB)
==================================================
downloaded 486 KB

trying URL 'https://cran.rstudio.com/src/contrib/rgeos_0.5-2.tar.gz'
Content type 'application/x-gzip' length 258710 bytes (252 KB)
==================================================
downloaded 252 KB

trying URL 'https://cran.rstudio.com/src/contrib/tmaptools_2.0-2.tar.gz'
Content type 'application/x-gzip' length 86373 bytes (84 KB)
==================================================
downloaded 84 KB

trying URL 'https://cran.rstudio.com/src/contrib/tidygeocoder_0.2.5.tar.gz'
Content type 'application/x-gzip' length 43627 bytes (42 KB)
==================================================
downloaded 42 KB

* installing *source* package ‘lwgeom’ ...
** package ‘lwgeom’ successfully unpacked and MD5 sums checked
** using staged installation
configure: CC: clang
configure: CXX: clang++
./configure: line 2073: pkg-config: command not found
checking for gcc... clang
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether clang accepts -g... yes
checking for clang option to accept ISO C89... none needed
checking how to run the C preprocessor... clang -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... rm: conftest.dSYM: is a directory
rm: conftest.dSYM: is a directory
yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking proj.h usability... no
checking proj.h presence... no
checking for proj.h... no
checking proj_api.h usability... no
checking proj_api.h presence... no
checking for proj_api.h... no
configure: error: neither proj.h nor proj_api.h were found.
ERROR: configuration failed for package ‘lwgeom’
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/lwgeom’
Warning in install.packages :
  installation of package ‘lwgeom’ had non-zero exit status
* installing *source* package ‘rgeos’ ...
** package ‘rgeos’ successfully unpacked and MD5 sums checked
** using staged installation
configure: CC: clang
configure: CXX: clang++
configure: rgeos: 0.5-2
checking for /usr/bin/svnversion... yes
configure: svn revision: 621
checking for geos-config... no
no
configure: error: geos-config not found or not executable.
ERROR: configuration failed for package ‘rgeos’
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgeos’
Warning in install.packages :
  installation of package ‘rgeos’ had non-zero exit status
ERROR: dependencies ‘lwgeom’, ‘rgeos’ are not available for package ‘tmaptools’
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/tmaptools’
Warning in install.packages :
  installation of package ‘tmaptools’ had non-zero exit status
ERROR: dependency ‘tmaptools’ is not available for package ‘tidygeocoder’
* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/tidygeocoder’
Warning in install.packages :
  installation of package ‘tidygeocoder’ had non-zero exit status

The downloaded source packages are in
	‘/private/var/folders/bs/rh_s78rx4p5dg87qc83cdsc80000gn/T/RtmpdPdzzl/downloaded_packages’`

and

`> devtools::install_github("jessecambon/tidygeocoder")
Downloading GitHub repo jessecambon/tidygeocoder@master
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?

1: All                          
2: CRAN packages only           
3: None                         
4: dplyr (0.8.3 -> 0.8.5) [CRAN]
5: units (0.6-5 -> 0.6-6) [CRAN]
6: sf    (0.8-0 -> 0.9-0) [CRAN]

Enter one or more numbers, or an empty line to skip updates:
1
dplyr     (0.8.3 -> 0.8.5) [CRAN]
tmaptools (NA    -> 2.0-2) [CRAN]
units     (0.6-5 -> 0.6-6) [CRAN]
sf        (0.8-0 -> 0.9-0) [CRAN]
lwgeom    (NA    -> 0.2-1) [CRAN]
rgeos     (NA    -> 0.5-2) [CRAN]
Installing 6 packages: dplyr, tmaptools, units, sf, lwgeom, rgeos
Error: Failed to install 'tidygeocoder' from GitHub:
  (converted from warning) unable to access index for repository https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6:
  cannot open URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/PACKAGES'`

It looks like it's attempting to reach cran / el-capitan -- an older version of Mac OS X, not Mojave. Is that my mistake, do I need to change something? All other packages are working, and I've tried changing my repo manually to no avail. Thank you!

Add TomTom support

Following #62 , I have implemented TomTom:

See the checklist below:

Data

  • data-raw/api_reference.R updated
  • Modify R/data.R

Sandbox

  • Added sandbox/query_debugging/tomtom_test.R
  • Added sandbox/reverse_queries/tomtom_reverse.R
  • Sandbox for TomTom batch added
  • Sandbox for TomTom batch reverse added

Code

  • R/utils.R - Utils added for TomTom
  • R/query_factory.R updated
  • R/geo_methods.R updated
  • R/geo.R updated
  • R/reverse_geo.R updated

Batch

  • R/batch_geocoding.R updated
  • R/reverse_batch_geocoding.R updated

Testing

Misc

  • Update README
  • Fix on R/reverse_geo.R to avoid errors on no_query = TRUE

I am polishing the PR with a final huge batch geocoding test, but overall my experience on this is that TomTom runs smoothly and fast, I think it could be a great addition to the package

geocode with method census returns NA if match_indicator = "Tie"

This is the current code that I am running.

          city = res_city_desc,
          state = state_cd,
          postalcode = zip_code,
          method = 'census', 
          full_results = TRUE, 
          return_type = 'geographies',
          unique_only= FALSE,
          flatten = FALSE)```

The issue is that if `match_indicator = "Tie"` (so there seems to be a non-unique match) the returned result nevertheless contains exclusively missing values (`NA`) for all geolocation information (just as if `match_indicator = "No_Match"`.

Error: All nested columns must have the same number of elements.

Thank you for this package. It looks super easy and very cool.

Data

My data set has following address column:

df <- sdata %>% 
  mutate(address = paste0(city, ", ", province_state, ", ", country)) 

> df %>% select(address)
# A tibble: 23 x 1
   address                          
   <chr>                            
 1 Ahmedabad, Gujarat, India        
 2 Bengaluru, Karnataka, India      
 3 Mumbai, Maharashtra, India       
 4 Ashoknagar, Madhya pradesh, India
 5 Ahmedabad, Gujarat, India        
 6 Vijayawada, Andhra Pradesh, India
 7 New Delhi, Delhi, India          
 8 Hyderabad, Telangana, India      
 9 Khambhat, Gujarat, India         
10 Amritsar, punjab, India          
# … with 13 more rows

Code

I am using the following code to do the geocoding:
df <- df %>% geocode(address, method = "osm")

Error:

No results found for "Khajoori khas, New Delhi, India".Error: All nested columns must have the same number of elements.

  1. Is there a way to skip the result where a warning is generated?
  2. Regardless of the above warning, what does this error mean?

Installation failed due to units/udunits2 dependency

Hi Jesse,
I'm excited to try your package, but I'm having some trouble.
I don't think this is an issue with your package, but with its dependency udunits2.
My installation of tidygeocoder always fails:
install.packages('tidyrgeocoder') produces the error:

ERROR: configuration failed for package ‘udunits2’.

So, I need udunits2. The error message prompts:

If the udunits2 library is installed in a non-standard location
use --configure-args='--with-udunits2-lib=/usr/local/lib'

I made sure that I have udunits2 installed, which udunits2:

/usr/bin/udunits2`

So, I tried installing again, specifying my udunits2 location:

install.packages("units",configure.args="--with-udunits2-lib=/usr/bin/udunits2")

Forgive me, I know this is not a problem with your package.

Any help is appreciated.

-Tyler

Installing package

I try to install tidygeocoder into R

install.packages("tidygeocoder",dependencies = TRUE)
but received;
ERROR: this R is version 3.4.3, package 'tidygeocoder' requires R >= 3.20

Which appears it is requesting R version 3.twenty which does not exist. I have tried install_gub and source install from github directly but the same error. Am I missing something obvious to resolve?

r package version

NA results for census

geo(street = "1600 Pennsylvania Ave NW", city = "Washington", state = "DC", method = "census", verbose = T)
Number of Unique Addresses: 1
Querying API URL: https://geocoding.geo.census.gov/geocoder/locations/address
Passing the following parameters to the API:
street : "1600 Pennsylvania Ave NW"
city : "Washington"
state : "DC"
format : "json"
benchmark : "Public_AR_Current"
vintage : "Current_Current"

No results found
A tibble: 1 x 5
street city state lat long

1 1600 Pennsylvania Ave NW Washington DC NA NA

geo('1600 Pennsylvania Ave NW Washington, DC')

A tibble: 1 x 3
address lat long

1 1600 Pennsylvania Ave NW Washington, DC NA NA

Errors when geocoding

I'm getting the following output errors when trying to geocode

I'm running the follow line
tidygeocoder::geocode(addr, lat=latitude, long=longitude, method = "cascade")

Opening and ending tag mismatch: meta line 6 and head
Opening and ending tag mismatch: br line 32 and div
Opening and ending tag mismatch: div line 21 and header
Entity 'times' not defined
Opening and ending tag mismatch: input line 116 and div
Opening and ending tag mismatch: input line 120 and div
Opening and ending tag mismatch: input line 127 and div
Opening and ending tag mismatch: input line 137 and div
Opening and ending tag mismatch: input line 136 and div
Opening and ending tag mismatch: input line 144 and div
Opening and ending tag mismatch: div line 139 and form
Opening and ending tag mismatch: input line 135 and div
AttValue: " or ' expected
attributes construct error
Couldn't find end of Start Tag div line 155
EntityRef: expecting ';'
EntityRef: expecting ';'
Opening and ending tag mismatch: input line 134 and div
Entity 'copy' not defined
Opening and ending tag mismatch: br line 214 and p
Opening and ending tag mismatch: p line 213 and footer
Opening and ending tag mismatch: footer line 209 and body
Opening and ending tag mismatch: input line 133 and html
Premature end of data in tag input line 132
Premature end of data in tag div line 131
Premature end of data in tag div line 129
Premature end of data in tag div line 125
Premature end of data in tag form line 124
Premature end of data in tag div line 119
Premature end of data in tag div line 115
Premature end of data in tag div line 111
Premature end of data in tag header line 20
Premature end of data in tag body line 18
Premature end of data in tag head line 3
Premature end of data in tag html line 2
Error: 1: Opening and ending tag mismatch: meta line 6 and head
2: Opening and ending tag mismatch: br line 32 and div
3: Opening and ending tag mismatch: div line 21 and header
4: Entity 'times' not defined
5: Opening and ending tag mismatch: input line 116 and div
6: Opening and ending tag mismatch: input line 120 and div
7: Opening and ending tag mismatch: input line 127 and div
8: Opening and ending tag mismatch: input line 137 and div
9: Opening and ending tag mismatch: input line 136 and div
10: Opening and ending tag mismatch: input line 144 and div
11: Opening and ending tag mismatch: div line 139 and form
12: Opening and ending tag mismatch: input line 135 and div
13: AttValue: " or ' expected
14: attributes construct error
15: Couldn't find end of Start Tag div line 155
16: EntityRef: expecting ';'
17: EntityRef: expecting ';'
18: Opening and ending tag mismatch: input line 134 and div
19: Entity 'copy' not defined
20: Opening and ending tag mismatch: br line 214 and p
21: O

Issue with custom_query and geocodio

First, thank you for the excellent package.

I am running into an issue where custom_query won't pull the requested fields from the geocodio API.

Reproducible example requesting state legislative districts or Census tracts via the field keys stateleg and tract_code:


library(tidygeocoder)
library(tidyverse)

Sys.setenv(GEOCODIO_API_KEY = "INSERT YOUR KEY")

some_addresses <- tribble(
                       ~name,                  ~addr,
                       "White House",          "1600 Pennsylvania Ave, Washington, DC",
                       "Transamerica Pyramid", "600 Montgomery St, San Francisco, CA 94111",     
                       "Willis Tower",         "233 S Wacker Dr, Chicago, IL 60606"                                  
                       )

test <- some_addresses %>% geocode(
                                addr, method = "geocodio", 
                                full_results = TRUE,
                                custom_query = list(fields = 'stateleg')
                                )

test2 <- some_addresses %>% geocode(
                                 addr, method = "geocodio", 
                                 full_results = TRUE,
                                 custom_query = list(fields = 'tract_code')
                                 )

My expectation was that this would give me the information contained within the stateleg and tract_code keys from geocodio. This is the link to the relevent API documentaiton: https://www.geocod.io/docs/?shell#fields.

I modeled this off the example in the tidygeocoder documentation for getting different vintages of census data (i.e., list(vintage = 'Current_Census2010').

unique_only causes error in vignette

Error in tidygeocoder vignette:

duplicate_addrs %>%
  geocode(singlelineaddress, unique_only = TRUE)
Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 7, 2
14.
stop(gettextf("arguments imply differing number of rows: %s", paste(unique(nrows), collapse = ", ")), domain = NA)
13.
data.frame(..., check.names = FALSE)
12.
cbind(deparse.level, ...)
11.
cbind(.tbl, results) at geocode.R#78
10.
tibble::as_tibble(cbind(.tbl, results)) at geocode.R#78
9.
geocode(., singlelineaddress, unique_only = TRUE)
8.
function_list[[k]](value)
7.
withVisible(function_list[[k]](value))
6.
freduce(value, `_function_list`)
5.
`_fseq`(`_lhs`)
4.
eval(quote(`_fseq`(`_lhs`)), env, env)
3.
eval(quote(`_fseq`(`_lhs`)), env, env)
2.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
1.
duplicate_addrs %>% geocode(singlelineaddress, unique_only = TRUE)

geocode() misaligns results when limit > 1

Since results and addresses are combined with cbind(), limit > 1 can cause addresses to be misaligned to results when using geocode().

In this example, address is from the input dataset whiledisplay_name is from the geocoder results:

library(tidygeocoder)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tibble(address = c('Tokyo', 'Paris', 'Rome')) %>%
  geocode(address, method = 'osm', limit = 3, full_results = TRUE) %>%
  select(address, display_name)
#>   address                                                        display_name
#> 1   Tokyo                                                        東京都, 日本
#> 2   Paris 東京, 鍛冶橋通り, 丸の内2, 丸の内, 千代田区, 東京都, 100-0005, 日本
#> 3    Rome             東京, 丸の内1, 丸の内, 千代田区, 東京都, 100-0005, 日本
#> 4   Tokyo                 Paris, Île-de-France, France métropolitaine, France
#> 5   Paris                 Paris, Île-de-France, France métropolitaine, France
#> 6    Rome                    Paris, Lamar County, Texas, 75460, United States
#> 7   Tokyo                                  Roma, Roma Capitale, Lazio, Italia
#> 8   Paris   Rome, City of Rome, Oneida County, New York, 13440, United States
#> 9    Rome                          Rome, Floyd County, Georgia, United States

Created on 2021-03-18 by the reprex package (v1.0.0)

Broaden Testing

Incorporate more functions into testing such as package_addresses, unpackage_addresses, and get_api_query.

Installing tidygeocoder - correct R version?

Do you have any suggestions on which version of R to use to install tidygeocoder? I've attempted installing it with R/3.5.1 and R/3.2.0, but got error messages both times saying the package was unavailable. Thank you!

CRAN Issue - example fails

Need to fix issue for CRAN where R/geocode.R example in R/ files fails to run with the examples function due to a missing library(dplyr) statement. Sample example commands are below.

example("geocode",package="tidygeocoder")
example("geo_census",run.donttest=TRUE)
example("geo_osm",run.dontrun=TRUE)

Add support for Mapbox service

Following #62 , I have implemented Mapbox rebased on reverse-geocoding branch.

See the checklist below:

Data

  • data-raw/api_reference.R updated
  • Modify R/data.R

Sandbox

  • Added sandbox/query_debugging/mapbox_test.R
  • Added sandbox/reverse_queries/mapbox_reverse.R

Code

  • R/utils.R - Utils added for Mapbox
  • R/query_factory.R updated
  • R/geo_methods.R updated
  • R/geo.R updated
  • R/reverse_geo.R updated

Testing

Some Notes not related with the PR itself:

#> * checking CRAN incoming feasibility ... NOTE
#> Version contains large components (1.0.2.9000)

And this:

Found the following (possibly) invalid URLs:
  URL: https://osm.org/copyright (moved to https://www.openstreetmap.org/copyright)
    From: README.md
    Status: 200
    Message: OK

I checked and that url is generated by the response of the reverse_geocode on the README (see table on https://github.com/dieghernan/tidygeocoder/blob/master/README.md#usage), so not related with the package.

Misc

  • Update README

There are two additional topics about which I would like to gather your feedback,

Integrate Github Action

Github Actions can run R CMD CHECK on a package repo over different platforms and R versions. I use this on my packages, you can see more on https://github.com/r-lib/actions/blob/master/examples/README.md

I included my version while developing. This Github Action would run regularly once a week and after every commit/PR on branchs master or main,, checks are performed on devel, release, oldrel on Linux, macOS and Windows (as CRAN):
https://github.com/dieghernan/tidygeocoder/actions/runs/620783144

Before removing it I just wanted to check if you would like to integrate this on your repo.

About batch geocoding

This is possible on Mapbox but under the commercial endpoint mapbox.places-permanent

https://docs.mapbox.com/api/search/geocoding/#batch-geocoding

I do not have that kind of access, so I didn't include it. I asked Mapbox Team if there is any way I can get a limited dev account for checking, I would let you know.

What I can do if this is strictly required is to include the capability using a mock answer of the API as example (see here), but I won't be able to check it if I don't get a permanent API key (maybe someone with that kind of key could help, @maxachis¿?).

Regards

Parallelize geocoding

Hey @jessecambon,
Do you think it would be possible to parallelize the geocoding task?
Or I might just be impatient...

* Project '/mnt/d/projects/tidygeocoder' loaded. [renv 0.10.0]

Evaluating time needed to geocode 100 addresses...
Unit: seconds
                     expr      min       lq     mean   median       uq      max
 geocode(x100_rows, ADDR) 14.76898 14.96446 15.15467 15.15994 15.34751 15.53508
 neval
     3

Predicted time to encode 88,303 addresses: 3.717 hours.

Deduplicate addresses passed to geocode()

Currently, every address given to the geocode() function is passed to the geocoder services (unless the address is deemed to be invalid or blank).

If duplicate addresses are passed, the geocode() function should only pass unique addresses to the geocoder services. The geocoded coordinates for these unique addresses can then be attached back to the original dataset.

Extract errors from google geocoder

Extract error messages from the google geocoder so the user can see what the issue is (instead of just getting NA results). For instance, my billing was not enabled and the error can be seen in the results object:

addr <- 'Tokyo, Japan'
url_base  <- "https://maps.googleapis.com/maps/api/geocode/json"

library(httr)
library(jsonlite)

soup <- httr::GET(url = url_base, 
                  query = list(address = addr, 
                               key = tidygeocoder:::get_key("google")))

raw_results <- jsonlite::fromJSON(httr::content(soup, as = 'text', encoding = "UTF-8"))
$error_message
[1] "You must enable Billing on the Google Cloud Project at https://console.cloud.google.com/project/_/billing/enable Learn more at https://developers.google.com/maps/gmp-get-started"

$results
list()

$status
[1] "REQUEST_DENIED"

Add HERE support

Following #62 , I have implemented HERE:

See the checklist below:

Data

  • data-raw/api_reference.R updated
  • Modify R/data.R

Sandbox

  • Added sandbox/query_debugging/here_test.R
  • Added sandbox/reverse_queries/here_reverse.R
  • Sandbox for HERE batch added
  • Sandbox for HERE batch reverse added

Code

  • R/utils.R - Utils added for HERE
  • R/query_factory.R updated
  • R/geo_methods.R updated
  • R/geo.R updated
  • R/reverse_geo.R updated

Batch

  • R/batch_geocoding.R updated
  • R/reverse_batch_geocoding.R updated

Testing

Misc

  • Update README

About HERE batch geocoding service

Some notes on batch geocoding on HERE:

  • The API is asynchronous so the process is Send POST > wait until status is completed (can be checked via GET) > Retrieve via GET. This makes the batch geocoding code a bit longer.
  • When sending POST, the results returns a RequestID. This is used for the subsequent GET processes. The good point is that you can retrieve previous runs using that RequestID and skipping the POST. This has been implemented.
  • Jobs are deleted after 30 days link
  • Batch geocoding results are in a different format than Single results, and the fields on batch results needs to be selected via the POST request. I have selected what I consider the most relevant, but those can be modified by the user via a custom query.
  • There are other bits of the code that are discretional, as seconds between GET status calls, some warnings, etc. Some feedback would be appreciated.
  • HERE Batch Geocoding is really slow. It can be useful for very large requests (limit of 1,000,000 plus the possibility of retrieving previous jobs as mentioned before), but I decided to force the batch mode only if the user requests it. Note that this is not the default behaviour for the rest of batch geocoder (batch by default if n>1 and mode is not single) but from an user perspective it can be frustating to wait 1 minute for geocoding two addresses.

Glad to know your feedback, specially on the batch geocoding implementation.

Regards

Add sleep argument

Add a sleep argument for the OSM geocoder to avoid usage limit issues.

Geocoding results from Google may contain more than 1 results

I believe this is a problem with Google geocoding API instead of this package. The results from google geocoding API sometimes strangely return more than 1 results in the response.

library(tidygeocoder)

stores = data.frame(
  STORE_ID = c(1,2),
  address = c(
    "Part G/F & Basement, 45-53 Austin Road, Kowloon",
    "Shop A & B, GF Coble Court, 127 – 139 Ap Lei Chau Main Street, Aberdeen, HK")
)

tidygeocoder::geocode(stores[2,], address = "address", method = "google", verbose = TRUE)
#> Number of Unique Addresses: 1
#> Querying API URL: https://maps.googleapis.com/maps/api/geocode/json
#> Passing the following parameters to the API:
#> address : "Shop A & B, GF Coble Court, 127 – 139 Ap Lei Chau Main Street, Aberdeen, HK"
#> key : "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
#> 
#> Query completed in: 0.2 seconds
#> 
#> Warning in data.frame(..., check.names = FALSE): row names were found from a
#> short variable and have been discarded
#> # A tibble: 2 x 4
#>   STORE_ID address                                                     lat  long
#>      <dbl> <chr>                                                     <dbl> <dbl>
#> 1        2 Shop A & B, GF Coble Court, 127 – 139 Ap Lei Chau Main S…  22.2  114.
#> 2        2 Shop A & B, GF Coble Court, 127 – 139 Ap Lei Chau Main S…  22.2  114.

When I pass the whole dataframe to the function, the number of rows of the results does not match with the original dataframe and thus throws an error.

tidygeocoder::geocode(stores, address = "address", method = "google", verbose = TRUE)
#> Number of Unique Addresses: 2
#> Executing single address geocoding...
#> Number of Unique Addresses: 1
#> Querying API URL: https://maps.googleapis.com/maps/api/geocode/json
#> Passing the following parameters to the API:
#> address : "Part G/F & Basement, 45-53 Austin Road, Kowloon"
#> key : "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
#> 
#> Query completed in: 0.1 seconds
#> 
#> Number of Unique Addresses: 1
#> Querying API URL: https://maps.googleapis.com/maps/api/geocode/json
#> Passing the following parameters to the API:
#> address : "Shop A & B, GF Coble Court, 127 – 139 Ap Lei Chau Main Street, Aberdeen, HK"
#> key : "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
#> 
#> Query completed in: 0.1 seconds
#> 
#> Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 2, 3

Created on 2021-03-15 by the reprex package (v0.3.0)

forward geocoding is stopped when LocationIQ API respond HTTP Error

Hello,

when an address is badly encoded, LocationIQ respond an 404 Response Code.

Not Found (HTTP 404).Error in if (nrow(lat_lng) == 0 | ncol(lat_lng) != 2) return(NA_result) : argument is of length zero

The problem is that it stop the geocode() process and further entries are not geocoded.
When you have an table of 1000 entries, that's annoying.
It could be great if the function bypass the bad entry and continue the gecoding.

David

Error due to type conflict in new version of tibble

hi @jessecambon — thanks for this awesome package. It solves a huge problem in my workflow!

I've encountered a hiccup that appears to be due to tibble. The following code ran fine before I updated tibble, but broke afterward:

geo(
  street = "9500 Gilman Dr.,",
  city = "San Diego",
  state = "CA",
  postalcode = "92161",
  method = "cascade"
)

The error reads:

Error: Assigned data `retry_results` must be compatible with existing data.
ℹ Error occurred for column `lat`.
✖ Can't convert from <double> to <logical> due to loss of precision.
* Locations: 1.

The error seems to be coming from here:

combi[na_indices,] <- retry_results

Tibble is not happy that the variable types are being overwritten.

If instead you change line 70 to something more explicit like combi[,c("lat","long")] <- retry_results[,c("lat","long")] everything runs just fine.

Hope this helps, thanks again for the great package!

Pad fips with 0's for `geocode()`

Fantastic package! One thing for you to consider is to pad the fips from geocode(..., method = 'census', full_results = TRUE, return_type = "geographies") with 0's on the left to be consistent with the number of digits expected (outlined by the census: https://www.census.gov/programs-surveys/geography/guidance/geo-identifiers.html#par_textimage_8). Of course, you'll also have to force those columns to be character because of the leading zeros. For example, changing this:

library(dplyr)
library(tidygeocoder)

address_single <- tibble(singlelineaddress = c('11 Wall St, NY, NY', 
                                               '600 Peachtree Street NE, Atlanta, Georgia'))
address_components <- tribble(
  ~street                      , ~cty,               ~st,
  '11 Wall St',                  'NY',               'NY',
  '600 Peachtree Street NE',     'Atlanta',          'GA'
)

address_single %>% geocode(address = singlelineaddress, method = 'census',
                           full_results = TRUE, return_type = "geographies") %>%
  select(matches("(fips|tract|block)$"))
# A tibble: 2 x 4
  state_fips county_fips census_tract census_block
       <int>       <int>        <int>        <int>
1         36          61          700         1008
2         13         121         1900         2003

To this:

address_single %>% geocode(address = singlelineaddress, method = 'census',
                           full_results = TRUE, return_type = "geographies") %>%
  transmute(state_fips = stringr::str_pad(state_fips, 2, pad = "0"),
            county_fips = stringr::str_pad(county_fips, 3, pad = "0"),
            census_tract = stringr::str_pad(census_tract, 6, pad = "0"),
            census_block = stringr::str_pad(census_block, 4, pad = "0"))
# A tibble: 2 x 4
  state_fips county_fips census_tract census_block
  <chr>      <chr>       <chr>        <chr>       
1 36         061         000700       1008        
2 13         121         001900       2003 

If it would help I can submit a PR, I just need to dig into the code more to figure out if you can retain the 0's upstream or if you'll need to do it similar to how I did it (but avoiding the stringr import).

Warn User If They Pass Improper/Unused Parameters

Warn the user if they pass an improper parameter (when possible). For example, the census geocoder does not have a limit or country argument and these parameters would have no effect on the query.

Geocodio example

I looked but did not find an example of how to set up tidygeocoder to use geocodio. Is there a code sample that you could point me to?

Thanks

Support for Google Maps?

Are there plans in the works to add support for Google Maps? The ggmap package could provide a model for those wrappers.

Error when method = 'google' and flatten = FALSE

sample1 <- tibble(address = c('11 Wall St New York, NY', NA, '',
    '1600 Pennsylvania Ave NW Washington, DC', '11 Wall St New York, NY', 
    'Toronto, Canada'))

geocode_google1_notflat <- sample1 %>% 
  geocode(address = address, method = 'google', full_results = TRUE, verbose = TRUE, flatten = FALSE)

Generates error:

Error in [<-.data.frame(*tmp*, value, value = NA) : new columns would leave holes after existing columns

from:

merge(package$crosswalk, results, by = ".uid", all.x = TRUE, sort = FALSE) at address_handling.R#91

Check parameters passed to geocoder services

An error should be thrown if the user passes an unsupported parameter to a geocoder service. For example, Census doesn't support a country argument and Google doesn't support city.

Allow limit > 1

geo(address = 'Tokyo', method = 'osm', limit = 5)

Currenty returns the error:

Error: Columns 3, 4, 5, 6, 7, and 3 more must be named.
Run rlang::last_error() to see where the error occurred.
Called from: signal_abort(cnd)
Browse[1]> rlang::last_error()
<error/tibble_error_column_names_cannot_be_empty>
Columns 3, 4, 5, 6, 7, and 3 more must be named.

Also update documentation and vignettes to advise usage for limit > 1.

Local geocoding database?

(Ported from another issue):

While I'm at suggesting things :), there was once headway on the data-science toolkit (rdstk) on using a local master address database to geocode - much more sustainable for big calls when someone can, for instance, download all geocoded addresses in a state or county and work off that (the way SAS or ArcGIS can). That to me feels like a missing keystone to a complete geocode package, though would be a sizable lift. I'll be watching this package and if there are ways I can contribute (ahem, after COVID, since I'm an overworked epidemiologist at the moment) I'd love to. Again, thanks for the work.

I've at times written my own somewhat lazy/hacky string-match to a known census of addresses in a location, but being able to call geo() on a local geodatabase would make this package indispensable for workers with access to local spatial databases but few API creds (since some of the open source / free geocoders are of much lower capacity than google or geocodio).

Again, great work. Just logging some ideas for the future!

Progress bar?

Hi there, fantastic package. Exciting standardization, as I've been writing my own API wrappers or using any of the other less-tidyverse-friendly geocoding packages.

Any thoughts on adding progress bars (like readr enables for long lists)? geo() could be wrapped to achieve this (using progress package or otherwise), but some intelligence to recognize larger geocode requests worth progress-bar-ing might be valuable.

FWIW. Best wishes again on this great package.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.