Giter Club home page Giter Club logo

banr's Introduction

output
github_document

An R client for the BAN API

R build status

The banR package is a light R client for the BAN API. The Base Adresse Nationale (BAN) is an open database of French adresses, produced by OpenStreetMap, La Poste, the IGN and Etalab.

banR can be installed from Github (current version):

# install.packages("devtools")
devtools::install_github("joelgombin/banR", build_vignettes = TRUE)

The CRAN version is out of date :

install.packages("banR")

banR allows to geocode lots of adresses in batch (the only hard limit is that, at the moment, the API only allows CSV files up to 50 MB). Please be gentle with the server though!

banR is designed to be used in a data exploration workflow, with a syntax 'à la tidyverse':

library(dplyr)
library(banR)
data("paris2012")

paris2012 %>%
  slice(1:100) %>%
  mutate(adresse = paste(numero, voie, nom),
         code_insee = paste0("751", arrondissement)) %>% 
  geocode_tbl(adresse = adresse, code_insee = code_insee) %>%
  glimpse()
#> Writing tempfile to.../tmp/Rtmp9YlNra/file147282c595535.csv
#> If file is larger than 50 MB, it must be splitted
#> Size is : 3 Kb
#> SuccessOKSuccess: (200) OK
#> Rows: 100
#> Columns: 25
#> $ arrondissement     <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06…
#> $ bureau             <chr> "09", "09", "09", "09", "09", "09", "09", "09", "09…
#> $ numero             <int> 4, 5, 6, 7, 8, 11, 12, 13, 14, 16, 3, 4, 5, 6, 7, 8…
#> $ voie               <chr> "RUE DE L", "RUE DE L", "RUE DE L", "RUE DE L", "RU…
#> $ nom                <chr> "ABBAYE", "ABBAYE", "ABBAYE", "ABBAYE", "ABBAYE", "…
#> $ nb                 <int> 1, 1, 20, 2, 17, 2, 9, 15, 17, 8, 13, 6, 6, 3, 9, 1…
#> $ ID                 <chr> "0609", "0609", "0609", "0609", "0609", "0609", "06…
#> $ adresse            <chr> "4 RUE DE L ABBAYE", "5 RUE DE L ABBAYE", "6 RUE DE…
#> $ code_insee         <chr> "75106", "75106", "75106", "75106", "75106", "75106…
#> $ latitude           <dbl> 48.85405, 48.85407, 48.85414, 48.85410, 48.85425, 4…
#> $ longitude          <dbl> 2.335715, 2.335172, 2.335352, 2.335041, 2.334903, 2…
#> $ result_label       <chr> "4 Rue de l’Abbaye 75006 Paris", "5 Rue de l’Abbaye…
#> $ result_score       <dbl> 0.97, 0.97, 0.97, 0.97, 0.97, 0.97, 0.97, 0.97, 0.9…
#> $ result_type        <chr> "housenumber", "housenumber", "housenumber", "house…
#> $ result_id          <chr> "75106_0002_00004", "75106_0002_00005", "75106_0002…
#> $ result_housenumber <chr> "4", "5", "6", "7", "8", "11", "12", "13", "14", "1…
#> $ result_name        <chr> "Rue de l’Abbaye", "Rue de l’Abbaye", "Rue de l’Abb…
#> $ result_street      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ result_postcode    <chr> "75006", "75006", "75006", "75006", "75006", "75006…
#> $ result_city        <chr> "Paris", "Paris", "Paris", "Paris", "Paris", "Paris…
#> $ result_context     <chr> "75, Paris, Île-de-France", "75, Paris, Île-de-Fran…
#> $ result_citycode    <chr> "75106", "75106", "75106", "75106", "75106", "75106…
#> $ result_oldcitycode <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ result_oldcity     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ result_district    <chr> "Paris 6e Arrondissement", "Paris 6e Arrondissement…

To know more about this package, please read the vignette (vignette("geocode"))

Please report issues and suggestions to the issues tracker.

See also

  • BAN-geocoder, python wrapper for adresse.data.gouv.fr
  • tidygeocoder, r package similar to banR using other geocoding services such as US Census geocoder, Nominatim (OSM), Geocodio, and Location IQ.

banr's People

Contributors

joelgombin avatar pachevalier avatar rolandrr avatar statnmap avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

banr's Issues

bug on Mac OS

I've tried to run the Readme.Rmd on Mac OS and I've got the following error :

Quitting from lines 32-42 (README.Rmd) 
Erreur : 'f_capture' is not an exported object from 'namespace:lazyeval'
Exécution arrêtée

I don't understand because the only mention of f_capture() in the code includes the namespace lazyeval::f_capture.

bug when the adresses are duplicated

Because of the join with implicit keys, when there are duplicated adresses in the input dataframe, the output dataframe has too many rows. Here is a reprex:

df <- data.frame(adresses = c("11 allée Sacoman", "11 allée Sacoman", "23 allée Sacoman"), code_insee = "13016", stringsAsFactors = FALSE)
result <- banR::ban_geocode(df, adresses, code_insee = "code_insee")
#> Geocoding...
nrow(df)
#> [1] 3
nrow(result)
#> [1] 5

Release 0.2.2

La version 0.2.1 du CRAN échoue contre dplyr 1.0.0 avec ces messages:

[master*] 126 MiB ❯ revdepcheck::revdep_details(, "banR")
══ Reverse dependency check ══════════════════════════════════════ banR 0.2.1 ══

Status: BROKEN

── Newly failing

✖ checking tests ...

── Before ──────────────────────────────────────────────────────────────────────
0 errors ✔ | 0 warnings ✔ | 0 notes ✔

── After ───────────────────────────────────────────────────────────────────────
❯ checking tests ...
  See below...

── Test failures ───────────────────────────────────────────────── testthat ────

> library("testthat")
> library("banR")
> test_check("banR")
── 1. Failure: Code INSEE and Code postal return the same result (@test_geocodet
geocode_tbl(tbl = table_check, adresse = num_voie, code_postal = cp) not equivalent to geocode_tbl(tbl = table_check, adresse = num_voie, code_insee = codecommune).
Component 1: 2 string mismatches
Component 2: 2 string mismatches
Component 4: 2 string mismatches

══ testthat results  ═══════════════════════════════════════════════════════════
[ OK: 7 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 1 ]
1. Failure: Code INSEE and Code postal return the same result (@test_geocodetbl.R#77)

Error: testthat unit tests failed
Execution halted

1 error ✖ | 0 warnings ✔ | 0 notes ✔

mais apparemment c'est déjà réparé. Es-ce que vous avez prévu de sortir une 0.2.2 avant la sortie de dplyr ?

'code_postal' argument unused in 'geocode_tbl' function ?

Hello Joel, thanks for this great job.
For some reason it seems the geocode_tbl function used with only one argument (adresse as a concatenation of street, zip code and city) sends back more accurate results than when it's used with multiple argument (adresse = street, code_postal = zip code).

Example :

library(tidyverse)
df <- tribble(
  ~ num_voie, ~ cp,  ~ ville,  
  "1 Rue Gaspard Monge", "22307", "Lannion Cedex",
  "Square Edouard Herriot", "85400", "Lucon")

library(banR)
  geocode_tbl(tbl = df , adresse = num_voie, code_postal = cp) %>%
  glimpse() %>% View()

  geocode_tbl(tbl = df %>% mutate(adr = paste0(num_voie,", ", cp, " ",ville)), adresse = adr) %>%
  glimpse() %>% View()

I guess 'code_postal' argument remains unused, so any adress who matches will be returned, even if it's not in the right city.
Or did I miss something ?

Searching non-ASCII characters in geocode_tbl() results in Unicode characters in the result

I guess this issue is not directly linked to banR but to the underlying API.

If one of the searched address contains non-ASCII characters we end up with Unicode characters in the results instead of UTF-8. (\xe2 instead of â for example):

In the following example using évron instead of evron results in a different encoding for my second search (Chatelaillon).

location_tbl <- tibble::tibble(city = c("évron", "Chatelaillon"))
banR::geocode_tbl(location_tbl, city) 
#> Writing tempfile to.../var/folders/dc/9dbfr9sx23jcx1tmfdlxqr3m0000gq/T//RtmpyiSyId/filef0946d58acb.csv
#> If file is larger than 8 MB, it must be splitted
#> Size is : 25 bytes
#> SuccessOKSuccess: (200) OK
#> # A tibble: 2 x 17
#>   city   latitude longitude result_label      result_score result_type result_id
#>   <chr>     <dbl>     <dbl> <chr>                    <dbl> <chr>       <chr>    
#> 1 évron      45.5      4.57 "Rue a C Victime…         0.2  street      42103_o6…
#> 2 Chate…     46.1     -1.09 "Ch\xe2telaillon…         0.62 municipali… 17094    
#> # … with 10 more variables: result_housenumber <chr>, result_name <chr>,
#> #   result_street <chr>, result_postcode <chr>, result_city <chr>,
#> #   result_context <chr>, result_citycode <chr>, result_oldcitycode <chr>,
#> #   result_oldcity <chr>, result_district <chr>

location_tbl <- tibble::tibble(city = c("evron", "Chatelaillon"))
banR::geocode_tbl(location_tbl, city)
#> Writing tempfile to.../var/folders/dc/9dbfr9sx23jcx1tmfdlxqr3m0000gq/T//RtmpyiSyId/filef096d8b39c1.csv
#> If file is larger than 8 MB, it must be splitted
#> Size is : 24 bytes
#> SuccessOKSuccess: (200) OK
#> # A tibble: 2 x 17
#>   city     latitude longitude result_label    result_score result_type result_id
#>   <chr>       <dbl>     <dbl> <chr>                  <dbl> <chr>       <chr>    
#> 1 evron        48.1    -0.425 Évron                   0.94 municipali… 53097    
#> 2 Chatela…     46.1    -1.09  Châtelaillon-P…         0.62 municipali… 17094    
#> # … with 10 more variables: result_housenumber <chr>, result_name <chr>,
#> #   result_street <chr>, result_postcode <chr>, result_city <chr>,
#> #   result_context <chr>, result_citycode <chr>, result_oldcitycode <chr>,
#> #   result_oldcity <chr>, result_district <chr>

fix `write_csv` warning

The `path` argument of `write_csv()` is deprecated as of readr 1.4.0.
Please use the `file` argument instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.

Naming of the functions ?

I suggest to rename ban_search_() as ban_geocode() or addok_geocode(). This would be easier to know what the function is about.

geocode_tbl() interprète les valeurs manquantes comme une chaine de caractères "NA"

S'il y a des valeurs manquantes dans la colonne adresse, geocode_tbl() transforme NA en une chaîne de caractère "NA".
On récupère alors une addresse: "Nas 07270 Saint-Barthélemy-Grozon"

Pour reproduire:

table_test <- tibble::tibble(
  x = c("39 quai Andre Citroen", "64 Allee de Bercy", "20 avenue de Segur", NA), 
  z = rnorm(4)
)

banR::geocode_tbl(tbl = table_test, adresse = x)

Cela est dû à la valeur par défaut de l'argument na de readr::write_csv

Ajouter des paramètres de requêtes à geocode()

Merci pour ce package fort utile!

Je me rend compte que l'API (https://geo.api.gouv.fr/adresse) propose des paramètres qui ne sont pas utilisés dans {banR}:

  • limit
  • autocomplete
  • type

J'aimerais les implémenter dans geocode()

Une de mes problématiques par exemple est que je recherche uniquement des villes mais pas d'adresse complète.
Actuellement, pour contourner il faut faire ceci:

banR::geocode("Lille") %>% 
  dplyr::filter(type %in% "municipality")

Cela marche bien si la commune est bien orthographiée et qu'elle ressort dans les 5 résultats fournis par l'API (valeur par défaut).

Si j'ai une valeur plus approximative comme par exemple "Perros" à la place de "Perros-Guirec", le code suivant ne me donne rien:

banR::geocode("Perros") %>% 
  dplyr::filter(type %in% "municipality")

Alors que la requête API suivante va me donner le résultat voulu:
curl 'https://api-adresse.data.gouv.fr/search/?q=Perros&type=municipality'

Est-ce que vous pensez que cela a sa place dans ce package? Je peux faire un PR si besoin.

Naming

When trying to lint the code with the lintr package, I've got several warnings :

  • the variable queryResults should be lowercase
  • the name of the package should also be lowercase !

Should we rename the package ?

By the way, I think we need to think about the name of variables.

I think it would be nice to have :

  • geocode : for geocoding a single address (ie using the one line API : http://adresse.data.gouv.fr/api/)
  • reverse_geocode : for reverse geocoding a single point
  • geocode_df or geocode_csv for geocoding a dataframe (ie the actual ban_geocode)
  • reverse_geocode_df for reverse geocoding a dataframe

I think the ban prefix is useless since the name of the package is already banR.

Parsing error

Merci pour ce paquet.

Sur un jeu-test j'ai :

Writing tempfile to...C:\Users\xxx\AppData\Local\Temp\RtmpSS0T23\file83cd61558e.csv
If file is larger than 8 MB, it must be splitted
Size is : 15.7 Kb
SuccessOKSuccess: (200) OK
Warning: 1 parsing failure.
row # A tibble: 1 x 5 col     row                col               expected actual         file expected   <int>              <chr>                  <chr>  <chr>        <chr> actual 1   116 result_housenumber no trailing characters      B <raw vector> file # A tibble: 1 x 5

Warning message:
In rbind(names(probs), probs_f) :
  number of columns of result is not a multiple of vector length (arg 2)

Je n'ai pas identifié les lignes responsables dans mes données mais en changeant les options de parsing de colonnes de la réponse dans la fonction geocode_tbl en : result_housenumber = readr::col_character() et result_postcode = readr::col_character() ça passe.

Cordialement

Refactoring

I've just tried to refactor the code of the main function. The goal is just to make it more readable.
My attempt is in the branch refactoring.

`sf` objects?

In addition to the coordinates, should we send back an sf object? (i.e. a tibblewith a geometrycolumn?

Github Action

Je vois dans la Github Action, qu'on construit le package pour pas mal de versions R sous Ubuntu :

Est-ce qu'on ne peut pas se contenter de la dernière version ( {os: ubuntu-16.04, r: 'release', rspm: "https://packagemanager.rstudio.com/cran/__linux__/xenial/latest"} ?

Par ailleurs, est-ce que ça ne vaut pas la peine de passer à Ubuntu 20 04.

[tidyeval] récupérer la taille d'un fichier

Dans la fonction geocode_tbl() dans la branche tidyeval, pour le moment, on récupère la taille du fichier avec file.size(). Pour obtenir la taille du fichier dans une unité 'lisible par un humain', on peut utiliser la commande (SO) :

utils:::format.object_size(file.size(tmp), "auto")

Mais quand on check le package, on obtient :

Unexported object imported by a ':::' call: ‘utils:::format.object_size’

Faudrait trouver un autre moyen d'afficher la taille du fichier en MB.

Language of the documentation

That's a great first commit. I think it would be useful to write the documentation in English instead of French. What do you think ?

Inconsistency in calculating geocoding scores

Too big difference in score when geocoding addresses containing "bis" vs "b". Please see the example below (which reveals an inconsistant behavior in the scoring) :

library(banR)
library(dplyr)

# score : 0.708 
geocode(query = "3 bis rue La Bruyère, 75009 Paris") %>%
  glimpse()

Rows: 5
Columns: 18
$ label       <chr> "3 b Rue la Bruyère 75009 Paris", "Square la Bruyère 75009 Paris", "3 b Rue Bleue 75009 Paris", ...
$ score       <dbl> 0.7080726, 0.5166771, 0.5044495, 0.4707309, 0.4683309
$ housenumber <chr> "3 b", NA, "3 b", NA, NA
$ id          <chr> "75109_5211_00003_b", "75109_5212", "75109_1017_00003_b", "75109_1345", "75109_1434"
$ name        <chr> "3 b Rue la Bruyère", "Square la Bruyère", "3 b Rue Bleue", "Rue de Bruxelles", "Rue de Calais"
$ postcode    <chr> "75009", "75009", "75009", "75009", "75009"
$ citycode    <chr> "75109", "75109", "75109", "75109", "75109"
$ x           <dbl> 651344.9, 651065.6, 652137.1, 650901.9, 650924.1
$ y           <dbl> 6864521, 6864555, 6864170, 6865026, 6864926
$ city        <chr> "Paris", "Paris", "Paris", "Paris", "Paris"
$ district    <chr> "Paris 9e Arrondissement", "Paris 9e Arrondissement", "Paris 9e Arrondissement", "Paris 9e Arron...
$ context     <chr> "75, Paris, Île-de-France", "75, Paris, Île-de-France", "75, Paris, Île-de-France", "75, Paris, ...
$ type        <chr> "housenumber", "street", "housenumber", "street", "street"
$ importance  <dbl> 0.69789, 0.57534, 0.66323, 0.67804, 0.65164
$ street      <chr> "Rue la Bruyère", NA, "Rue Bleue", NA, NA
$ type_geo    <chr> "Point", "Point", "Point", "Point", "Point"
$ longitude   <dbl> 2.336596, 2.332783, 2.347436, 2.330497, 2.330811
$ latitude    <dbl> 48.87887, 48.87915, 48.87577, 48.88338, 48.88248

# score : 0.799
geocode(query = "3 b rue La Bruyère, 75009 Paris") %>%
  glimpse()

Rows: 5
Columns: 18
$ label       <chr> "3 b Rue la Bruyère 75009 Paris", "3 b Rue Bleue 75009 Paris", "Square la Bruyère 75009 Paris", ...
$ score       <dbl> 0.7998082, 0.5716573, 0.5432127, 0.4940360, 0.4896602
$ housenumber <chr> "3 b", "3 b", NA, "3 b", "3 b"
$ id          <chr> "75109_5211_00003_b", "75109_1017_00003_b", "75109_5212", "75109_1407_00003_b", "75109_1363_0000...
$ name        <chr> "3 b Rue la Bruyère", "3 b Rue Bleue", "Square la Bruyère", "3 b Rue Cadet", "3 b Rue de Budapest"
$ postcode    <chr> "75009", "75009", "75009", "75009", "75009"
$ citycode    <chr> "75109", "75109", "75109", "75109", "75109"
$ x           <dbl> 651344.9, 652137.1, 651065.6, 651777.3, 650678.1
$ y           <dbl> 6864521, 6864170, 6864555, 6864028, 6864230
$ city        <chr> "Paris", "Paris", "Paris", "Paris", "Paris"
$ district    <chr> "Paris 9e Arrondissement", "Paris 9e Arrondissement", "Paris 9e Arrondissement", "Paris 9e Arron...
$ context     <chr> "75, Paris, Île-de-France", "75, Paris, Île-de-France", "75, Paris, Île-de-France", "75, Paris, ...
$ type        <chr> "housenumber", "housenumber", "street", "housenumber", "housenumber"
$ importance  <dbl> 0.69789, 0.66323, 0.57534, 0.66969, 0.64942
$ street      <chr> "Rue la Bruyère", "Rue Bleue", NA, "Rue Cadet", "Rue de Budapest"
$ type_geo    <chr> "Point", "Point", "Point", "Point", "Point"
$ longitude   <dbl> 2.336596, 2.347436, 2.332783, 2.342547, 2.327538
$ latitude    <dbl> 48.87887, 48.87577, 48.87915, 48.87447, 48.87620

Intercepting errors

@pachevalier I like how your functions print the http code of the request. But how we can intercept them to programatically handle exceptions?

banR n'est plus sur le CRAN

image

D'après ce que je vois, le package a été rejeté suite à des erreurs 502 venant de l'API lors de tests unitaires.
J'ai déjà observé ce genre d'erreur. L'API a l'air moins stable certains jours..

Est-ce qu'une solution serait de vérifier la réponse de l'API dans les tests et de faire un skip_on_cran() en cas d'erreur 502?(Par contre cela ne résoudra pas les erreurs lors de la génération des .Rmd et des exemples).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.