Comments (3)
We will not be able to identify street names or localities that are not in the province, so we won't be able to identify that the province name is the province name. "blah blah Ontario" is just as likely an occupant or site name on ontario street as it a street address in ontario.
from ols-geocoder.
As a first step, we should focus on reliably identifying alien addresses and assigning an address.isAlien fault to them instead of matching to a false positive address somewhere in BC. Here's a current example of such a false positive using Geocoder 4.1:
122 Albert St, Port Melbourne, Victoria australia
matches to this:
122 Lambert St, Quesnel, BC.
At least with accurate alien detection, a script can filter aliens out of the batch geocoder results file and apply a global geocoder to them.
@bstratto Feel free to add a comment describing potential alien detectors you've discovered in your rejected address analysis.
from ols-geocoder.
Below are a few patterns for identifying addresses in other countries. This is based on analysis of the 13 million HealthIdeas addresses and reflects the examples available in that dataset.
These patterns provide only a subset of the HealthIdeas addresses in these countries. There are many more addresses for which there is no “safe” pattern (i.e. a pattern would have the potential of also eliminating addresses where a BC location is included).
Pattern for addresses in Germany:
The HealthIdeas addresses show that people use the general formats:
• German zip code + “, GERMANY, BC”
• German zip code + German locality + “, GERMANY, BC”
To make this pattern safe, we would have to check that the text in German locality is not in fact the name of a BC locality. For example, there could be an address “10319 HOPE, GERMANY, BC”
The below pattern was tested with HealthIdeas and returns only German alien addresses:
• The first 5 characters are numbers
• The length of the address is <= 30 (longer addressStrings tend to include BC address text)
• The address ends with “, GERMANY, BC”
• The text preceding “, GERMANY, BC” is not the abbreviation for a street type
• The text preceding “, GERMANY, BC” is not a known BC locality
Below are some examples. This pattern identifies 65 addresses in the HealthIdeas dataset:
addressString | Standardized address | Score |
---|---|---|
55131 MAINZ, GERMANY, BC | German Rd, Flatrock, BC | 55 |
60486 FRANKFURT, GERMANY, BC | BC | 1 |
61350 BAD HOMBERG, GERMANY, BC | BC | 1 |
27612 LOXO ZECHT, GERMANY, BC | Zacht 5 near Kanaka Bar, BC | 52 |
28211 BREMEN, GERMANY, BC | 28211 Herman S. Braich Blvd, Mission, BC | 76 |
Pattern for addresses in England:
The HealthIdeas addresses show that people use the general formats:
• “, ENGLAND, BC”
• England locality + “ ENGLAND, BC” or England locality + “, ENGLAND, BC”
To make this pattern safe, we would have to check that the text in is not in fact the name of a BC locality. For example, there could be an address “HOPE, ENGLAND, BC”. This text, however, may also include localities that exist in both England and BC, such as “SURREY, ENGLAND, BC”. Geocoder would have to “make a call” regarding these.
The below pattern was tested with HealthIdeas and returns only England alien addresses:
• The first character is not a number
• The length of the address is <= 25 (longer addressStrings tend to include BC address text)
• The address ends with “ENGLAND, BC”
• The text preceding “ENGLAND, BC” does not include a known BC locality
Below are some examples. This pattern identifies 173 addresses in the HealthIdeas dataset:
addressString | Standardized address | Score |
---|---|---|
VISITOR FROM, ENGLAND, BC | Vision Way, Langford, BC | 24 |
WELSHPOOL, ENGLAND, BC | BC | 1 |
VISITING, ENGLAND, BC | BC | 1 |
WEST SUSSEX, ENGLAND, BC | West Boulevard, Vancouver, BC | 69 |
, KENT ENGLAND, BC | England Rd, Courtenay, BC | 64 |
, ENGLAND, BC | England Ave, Courtenay, BC | 62 |
Pattern for addresses in the United States:
The HealthIdeas addresses show that people use the general formats:
• text + “, USA, BC”
• text + “, US, BC”
• 6 to 10 numeric digits + text + “ , USA, BC”
To make these patterns safe, we would have to check that the text is not in fact the name of a BC locality. For example, there could be an address “HOPE, USA, BC”. This text, however, may also include localities that have similar names or exist in both US and BC, such as “MT VERNON, USA, BC”. Geocoder would have to “make a call” regarding these.
The below patterns were tested with HealthIdeas and return only United States alien addresses:
• Pattern 1: Non-numeric (106 addresses found in HealthIdeas)
o The first 2 characters are not a number
o The length of the address is <= 19 (longer addressStrings tend to include BC address text)
o The address ends with “ US, BC” or “ USA, BC”
o The text preceding “ US, BC” (or “ USA, BC”) is not a known BC locality or “UVIC” or “UBC”
• Pattern 2: Numeric (52 addresses found in HealthIdeas)
o The first 6 characters are numbers
o The length of the address is <= 19 (longer addressStrings tend to include BC address text)
o The address ends with “, USA, BC”
o The characters in position 7-10 are one of these: space, comma, “U”, “S”
Below are some examples. Numeric addresses were redacted.
addressString | Standardized address | Score |
---|---|---|
MT VERNON, USA, BC | Mt Atkinson Pl, Vernon, BC | 69 |
ALTONA PA, USA, BC | Pa-aat 6 near Pitt Island, BC | 54 |
, US, BC | BC | 1 |
, USA, BC | BC | 1 |
ARIZONA, USA, BC | BC | 1 |
BBBBBB, USA, BC | BC | 1 |
9999999, USA, BC | BC | 1 |
9999999, USA, BC | BC | 1 |
9999999 01, USA, BC | BC | 1 |
from ols-geocoder.
Related Issues (20)
- Batch Geocoder - Add yourId to errorResultData HOT 2
- It's Been a While Since This Repository has Been Updated
- Geocode of Addresses resolve to the same point HOT 2
- Batch Geocoder: Error message HOT 5
- Improve response for site not found case HOT 1
- Admin app - Delete locality mapping HOT 3
- Right-to-left address salvage HOT 2
- Data integration automation: ITN range site prep HOT 2
- NullPointerException HOT 1
- Data prep - Steps D3 & D4
- It's Been a While Since This Repository has Been Updated HOT 1
- Handling similar locality names with matchPrecision=LOCALITY HOT 2
- Handling the word 'block' in addressString HOT 1
- Distance Based Locality Mappings
- Unstructured /addresses request with a locality filter HOT 2
- localities parameter - error handling HOT 1
- Locality Names do not include punctuation from GNIS HOT 4
- Missing addresses not found in rejected list HOT 2
- test
- autoComplete - Limit results to exact spelling of partial address
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ols-geocoder.