Giter Club home page Giter Club logo

ols-geocoder's Introduction

ols-geocoder

img

OLS-Geocoder is an open source project that provides APIs for address cleaning, correction, completion, geocoding, and reverse geocoding. The Province of British Columbia is the prime committer and maintains a reference deployment called the BC Address Geocoder for use by government and the public at large. The BC Address Geocoder has processed over one billion addresses since its initial release in 2013. To see it in use by an application, visit Location Services in Action.

To find out how to integrate the BC Address Geocoder into your own application, please visit here.

To install OLS-Geocoder in your own environment, instructions are here.

Troubleshooting

Geocoder Playbook

Pipelines

See geocoder-pr-triggers.yaml Helm Chart

ols-geocoder's People

Contributors

3rdmike avatar bolyachevets avatar cmhodgson avatar dependabot[bot] avatar dkelsey avatar dstainton avatar geocoder-bot avatar repo-mountie[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ols-geocoder's Issues

Integrate Tomcat using Spring Boot

The current deployment requires an existing installation of tomcat onto which the Geocoder can be deployed. This made sense when the tomcat java application server was shared among several other applications. However, given the current approach of a dedicated docker container with a dedicated tomcat, integrating tomcat into the geocoder provides a simpler deployment/installation process. Non-integrated deployment can still be supported as well. Spring Boot provides the ability to do this and we are already using the Spring framework, so it should be straightforward to do.

Add support for glued and unglued words in site, street, and locality name matching

A compound word is a word composed of sub-words where a sub-word is also a word. An example of a compound word is Mapleridge which consists of sub-words "maple" and "ridge". A streetName based on Mapleridge may appear in ITN in two forms:

 1. Mapleridge

 2. Maple Ridge

The geocoder should be able to match an input address to either form.

Currently user input like Birds Eye Dr Duncan BC will not cleanly match to the concatenated name known within ITN: Birdseye Dr. It is worth considering if it is possible to change the geocoder to accommodate this type of match.

Perhaps there is a way to change the parser to accommodate many-to-one word mappings in the street name or to introduce logic that ignores spaces when looking up a street candidate.

This is needed by all clients to reduce the number of poor matches they have to deal with.

https server 500 error in response to sites/nearest.geojson request

@mraross commented on Thu May 23 2019

https://geocoder.api.gov.bc.ca/sites/nearest.geojson?locationDescriptor=accessPoint&maxDistance=5000&excludeUnits=true&onlyCivic=true&brief=true&point=-127.28111966701108,49.65348852775016


@cmhodgson commented on Fri May 24 2019

The error handler was not setup to handle geojson output (even though it is the same actual output as json). Had to add the geojson mime type to the ErrorMessageConverter's supported types. This is fixed locally and ready for future delivery.

Add support for an alias address in a geocoding request for autocompletion

An alias address is an address that includes the alias street and/or locality as entered. This is returned only if autoComplete=True.

In autocompletion mode, an application tries to autocomplete an address as a user is typing. If they start typing an alias locality as in:

4460 Happy Valley Rd, saan

the following fullAddress is among the top five choices:

4460 Happy Valley Rd, Metchosin, BC

but the following aliasAddress is not among them:

4460 Happy Valley Rd, Saanich, BC

This is confusing to a user because they can't select the full address they are trying to enter. To overcome this, add an aliasAddress property in the addresses response. aliasAddress should be set to null or not included in the response if no alias was detected in the input. Otherwise, aliasAddress should be set to the complete fullAddress using the aliases entered. fullAddress should alway contain the correct, official address.

This issue was first reported by Elections BC.

Investigate feasibility of business licenses as a source of secondary addresses

Most address authorities are only concerned with the primary civic address of a property, which is the address used in construction permitting and land title registration. Secondary civic addresses are usually store fronts or offices in buildings or malls and are of most interest to people trying to find the location of a given business. BC Emergency Health Services is sending us missing secondary addresses regularly. We do add them to an address exception file that gets loaded into the geocoder but it would be good to be proactive and find a source of secondary addresses before BC EHS and other government business areas struggle to cope without them.

For example, 620 View is the primary civic address of the Central Building on View St. The Central Building also has multiple storefronts, each with its own secondary civic address including 614, 616, 618, and 622 View St, and 1200 Broad St.

We propose to investigate the use of business licenses as a source of secondary civic addresses. Many cities offer business licenses as open data. Vancouver would probably be a good place to start.

Multiple inconsistent street type faults

Bad matches can get good scores. For example:

1175 king st nanaimo bc

matches to:

1175 St. Patrick Cres, Nanaimo, BC

with a score of 94 and the following faults:

[STREET_NAME.partialMatch:1, STREET_TYPE.notMatched:3, STREET_TYPE.spelledWrong:1, STREET_TYPE.notPrefix:0]

We might be able to further reduce the score by penalizing multiple faults about the same street element, in this case, STREET_TYPE; or penalizing a street name partialMatch that occurs with faults on other street name elements.

Automate geocoder data integration process

What is Geocoder Data Integration?
Here's an overview of the geocoder data integration process:

Gather | Transform | Integrate | Verify | Deploy

In Gather, we download source road network and address data.

In Transform, we transform all source data into standard schemas and formats and do field-level validation.

In Integrate, we tie addresses to the road network and generate address ranges.

In Verify, we verify the quality of the integrated data by geocoding our Acceptance Test addresses.

In Deploy, if the validation was successful, we deploy the integrated data to a given geocoder environment.


Weaknesses of current implementation

The current geocoder data integration process has three main weaknesses:

  • It is too fussy. The current process requires many manual steps that require careful typing and lots of visual confirmation.
  • It takes too long. It takes approximately three business days to complete.
  • It ties up our delivery and test environments; conversely, delivery and testing of code fixes and enhancements often blocks the data integration process. We rarely keep to our stated monthly data update schedule.

Business Value
Currently, authoritative addresses are submitted to us monthly but we have not been able to prepare them for use in the geocoder on that schedule. This means our users (e.g., MCFD, Elections BC) are submitting addresses of new construction that the geocoder doesn't know about or addresses for which the authoritative location has been improved but the geocoder doesn't know about either. We are also receiving an updated road network (Integrated Transportation Network) on a monthly basis.

Automating the geocoder data integration process will allow Location Services to maintain a monthly update schedule and respond to emergency updates in a timely fashion (e.g., one or two days). It will shorten the data integration process from days to hours. Note that a change to the road network necessitates a geocoder data integration (e.g., due to potential road segment name changes).

Here is more details on the geocoder data integration process,

Devise a way to identify multiple sites on the same JUROL that have the same civic number but different street names

@mraross commented on Wed Sep 14 2016

One way is to find accessPoints that are associated with more than one address. You need to exclude sub-units. For example, 1004, 1008,1104,1108, 1100 Pavillion-Clinton Rd, Clinton all share the same accessPoint


@gleeming commented on Wed Sep 14 2016

To even start, this will require us to resurrect a way to load the current data formats used in the now database-free geocoder environment into a database.


@mraross commented on Wed Sep 14 2016

Ok, will investigate ways of scripting this internally without a db.


@cmhodgson commented on Wed Sep 14 2016

Nitpicky but important: they do not share the same "accessPoint". Their accessPoints are at the same location. I guess its fair to say they share the same "access point" (note the space and lack of camelCase).

I make these accessPoints in the rangeGen, we don't really need to find them after the fact. They end up at the same location because they are "off the end" of the Street's centerline linestring. Certainly, MOST accessPoints that are offset from the end-point of the Street's lineSegment probably do not belong there.


@gleeming commented on Wed Sep 14 2016

I suspect many of them are actually legitimate where a street ends in a cul-de-sac.

I have a semi-working database environment in postgis that can ultimately be used to load the new data formats; it still needs more tweaking. About 10% of all APs share the same coordinate with at least one other AP (also included restriction to have same locality ID and street segment ID for a fast response). Note that subsites currently don't have their own APs because custodians don't supply it, so the counts are related to sites only. About 3500 records comprise the set of APs where 50 or more of them have a shared location.

SELECT access_point_id, site_id, ap.locality_id, ap.street_segment_id, civic_number, count, ap.geometry FROM
(SELECT * FROM
(SELECT ap.geometry, locality_id, street_segment_id, count(*) AS count FROM bgeo.bgeo_access_points ap
LEFT JOIN bgeo.bgeo_sites s ON ap.site_id = s.site_id
WHERE s.parent_site_id IS NULL
GROUP BY locality_id, street_segment_id, ap.geometry) AS foo
WHERE count > 1) AS foo2
LEFT JOIN bgeo.bgeo_access_points ap ON ap.geometry = foo2.geometry
AND ap.locality_id = foo2.locality_id AND ap.street_segment_id = foo2.street_segment_id
ORDER BY ap.geometry, ap.locality_id, ap.street_segment_id;


@mraross commented on Wed Sep 14 2016

Perhaps restricting to APs that share the same point with three or more APs would narrow it down a bit.


@gleeming commented on Wed Sep 14 2016

Cuts the number of cases in half, but there are still way too many to be useful for review. I could try cross-referencing to and ignoring dead end street cases, but this may still not be good enough.

The worst offender is 143 APs at the same location around 1822 Purcell Way, North Vancouver. They are mid-block because all the custodian records for this block complex have the same site location. It isn't a dead end case, they just all extrapolate to the same mid-block location from their common starting geometry.


@gleeming commented on Tue Sep 27 2016

Here's a revised query that ignores cases on dead-end segments. It cuts the number of cases in half, but with over 60k results is still too excessive for a broad review. As previously noted, looking for the largest clusters may provide the best starting point.

SELECT access_point_id, site_id, ap.locality_id, ap.street_segment_id, civic_number, count, ap.geometry FROM
(SELECT ap.geometry, locality_id, street_segment_id, count(*) AS count FROM bgeo_access_points ap
LEFT JOIN bgeo_sites s ON ap.site_id = s.site_id
WHERE s.parent_site_id IS NULL
GROUP BY locality_id, street_segment_id, ap.geometry) AS foo
LEFT JOIN bgeo_access_points ap ON ap.geometry = foo.geometry
AND ap.locality_id = foo.locality_id AND ap.street_segment_id = foo.street_segment_id
LEFT JOIN bgeo_street_segments ss ON ss.street_segment_id = ap.street_segment_id
LEFT JOIN bgeo_street_intersections si1 ON si1.street_intersection_id = ss.start_intersection_id
LEFT JOIN bgeo_street_intersections si2 ON si2.street_intersection_id = ss.end_intersection_id
WHERE count > 1 AND si1.degree > 1 AND si2.degree > 1
ORDER BY ap.geometry, ap.locality_id, ap.street_segment_id;

Geocode Results with the same score but fewer 0-penalty faults should come first

This issue is highlighted by the fact that these two addresses do not get de-duplicated in the address prep process:

  1. 7931 Hwy 97 N, Kelowna, BC
  2. CAMS RACE TRACK GAS -- 7931 Hwy 97 N, Kelowna, BC

The problem is that the 0-point penalty for STREET_DIRECTION.notMatchedInHighway means that there is no way to prefer an exact match of "Hwy 97 N" over the slightly incorrect "Hwy 97" with the 0-point fault. This duplicate address bug happens in the address prep phase, but this same problem is also evident at the user level, as the order of such matches is not consistent.

A possible way to fix this is to internally add a .01 point penalty for any 0-point fault. All external views would still see the integer value of the score, but the internal sorting mechanism would then be sure to put the better matches first.

bug fix requested by DataBC to improve user experience

Add support for hwy and freeway exits as intersections

@mraross commented on Mon May 27 2019

Highway and freeway exits should be recognized as a type of intersection by the geocoder and route planner since they are part of the road network.
BC Wildfire Service Fire Reporting Centre needs to be able to find a location by hwy exit number.

There are a couple of ways to do this:

  1. Extend ITN data prep to create the appropriate exit street aliases.
  2. Extend the definition of Intersection to support exits.

Option 1 seems simpler at this point.

  1. In ITN data prep, create street aliases for off-ramps in the following form:

    Hwy <routeNo> and Exit <exitNo>
    

Here's an example:

        Hwy 1 and Exit 305

This form doesn't distinguish between the two directions of travel (e.g., NorthBound/ SouthBound) This would require an addition as in:

               Hwy <routeNo> and Exit <exitNo> <direction>

and here's another example:

               Hwy 1 and Exit 305 Eastbound

With this intersection format you can enter:

                Hwy 1 and exit 305

and the geocoder in autocompletion mode should return:

                Hwy 1 and Exit 305 Eastbound
                Hwy 1 and Exit 305 Westbound

It might be useful to use the keyword "at" instead of "and" in the standard form of an exit intersection. For example:

                Hwy 1 at Exit 305 Eastbound

In any case, allowing "at" as an alias for "and" would be helpful for both exit intersections and street intersections.

  1. Extend the intersection feature in the geocoder data model to support exit intersection. Minimally, this would require the following additional intersection properties:

         routeNumber - provincial route number; municipal freeways may not have a route number
         routeName - provincial route name (e.g., Trans Canada highway)
         localName - freeway name or local name of provincial highway (e.g., Ginger Goodwyn Way)
         exitNo - exit number of intersection of route and start point of off-ramp.
    

This would allow intersection searches by routeNumber, routeName, and localName

The geocoder needs to be enhanced to support exit intersections.

Exit 116, Hwy 1

which would return two exits:

Exit 116 EB, Hwy 1 to Chilliwack
Exit 116 WB, Hwy 1 to Chilliwack

  1. Extend the intersections resource to support exits. Here are some options:

    1. Don't add new Intersection properties. Just set intersectionName appropriately as in:
      "Exit 104 EB, Hwy 1 to Chilliwack"

    2. Add two new properties: isExit: Boolean and exitTo: Text as in:
      intersectionName="Exit 104 EB, Hwy 1"
      isExit=True
      exitTo: Chilliwack

    3. Go completely structured as in:
      exitNumber: Integer
      exitNumberSuffix: Character
      exitDirection: String
      exitTo: String

Add support for electoral areas at the matchPrecision level of locality

If you geocode:

3450 Cobble Hill Rd, Cobble Hill, BC

the correct electoral area will be returned:

 CVRD Electoral Area C (Cobble Hill)

If you geocode:

Cobble Hll,BC

you get no electoral area.

Electoral area is particularly important for non-civic occupants as they tend to be located in unincorporated areas.

This feature is needed by Ministry of Finance to determine the correct tax jurisdiction for such taxes as B&B tax and speculation tax.

Some addresses in sites_databc.csv aren't getting integrated into geocoder address load files

Not sure why. In the following row, there is the known location, the source address, and the fullAddress returned by the geocoder in delivery.

[50.65502,-120.37081] 1320 Trans-Canada Hwy W,Kamloops,BC => 1320 Holman Rd, Kamloops, BC

[49.51412,-115.76217] 201 14th Ave N,Cranbrook,BC => 201 14th Ave N, Cranbrook, BC {block}

[54.04971,-128.65019] 701 Mountainview Sq,Kitimat,BC => Mountainview Sq, Kitimat, BC

[49.25400,-123.23646] 5968 Webber Lane,Chilliwack,BC => Chilliwack, BC

[54.05365,-128.65186] 327 City Centre,Kitimat,BC => City Ctr, Kitimat, BC

[54.05051,-128.65025] 535 Mountainview Sq,Kitimat,BC => Mountainview Sq, Kitimat, BC

[54.05051,-128.65025] 570 Mountainview Sq,Kitimat,BC => Mountainview Sq, Kitimat, BC

Add support for ignoring a care-of (c/o) name element embedded in an address

Given the following address:
c/o Mary Smith, 1000 Main St, Rosebug, BC

the geocoder should return:

 1000 Main St, Rosebug, BC

and a fault of: CARE_OF.ignored:1

There are 9,907 MCFD addresses that contain c/o elements in their children in care list.
Advanced Ed also raised this issue since student addresses sometimes include such data and it currently requires manual labour to edit it out.

Add sites/within/count that returns the number of sites within a given bbox

This resource allows an application to know if a request for all sites within a given bbox will exceed the maximum results allowed.

parms include excludeUnits which excludes subsites and onlyCivic which includes only civic addresses

needed by GeoBC to inform their GIS analysts when there are more results they can't see.

Add a resource that returns all occupants at a given address

@cmhodgson commented on Mon Jun 26 2017

Presently if you input the following addressString to /occupants/addresses:
unit 207 -- 5462 Trans-Canada Hwy, Duncan, BC

You get a UNIT level match returned for that exact address. However, if you input the following:
WOLF unit 207 -- 5462 Trans-Canada Hwy, Duncan, BC

You get OCCUPANT level matches for the following (among others):
Sun Wolf Earthworks Ltd., UNIT 207 -- 5462 Trans-Canada Hwy, Duncan, BC
Iron Wolf Bridge Bridge and Steel, UNIT 207 -- 5462 Trans-Canada Hwy, Duncan, BC

Given that you are using the /occupants/ API, perhaps it would be preferable to return the occupant(s) at the site/unit/civic number matched, rather than the site/unit/civic number matches, if there are such occupant(s).

This would allow us to answer the question "are there any known occupants at this address", which we cannot answer as it is now.

Currently there is no in-memory link between a site and any occupants of that site, the link goes the other way from occupant to site. But we could easily add this link at the cost of an extra pointer for every site.


@mraross commented on Wed Aug 30 2017

Is that a cost of one pointer per occupant at a given address?


@mraross commented on Wed Aug 30 2017

Perhaps occupants at a given address should be a separate resource as in:
occupants/atAddress?addressString="unit 207 - 5462 Trans-Canada Hwy, Duncan, BC"

OR maybe occupants at a given site as in:

occupants/atSite/{siteID}

to separate the process of geocoding an address from looking up a site's occupants.
So, first you geocode unit 207 -- 5462 Trans-Canada Hwy, Duncan, BC to get the best matching site then you get the occupants of that site.

This could be considered reverse occupant.


@cmhodgson commented on Fri Sep 01 2017

I was thinking that there would be at most one occupant per site, otherwise there would be some sub-site structure missing? So each site would have a pointer to a potential occupant. If a site can potentially have multiple occupants, then we would also need a list object for each site with any occupants, and a pointer to each of the occupants, which would likely double the additional memory requirements to support this.

Add sites/within/random which returns a number of sites chosen at random

@mraross commented on Mon Dec 18 2017

has same parameters as sites/within. This is currently implemented in the Location Services in Action and has proved so useful in QA and outreach, it should be in API for all to benefit.

needed by DataBC to support QA, outreach, and geocoder evaluations by prospective clients.


@mraross commented on Tue Feb 13 2018

Instead of a maxDistance parameter, the resource could autocalibrate maxDistance as follows:

if there is only one civic addresses within bbox then return civic address endif

if there are no civic addresses then return error "No civic addresses within requested bbox" endif

Try 10 random point searches with maxDistance=randomRadius (metres)
if civic address found then return civic address endif

Try 10 random point searches with maxDistance=randomRadius *10 (metres)
if civic address found then return civic address endif

Try 80 random point searches with maxDistance=randomRadius * 100 (metres)
if civic address found then return civic address endif

Return error "No civic address found"


@mraross commented on Tue Feb 13 2018

Instead of the resource parameter &xmdx, add support for a global config variable called randomRadius that has a default value of 30 (metres) and can be managed from the geocoder admin app. randomRadius should be restricted to a value between 10 and 90.

Add an occupantName.ignored fault with zero penalty when occupant name is included in a addresses request

Given the following addressString in a geocoder.api.gov.bc.ca/addresses request:

StitchNChat Club ** 1175 douglas st, victoria, bc

the geocoder correctly identifies and ignores the occupant name and returns a fullAddress of:
1175 douglas st, victoria, bc

It should also return the following fault with a penalty of 0:

OCCUPANT_NAME_IGNORED

requested by DataBC to improve error reporting for all developers

in geocoder, add serviceCentres/nearestByRoad resource

@mraross commented on Mon Mar 20 2017

Assessors at BC Assessment need to know the occupants nearby that would affect a property's value (e.g., railyard, school, firehall).

This will require the implementation of the roadVoronoi resource in the route planner.
Input parameters should include accessPoint which will be snapped to the nearest road but should be where the driveway hits the road.


@cmhodgson commented on Tue Mar 21 2017

I'm not too sure how you visualize this working. The geocoder has the occupants, the router has the roads. You could implement this right now by calling:

geocoder/occupants/within?center=<property location>&distance=<maximum distance of concern> 

and then call

router/distance/betweenPairs?fromPoints=<propertyLocation>&toPoints=<the points returned in the occupants query>

This is literally 5 to 10 lines of JavaScript on the client side. If we speed up betweenPairs by using the "marching outward" algorithm then this would work fantastically for even large numbers of nearby occupants. The client would even have the flexibility to use different distances for different occupant types (tags) by just adding tags to the first query and doing it a few times. Maybe it only matters if you are within 500m of a coffee shop, but 5km of a sewage treatment plant ;)


@mraross commented on Tue Mar 21 2017

The flaw in your plan is that there are road network layouts in which occupants/within will not return all the needed occupants for step 2 (think city with a river running through it). This is what we discovered in the ACDF and why we abandoned the two-step plan (aka the unholy mixing of metrics). I will be adding a roadVoronoi enhancement to the route planner soon which will explain how nearestByRoad can work with no failure cases.


@mraross commented on Thu Apr 27 2017

All nearestByRoad computations should be done offline during route/address data prep. This will work with our current monthly update cycle but certainly won't work if we have daily road, address, and occupant updates.


@cmhodgson commented on Fri Apr 28 2017

I assume you mean this to work as the inverse of the occupant service area boundary - this works as long as the service areas are purely distance-based. If the service areas are calculated using the population in some way, they we need multiple different kinds of "service areas" pre-computed and stored, per segment. If you have many occupant types this can get to be a lot of memory, and a lot of calculation (off-line so I suppose not a real concern).


@mraross commented on Fri Apr 28 2017

They wouldn't be population based but you do raise an interesting option to support both fastest and shortest metrics. I guess we should ask clients if they had to pick one metric, what would it be.

Add support for a deepMatch parameter

Add support for a logical deepMatch parameter to control whether or not the parser uses heuristics to salvage a poorly matching address. If deepMatch is true, use heuristics. deepMatch is false by default.

Here is a couple of addresses that score poorly in geocoder/addresses:
2187 SOUTH ALDER ST S, CAMPBELL RIVER, BC
MAYFAIR MALL 3147 DOUGLAS ST, VICTORIA, BC

The first has a redundant streetDirection, the second an irrelevant building name. These examples look salvagable.

Heuristics should be applied only after the parser has determined that an input address has no good straighforward match and if &deepMatch=true.

The need for heuristics comes from the MOH experience of geocoding all 4.8 million MSP addresses and getting a 6% rejection rate. How long does it take to manually clean up 300,000 addresses? At between one minute and 4 minutes per address, it could take between 5,000 and 20,000 person hours. Implementing automated address salvage could cut this time in half or more.

Insufficient anchor points on segment

@BK01 commented on Wed Aug 14 2019

The correct location of the following address is not found using adaptive or linear. Rather a block level match is found.

4760 Roger St, Port Alberni, BC

Following review of surrounding sites as well as intersections we can see that insufficient anchor points are available per segment for successful adaptive interpolation. In particular the segment of Roger St closest to the address is broken by an emergency lane.


@cmhodgson commented on Wed Aug 14 2019

It is true that there are two many segments with no enough address points. However it is weird that the access point for 4780 Roger St is so far down the road, rather than being placed on the segment immediately in front of the parcel. I suspect that anchor points from other sources are making this worse instead of better. There is definitely room for smarter address assignment logic in cases where there is a shortage of information (likely only a couple of anchor points and a single parcel point). The extra segment breaks due to the rail crossing and emergency lane work against the post-smoothing logic, as does the apparent offset in addressing between the sides of street, and the existence of the same-name side-street branches.


@mraross commented on Wed Aug 14 2019

I added an additional address on each side of the emergency lane (4760 and 4724) to sites_databc.csv. That should give the address range maker a little more grist for the mill.

Create a cloud-ready architecture for the online geocoder to support continuous availability

The BC Address Geocoder's current system architecture is tied to Ministry infrastructure which includes both a fixed hardware platform and a private cloud platform running OpenShift. While the current architecture supports georedundancy and fixed horizontal scaling (multiple nodes), it doesn't support autoscaling.

We need a new, cloud-based, non-proprietary architecture that can do the following:

  1. Autodeploy a set of geocoder nodes/pods in a rolling manner to avoid API service interruption.

  2. Autoscale to handle peak loads gracefully and efficiently.

  3. Be deployed to both the gov't cloud platform and a public cloud that supports Kubernetes.

  4. Simplify geocoder pod deployment by replacing Apache Cassandra with per-pod, file-based storage of geocoder configuration data. The configmap capability of Kurbenetes may come in handy here.

  5. Be compatible with any new cloud platform that the Ministry is planning to roll out this year.

Autoscaling automates the deployment and release of compute nodes and storage to effectively shadow demand. This will require use of kubernetes to manage online geocoder deployment to private or public cloud infrastructure.

The need for autoscaling arose in November, 2019 when the online geocoder couldn't handle peak load during the BC Referendum on Proportional Representation. The geocoder became unavailable for twenty minutes while DataBC infrastructure staff manually deployed additional geocoder nodes.

The following tasks will be required:

  1. Design of a new architecture including all kubernetes and other scripts for build and deployment. [DataBC]

  2. Architecture review by geocoder developer

  3. Update online geocoder to support new architecture [geocoder developer]

  4. Deployment testing to ensure geocoder can be correctly deployed without API service interruption. [DataBC]

  5. Volume testing to ensure correct operation of autoscaling. [DataBC]

  6. Cloud cost analysis to determine impact of new architecture on operating costs. [DataBC]

Two addresses for the same parcel: 588 Cedar Pl, Duncan, BC and 588 Cedar Ave, Duncan, BC

@mraross commented on Fri May 05 2017

588 Cedar Pl and 588 Cedar Ave occupy the same parcel.


@mraross commented on Fri May 05 2017

geocoder_703 may 05 12 36


@mraross commented on Fri May 05 2017

In the screenshot above, 588 Cedar Pl and 588 Cedar Ave occupy the same parcel. They came from two different sources. One of them is wrong.


@cmhodgson commented on Fri May 05 2017

Can you think of a way we would programmatically tell the difference between this and for example your favorite, the Central Building, with valid addresses on 4 streets spanning 3 parcels?


@mraross commented on Fri May 05 2017

The civic numbers are identical in this case. In general, the cases are:

  1. Same civic numbers and suffixes, different something else.

    • worth looking into
  2. Different civic numbers and suffixes, same everything else.

    • probably ok
  3. Different everything.

    • worth looking into

@cmhodgson commented on Fri May 05 2017

We could potentially consider developing an address range QA process that put the points onto the parcel fabric and pointed out all of the potential issues.

Add support for hwy and freeway exits as intersections

Highway and freeway exits should be recognized as a type of intersection by the geocoder and route planner since they are part of the road network.
BC Wildfire Service Fire Reporting Centre needs to be able to find a location by hwy exit number.

There are a couple of ways to do this:

  1. Extend ITN data prep to create the appropriate exit street aliases.
  2. Extend the definition of Intersection to support exits.

Option 1 seems simpler at this point.

  1. In ITN data prep, create street aliases for off-ramps in the following form:

    Hwy <routeNo> and Exit <exitNo>
    

Here's an example:

        Hwy 1 and Exit 305

This form doesn't distinguish between the two directions of travel (e.g., NorthBound/ SouthBound) This would require an addition as in:

               Hwy <routeNo> and Exit <exitNo> <direction>

and here's another example:

               Hwy 1 and Exit 305 Eastbound

With this intersection format you can enter:

                Hwy 1 and exit 305

and the geocoder in autocompletion mode should return:

                Hwy 1 and Exit 305 Eastbound
                Hwy 1 and Exit 305 Westbound

Incorrect content-type returned by GET requests for .json content when the origin header is specified

@Darv72 commented on Wed May 29 2019

UPDATE
Confirmed that this bug is seen when Geocoder is deployed with Tomcat version 7.0.94. I setup a Geocoder node on our test server using Tomcat 7.0.94 as a base and ran some test calls, results:

curl -I -XGET -H "Origin: https://office.refractions.net" http://localhost:9606/addresses.json
HTTP/1.1 403 Forbidden
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Content-Type: text/plain
Content-Length: 0
Date: Wed, 29 May 2019 22:05:58 GMT

curl -I -XGET -H "Origin: https://office.refractions.net" http://localhost:9606/addresses.geojson
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Content-Type: application/vnd.geo+json;charset=UTF-8
Transfer-Encoding: chunked
Date: Wed, 29 May 2019 22:06:24 GMT

DETAILS
Currently observable via the geocoderdlv.api.gov.bc.ca and geocodertst.api.gov.bc.ca instances

This has been patched for now by configuring the nested proxy entry with a request transformer plugin which removes the origin header from the request. This results in a correct 200 message.

The underlying issue appears to be a problem with the application returning an incorrect content-type for .json request. These requests are being returned as plain text instead of .json. This result is the same if a GET request is made directly to the tomcat instance bypassing the proxy.

Examples:

GET request for .json format
curl -I -XGET -H "Origin: https://office.refractions.net" https://geocodertst.api.gov.bc.ca/addresses.json
HTTP/1.1 403 Forbidden
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
Connection: keep-alive
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Date: Wed, 29 May 2019 21:06:22 GMT
X-RateLimit-Limit-minute: 10000
X-RateLimit-Remaining-minute: 9999
X-Kong-Upstream-Latency: 3
X-Kong-Proxy-Latency: 14
Via: kong/1.1.1
Access-Control-Allow-Origin: https://office.refractions.net
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Origin,Authorization,Access-Control-Allow-Origin,Access-Control-Allow-Methods,apikey

GET request for geojson
curl -I -XGET -H "Origin: https://office.refractions.net" https://geocodertst.api.gov.bc.ca/addresses.geojson
HTTP/1.1 200 OK
Content-Type: application/vnd.geo+json;charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Date: Wed, 29 May 2019 21:07:48 GMT
X-RateLimit-Limit-minute: 10000
X-RateLimit-Remaining-minute: 9999
X-Kong-Upstream-Latency: 6
X-Kong-Proxy-Latency: 5
Via: kong/1.1.1
Access-Control-Allow-Origin: https://office.refractions.net
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Origin,Authorization,Access-Control-Allow-Origin,Access-Control-Allow-Methods,apikey

GET request for .json bypassing both proxies
curl -I -XGET -H "Origin: https://office.refractions.net" https://blahblahblah.pathfinder.gov.bc.ca/addresses.json
HTTP/1.1 403 Forbidden
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Content-Type: text/plain
Content-Length: 0
Date: Wed, 29 May 2019 21:09:21 GMT
Set-Cookie: e1fd7fc1519d827d2877f661da3a9231=9c7917238536aaa0c536c3b38daed306; path=/; HttpOnly

The above calls are successful for other file formats as well (for example https://geocodertst.api.gov.bc.ca/addresses.kml). The issue appears to be with the way Geocoder is handling requests for mime type .json. In Openshift where these instances are deployed they are using a different version of Tomcat than in prod (OCP image is using Tomcat 7.0.94, prod is using 7.0.81), I've reviewed the web.xml for Tomcat 7.0.94 and it has a proper mime-mapping for application/json.


@cmhodgson commented on Thu May 30 2019

I think the content-type of text/plain is only the content type of the error message coming back with the 403 forbidden response - it has no relation to the expected json content. The real question is why are we getting the 403 forbidden response. I have CORS setup to be wide open. If you don't pass the origin header to the bypassed version does it return a 200 OK?

Calling our dev server it works fine (it's running Tomcat 8.0.36):
curl -I -XGET -H "Origin: https://office.refractions.net" https://ssl.refractions.net/ols/pub/geocoder/addresses.json
HTTP/1.1 200 OK
Date: Thu, 30 May 2019 18:01:27 GMT
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Access-Control-Allow-Origin: https://office.refractions.net
Access-Control-Allow-Credentials: true
Content-Type: application/json
Transfer-Encoding: chunked

Add tools for occupant/non-civic address management

Currently, we offer data custodians and their data stewards no way of improving the occupants and points of interest they provide to location services. Most of these clients have no tools of their own other than a spreadsheet app and spreadsheet apps don't validate data against a user-defined schema! The DA group spends a substantial amount of time fixing up the addresses in these spreadsheets.

The current set of occupants and non-civic addresses are of low quality, incomplete, and limited to just a few categories of data. This reflects the primary purpose of the Geographic Sites Registry, the current intake tool, which is to load point data that is needed by an map app hosted by DataBC, into the BCGW.

With an Occupant/non-civic address editor, web service, and datastore, data custodians and stewards can easily improve their data. Occupant/non-civic address data integration should be included in any new automated geocoder data integration tool so that data managers can see their updates take effect within one day compared to the month or more it currently takes. Also the time and effort currently required to add new sources of occupant/non-civic address data is eliminated since the new data schema is fixed to those few metadata attributes actually required by the geocoder. The GSR allows clients to extend the schema to incorporate application-specific attributes.

The occupant/non-civic address web service can be implemented using GeoServer and OGC WFS protocol.

Current geocoder occupant and non-civic address data should be purged and only curated, high-quality address data should be populated, starting with data needed by MOTI to support OnRoute.

These tools will benefit all geocoder data suppliers particularly MOTI, whose OnRoute staff need to be able to add or update places of interest within a single day.

Add support for offline geocoding

@mraross commented on Wed Dec 20 2017

An offline geocoder doesn't need to be hooked up to the web to work. It could take the form of a containerized API or a standalone desktop Java app.

Requested by Elections BC for use onsite at polling stations during elections


@ll911 commented on Wed Jan 03 2018

docker package and allow to run locally or anywhere


@cmhodgson commented on Wed Jan 03 2018

Not sure that a docker package would be an acceptable solution for the clients in mind (on-site elections registration was one?) due to complexities of install and system requirements - however, if the geocoder is to run in its present form this is probably the most straightforward way to package up all of the components.

An alternative approach would be a java desktop app of some kind, a simpler install, potentially reduced requirements, but does require much more work to create it.

Make site ids immutable in data prep

Problem
Currently, site ids are mutable; they are different every time an address data integration is performed (e.g., currently monthly). If an application makes a geocoder request, hangs on to the site id for dayw or weeks, then uses the id in another geocoder api request such sites/{siteID}/subsites, the id will likely be invalid by then. This problem will become more apparent if and when we start more frequent data integrations.

Proposed solution
Copy fullAddress into siteID; it is unique and quite stable. For example, 1175 Douglas St, Victoria, BC doesn't change much from decade to decade.

Incorrect content-type returned by GET requests for .json content when the origin header is specified

@Darv72 commented on Wed May 29 2019

UPDATE
Confirmed that this bug is seen when Geocoder is deployed with Tomcat version 7.0.94. I setup a Geocoder node on our test server using Tomcat 7.0.94 as a base and ran some test calls, results:

curl -I -XGET -H "Origin: https://office.refractions.net" http://localhost:9606/addresses.json
HTTP/1.1 403 Forbidden
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Content-Type: text/plain
Content-Length: 0
Date: Wed, 29 May 2019 22:05:58 GMT

curl -I -XGET -H "Origin: https://office.refractions.net" http://localhost:9606/addresses.geojson
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Content-Type: application/vnd.geo+json;charset=UTF-8
Transfer-Encoding: chunked
Date: Wed, 29 May 2019 22:06:24 GMT

DETAILS
Currently observable via the geocoderdlv.api.gov.bc.ca and geocodertst.api.gov.bc.ca instances

This has been patched for now by configuring the nested proxy entry with a request transformer plugin which removes the origin header from the request. This results in a correct 200 message.

The underlying issue appears to be a problem with the application returning an incorrect content-type for .json request. These requests are being returned as plain text instead of .json. This result is the same if a GET request is made directly to the tomcat instance bypassing the proxy.

Examples:

GET request for .json format
curl -I -XGET -H "Origin: https://office.refractions.net" https://geocodertst.api.gov.bc.ca/addresses.json
HTTP/1.1 403 Forbidden
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
Connection: keep-alive
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Date: Wed, 29 May 2019 21:06:22 GMT
X-RateLimit-Limit-minute: 10000
X-RateLimit-Remaining-minute: 9999
X-Kong-Upstream-Latency: 3
X-Kong-Proxy-Latency: 14
Via: kong/1.1.1
Access-Control-Allow-Origin: https://office.refractions.net
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Origin,Authorization,Access-Control-Allow-Origin,Access-Control-Allow-Methods,apikey

GET request for geojson
curl -I -XGET -H "Origin: https://office.refractions.net" https://geocodertst.api.gov.bc.ca/addresses.geojson
HTTP/1.1 200 OK
Content-Type: application/vnd.geo+json;charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Date: Wed, 29 May 2019 21:07:48 GMT
X-RateLimit-Limit-minute: 10000
X-RateLimit-Remaining-minute: 9999
X-Kong-Upstream-Latency: 6
X-Kong-Proxy-Latency: 5
Via: kong/1.1.1
Access-Control-Allow-Origin: https://office.refractions.net
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Origin,Authorization,Access-Control-Allow-Origin,Access-Control-Allow-Methods,apikey

GET request for .json bypassing both proxies
curl -I -XGET -H "Origin: https://office.refractions.net" https://blahblahblah.pathfinder.gov.bc.ca/addresses.json
HTTP/1.1 403 Forbidden
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Content-Type: text/plain
Content-Length: 0
Date: Wed, 29 May 2019 21:09:21 GMT
Set-Cookie: e1fd7fc1519d827d2877f661da3a9231=9c7917238536aaa0c536c3b38daed306; path=/; HttpOnly

The above calls are successful for other file formats as well (for example https://geocodertst.api.gov.bc.ca/addresses.kml). The issue appears to be with the way Geocoder is handling requests for mime type .json. In Openshift where these instances are deployed they are using a different version of Tomcat than in prod (OCP image is using Tomcat 7.0.94, prod is using 7.0.81), I've reviewed the web.xml for Tomcat 7.0.94 and it has a proper mime-mapping for application/json.


@cmhodgson commented on Thu May 30 2019

I think the content-type of text/plain is only the content type of the error message coming back with the 403 forbidden response - it has no relation to the expected json content. The real question is why are we getting the 403 forbidden response. I have CORS setup to be wide open. If you don't pass the origin header to the bypassed version does it return a 200 OK?

Calling our dev server it works fine (it's running Tomcat 8.0.36):
curl -I -XGET -H "Origin: https://office.refractions.net" https://ssl.refractions.net/ols/pub/geocoder/addresses.json
HTTP/1.1 200 OK
Date: Thu, 30 May 2019 18:01:27 GMT
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Access-Control-Allow-Origin: https://office.refractions.net
Access-Control-Allow-Credentials: true
Content-Type: application/json
Transfer-Encoding: chunked

Add support for an address history file

@mraross commented on Wed Nov 15 2017

An address history file contains a table of mappings between a previous address its current address. For example, Abbotsford Middle School at 2222 Ware St, Abbotsford, BC was torn down and a new one built in its place with a new address of 33231 Bevan Ave, Abbotsford, BC. Another example is the house at 238 Leon Ave, Kelowna, BC paid to have its civic number changed to 240.

The address history file for these two examples would look like this:
oldAddress,newAddress
"2222 Ware St, Abbotsford,BC","33231 Bevan Ave, Abbotsford, BC"
"238 Leon Ave, Kelowna, BC","240 Leon Ave, Kelowna, BC"

An address can have more than one previous addresses but each oldAddress requires a separate row.

The address history file is to be used during geocoding like an alias address table, attempting to find a match to an old address and returning the new one.


@cmhodgson commented on Wed Nov 22 2017

Some concerns about this:

  • There would need to be some way to make it clear to the user what is going on (I guess a fault could explain it)
  • What if the old address has been reused?
  • What if the old address is no longer valid, eg. there is no road segment with an address range including the old civic number? And/or the street name has changed? The "old" addresses would need to be attached to some street segment somewhere in order to be found at all.

For renumbering cases, the linear approximation is going to be pretty close to the right address anyway, for case involving moving the access to a different street it seems awfully confusing and again, the old address will get you to the right place, you may just need to turn the corner to get to the access ... if you are looking for a business then the named-based lookup will give you the current address

In addresses resource, add a value property to a fault record

Given the following address:

1175 Douglas St, Saanich, BC

the geocoder will return the following address:

1175 Douglas St, Victoria, BC

and the following faults:

"faults":[{"element":"LOCALITY","fault":"isAlias","penalty":1}]

It would be more helpful to include the aliased locality as follows:

"faults":[{"element":"LOCALITY","fault":"isAlias","penalty":1,"value":"Saanich"}]

This enhancement affects the addresses and occupants resources.

needed by Elections BC to get aliased locality a user entered which they are obliged to record under the Elections Act. This approach could also extend to other faulty entered values such as postal code which could be of interest to all clients.

In geocoder, add support for error messages in json

Currently, all error responses are in html format.

Change error handling so that:

  1. A request for HTML or XHTML output returns an error in HTML format (as currently supported).
  2. A request for KML output returns an error in KML format (as currently supported).
  3. A request for json, geojson, CSV, or ShapeFile output returns an error in JSON format.

This is needed by JSON developers since JSON is the format they expect error objects to be in.

I propose we adopt the Google json error object as defined in:

https://google.github.io/styleguide/jsoncstyleguide.xml?showone=error#error
https://google.github.io/styleguide/jsoncstyleguide.xml?showone=error#error

So, for example, a request for a non-existent siteID and json output will return an HTTP Response code of 404 and the following JSON error object:

{
"apiVersion": "3.4",
"error": {
"code": 404,
"message": "Site Not Found",
}
}

Make logging configurable using environment variables

Previously, log4j properties files were selected base on the environment the package was built for. The environment-specific packaging has been removed in favor of environment variable based configuration which works better within docker containers on k8s. The logging configuration should be done in the same way, likely with 2 env. variables for location and level. Perhaps we can switch to logback or log4j2 for output, especially if it makes this configuration easier. Logging actions would remain based on slf4j as always.

Some site-name geocoder queries are very slow

The address:
Lake Cowichan Fire Hall And Ambulance Station -- 3 East North Shore Rd, Lake Cowichan, BC

Takes a very long time (24 sec on current prod, 84 sec on our devel server) to return a result.

It seems to generate an excessive number of parse derivations (375,877) which then become 847,820 matches, which is reduced to 34,730 after removing duplicates.

There are other examples of slow site-name based queries but this is the worst one found so far.

needed by all clients improve worst case performance and user experience

Create a process to salvage Hwy Hwy and Highway Hwy cases in AddressBC data

@gleeming commented on Mon Jun 12 2017

Until these double highway names are fixed in the source data, we can recover the probable highway number based on address point proximity to ITN highway route segments. An example is "840 HIGHWAY HWY Kaledon BC" which is close to a Hwy 3A feature in ITN and could be renamed to 840 HWY 3A.

It may be easiest to manually create a static route layer to act as the spatial match source. Highways don't change much so this should suffice. The alternative is to set up a dynamic link to the current ITN data, but this would add a lot of likely unnecessary processing to the FME batch prep script. Once the preferred approach has been discussed and agreed upon, an estimate for completing this work can be made.

Needed by all clients to have as large a reference address list as possible in the geocoder.

Improve handling of addresses outside the province

Min of Advanced Education has out of province addresses that are being mangled by the current geocoder. For example, when the geocoder is faced with an address that is outside BC, it tries to interpret it as an address somewhere within BC to sometimes hilarious effect. For example:

13 oakwood ave toronto on

becomes:

Premier, BC

Ministry of Advanced Education would be happy if an out of province address returned a fullAddress of ON, CA with a matchPrecision of Province. If an input address is outside of Canada, fullAddress should return the ISO Alpha-2 country code and a matchPrecision of Country.

Here is one approach to the problem:

Add a global populated places table to the current geocoder, load it from geonames.org, and add support for a parameter that indicates address may be located outside of BC. Here are some details:

scopeGlobal is a new parameter that if true, indicates addressString might be located outside the jurisdiction of the geocoder (e.g., another province, another country).

if scopeGlobal=false (the default), the geocoder assumes address is located within the geocoder's jurisdiction (e.g., BC). This is the current geocoder's behavior.

if scopeGlobal=true, the geocoder will check if the input address is located within a province other than BC or a country other than Canada. If the input address is found outside of BC but within Canada, return a fullAddress of the ISO subCountry code (e.g., ON) plus the ISO Alpha2 Country code (e.g., CA) and a matchPrecision of Province. If input address is found outside of Canada, return a fullAddress of the ISO country code and a matchPrecision of Country. Also raise an address.outsideJurisdiction fault with a penalty of 1 and a match precision of Province or Country.

We could also set lat/lon to province or country point. This will require province and country location tables.

We could only recognize ISO Country and sub-country codes and rely on abbreviation mappings to handle common country names such as Canada, Japan, South Korea, China, United States of America

Prepare geocoder for release under the Apache 2.0 open source license

  1. Perform code provenance
  2. Add appropriate license info to all source code
  3. Create ols-geocoder repo and load all source code and docs
  4. Create test road and address datasets for use in verifying correct deployment
  5. Write readme.md including build and deployment instructions. Include instructions on how to fork and modify the Location Services in Action app to work with third-party deployments of geocoder/route planner.

Incorrect content-type returned by GET requests for .json content when the origin header is specified

@Darv72 commented on Wed May 29 2019

UPDATE
Confirmed that this bug is seen when Geocoder is deployed with Tomcat version 7.0.94. I setup a Geocoder node on our test server using Tomcat 7.0.94 as a base and ran some test calls, results:

curl -I -XGET -H "Origin: https://office.refractions.net" http://localhost:9606/addresses.json
HTTP/1.1 403 Forbidden
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Content-Type: text/plain
Content-Length: 0
Date: Wed, 29 May 2019 22:05:58 GMT

curl -I -XGET -H "Origin: https://office.refractions.net" http://localhost:9606/addresses.geojson
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Content-Type: application/vnd.geo+json;charset=UTF-8
Transfer-Encoding: chunked
Date: Wed, 29 May 2019 22:06:24 GMT

DETAILS
Currently observable via the geocoderdlv.api.gov.bc.ca and geocodertst.api.gov.bc.ca instances

This has been patched for now by configuring the nested proxy entry with a request transformer plugin which removes the origin header from the request. This results in a correct 200 message.

The underlying issue appears to be a problem with the application returning an incorrect content-type for .json request. These requests are being returned as plain text instead of .json. This result is the same if a GET request is made directly to the tomcat instance bypassing the proxy.

Examples:

GET request for .json format
curl -I -XGET -H "Origin: https://office.refractions.net" https://geocodertst.api.gov.bc.ca/addresses.json
HTTP/1.1 403 Forbidden
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
Connection: keep-alive
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Date: Wed, 29 May 2019 21:06:22 GMT
X-RateLimit-Limit-minute: 10000
X-RateLimit-Remaining-minute: 9999
X-Kong-Upstream-Latency: 3
X-Kong-Proxy-Latency: 14
Via: kong/1.1.1
Access-Control-Allow-Origin: https://office.refractions.net
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Origin,Authorization,Access-Control-Allow-Origin,Access-Control-Allow-Methods,apikey

GET request for geojson
curl -I -XGET -H "Origin: https://office.refractions.net" https://geocodertst.api.gov.bc.ca/addresses.geojson
HTTP/1.1 200 OK
Content-Type: application/vnd.geo+json;charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Date: Wed, 29 May 2019 21:07:48 GMT
X-RateLimit-Limit-minute: 10000
X-RateLimit-Remaining-minute: 9999
X-Kong-Upstream-Latency: 6
X-Kong-Proxy-Latency: 5
Via: kong/1.1.1
Access-Control-Allow-Origin: https://office.refractions.net
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Origin,Authorization,Access-Control-Allow-Origin,Access-Control-Allow-Methods,apikey

GET request for .json bypassing both proxies
curl -I -XGET -H "Origin: https://office.refractions.net" https://blahblahblah.pathfinder.gov.bc.ca/addresses.json
HTTP/1.1 403 Forbidden
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Content-Type: text/plain
Content-Length: 0
Date: Wed, 29 May 2019 21:09:21 GMT
Set-Cookie: e1fd7fc1519d827d2877f661da3a9231=9c7917238536aaa0c536c3b38daed306; path=/; HttpOnly

The above calls are successful for other file formats as well (for example https://geocodertst.api.gov.bc.ca/addresses.kml). The issue appears to be with the way Geocoder is handling requests for mime type .json. In Openshift where these instances are deployed they are using a different version of Tomcat than in prod (OCP image is using Tomcat 7.0.94, prod is using 7.0.81), I've reviewed the web.xml for Tomcat 7.0.94 and it has a proper mime-mapping for application/json.


@cmhodgson commented on Thu May 30 2019

I think the content-type of text/plain is only the content type of the error message coming back with the 403 forbidden response - it has no relation to the expected json content. The real question is why are we getting the 403 forbidden response. I have CORS setup to be wide open. If you don't pass the origin header to the bypassed version does it return a 200 OK?

Calling our dev server it works fine (it's running Tomcat 8.0.36):
curl -I -XGET -H "Origin: https://office.refractions.net" https://ssl.refractions.net/ols/pub/geocoder/addresses.json
HTTP/1.1 200 OK
Date: Thu, 30 May 2019 18:01:27 GMT
Server: Apache-Coyote/1.1
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY
Content-Security-Policy: script-src 'self' https://code.jquery.com https://unipear.api.gov.bc.ca 'unsafe-inline' 'unsafe-eval'
Access-Control-Allow-Origin: https://office.refractions.net
Access-Control-Allow-Credentials: true
Content-Type: application/json
Transfer-Encoding: chunked

In address data prep, make site/address ids immutable

@mraross commented on Mon Jun 03 2019

Problem
Currently, site ids are mutable; they are different every time an address data integration is performed (e.g., currently monthly). If an application makes a geocoder request, hangs on to the site id for dayw or weeks, then uses the id in another geocoder api request such sites/{siteID}/subsites, the id will likely be invalid by then. This problem will become more apparent if and when we start more frequent data integrations.

Proposed solution
Make site ids immutable. The simplest way to do this is to keep a permanent table of site ids by fullAddress. Each data integration, look up a fullAddress to see if there is already an id for it. If no id is found, create a new id and add a entry to the table for next time. Can also keep track of the date the site id was created and the last time it was used in a new data integration.

in Geocoder, add a sites/withinParcel resource that returns all sites within a given parcel

@mraross commented on Tue Mar 21 2017

Reverse geocoding is an ill-defined operation. Sometimes you're driving along a street at night and want to know the address of the house on the right. For that, the nearest accessPoint will work fine. Sometimes you have a list of rooftop points and want to know the associated addresses. For this, nearest parcelPoint isn't always going to work. For example, a building may be near the corner of a parcel and the nearest parcelPoint is in the neighbouring parcel. For this case, its better to find the parcel that contains the rooftop point then find the site or sites associated with that parcel. sites/withinParcel supports this second case. The BC Ministry of Agriculture has a need for finding the address of a farm based on some point in a parcel.

sites/withinParcel takes a parcel id (aka pid) and returns all the sites within the parcel associated with that pid. ParcelMapBC will be used to source the needed parcel boundaries and pids.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.