Giter Club home page Giter Club logo

placeholder's Introduction

A modular, open-source search engine for our world.

Pelias is a geocoder powered completely by open data, available freely to everyone.

Local Installation · Cloud Webservice · Documentation · Community Chat

What is Pelias?
Pelias is a search engine for places worldwide, powered by open data. It turns addresses and place names into geographic coordinates, and turns geographic coordinates into places and addresses. With Pelias, you’re able to turn your users’ place searches into actionable geodata and transform your geodata into real places.

We think open data, open source, and open strategy win over proprietary solutions at any part of the stack and we want to ensure the services we offer are in line with that vision. We believe that an open geocoder improves over the long-term only if the community can incorporate truly representative local knowledge.

Pelias coarse geocoder

This repository provides all the code & geographic data you'll need to run your own coarse geocoder.

Read our An (almost) one line coarse geocoder with Docker blog post for a quick start guide and check out our demo.

This service is intended to be run as part of the Pelias Gecoder but can just as easily be run independently as it has no external dependencies.

Natural language parser for geographic text

The engine takes unstructured input text, such as 'Neutral Bay North Sydney New South Wales' and attempts to deduce the geographic area the user is referring to.

Human beings (familiar with Australian geography) are able to quickly scan the text and establish that there 3 distinct token groups: 'Neutral Bay', 'North Sydney' & 'New South Wales'.

The engine uses a similar technique to our brains, scanning across the text, cycling through a dictionary of learned terms and then trying to establish logical token groups.

Once token groups have been established, a reductive algorithm is used to ensure that the token groups are logical in a geographic context. We don't want to return New York City for a term such as 'nyc france', so we need to only return things called 'nyc' inside places called 'france'.

The engine starts from the rightmost group, and works to the left, ensuring token groups represent geographic entities contained within those which came before. This process is repeated until it either runs out of groups, or would return 0 results.

The best estimation is then returned, either as a set of integers representing the ids of those regions, or as a JSON structure which also contains additional information such as population counts etc.

The data is sourced from the whosonfirst project, this project also includes different language translations of place names.

Placeholder supports searching on and retrieving tokens in different languages and also offers support for synonyms and abbreviations.

The engine includes a rudimentary language detection algorithm which attempts to detect right-to-left languages and languages which write their addresses in major-to-minor format. It will then reverse the tokens to re-order them in to minor-to-major ordering.


Requirements

Placeholder requires Node.js and SQLite

See Pelias software requirements for required and recommended versions.

Install

$ git clone [email protected]:pelias/placeholder.git && cd placeholder
$ npm install

Download the required database files

Data hosting is provided by Geocode Earth. Other Pelias related downloads are available at https://geocode.earth/data.

$ mkdir data
$ curl -s https://data.geocode.earth/placeholder/store.sqlite3.gz | gunzip > data/store.sqlite3;

Confirm the build was successful

$ npm test
$ npm run cli -- san fran

> [email protected] cli
> node cmd/cli.js "san" "fran"

san fran

took: 3ms
 - 85922583	locality 	San Francisco

Run server

$ PORT=6100 npm start;

Configuration via Environment Variables

The service supports additional environment variables that affect its operation:

Environment Variable Default Description
HOST undefined The network address that the placeholder service will bind to. Defaults to whatever the current Node.js default is, which is currently to listen on 0.0.0.0 (all interfaces). See the Node.js Net documentation for more information.
PORT 3000 The TCP port that the placeholder service will use for incoming network connections
PLACEHOLDER_DATA ../data/ Path to the directory where the placeholder service will find the store.sqlite3 database file.

Open browser

the server should now be running and you should be able to access the http API:

http://localhost:6100/

try the following paths:

/demo
/parser/search?text=london
/parser/findbyid?ids=101748479
/parser/query?text=london
/parser/tokenize?text=sydney new south wales

Changing languages

the /parser/search endpoint accepts a ?lang=xxx property which can be used to vary the language of data returned.

for example, the following urls will return strings in Japanese / Russian where available:

/parser/search?text=germany&lang=jpn
/parser/search?text=germany&lang=rus

documents returned by /parser/search contain a boolean property named languageDefaulted which indicates if the service was able to find a translation in the language you request (false) or whether it returned the default language (true).

The /parser/findbyid endpoint also accepts a ?lang=xxx property which will return the selected lang if the translation exists and all translations otherwise.

for example, the following url will return strings in French / Korean where available:

/parser/findbyid?ids=85633147,102191581,85862899&lang=fra
/parser/findbyid?ids=85633147,102191581,85862899&lang=kor

the demo is also able to serve responses in different languages by providing the language code in the URL anchor:

/demo#jpn
/demo#chi
/demo#eng
/demo#fra
... etc.

Filtering by placetype

the /parser/search endpoint accepts a ?placetype=xxx parameter which can be used to control the placetype of records which are returned.

the API does not provide any performance benefits, it is simply a convenience API to filter by a whitelist.

you may specify multiple placetypes using a comma to separate them, such as ?placetype=xxx,yyy, these are matched as OR conditions. eg: (xxx OR yyy)

for example:

the query search?text=luxemburg will return results for the country, region, locality etc.

you can use the placetype filter to control which records are returned:

# all matching results
search?text=luxemburg

# only return matching country records
search?text=luxemburg&placetype=country

# return matching country or region records
search?text=luxemburg&placetype=country,region

Live mode (BETA)

the /parser/search endpoint accepts a ?mode=live parameter pair which can be used to enable an autocomplete-style API.

in this mode the final token of each input text is considered as 'incomplete', meaning that the user has potentially only typed part of a token.

this mode is currently in BETA, the interface and behaviour may change over time.

Configuring the rtree threshold

the default matching strategy uses the lineage table to ensure that token pairs represent a valid child->parent relationship. this ensures that queries like 'London France' do not match, because there is no entry in the lineage table linking those two places together.

in some cases it's preferable to fall back to a matching strategy which considers geographically nearby places with a matching name, even if that relationship does not explicitly exist in the lineage table.

for example, 'Basel France' will return 'Basel Switzerland'. this is useful for handling user input errors and errors and omissions from the lineage table.

in the example above, 'Basel France' only matches because the bounding box of 'Basel' overlaps the bounding box of 'France' and no other valid entry for 'Basel France' exists.

the definition of what is 'nearby' is configurable, the bbox for the minor term (left token) is expanded by a threshold (the threshold is added or subtracted to each of the bbox vertices).

by default the threshold is set as 0.2 (degrees), any float value between 0 and 1 may be specified via the enviornment variable RTREE_THRESHOLD.

a setting of less than 0 will disable the rtree functionality completely. disabling the rtree will result in nearby queries such as 'Basel France' returning 'France' instead of 'Basel Switzerland'.


Run the interactive shell

$ npm run repl

> [email protected] repl
> node cmd/repl.js

placeholder >

try the following commands:

placeholder > london on
 - 101735809	locality 	London

placeholder > search london on
 - 101735809	locality 	London

placeholder > tokenize sydney new south wales
 [ [ 'sydney', 'new south wales' ] ]

placeholder > token kelburn
 [ 1729339019 ]

placeholder > id 1729339019
 { name: 'Kelburn',
   placetype: 'neighbourhood',
   lineage:
    { continent_id: 102191583,
      country_id: 85633345,
      county_id: 102079339,
      locality_id: 101915529,
      neighbourhood_id: 1729339019,
      region_id: 85687233 },
   names: { eng: [ 'Kelburn' ] } }

Configuration for pelias API

While Placeholder can be used as a stand-alone application or included with other geographic software / search engines, it is designed for the Pelias geocoder.

To connect Placeholder service to the Pelias API, configure the pelias config file with the port that placeholder is running on.


Tests

run the test suite

$ npm test

Run the functional cases

there are more exhaustive test cases included in test/cases/.

to run all the test cases:

$ npm run funcs

Generate a ~500,000 line test file

this command requires the data/wof.extract file mentioned below in the 'building the database' section.

$ npm run gentests

once complete you can find the generated test cases in test/cases/generated.txt.


Docker

Build the service image

$ docker-compose build

Run the service in the background

$ docker-compose up -d

Building the database

Prerequisites

  • jq 1.5+ must be installed
    • on ubuntu: sudo apt-get install jq
    • on mac: brew install jq
  • Who's on First data download

Steps

the database is created from geographic data sourced from the whosonfirst project.

the whosonfirst project is distributed as geojson files, so in order to speed up development we first extract the relevant data in to a file: data/wof.extract.

the following command will iterate over all the geojson files under the WOF_DIR path, extracting the relevant properties in to the file data/wof.extract.

this process can take 30-60 minutes to run and consumes ~350MB of disk space, you will only need to run this command once, or when your local whosonfirst-data files are updated.

$ WOF_DIR=/data/whosonfirst-data/data npm run extract

now you can rebuild the data/store.json file with the following command:

this should take 2-3 minutes to run:

$ npm run build

Using the Docker image

Rebuild the image

you can rebuild the image on any system with the following command:

$ docker build -t pelias/placeholder .

Download pre-built image

Up to date Docker images are built and automatically pushed to Docker Hub from our continuous integration pipeline

You can pull the latest stable image with

$ docker pull pelias/placeholder

Download custom image tags

We publish each commit and the latest of each branch to separate tags

A list of all available tags to download can be found at https://hub.docker.com/r/pelias/placeholder/tags/

placeholder's People

Contributors

dianashk avatar greenkeeper[bot] avatar joxit avatar loadit1 avatar missinglink avatar orangejulius avatar singingwolfboy avatar tigerlily-he avatar trescube avatar vicchi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

placeholder's Issues

An in-range update of semantic-release is breaking the build 🚨

Version 15.5.3 of semantic-release was just published.

Branch Build failing 🚨
Dependency [semantic-release](https://github.com/semantic-release/semantic-release)
Current Version 15.5.2
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

semantic-release is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci Your tests passed on CircleCI! Details
  • continuous-integration/travis-ci/push The Travis CI build could not complete due to an error Details

Release Notes v15.5.3

15.5.3 (2018-06-15)

Bug Fixes

  • package: update p-locate to version 3.0.0 (0ab0426)
Commits

The new version differs by 3 commits.

  • 0ab0426 fix(package): update p-locate to version 3.0.0
  • f9d9144 docs: Add a troubleshooting section about squashed commits
  • 11cef46 chore(package): update sinon to version 6.0.0

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

An in-range update of semantic-release is breaking the build 🚨

Version 15.6.1 of semantic-release was just published.

Branch Build failing 🚨
Dependency semantic-release
Current Version 15.6.0
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

semantic-release is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci Your tests passed on CircleCI! Details
  • continuous-integration/travis-ci/push The Travis CI build could not complete due to an error Details

Release Notes v15.6.1

15.6.1 (2018-06-26)

Bug Fixes

  • package: update yargs to version 12.0.0 (d4f68a5)
Commits

The new version differs by 1 commits.

  • d4f68a5 fix(package): update yargs to version 12.0.0

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

An in-range update of lodash is breaking the build 🚨

Version 4.17.5 of lodash was just published.

Branch Build failing 🚨
Dependency lodash
Current Version 4.17.4
Type dependency

This version is covered by your current version range and after updating it in your project the build failed.

lodash is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • ci/circleci Your tests passed on CircleCI! Details
  • continuous-integration/travis-ci/push The Travis CI build failed Details

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Missing data for "Caribbean Netherlands"

"id": 136251281,
"name": "Caribbean Netherlands",
"placetype": "dependency",

There is no short name/abbr present in the DB even though the data is present in WhosOnFirst.

Export skips files

While working on the docker-compose setup for Pelias, I came across this odd behavior and was able to confirm it outside of docker as well.

Here's a script I've used to reproduce it:

#!/bin/bash

mkdir -p ./test_build && cd ./test_build

git clone https://github.com/pelias/whosonfirst.git ./whosonfirst && cd ./whosonfirst;

git checkout filtered-download

npm install

echo \
'{
  "imports":{
    "whosonfirst": {
      "datapath": "../data/whosonfirst",
      "importVenues": false,
      "importPostalcodes": true,
      "importPlace": "101715829",
      "api_key": "mapzen-fepXwQF"
    }
  }
}' > pelias.json

PELIAS_CONFIG=pelias.json npm run download

cd ..

git clone https://github.com/pelias/placeholder.git ./placeholder && cd ./placeholder;

npm install

mkdir -p ./data

WOF_DIR=../data/whosonfirst npm run extract

npm run build

echo "\033[31mCounts don't match up: \033[m"

find ../data/whosonfirst/data/ -name "*.geojson" | wc -l | awk '{print $1" wof records available"}'

wc -l ./data/wof.extract | awk '{print $1" lines in export"}'

in the end I'm getting the following counts consistently:

Counts don't match up: 
140 wof records available
33 lines in export

I've confirmed that the skipped WOF records are valid geojson files, so it's unclear why they are being skipped. Unfortunately, I couldn't make much sense of the bash kung-fu happening in wof_extract.sh.

Add health check endpoint

It would be great if all the Pelias services had a health check endpoint. Currently Placeholder does not.

The API uses /status, so it would be great if Placeholder did as well for consistency.

An in-range update of better-sqlite3 is breaking the build 🚨

Version 4.1.0 of better-sqlite3 was just published.

Branch Build failing 🚨
Dependency better-sqlite3
Current Version 4.0.3
Type dependency

This version is covered by your current version range and after updating it in your project the build failed.

better-sqlite3 is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • ci/circleci Your tests passed on CircleCI! Details
  • continuous-integration/travis-ci/push The Travis CI build failed Details

Commits

The new version differs by 11 commits.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

return unknown/leftover tokens

It's becoming increasingly apparent that the Pelias API will have to call both libpostal and placeholder to come up with the correct answer. For example, libpostal parsed "Fort Hood, TX" as:

> Fort Hood, TX

Result:

{
  "road": "fort",
  "city": "hood",
  "state": "tx"
}

There's no reasonable non-hacky way to correct for this so the idea is to call both placeholder and libpostal for inputs, then figure out an answer from both responses.

For example, for the input 30 W 26th St, New York, NY, placeholder throws away the 30 W 26th St. For the above strategy to work, the API would need to know which tokens are unknown. In the case of Fort Hood, TX, if the API that placeholder had no leftover tokens then it could reasonably assured that the input was only admin data and could disregard the libpostal input (which is incorrect in this case).

Using postal codes for better match

Is there a reason why placeholder does not use postal codes for lookups? In some ad-hoc tests I did, Placeholder could not find correct places due to insufficient address / misspellings etc.

I tried looking up the postal code (which was available) in Geonames post code data, found a match, then used location names from Geonames post code data in Placeholder, and then Placeholder could correctly identify the place.

I don't have much experience with this, so don't know all the intricacies. I have read that post codes can be a mess. But if an address has a country and postcode, we can pretty much find the exact location just with that information - at least that's why I am thinking..

What are the problems with using post codes then?

synonymous grouping

It's possible for the analysis engine to produce multiple potential groupings, such as:

Le Cros-d’Utelle, France
http://parser.wiz.co.nz/parser/tokenize?text=Le+Cros-d%E2%80%99Utelle%2C+France

[
  [
    "france"
  ],
  [
    "le cros",
    "d",
    "utelle",
    "france"
  ]
]

However this results in returning the localadmin (404391265) and also the country of France (85633147).

France country 85633147
└ Europe continent 102191581

Utelle localadmin 404391265
└ Lantosque county 102069321
   └ Nicemacro county 404227807
      └ Alpes-Maritimes region 85683323
         └ Provence-Alpes-Côte D'Azur macroregion 404227445
            └ France FRA country 85633147
               └ Europe continent 102191581

Considering that the localadmin is inside France, returning France here is redundant and incorrect.

I think generally, we can remove less granular results where a more granular child is being returned

socket hangup on 1 query

When the API calls placeholder for this one query (repeatable):

9丁目, japan

it gets the following error:

2017-04-19T16:33:49.012Z - error: [placeholder] http://localhost:3000/parser/search?text=9丁目, Japan&lang=eng: {"code":"ECONNRESET"}
{ [Error: socket hang up] code: 'ECONNRESET' }

I'm unable to duplicate the error for similar neighbourhood inputs like "1丁目, Japan".

An in-range update of jshint is breaking the build 🚨

The devDependency jshint was updated from 2.9.6 to 2.9.7.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

jshint is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci: Your tests passed on CircleCI! (Details).
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 5 commits.

  • 01bf8c6 v2.9.7
  • 71f2f1f [[TEST]] Assert CLI behavior: stdin w/o filename
  • 3a8ef8b Added Spotify to companies who use JSHint (#3333)
  • 80c7fda [[CHORE]] Relocate development dependency
  • f70250b [[CHORE]] Relocate development dependencies

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

An in-range update of semantic-release is breaking the build 🚨

Version 15.6.3 of semantic-release was just published.

Branch Build failing 🚨
Dependency semantic-release
Current Version 15.6.2
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

semantic-release is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci Your tests passed on CircleCI! Details
  • continuous-integration/travis-ci/push The Travis CI build could not complete due to an error Details

Release Notes v15.6.3

15.6.3 (2018-07-02)

Bug Fixes

  • fetch all tags even if the repo is not shallow (45eee4a)
Commits

The new version differs by 2 commits.

  • 45eee4a fix: fetch all tags even if the repo is not shallow
  • 2d3a5e5 test: harmonize git-utils functions name

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

documentation for PLACEHOLDER_DATA

a PR was merged recently which allows the use of an environment variable PLACEHOLDER_DATA to override the location of the data directories.

this was not documented anywhere, this ticket is to add some docs in the readme, explaining how the env var can be used.

An in-range update of semantic-release is breaking the build 🚨

Version 15.7.0 of semantic-release was just published.

Branch Build failing 🚨
Dependency semantic-release
Current Version 15.6.6
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

semantic-release is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci Your tests passed on CircleCI! Details
  • continuous-integration/travis-ci/push The Travis CI build could not complete due to an error Details

Release Notes v15.7.0

15.7.0 (2018-07-10)

Bug Fixes

  • do not set path to plugin config defined as a Function or an Array (f93eeb7)

Features

  • allow to define multiple generateNotes plugins (5989989)
Commits

The new version differs by 12 commits.

  • 24ce560 refactor: build plugin pipeline parameters at initialization
  • eb26254 refactor: use Object.entries rather than Object.keys
  • 50061bb refactor: remove unnecessary object destructuring
  • 5989989 feat: allow to define multiple generateNotes plugins
  • 576eb60 refactor: simplify plugin validation
  • f7f4aab refactor: use the lastInput arg to compute the prepare pipeline next input
  • 12de628 refactor: fix incorrect comments in lib/plugins/pipeline.js
  • d303286 docs: fix default value for analyzeCommits plugin
  • ed9c456 refactor: always return an Array of results/errors from a plugin pipeline
  • cac4882 docs: clarify verifyRelease plugin description
  • 09348f1 style: disable max-params warning for lib/plugins/normalize.js
  • f93eeb7 fix: do not set path to plugin config defined as a Function or an Array

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Version 10 of node.js has been released

Version 10 of Node.js (code name Dubnium) has been released! 🎊

To see what happens to your code in Node.js 10, Greenkeeper has created a branch with the following changes:

  • Added the new Node.js version to your .travis.yml
  • The new Node.js version is in-range for the engines in 1 of your package.json files, so that was left alone

If you’re interested in upgrading this repo to Node.js 10, you can open a PR with these changes. Please note that this issue is just intended as a friendly reminder and the PR as a possible starting point for getting your code running on Node.js 10.

More information on this issue

Greenkeeper has checked the engines key in any package.json file, the .nvmrc file, and the .travis.yml file, if present.

  • engines was only updated if it defined a single version, not a range.
  • .nvmrc was updated to Node.js 10
  • .travis.yml was only changed if there was a root-level node_js that didn’t already include Node.js 10, such as node or lts/*. In this case, the new version was appended to the list. We didn’t touch job or matrix configurations because these tend to be quite specific and complex, and it’s difficult to infer what the intentions were.

For many simpler .travis.yml configurations, this PR should suffice as-is, but depending on what you’re doing it may require additional work or may not be applicable at all. We’re also aware that you may have good reasons to not update to Node.js 10, which is why this was sent as an issue and not a pull request. Feel free to delete it without comment, I’m a humble robot and won’t feel rejected 🤖


FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

ECONNREFUSED using placeholder

I get an ECONNREFUSED error when trying to connect to the placeholder server from the pelias API. But I'm able to go to localhost:6100 and see the HTML demo.

> npm start

> [email protected] start /disk/homedirs/barronk-dua51929/local/pelias/api
> ./bin/start

2018-11-21T16:20:12.346Z - warn: [pip] pip service disabled
2018-11-21T16:20:12.356Z - info: [placeholder] using placeholder service at http://localhost:6100/
2018-11-21T16:20:12.356Z - info: [language] using language service at http://localhost:6100/
2018-11-21T16:20:12.357Z - warn: [interpolation] interpolation service disabled
2018-11-21T16:20:12.357Z - info: [libpostal] using libpostal service at http://localhost:6101/
2018-11-21T16:20:12.357Z - info: [libpostal] using libpostal service at http://localhost:6101/
pelias is now running on 10.200.0.55:3100
2018-11-21T16:31:34.346Z - debug: [api] [lang] 'en' via 'header'
2018-11-21T16:31:34.348Z - debug: [libpostal] libpostal: http://localhost:6101/
2018-11-21T16:31:34.386Z - debug: [placeholder] placeholder: http://localhost:6100/
2018-11-21T16:31:34.404Z - error: [placeholder] http://localhost:6100/ [do_not_track]: {"errno":"ECONNREFUSED","code":"ECONNREFUSED","syscall":"connect","address":"127.0.0.1","port":6100,"retries":0}
2018-11-21T16:31:34.416Z - warn: [api] unknown geocoding error string: connect ECONNREFUSED 127.0.0.1:6100
2018-11-21T16:31:34.675Z - info: [api] [IP removed] - - [21/Nov/2018:16:31:34 +0000] "GET /v1/search?text=%5Bremoved%5D HTTP/1.1" 400 604
2018-11-21T16:31:58.244Z - info: [api] [IP removed] - - [21/Nov/2018:16:31:58 +0000] "GET / HTTP/1.1" 301 62
2018-11-21T16:31:58.249Z - info: [api] [IP removed] - - [21/Nov/2018:16:31:58 +0000] "GET /v1 HTTP/1.1" 200 235
2018-11-21T16:31:59.394Z - info: [api] [IP removed] - - [21/Nov/2018:16:31:59 +0000] "GET /favicon.ico HTTP/1.1" 404 35

Starting placeholder, I see:

> PORT=6100 npm start
> [email protected] start /disk/homedirs/barronk-dua51929/local/pelias/placeholder
> ./cmd/server.sh

loading data
[master] using 8 cpus
[master] worker forked 8203
[master] worker forked 8204
[master] worker forked 8210
[master] worker forked 8220
[master] worker forked 8221
[master] worker forked 8227
[master] worker forked 8233
[master] worker forked 8238
loading data
loading data
loading data
loading data
[worker 8221] listening on age5:6100
[worker 8203] listening on age5:6100
loading data
[worker 8204] listening on age5:6100
[worker 8220] listening on age5:6100
[worker 8233] listening on age5:6100
loading data
loading data
loading data
[worker 8210] listening on age5:6100
[worker 8238] listening on age5:6100
[worker 8227] listening on age5:6100

An in-range update of tape is breaking the build 🚨

Version 4.9.0 of tape was just published.

Branch Build failing 🚨
Dependency tape
Current Version 4.8.0
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

tape is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci Your tests passed on CircleCI! Details
  • continuous-integration/travis-ci/push The Travis CI build failed Details

Commits

The new version differs by 27 commits.

  • ea6d91e v4.9.0
  • 6867840 [Deps] update object-inspect, resolve
  • 4919e40 [Tests] on node v9; use nvm install-latest-npm
  • f26375c Merge pull request #420 from inadarei/global-depth-env-var
  • 17276d7 [New] use process.env.NODE_TAPE_OBJECT_PRINT_DEPTH for the default object print depth.
  • 0e870c6 Merge pull request #408 from johnhenry/feature/on-failure
  • 00aa133 Add "onFinish" listener to test harness.
  • 0e68b2d [Dev Deps] update js-yaml
  • 10b7dcd [Fix] fix stack where actual is falsy
  • 13173a5 Merge pull request #402 from nhamer/stack_strip
  • f90e487 normalize path separators in stacks
  • b66f8f8 [Deps] update function-bind
  • cc69501 Merge pull request #387 from fongandrew/master
  • bf5a750 Handle spaces in path name for setting file, line no
  • 3c2087a Test name with spaces

There are 27 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

xargs interleaving bytes, resulting in invalid wof.extract

It seems that #134 introduced a bug where the output of xargs processes running in parallel can be output interleaved, resulting in invalid json when running wof_extract.sh.

I have been seeing a bunch of errors such as Unexpected token { in JSON at position 3657 in the logs, using jq I can confirm that the file is corrupt:

$ cat wof.extract | jq . >/dev/null
parse error: Expected separator between values at line 4187, column 9

An example of inteleaving can be found in the extract below near name:por_x_preferred:

"name:pam_x_preferred":["Seattle"],
"name:pap_x_preferred":["Seattle"],
"name:per_x_preferred":["سیاتل"],
"name:pms_x_preferred":["Seattle"],
"name:pnb_x_preferred":["سیاٹل"],
"name:pol_x_preferred":["Seattle"],
"name:por_x_preferred":[{"geom:area":0.000228,"geom:bbox":"-117.293484,47.678915,-117.26234,47.691216","geom:latitude":47.685772,"geom:longitude":-117.280507,"gn:population":1786,"iso:country":"US","lbl:latitude":47.685512,"lbl:longitude":-117.283914,"mz:is_current":1,"name:ara_x_preferred":["ميلوود"],"name:azb_x_preferred":["میلوود، واشینقتون"],"name:bul_x_preferred":["Милуд"],"name:cat_x_preferred":["Millwood"],"name:dut_x_preferred":["Millwood"],"name:eng_x_preferred":["Millwood"],"name:fas_x_preferred":["میلوود، واشینگتن"],"name:fre_x_preferred":["Millwood"],"name:ger_x_preferred":["Millwood"],"name:hat_x_preferred":["Millwood"],"name:hbs_x_preferred":["Millwood"],"name:hrv_x_preferred":["Millwood"],"name:ita_x_preferred":["Millwood"],"name:mlg_x_preferred":["Millwood"],"name:nan_x_preferred":["Millwood"],"name:nld_x_preferred":["Millwood"],"name:per_x_preferred":["میلوود، واشینگتن"],"name:pol_x_preferred":["Millwood"],"name:por_x_preferred":["Millwood"],"name:spa_x_preferred":["Millwood"],"name:srp_x_preferred":["Милвуд"],"name:unk_x_variant":["Woodward's"],"name:uzb_x_preferred":["Millwood"],"name:vol_x_preferred":["Millwood"],"qs:pop":0,"wof:hierarchy":[{"continent_id":102191575,"country_id":85633793,"county_id":102087555,"locality_id":101730019,"region_id":85688623}],"wof:id":101730019,"wof:name":"Millwood","wof:parent_id":102087555,"wof:placetype":"locality","wof:population":1786,"wof:superseded_by":[]}

cc/ @Joxit could you please confirm the issue? I'd like to find a fix for xargs, or if that's not possible we might need to revert to parallel because it has flags for this:

from the gnu parallel man page

--line-buffer
--lb
Buffer output on line basis. --group will keep the output together for a whole job. --ungroup allows output to mixup with half a line coming from one job and half a line coming from another job. --line-buffer fits between these two: GNU parallel will print a full line, but will allow for mixing lines of different jobs.

--line-buffer takes more CPU power than both --group and --ungroup, but can be much faster than --group if the CPU is not the limiting factor.

Normally --line-buffer does not buffer on disk, and can thus process an infinite amount of data, but it will buffer on disk when combined with: --keep-order, --results, --compress, and --files. This will make it as slow as --group and will limit output to the available disk space.

With --keep-order --line-buffer will output lines from the first job while it is running, then lines from the second job while that is running. It will buffer full lines, but jobs will not mix. Compare:

  parallel -j0 'echo {};sleep {};echo {}' ::: 1 3 2 4
  parallel -j0 --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
  parallel -j0 -k --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
See also: --group --ungroup

example of running stats queries against the data store

here's an example of running a SQL query to get statistical info from WOF admin records:

#!/bin/bash

sqlite3 'data/store.sqlite3' <<SQL
SELECT
  json_extract( json, '$.placetype' ) AS placetype,
  CAST( AVG( json_extract( json, '$.population' ) ) AS INT ) AS average_population
FROM docs
WHERE json_extract( json, '$.population' ) IS NOT NULL
GROUP BY placetype
ORDER BY average_population DESC;
SQL

running the command outputs average populations grouped by placetype:

continent|3812366000
disputed|234142442
country|33087442
region|3438324
borough|890580
dependency|522742
county|99639
locality|42603
localadmin|13400
macrohood|8744
neighbourhood|7927

An in-range update of tape is breaking the build 🚨

The devDependency tape was updated from 4.9.1 to 4.9.2.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

tape is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci: Your tests passed on CircleCI! (Details).
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 11 commits.

  • a1e8f7e v4.9.2
  • 4b9c951 [Dev Deps] update eslint, eclint
  • 9ced991 [Fix] notEqual and notDeepEqual show "expected" value on failure
  • 75c467e [Docs] Updated readme to make test, test.only and test.skip consistent.
  • 1225d01 Merge pull request #450 from axelpale/patch-1
  • f53e3f1 Clarify doesNotThrow parameters
  • 96de340 Adding tap-junit
  • b1df632 [readme] Change broken image to use web archive
  • 5f1c5a2 [Docs] cleanup from #439
  • 6c633d0 Merge pull request #439 from abelmokadem/master
  • 4337f58 Convert list of tap reporters to links

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Unicode problems for data creation?

When I played with the demo I noticed a strange & that got appended and some unicode problems. But as not all results have this problem, there might be something with data creation problems regarding unicode?

image

Also looking for locations with unicode like "zeißig" then it does not work. It should give this suburb.

Mention jq as required in the readme

The data extract script will fail with a nice mesage, but no mention of jq being required is currently in the readme. A heads up would probably be helpful

Missing "npx" dependency

On a stock Ubuntu 18.04 machine installing nodejs and npm running tests fails with an error about a missing npx command.

Installing npx from apt fixes the problem. I am not a Node person and I'm not really sure where this dependency should be listed so passing along FYI.

$> npm test
> [email protected] test /home/ubuntu/placeholder
> npm run units


> [email protected] units /home/ubuntu/placeholder
> ./cmd/units

./cmd/units: line 7: npx: command not found

npm ERR! Linux 4.15.0-1044-aws
npm ERR! argv "/usr/bin/node" "/usr/bin/npm" "run" "units"
npm ERR! node v8.10.0
npm ERR! npm  v3.5.2
npm ERR! file sh
npm ERR! code ELIFECYCLE
npm ERR! errno ENOENT
npm ERR! syscall spawn
npm ERR! [email protected] units: `./cmd/units`
npm ERR! spawn ENOENT
npm ERR! 
npm ERR! Failed at the [email protected] units script './cmd/units'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the pelias-placeholder package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     ./cmd/units
npm ERR! You can get information on how to open an issue for this project with:
npm ERR!     npm bugs pelias-placeholder
npm ERR! Or if that isn't available, you can get their info via:
npm ERR!     npm owner ls pelias-placeholder
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR!     /home/ubuntu/placeholder/npm-debug.log
npm ERR! Test failed.  See above for more details.

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper integration’s white list on Github. You'll find this list on your repo or organization’s settings page, under Installed GitHub Apps.

Service Timeout

After enabling the service on pelias.json and calling the API it throws a timeout error:

[placeholder] http://localhost:4100/parser/search?text=israel&lang=eng: {"timeout":250,"code":"ECONNABORTED","errno":"ETIME","retries":3}

An in-range update of split2 is breaking the build 🚨

The dependency split2 was updated from 3.0.0 to 3.1.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

split2 is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • ci/circleci: Your tests passed on CircleCI! (Details).
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Release Notes for v3.1.0
  • Add skipOverflow option #24
Commits

The new version differs by 6 commits.

  • a9b7da4 Bumped v3.1.0.
  • 25bac40 Added Node 11 to .travis.yml
  • 0bf7d69 Merge pull request #24 from IronSavior/skip_overflow
  • 618c73d Add skipOverflow option
  • 50c92f6 Merge pull request #20 from mcollina/greenkeeper/standard-12.0.0
  • 760a567 chore(package): update standard to version 12.0.0

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

An in-range update of require-dir is breaking the build 🚨

The dependency require-dir was updated from 1.1.0 to 1.2.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

require-dir is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • ci/circleci: Your tests passed on CircleCI! (Details).
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 5 commits.

  • 21600ad 1.2.0
  • 6370638 Merge pull request #60 from akatsukle/master
  • 868a07f [#59] Add an optional extensions parameter to use instead of extensions
  • 48b583a [#59] Add the optional extensions parameter to the documentation
  • 8b8744f [#59] Add tests for the optional extensions parameter

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Big response time for findbyid endpoint

Hey guys,
we are using placeholder service only for localisation, so that we are consuming only findbyid endpoint and we've noticed that it's slow for the moment (~30ms).
Using debugger I found that most problematic place is JSON.parse function.

screen shot 2018-05-28 at 11 32 02

The reason is structure of table docs. It contains 2 columns, id and json. Column json has a TEXT type and it contains huge (up to ~8.5 characters). These values then parse with JSON.parse in order to get some manipulations. Parsing is blocking synchronous operation and it takes some time for server to respond.

The most naive and natural way to make it faster is to decrease amount of json that should be parsed in these cases.
This could be done in several possible ways:

  1. more difficult one is to change DB scheme and disassemble containing document to columns.
  2. and easiest way is to use JSON type for this column instead. It allows to perform json_extract (and some others) operation on DB level, so that we need to parse/stringify much smaller strings and objects.
    Not sure about other endpoints (as I said we are not using them), but for localisation it could be done pretty easy.

SQL example (assuming that we want to get german names for several objects:

SELECT id, json_extract(json, '$.names.deu[0]') as name 
FROM docs 
WHERE id IN (102063941,102191581,85682555,85633111)

And final results of load test (I was using ab):

ab -n 1000 -c 100  "http://localhost:3002/parser/findbyid?ids=102063941,102191581,85682555,85633111,102064231,85682523,102063845,102063889,85687045,101807185,404473841,85633337,85682447,85633105,102064007,101875803,102064083,85688637,102081727,85927929,102191575,85633793"

Percentage of the requests served within a certain time (ms)
  50%    325
  66%    328
  75%    330
  80%    334
  90%    338
  95%    339
  98%    351
  99%    357
 100%    362 (longest request)
ab -n 1000 -c 100  "http://localhost:3002/parser/translations?ids=102063941,102191581,85682555,85633111,102064231,85682523,102063845,102063889,85687045,101807185,404473841,85633337,85682447,85633105,102064007,101875803,102064083,85688637,102081727,85927929,102191575,85633793&lang=eng"

Percentage of the requests served within a certain time (ms)
  50%     83
  66%     86
  75%     88
  80%     90
  90%    104
  95%    114
  98%    124
  99%    125
 100%    127 (longest request)

Disambiguation removal logic truncates remaining search text

Tokenization has a regex that removes disambiguation markers. This can be helpful, but it's currently truncating remaining text after the first occurrence of any disambiguation character.

So:
"Portland (Oregon) USA" becomes "portland"
"Borivali (West), Mumbai, India" becomes "borivali"
etc

Should it just remove the disambiguation part and leave the rest of the text as it is? (I understand that's going to be impossible where we don't have a "closing" disambiguation marker - e.g. just a simple "-")

Or is it just easier to remove the marker characters only and leave the disambiguation text as it is?

For example, change the regex to
input.replace(/[-֊־‐‑﹣\(\)\[\]]/g, ' ');

With this:
"Portland (Oregon) USA" becomes "portland oregon usa"
"Borivali (West), Mumbai, India" becomes "borivali west india"

Referring to this:
d7de4c9#diff-b1c9f1b1a4d867ea6fd37744bd1b38e5

An in-range update of semantic-release is breaking the build 🚨

Version 15.9.2 of semantic-release was just published.

Branch Build failing 🚨
Dependency semantic-release
Current Version 15.9.1
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

semantic-release is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci: Your tests passed on CircleCI! (Details).
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

Release Notes v15.9.2

15.9.2 (2018-07-30)

Bug Fixes

  • also hide sensitive info when loggin from cli.js (43d0646)
Commits

The new version differs by 1 commits.

  • 43d0646 fix: also hide sensitive info when loggin from cli.js

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Support non-standard apostrophes

When users are copy/pasting addresses from websites, the ' we all know and love is sometimes a left- or right-quote / (hex 2018/2019).

An in-range update of through2 is breaking the build 🚨

The dependency through2 was updated from 2.0.4 to 2.0.5.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

through2 is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • ci/circleci: Your tests passed on CircleCI! (Details).
  • continuous-integration/travis-ci/push: The Travis CI build failed (Details).

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

cast numeric strings to floats

in some cases, WOF encodes numeric fields as strings eg:

"geom:latitude": -41.072228,
"geom:longitude": 145.907314,
"lbl:latitude": "-41.0690002441",
"lbl:longitude": "145.895690918",

since the data is stored verbatim in the internal store it might be nicer to explicitly cast these strings to floats in prototype/wof.js:

  // --- cast float ---
  // note: sometimes numeric properties in WOF can be encoded as strings.

  if( 'string' === typeof doc.geom.area ){ doc.geom.area = parseFloat( doc.geom.area ); }
  if( 'string' === typeof doc.geom.lat ){ doc.geom.lat = parseFloat( doc.geom.lat ); }
  if( 'string' === typeof doc.geom.lon ){ doc.geom.lon = parseFloat( doc.geom.lon ); }

`findbyid` should accept and honor a `lang` parameter

Currently a request for Paris, France results in this findbyid request:

http://internal-pelias-dev-placeholder-498151958.us-east-1.elb.amazonaws.com/parser/findbyid?ids=102191581,85633147,102068177,101751119,404227749,404227465,85683497

The response is 18kb which is not overly burdensome from a network transfer perspective, but when deserializing could be prove cumbersome as most of the data will be thrown away as it returns all languages. It would very helpful if the findbyid request took a lang parameter that returns only the translations, if any, in that language.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.