Light

mapbox / osm-tiler Goto Github PK

View Code? Open in Web Editor NEW

7.0 78.0 6.0 32.79 MB

experimental osm pbf tiler for efficiently breaking planet files into small chunks

License: ISC License

Makefile 10.38% C++ 81.68% Shell 7.94%

banished

osm-tiler's Introduction

osm-tiler

osm-tiler is an .osm.pbf tiler for efficiently breaking OpenStreetMap planet files into smaller chunks. It is ideal for distributed processing of OpenStreetMap data, since it groups spatially related data into separate files that do not need to be loaded into memory at once.

Use

osm-tiler planet.osm.pbf -z 7

Build

osm-tiler uses https://github.com/mason to fetch dependencies to ensure a uniform development and build environment.

make osm-tiler

Test

The following command will download necessary sample files, rebuild the binary, and run osm-tiler against an OSM extract.

make test

Download

Downloads a small metro extract for testing osm-tiler.

make chs.osm.pbf

Clean

Deletes the osm-tiler binary.

make clean

osm-tiler's People

Contributors

Stargazers

Watchers

Forkers

testbigorg exlimit isabella232 harryat mapclone

osm-tiler's Issues

leveldb node-cache -> libosmium mmap node-cache

libosmium's node cache mechanism uses mmap to scale node caching dynamically to disk if there is not enough RAM on a machine. It also allows for some niceties like being able to call node_ref.location() from a node_ref parsed from a way as if you had the node itself in memory. We should use this.

@danpat linked to a great example from the libosmium source in chat.

cc @kcalloway @rclark

geojson => json

remove -g and --geojson flags
add -j and --json flags
document new json format in readme
json writer for nodes using RapidJSON
json writer for ways using RapidJSON
json writer for relations using RapidJSON

cc @rclark

what is osm-split?

osm-split is intended to be a C++ tool using libosmium for efficiently splitting an OSM planet file into tiled chunks of either line-delimitted geojson or osm.pbf files, similar to most metro-extract tools. This allows the single-threaded memory-hungry operations required for processing the planet to be run only once, allowing mapreduce tools to perform additional transformations in JavaScript.

osm-split will make a couple subjective decisions about the data that any processor could live with. For example, only OSM objects that can be joined to a node's lng/lat will be output. In the case of turn-restrictions, this would mean building an intermediate geometry from the from/via/to objects so that they can be tiled. Beyond these basic spatial operations necessary for indexing, osm-split would not perform any other filtering of tags, which allows a wide array of tile-reduce processors to use the data however they need to.

Why not use existing extract tools?

Most of the existing extract tools allow for arbitrary polygons to be used to define areas of interest. This presents memory and computation challenges that we can easily sidestep with our use case, which is restricted to tiles. Generating a unique list of tiles that an OSM object overlaps with is significantly cheaper than performing the same operation for an arbitrary shape like a city boundary. Also, having tight control over how fuzzy OSM objects like relations are handled should give us a bit more flexibility over general-purpose tools like minjur.

next steps:

get code compiling from scratch with a basic Makefile setup
write the tile indexing logic
output to geojson option
output to osm.pbf option
benchmark performance using leveldb as the node-cache
if leveldb is too slow, switch to libosmiums RAM-based node-cache

cc @rclark @camilleanne

Dynamically generate directories based on tile parameters

We need to create directories from c++ to map the structure of the quadtiles. Investigate options.

get code compiling from scratch with a basic Makefile setup

I believe dependencies are all header-only libraries at the moment. So far, we will need:

libosmium
rapidjson
leveldb
protozero

benchmark performance and minimum specs

How fast can we split a metro area?
How fast can we split a large-ish region?
How fast can we split the planet?
What are the minimum machine specs for each of the above?

The first pass will use leveldb, which should only be limited by disk space. If speed is insufficient using this method, we can use libosmium's in memory node-cache instead. If we use the in memory cache, processing will likely be faster, but require a very large machine.

cc @rclark

tile indexing

An OSM object should be included in any tile extract if any of its nodes are contained in that tile. We can vendor the tilebelt implementation of this tile labeling operation.

This approach will be extremely fast (~10M nodes per second in JavaScript and possibly much more in C++), and provide "good enough" tile indexing. Where it will fail is when an edge crosses over a particular tile, without any nodes being in that tile (like at a corner). I think we can overlook this for the types of data analysis we are concerned with. If we find that we realllly don't want to make the tradeoff, we can implement tile-cover in C++, although that would be a bigger lift.

An open question is what we should do with giant geometries like coastlines that span large numbers of tiles. I would be in favor of logic that says something like "if an object touches >100 unique tiles, bail and do not include it". Coastlines and other giant objects do not fit the TileReduce scaling model well, and are not needed for any types of analysis we need so far. If a particular processor does need these, it can handle these geometries on its own.

cc @rclark

tests

The current "test" simply runs osm-tiler on a small metro extract, and passes as long as it does not crash. Let's add:

unit tests - tile conversion, directory creation, required flag exceptions, etc
integration tests - verify that data gets tiled where it needs to be in the appropriate

When these are in place, let's also enable tests on travis.

cc @springmeyer

optional leveldb tile cache

For RAM constrained environments, let's add a flag for caching tile indices in leveldb.

cc @rclark

code linting

What should we use?

document performance shortcuts

We are intentionally building this tool to favor speed over completeness. We are also intentionally building it to work for the specific use cases presented by mapreducable network analysis. This means that we can and will take shortcuts that a more general purpose tool might go to great lengths to avoid. We should document the tradeoffs explicitly to set expectations.

Off the bat, we will need to document:

tile index methodology and gaps
large object omissions
relation omissions (or the flip side: which do we support?)

output to geojson

Output to line-delimitted geojson. The output would be structured like:

./output/{{X}}/{{Y}}/{{Z}}/nodes.json
./output/{{X}}/{{Y}}/{{Z}}/ways.json
./output/{{X}}/{{Y}}/{{Z}}/relations.json

output to osm.pbf

Similar to the geojson output option, but this would output a complete osm.pbf tile for each tile. The data would be indexed using the same method as any other case, but the geometries constructed would be tossed out after the osm.pbf subset was created. We can lean on libosmium for writing and compressing this data. The structure would look like:

./output/{{X}}/{{Y}}/{{Z}}/data.osm.pbf

These extracts would take a bit more work to use in processors, but would allow for maximum flexibility using node-osmium.

arg parsing

This should cover all the parameters I am aware of:

Usage:
  osm-split [options] <file>

<file> : osm.pbf file to process
--zoom -z : zoom level of tiles
--pbf -p : output tiled osm.pbf files
--geojson -g : output tiled line-delimitted geojson files

build fails on Ubuntu

error is:

g++ osm-tiler.cpp -o osm-tiler -isystem./mason_packages/.link/include -L./mason_packages/.link/lib  -std=c++11 -fvisibility=hidden -g -Wall -Wextra -Wfloat-equal -Wundef -Wcast-align -Wwrite-strings -Wlong-long -Wmissing-declarations -Wredundant-decls -Wshadow -Woverloaded-virtual -O3 -DNDEBUG  -lz -lpthread -lboost_program_options -lboost_filesystem -lboost_system;
/tmp/ccNL9CFc.o: In function 
`boost::program_options::validation_error::validation_error(boost::program_options::validation_error::kind_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)':
/home/dev/projects/osm-tiler/./mason_packages/.link/include/boost/program_options/errors.hpp:373: undefined reference to `boost::program_options::validation_error::get_template[abi:cxx11](boost::program_options::validation_error::kind_t)'
/home/dev/projects/osm-tiler/./mason_packages/.link/include/boost/program_options/errors.hpp:373: undefined reference to 
`boost::program_options::error_with_option_name::error_with_option_name(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)'

I'm totally clueless with these dependencies. Wondering if I'm missing a required package

# osm-tiler -z 7 -o output/ planet.osm.pbf 
error: Could not detect file format for filename '-z'

Putting the arguments after the filename works as expected.

cc @morganherlocker

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.