mapbox / osm-tiler Goto Github PK

View Code? Open in Web Editor NEW

7.0 77.0 6.0 32.79 MB

experimental osm pbf tiler for efficiently breaking planet files into small chunks

License: ISC License

Makefile 10.38% C++ 81.68% Shell 7.94%

banished

osm-tiler's Issues

output to geojson

Output to line-delimitted geojson. The output would be structured like:

./output/{{X}}/{{Y}}/{{Z}}/nodes.json
./output/{{X}}/{{Y}}/{{Z}}/ways.json
./output/{{X}}/{{Y}}/{{Z}}/relations.json

Use osmium::geom::Tile?

Libosmium has a class osmium::geom::Tile that already does a lot of what is implemented in osm-tiler/handler.hpp.

tests

The current "test" simply runs osm-tiler on a small metro extract, and passes as long as it does not crash. Let's add:

unit tests - tile conversion, directory creation, required flag exceptions, etc
integration tests - verify that data gets tiled where it needs to be in the appropriate

When these are in place, let's also enable tests on travis.

cc @springmeyer

build fails on Ubuntu

error is:

g++ osm-tiler.cpp -o osm-tiler -isystem./mason_packages/.link/include -L./mason_packages/.link/lib  -std=c++11 -fvisibility=hidden -g -Wall -Wextra -Wfloat-equal -Wundef -Wcast-align -Wwrite-strings -Wlong-long -Wmissing-declarations -Wredundant-decls -Wshadow -Woverloaded-virtual -O3 -DNDEBUG  -lz -lpthread -lboost_program_options -lboost_filesystem -lboost_system;
/tmp/ccNL9CFc.o: In function 
`boost::program_options::validation_error::validation_error(boost::program_options::validation_error::kind_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)':
/home/dev/projects/osm-tiler/./mason_packages/.link/include/boost/program_options/errors.hpp:373: undefined reference to `boost::program_options::validation_error::get_template[abi:cxx11](boost::program_options::validation_error::kind_t)'
/home/dev/projects/osm-tiler/./mason_packages/.link/include/boost/program_options/errors.hpp:373: undefined reference to 
`boost::program_options::error_with_option_name::error_with_option_name(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)'

I'm totally clueless with these dependencies. Wondering if I'm missing a required package

arg parsing

This should cover all the parameters I am aware of:

Usage:
  osm-split [options] <file>

<file> : osm.pbf file to process
--zoom -z : zoom level of tiles
--pbf -p : output tiled osm.pbf files
--geojson -g : output tiled line-delimitted geojson files

document performance shortcuts

We are intentionally building this tool to favor speed over completeness. We are also intentionally building it to work for the specific use cases presented by mapreducable network analysis. This means that we can and will take shortcuts that a more general purpose tool might go to great lengths to avoid. We should document the tradeoffs explicitly to set expectations.

Off the bat, we will need to document:

tile index methodology and gaps
large object omissions
relation omissions (or the flip side: which do we support?)

code linting

What should we use?

How are ways or relations running over more than one tile handled?

Couldn't find any explanation about this on the repo. I apologise if it is mentioned somewhere and I missed it. :P

CLI too picky about argument ordering

# osm-tiler -z 7 -o output/ planet.osm.pbf 
error: Could not detect file format for filename '-z'

Putting the arguments after the filename works as expected.

cc @morganherlocker

benchmark performance and minimum specs

How fast can we split a metro area?
How fast can we split a large-ish region?
How fast can we split the planet?
What are the minimum machine specs for each of the above?

The first pass will use leveldb, which should only be limited by disk space. If speed is insufficient using this method, we can use libosmium's in memory node-cache instead. If we use the in memory cache, processing will likely be faster, but require a very large machine.

cc @rclark

what is osm-split?

osm-split is intended to be a C++ tool using libosmium for efficiently splitting an OSM planet file into tiled chunks of either line-delimitted geojson or osm.pbf files, similar to most metro-extract tools. This allows the single-threaded memory-hungry operations required for processing the planet to be run only once, allowing mapreduce tools to perform additional transformations in JavaScript.

osm-split will make a couple subjective decisions about the data that any processor could live with. For example, only OSM objects that can be joined to a node's lng/lat will be output. In the case of turn-restrictions, this would mean building an intermediate geometry from the from/via/to objects so that they can be tiled. Beyond these basic spatial operations necessary for indexing, osm-split would not perform any other filtering of tags, which allows a wide array of tile-reduce processors to use the data however they need to.

Why not use existing extract tools?

Most of the existing extract tools allow for arbitrary polygons to be used to define areas of interest. This presents memory and computation challenges that we can easily sidestep with our use case, which is restricted to tiles. Generating a unique list of tiles that an OSM object overlaps with is significantly cheaper than performing the same operation for an arbitrary shape like a city boundary. Also, having tight control over how fuzzy OSM objects like relations are handled should give us a bit more flexibility over general-purpose tools like minjur.

next steps:

get code compiling from scratch with a basic Makefile setup
write the tile indexing logic
output to geojson option
output to osm.pbf option
benchmark performance using leveldb as the node-cache
if leveldb is too slow, switch to libosmiums RAM-based node-cache

cc @rclark @camilleanne

output to osm.pbf

Similar to the geojson output option, but this would output a complete osm.pbf tile for each tile. The data would be indexed using the same method as any other case, but the geometries constructed would be tossed out after the osm.pbf subset was created. We can lean on libosmium for writing and compressing this data. The structure would look like:

./output/{{X}}/{{Y}}/{{Z}}/data.osm.pbf

These extracts would take a bit more work to use in processors, but would allow for maximum flexibility using node-osmium.

optional leveldb tile cache

For RAM constrained environments, let's add a flag for caching tile indices in leveldb.

cc @rclark

get code compiling from scratch with a basic Makefile setup

I believe dependencies are all header-only libraries at the moment. So far, we will need:

libosmium
rapidjson
leveldb
protozero

geojson => json

remove -g and --geojson flags
add -j and --json flags
document new json format in readme
json writer for nodes using RapidJSON
json writer for ways using RapidJSON
json writer for relations using RapidJSON

cc @rclark

Dynamically generate directories based on tile parameters

We need to create directories from c++ to map the structure of the quadtiles. Investigate options.

leveldb node-cache -> libosmium mmap node-cache

libosmium's node cache mechanism uses mmap to scale node caching dynamically to disk if there is not enough RAM on a machine. It also allows for some niceties like being able to call node_ref.location() from a node_ref parsed from a way as if you had the node itself in memory. We should use this.

@danpat linked to a great example from the libosmium source in chat.

cc @kcalloway @rclark

tile indexing

An OSM object should be included in any tile extract if any of its nodes are contained in that tile. We can vendor the tilebelt implementation of this tile labeling operation.

This approach will be extremely fast (~10M nodes per second in JavaScript and possibly much more in C++), and provide "good enough" tile indexing. Where it will fail is when an edge crosses over a particular tile, without any nodes being in that tile (like at a corner). I think we can overlook this for the types of data analysis we are concerned with. If we find that we realllly don't want to make the tradeoff, we can implement tile-cover in C++, although that would be a bigger lift.

An open question is what we should do with giant geometries like coastlines that span large numbers of tiles. I would be in favor of logic that says something like "if an object touches >100 unique tiles, bail and do not include it". Coastlines and other giant objects do not fit the TileReduce scaling model well, and are not needed for any types of analysis we need so far. If a particular processor does need these, it can handle these geometries on its own.

cc @rclark

run on morec2

Let's do a planet run on a morec2 to see how it performs with a realistic dataset.

mapbox / osm-tiler Goto Github PK

osm-tiler's Issues

Why not use existing extract tools?

next steps:

Recommend Projects

Recommend Topics

Recommend Org