mapbox / osm-tiler Goto Github PK
View Code? Open in Web Editor NEWexperimental osm pbf tiler for efficiently breaking planet files into small chunks
License: ISC License
experimental osm pbf tiler for efficiently breaking planet files into small chunks
License: ISC License
Let's do a planet run on a morec2 to see how it performs with a realistic dataset.
cc @rclark
Libosmium has a class osmium::geom::Tile
that already does a lot of what is implemented in osm-tiler/handler.hpp
.
libosmium's node cache mechanism uses mmap to scale node caching dynamically to disk if there is not enough RAM on a machine. It also allows for some niceties like being able to call node_ref.location()
from a node_ref parsed from a way as if you had the node itself in memory. We should use this.
@danpat linked to a great example from the libosmium source in chat.
The first pass will use leveldb, which should only be limited by disk space. If speed is insufficient using this method, we can use libosmium's in memory node-cache instead. If we use the in memory cache, processing will likely be faster, but require a very large machine.
cc @rclark
What should we use?
# osm-tiler -z 7 -o output/ planet.osm.pbf
error: Could not detect file format for filename '-z'
Putting the arguments after the filename works as expected.
The current "test" simply runs osm-tiler on a small metro extract, and passes as long as it does not crash. Let's add:
When these are in place, let's also enable tests on travis.
cc @springmeyer
osm-split is intended to be a C++ tool using libosmium for efficiently splitting an OSM planet file into tiled chunks of either line-delimitted geojson or osm.pbf files, similar to most metro-extract tools. This allows the single-threaded memory-hungry operations required for processing the planet to be run only once, allowing mapreduce tools to perform additional transformations in JavaScript.
osm-split will make a couple subjective decisions about the data that any processor could live with. For example, only OSM objects that can be joined to a node's lng/lat will be output. In the case of turn-restrictions, this would mean building an intermediate geometry from the from/via/to objects so that they can be tiled. Beyond these basic spatial operations necessary for indexing, osm-split would not perform any other filtering of tags, which allows a wide array of tile-reduce processors to use the data however they need to.
Most of the existing extract tools allow for arbitrary polygons to be used to define areas of interest. This presents memory and computation challenges that we can easily sidestep with our use case, which is restricted to tiles. Generating a unique list of tiles that an OSM object overlaps with is significantly cheaper than performing the same operation for an arbitrary shape like a city boundary. Also, having tight control over how fuzzy OSM objects like relations are handled should give us a bit more flexibility over general-purpose tools like minjur.
This should cover all the parameters I am aware of:
Usage:
osm-split [options] <file>
<file> : osm.pbf file to process
--zoom -z : zoom level of tiles
--pbf -p : output tiled osm.pbf files
--geojson -g : output tiled line-delimitted geojson files
Output to line-delimitted geojson. The output would be structured like:
./output/{{X}}/{{Y}}/{{Z}}/nodes.json
./output/{{X}}/{{Y}}/{{Z}}/ways.json
./output/{{X}}/{{Y}}/{{Z}}/relations.json
error is:
g++ osm-tiler.cpp -o osm-tiler -isystem./mason_packages/.link/include -L./mason_packages/.link/lib -std=c++11 -fvisibility=hidden -g -Wall -Wextra -Wfloat-equal -Wundef -Wcast-align -Wwrite-strings -Wlong-long -Wmissing-declarations -Wredundant-decls -Wshadow -Woverloaded-virtual -O3 -DNDEBUG -lz -lpthread -lboost_program_options -lboost_filesystem -lboost_system;
/tmp/ccNL9CFc.o: In function
`boost::program_options::validation_error::validation_error(boost::program_options::validation_error::kind_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)':
/home/dev/projects/osm-tiler/./mason_packages/.link/include/boost/program_options/errors.hpp:373: undefined reference to `boost::program_options::validation_error::get_template[abi:cxx11](boost::program_options::validation_error::kind_t)'
/home/dev/projects/osm-tiler/./mason_packages/.link/include/boost/program_options/errors.hpp:373: undefined reference to
`boost::program_options::error_with_option_name::error_with_option_name(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)'
I'm totally clueless with these dependencies. Wondering if I'm missing a required package
For RAM constrained environments, let's add a flag for caching tile indices in leveldb.
cc @rclark
An OSM object should be included in any tile extract if any of its nodes are contained in that tile. We can vendor the tilebelt implementation of this tile labeling operation.
This approach will be extremely fast (~10M nodes per second in JavaScript and possibly much more in C++), and provide "good enough" tile indexing. Where it will fail is when an edge crosses over a particular tile, without any nodes being in that tile (like at a corner). I think we can overlook this for the types of data analysis we are concerned with. If we find that we realllly don't want to make the tradeoff, we can implement tile-cover in C++, although that would be a bigger lift.
An open question is what we should do with giant geometries like coastlines that span large numbers of tiles. I would be in favor of logic that says something like "if an object touches >100 unique tiles, bail and do not include it". Coastlines and other giant objects do not fit the TileReduce scaling model well, and are not needed for any types of analysis we need so far. If a particular processor does need these, it can handle these geometries on its own.
cc @rclark
Similar to the geojson output option, but this would output a complete osm.pbf tile for each tile. The data would be indexed using the same method as any other case, but the geometries constructed would be tossed out after the osm.pbf subset was created. We can lean on libosmium for writing and compressing this data. The structure would look like:
./output/{{X}}/{{Y}}/{{Z}}/data.osm.pbf
These extracts would take a bit more work to use in processors, but would allow for maximum flexibility using node-osmium.
I believe dependencies are all header-only libraries at the moment. So far, we will need:
We need to create directories from c++ to map the structure of the quadtiles. Investigate options.
We are intentionally building this tool to favor speed over completeness. We are also intentionally building it to work for the specific use cases presented by mapreducable network analysis. This means that we can and will take shortcuts that a more general purpose tool might go to great lengths to avoid. We should document the tradeoffs explicitly to set expectations.
Off the bat, we will need to document:
Couldn't find any explanation about this on the repo. I apologise if it is mentioned somewhere and I missed it. :P
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.