Comments (18)
I wrote a sample Java driver using these dependencies:
org.apache.parquet:parquet-avro:1.13.1
org.apache.hadoop:hadoop-client:3.3.6
They double the size of the planetiler fat jar file (from 60 to 120MB) 😭
On an r7g.metal instance, it takes ~1m30s to download all of the parquet files to disk, then about 1m30s to read all of the 1.5B geometries out of it.
Even though the IDs are 128-bit longs (and some are temp ID strings) - if you FNV1A-hash them to 64 bits, there are no collisions.
from planetiler.
there should be many more buildings than OSM, if this is a superset of previous Daylight MSFT building releases.
The Parquet format is very efficient at encoding column data but the geometry encoding is simply WKB, which should be less compact than the OSM topological way/node model. So I think it's a combination of both.
from planetiler.
Exactly, currently the buildings theme contains a combination of OSM + Microsoft ML + ESRI Community Maps data, for a total of about 786M buildings globally.
from planetiler.
@mactrem gave a nice presentation about possible improvements of the Mapbox Vector Tile format in the last MapLibre Technical Steering Committee meeting. There was some discussion about nested properties. Maybe this is a bit far in the future, but worth thinking about...
If you are interested in discussions around the tile format, feel free to join the #maplibre-tile-format
channel in the OSMUS slack.
from planetiler.
Alright, got a first pass (highly experimental!) overture reader up and running. Planetiler can download and build a planet.pmtiles for the planet on an r7g ec2 instance in about 15 minutes, compared to 20+ for an OSM planet pbf. The output is around 50GB.
Here's a demo: https://msbarry.github.io/planetiler-overture-demo/#14/42.35647/-71.07003
The structured attributes definitely present a challenge mapping to vector tile key/value pairs, mostly on road segment layer. Take a look at the road "segment" attributes, I just left them JSON for now but I'll need to do something to split up road lines and apply tags conditionally for different segments with attributes like flags=[{"value":["isBridge"],"at":[0.040321064,0.448427838]}]
.
from planetiler.
Some initial observations on the dataset:
It contains 1.5B elements (compared to 1.2B for OSM if you exclude nodes with no tags)
The initial size is 215GB vs. 75GB for OSM
Size breakdown by theme and type:
theme | type | Size (GB) |
---|---|---|
theme=admins | type=administrativeBoundary | 0.1 |
theme=admins | type=locality | 0.5 |
theme=buildings | type=building | 118.3 |
theme=places | type=place | 8.6 |
theme=transportation | type=connector | 21.1 |
theme=transportation | type=segment | 66.9 |
from planetiler.
It seems the majority of the space is buildings, are there a lot more buildings than OSM or is the size just the format used?
from planetiler.
A couple of other options for reading parquet format:
spark-sql
makes it simple to select and transform columns, filter, etc. but pulls in a ton of dependencies (jar goes from 60 to 260mb)- https://github.com/strategicblue/parquet-floor/ lets you read parquet files without pulling in hadoop dependencies at all and jar only goes up to 80mb
from planetiler.
Probably this is stupid so feel free to ignore my comment, but my first thought was let's flatten the nested properties and make a planet.pbf file which looks like OSM...
from planetiler.
I got a little deeper into the low-level parquet format reading this morning. It looks like it should actually work pretty cleanly in planetiler architecture to saturate all cores if I read one row group from a file at a time, then hand it off to a worker to parse and process one element at a time.
I tried playing with https://github.com/joelittlejohn/jsonschema2pojo to generate typed classes from the json schema definition in https://github.com/OvertureMaps/schema/tree/main/schema but looks like it can't handle allOf/oneOf. I'll play with it a bit more but might just start with a dynamic API like getInt("level")
or getDouble("bbox.minx")
instead of marshaling it into a typed wrapper for the first pass.
from planetiler.
Probably this is stupid so feel free to ignore my comment, but my first thought was let's flatten the nested properties and make a planet.pbf file which looks like OSM...
It should be possible but it should definitely use #127 - would be highly inefficient to break out Overture into a topological node/ways/relations just to re-assemble them again.
from planetiler.
Congratulations @msbarry, this is amazing!
from planetiler.
Very cool @msbarry
Looking at it, it doesn't seem like it would be hard to style those values with a case syntax. similar to the ugly way I did the icons in my recent trail map, like
"icon-image":["case",["in", "Fuel", ["get","description"]],"fuel_15", ["case",["in", "Parking", ["get","description"]],"parking_15", ["case",["in", "View", ["get","description"]],"attraction_15", ["case",["in", "Sales", ["get","description"]],"commercial_15", ["case",["in", "Camping", ["get","description"]],"campsite_15", ["case",["in", "Food", ["get","description"]],"restaurant_15", ["case",["in", "Lodging", ["get","description"]],"hotel_15", ["case",["in", "Restroom", ["get","description"]],"toilets_15", ["case",["in", "Club", ["get","description"]],"warehouse_15", "circle_15"]]]]]]]]],
What does that "at" location mean? does the line start at that point or is it longer and telling you at a certain point it changes to a different style?
I wonder how this looks side by side with a map made from OSM sources. it seems like OSM is the source for a lot of this anyway.
from planetiler.
The at field means that the attribute only applies for a certain segment of the line, so at: [0, 0.5] means it applies for the first half. This is one of the biggest mismatches between overture format and planetiler processing/vector tiles in general.
So to "have support for overture" in planetiler probably means it is able to:
- download and read parquet sources (realistically people will probably want to mix and match only certain themes from overture with other sources)
- access structured properties (nested structs and lists) and parse json strings, assuming this isn't a bug: OvertureMaps/data#43
- break apart lines based on attributes that only apply to partial segment lengths
I'd say understanding the current overture schema could be out of scope for now, since it will evolve and we people should be able to use new attributes without being blocked on a planetiler update.
What do people think?
from planetiler.
OK I got a prototype profile with those working (see code and demo)
I think it's easiest to work with the structured schema with a dynamic API, so you can do things like:
feature.setAttr("categories.main", struct.get("categories").get("main").asString())
or handle all of the different ways that partial-length road data is provided (some embed an "at": [start, end]
field in an object, some use {"at": [start ,end]: "value": list or value}
some use "values"
but it's all handled by this code)
For actually handling the partial-length values I came up with an API
var rangeMap = new RangeMapMap();
rangeMap.put(0, 0.25, Map.of("key", "value");
rangeMap.put(0.25, 1.0, Map.of("key", "other value");
var lineSplitter = new LineSplitter(lineString);
for (var range : tags.result()) { // merges overlapping tag maps
var splitLine = lineSplitter.get(range.start(), range.end());
features.geometry(sourceFeature.getSourceLayer(), splitLine)
.putAttrs(range.value());
}
but I could probably simplify it to something like:
features.line(sourceFeature.getSourceLayer())
.setAttrPartialLength(0, 0.25, "key", "value")
.setAttrPartialLength(0.25, 1.0, "key", "other value")
.putAttrsPartialLength(0, 0.5, names)
then have planetiler handle creating multiple line geometries behind the scenes.
from planetiler.
If I wanted to try the code at https://github.com/onthegomap/planetiler/tree/overture-generic , how would I use this new profile after compiling planetiler? can it only be used with pmtiles or are mbtiles still possible?
from planetiler.
If I wanted to try the code at https://github.com/onthegomap/planetiler/tree/overture-generic , how would I use this new profile after compiling planetiler? can it only be used with pmtiles or are mbtiles still possible?
java -jar planetiler.jar overture
should be sufficient. It will download by default, but you can set --download=false --overture-path=...
to point to a location you've already downloaded to. You can also set --split_roads=true
to spit road segments (default just leaves the json structs on each full-length road segment) and --connectors=false
will disable writing transportation connectors and connector IDs to the output (which double the size). You can write to pmtiles with --output=planet.pmtiles
or --output=planet.mbtiles
. You can set --bounds=minlon,minlat,maxlon,maxlat
to generate map for a bounding box - this runs pretty fast because it's able to use a column predicate to avoid reading/parsing entire rows outside of the box.
from planetiler.
Also, I'm not sure if we should add full overture support to planetiler while they are still only doing alpha releases if the format might change in the future (for example something besides avro-parquet). So maybe we should split this out into separate independent issues for the generic lower-level capabilities that planetiler needs to work with overture-like data:
- support partial-length line attributes, which effectively splits it up into several vector tile line segments
- support for working with structured input feature attributes (lists, maps...)
- working with an input source composed of many files (which can be listed through s3 api) - the downloads are a little wonky in overture-generic branch
- parquet input format
- pole of inaccessibility/maximum inscribed circle centroid option for polygon centers (not strictly necessary but something I added in overture-generic branch)
Then we could have an example profile that used these to read one of the alpha releases but it's mostly up to consumers if/how they want to use it? Most likely I think people would want to pick individual themes from an overture release to layer-into another map profile.
from planetiler.
Related Issues (20)
- About the display layer of place name labels HOT 1
- Multiple areas support? HOT 2
- House number is generated for every primative with addr:housenumber key HOT 1
- [FEATURE] Merge with already existing tiles HOT 2
- Prefer name statements over Wikidata labels HOT 2
- [FEATURE] Log problematic geometry causing JTS exceptions HOT 22
- [FEATURE] Min polygon area for including center point
- [FEATURE] Expose OSM entity metadata in the process feature context HOT 2
- [FEATURE] Lua profile followups
- [BUG] US admin level 4 boundary lines appear incomplete HOT 3
- Tilejson attributes missing attributes added or removed during post-processing
- [BUG] tiles missing >= zoom 7 HOT 13
- [FEATURE] Log planet version being used HOT 2
- [BUG] Charging stations won't be exported HOT 4
- [FEATURE] Add arbitrary extra metadata using the Profile interface
- [BUG] GeoPackageReader with EPSG:3067 results in swapped lat & lng
- [BUG] IOException/GOAWAY happening in parseResults()
- [FEATURE] Speed-up Planetiler by skipping OSM reads when generating only layers which do not need OSM data HOT 2
- Derive progress % from logs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from planetiler.