Comments (14)
Today:
build tilemaker from v3 branch on ubuntu 22.04
Last planet (74327MB)
- default options + --fast
- scripts from v3 +
-- preferred_language = "ru"
-- preferred_language_attribute = "name:ru" - VM: 48 cores (2GHz) + 768GB RAM + SSD
- peak memory ~345GB + 158GB (cache/buffers) by top
- time:
real 84m9.543s
user 3596m55.728s
sys 109m37.462s
from tilemaker.
Just wanted to say thanks for all the hard work. Generated the North America tile set in 50 minutes on my 32GB machine and it never used more than 1GB of swap when V2 was using several 10s of GB. Just throwing Europe through it now
from tilemaker.
Exciting!
Depending on your timeline, there are two other things that seem promising
- use protozero for PBF reading - the motivation for this is touched on in #621
- write a custom map class for AttributePair and AttributeSet
The motivation for AttributePair/AttributeSet - I think we can save ~600MB of RAM in a planet build. Between that and (1), I think we might get away with fewer, bigger splits when doing the planet on a constrained-memory build, so it's worth doing.
The idea is: we currently refer to AttributePairs and AttributeSets compactly, e.g. by a uint32_t
that references their index in a deque<AttributePair>
, rather than by a full pointer.
When reading a new attribute, we need to know if we've seen it already, and if so, what its index is. To do this, we have a boost::container::flat_map<AttributePair*, uint32_t>
.
It has a custom less-than comparator so that the pointer is compared by its logical contents, not its memory address. This lets us ask "is this new AttributePair, who may be stored in a different memory location than a previously-seen version of it, already logically present in the map?"
But there is some waste: we effectively store the pointer in the map twice. The first copy is the key, which is literally a pointer. The second copy is the index value -- the uint32_t
identifies the offset in the deque, which, if you squint, is a pointer to an AttributePair. Deques don't invalidate memory locations when they grow, so the pointer obtained by indexing to that location will be valid throughout the life of the program.
Thus, where today the flat map stores a vector<pair<AttributePair*, uint32_t>>
, I think if we write a small wrapper, we ought to be able to get away with a custom container that stores a vector<uint32_t>
and supports the same operations. I believe there are ~40M AttributeSets and ~30M AttributePairs, so I expect a savings of 70M * 8 => 560M. Our task is simpler than a general-purpose map, since we're only interested in insert + find -- iteration, deletion, etc aren't needed.
from tilemaker.
Both sound good. I'm not in an inordinate hurry to do the release so happy to wait for those!
from tilemaker.
Incidentally, there might be some possibilities for memory saving in the .pmtiles branch I've just merged in - it creates a big map/vector (depending on size) of the location of each tile within the file, for writing out into the pmtiles directories at the end of the process. I wondered whether this could potentially be mmaped after #618 - writing by z6 tile means that we'll be filling up nearby entries most of the time.
from tilemaker.
Incidentally, there might be some possibilities for memory saving in the .pmtiles branch I've just merged in - it creates a big map/vector (depending on size) of the location of each tile within the file, for writing out into the pmtiles directories at the end of the process. I wondered whether this could potentially be mmaped after #618 - writing by z6 tile means that we'll be filling up nearby entries most of the time.
Oh, good thought, denseIndex
does look like a candidate for being mmap-able. I'm guessing the savings would be ~640MB? When I build the planet without shapefiles, I get ~80M tiles, so I'm treating that as the upper bound on the number of tiles with "interesting" things in them, and TileOffset
is an 8-byte struct.
It might also be possible to approach it from a different angle -- maybe they could be flushed from memory to the pmtiles archive earlier? If I understand the pmtiles spec, I think things can be scattered throughout the archive willy-nilly when non-clustered mode is used. It might complicate the bookkeeping, and be a little tricky to do without imposing a lot of locking overhead.
from tilemaker.
When I build the planet without shapefiles, I get ~80M tiles, so I'm treating that as the upper bound on the number of tiles with "interesting" things in them, and TileOffset is an 8-byte struct.
I think it'll be more than that - there are many thousands of sea tiles when building with shapefiles, and we need an index for each of them.
It might also be possible to approach it from a different angle -- maybe they could be flushed from memory to the pmtiles archive earlier? If I understand the pmtiles spec, I think things can be scattered throughout the archive willy-nilly when non-clustered mode is used. It might complicate the bookkeeping, and be a little tricky to do without imposing a lot of locking overhead.
Each pmtiles leaf directory is a series of file offsets for contiguously numbered tiles (using pmtiles' Hilbert tile numbering). The leaf directories are all together in the .pmtiles archive.
What this means in practice:
- Either we write all the leaf directories together at the end of file (as we do now), or we reserve space for them at the start of the file, and write them as we go along. But if we do the latter we have to estimate how big the leaf directories are going to be before we start writing tiles.
- If we have space reserved at the start of the file, we can flush a leaf directory to disk once it's complete. Directory completion is obviously not linear though - some tiles take longer than others, and we don't build the tiles in pmtiles Hilbert order. So there'd still be several (maybe many?) incomplete directories in memory before we flush.
With that in mind I suspect mmaping denseIndex
is possibly easiest - but I haven't tried it!
from tilemaker.
I think it'll be more than that - there are many thousands of sea tiles when building with shapefiles, and we need an index for each of them.
Ah, right. I misunderstood how isSparse
worked.
And re-reading the PMTiles spec, I see that I was hallucinating, and it's not valid to intersperse tile data with leaf directories. Let's pretend I didn't comment. :)
from tilemaker.
I've created a v3 branch: so far it has master + #626, #629, #636 and a bit of tidying. I think I've merged #629 correctly but you might want to check.
from tilemaker.
I think I've merged #629 correctly but you might want to check.
I did a smoke test, seems good to me.
from tilemaker.
A bit of benchmarking for v3 on my current machine:
Great Britain (no --store)
default:
- Elapsed (wall clock) time (h:mm:ss or m:ss): 4:18.12
- Maximum resident set size (kbytes): 12210088
--fast:
- Elapsed (wall clock) time (h:mm:ss or m:ss): 4:17.14
- Maximum resident set size (kbytes): 12119416
--lazy-geometries:
- Elapsed (wall clock) time (h:mm:ss or m:ss): 4:25.12
- Maximum resident set size (kbytes): 9272368
So when running without --store, I'm inclined to default to --lazy-geometries (for the significant memory saving), but turn it off with --fast.
planet (with --store on SSD)
default:
- peak memory ~18.3GB (note this is quite an old planet)
- Elapsed (wall clock) time (h:mm:ss or m:ss): 4:24:27
- Maximum resident set size (kbytes): 131827336
--fast:
- peak memory similar
- Elapsed (wall clock) time (h:mm:ss or m:ss): 4:09:23
- Maximum resident set size (kbytes): 139364168
I haven't yet tried with --materialize-geometries on the planet.
from tilemaker.
Release done!
real 84m9.543s
That's amazing - thank you for running that as a benchmark.
from tilemaker.
👍 No more excuses for me to avoid working on my hobby map project now, I guess. :)
Thanks, @systemed, for your patience in answering my questions and reviewing PRs over the past few months. I really appreciate it.
The in memoriam you added for Wouter van Kleunen is very thoughtful. I stumbled across many of his contributions and discussions here and in the boost geometry repos, learning something from him each time.
from tilemaker.
Thanks, @systemed, for your patience in answering my questions and reviewing PRs over the past few months. I really appreciate it.
Not at all - you've made massive improvements, so thank you!
The in memoriam you added for Wouter van Kleunen is very thoughtful. I stumbled across many of his contributions and discussions here and in the boost geometry repos, learning something from him each time.
A lot of his code was really inspired - particularly the really intense geometry stuff such as intersection-aware simplification and the dissolve/correct algorithm. I hope he's in a better place.
from tilemaker.
Related Issues (20)
- Unable to open landcover SHP HOT 6
- Vector tile generation faulty in edge cases ? HOT 4
- stylesheet HOT 1
- PR575 seems to break shapefile HOT 20
- Geometry clipping - summary issue HOT 2
- Low CPU usage when writing tiles HOT 1
- during writing, much time is spent in seemingly empty tiles HOT 3
- htt HOT 1
- Attribution
- Missing Icons HOT 13
- Improve Lua detection in Makefile
- Font Problem HOT 3
- Multipoint support in Shapefile HOT 15
- Raster/contour tracing
- Makefile:51: *** Couldn't find Lua libraries. Stop. HOT 7
- lua runtime error: ../resources/process-openmaptiles.lua:123: attempt to call global 'Find' (a nil value) HOT 2
- pmtiles root/leaf index size issue ? HOT 3
- `tilemaker --store` leads to mmap error (solution found)
- "`Could not find node with id`" despite node presence in PBF HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tilemaker.