3.0 release planning about tilemaker HOT 14 CLOSED

systemed commented on May 27, 2024 1

3.0 release planning

from tilemaker.

Comments (14)

leshak commented on May 27, 2024 1

Today:
build tilemaker from v3 branch on ubuntu 22.04

Last planet (74327MB)

default options + --fast
scripts from v3 +
-- preferred_language = "ru"
-- preferred_language_attribute = "name:ru"
VM: 48 cores (2GHz) + 768GB RAM + SSD
peak memory ~345GB + 158GB (cache/buffers) by top
time:

real	84m9.543s
user	3596m55.728s
sys     109m37.462s

from tilemaker.

StephenAtty commented on May 27, 2024 1

Just wanted to say thanks for all the hard work. Generated the North America tile set in 50 minutes on my 32GB machine and it never used more than 1GB of swap when V2 was using several 10s of GB. Just throwing Europe through it now

from tilemaker.

cldellow commented on May 27, 2024

Exciting!

Depending on your timeline, there are two other things that seem promising

use protozero for PBF reading - the motivation for this is touched on in #621
write a custom map class for AttributePair and AttributeSet

The motivation for AttributePair/AttributeSet - I think we can save ~600MB of RAM in a planet build. Between that and (1), I think we might get away with fewer, bigger splits when doing the planet on a constrained-memory build, so it's worth doing.

The idea is: we currently refer to AttributePairs and AttributeSets compactly, e.g. by a uint32_t that references their index in a deque<AttributePair>, rather than by a full pointer.

When reading a new attribute, we need to know if we've seen it already, and if so, what its index is. To do this, we have a boost::container::flat_map<AttributePair*, uint32_t>.

It has a custom less-than comparator so that the pointer is compared by its logical contents, not its memory address. This lets us ask "is this new AttributePair, who may be stored in a different memory location than a previously-seen version of it, already logically present in the map?"

But there is some waste: we effectively store the pointer in the map twice. The first copy is the key, which is literally a pointer. The second copy is the index value -- the uint32_t identifies the offset in the deque, which, if you squint, is a pointer to an AttributePair. Deques don't invalidate memory locations when they grow, so the pointer obtained by indexing to that location will be valid throughout the life of the program.

Thus, where today the flat map stores a vector<pair<AttributePair*, uint32_t>>, I think if we write a small wrapper, we ought to be able to get away with a custom container that stores a vector<uint32_t> and supports the same operations. I believe there are ~40M AttributeSets and ~30M AttributePairs, so I expect a savings of 70M * 8 => 560M. Our task is simpler than a general-purpose map, since we're only interested in insert + find -- iteration, deletion, etc aren't needed.

from tilemaker.

systemed commented on May 27, 2024

Both sound good. I'm not in an inordinate hurry to do the release so happy to wait for those!

from tilemaker.

systemed commented on May 27, 2024

Incidentally, there might be some possibilities for memory saving in the .pmtiles branch I've just merged in - it creates a big map/vector (depending on size) of the location of each tile within the file, for writing out into the pmtiles directories at the end of the process. I wondered whether this could potentially be mmaped after #618 - writing by z6 tile means that we'll be filling up nearby entries most of the time.

from tilemaker.

cldellow commented on May 27, 2024

Incidentally, there might be some possibilities for memory saving in the .pmtiles branch I've just merged in - it creates a big map/vector (depending on size) of the location of each tile within the file, for writing out into the pmtiles directories at the end of the process. I wondered whether this could potentially be mmaped after #618 - writing by z6 tile means that we'll be filling up nearby entries most of the time.

Oh, good thought, denseIndex does look like a candidate for being mmap-able. I'm guessing the savings would be ~640MB? When I build the planet without shapefiles, I get ~80M tiles, so I'm treating that as the upper bound on the number of tiles with "interesting" things in them, and TileOffset is an 8-byte struct.

It might also be possible to approach it from a different angle -- maybe they could be flushed from memory to the pmtiles archive earlier? If I understand the pmtiles spec, I think things can be scattered throughout the archive willy-nilly when non-clustered mode is used. It might complicate the bookkeeping, and be a little tricky to do without imposing a lot of locking overhead.

from tilemaker.

systemed commented on May 27, 2024

When I build the planet without shapefiles, I get ~80M tiles, so I'm treating that as the upper bound on the number of tiles with "interesting" things in them, and TileOffset is an 8-byte struct.

I think it'll be more than that - there are many thousands of sea tiles when building with shapefiles, and we need an index for each of them.

It might also be possible to approach it from a different angle -- maybe they could be flushed from memory to the pmtiles archive earlier? If I understand the pmtiles spec, I think things can be scattered throughout the archive willy-nilly when non-clustered mode is used. It might complicate the bookkeeping, and be a little tricky to do without imposing a lot of locking overhead.

Each pmtiles leaf directory is a series of file offsets for contiguously numbered tiles (using pmtiles' Hilbert tile numbering). The leaf directories are all together in the .pmtiles archive.

What this means in practice:

Either we write all the leaf directories together at the end of file (as we do now), or we reserve space for them at the start of the file, and write them as we go along. But if we do the latter we have to estimate how big the leaf directories are going to be before we start writing tiles.
If we have space reserved at the start of the file, we can flush a leaf directory to disk once it's complete. Directory completion is obviously not linear though - some tiles take longer than others, and we don't build the tiles in pmtiles Hilbert order. So there'd still be several (maybe many?) incomplete directories in memory before we flush.

With that in mind I suspect mmaping denseIndex is possibly easiest - but I haven't tried it!

from tilemaker.

cldellow commented on May 27, 2024

I think it'll be more than that - there are many thousands of sea tiles when building with shapefiles, and we need an index for each of them.

Ah, right. I misunderstood how isSparse worked.

And re-reading the PMTiles spec, I see that I was hallucinating, and it's not valid to intersperse tile data with leaf directories. Let's pretend I didn't comment. :)

from tilemaker.

systemed commented on May 27, 2024

I've created a v3 branch: so far it has master + #626, #629, #636 and a bit of tidying. I think I've merged #629 correctly but you might want to check.

from tilemaker.

cldellow commented on May 27, 2024

I think I've merged #629 correctly but you might want to check.

I did a smoke test, seems good to me.

from tilemaker.

systemed commented on May 27, 2024

A bit of benchmarking for v3 on my current machine:

Great Britain (no --store)

default:

Elapsed (wall clock) time (h:mm:ss or m:ss): 4:18.12
Maximum resident set size (kbytes): 12210088

--fast:

Elapsed (wall clock) time (h:mm:ss or m:ss): 4:17.14
Maximum resident set size (kbytes): 12119416

--lazy-geometries:

Elapsed (wall clock) time (h:mm:ss or m:ss): 4:25.12
Maximum resident set size (kbytes): 9272368

So when running without --store, I'm inclined to default to --lazy-geometries (for the significant memory saving), but turn it off with --fast.

planet (with --store on SSD)

default:

peak memory ~18.3GB (note this is quite an old planet)
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:24:27
Maximum resident set size (kbytes): 131827336

--fast:

peak memory similar
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:09:23
Maximum resident set size (kbytes): 139364168

I haven't yet tried with --materialize-geometries on the planet.

from tilemaker.

systemed commented on May 27, 2024

Release done!

real 84m9.543s

That's amazing - thank you for running that as a benchmark.

from tilemaker.

cldellow commented on May 27, 2024

👍 No more excuses for me to avoid working on my hobby map project now, I guess. :)

Thanks, @systemed, for your patience in answering my questions and reviewing PRs over the past few months. I really appreciate it.

The in memoriam you added for Wouter van Kleunen is very thoughtful. I stumbled across many of his contributions and discussions here and in the boost geometry repos, learning something from him each time.

from tilemaker.

systemed commented on May 27, 2024

Thanks, @systemed, for your patience in answering my questions and reviewing PRs over the past few months. I really appreciate it.

Not at all - you've made massive improvements, so thank you!

The in memoriam you added for Wouter van Kleunen is very thoughtful. I stumbled across many of his contributions and discussions here and in the boost geometry repos, learning something from him each time.

A lot of his code was really inspired - particularly the really intense geometry stuff such as intersection-aware simplification and the dissolve/correct algorithm. I hope he's in a better place.

from tilemaker.

3.0 release planning about tilemaker HOT 14 CLOSED

Comments (14)

Great Britain (no --store)

planet (with --store on SSD)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent