Fetching the web tiles can take a long time when there are many to download. Would

Just made a couple more tests with smaller images: At <code class="n

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Closed by <a class="issue-link js-issue-link" data-error-text="Failed to load title" d

Add the ability to fetch tiles in parallel about contextily HOT 8 CLOSED

tcihak-fqa commented on September 3, 2024 1

Add the ability to fetch tiles in parallel

from contextily.

Comments (8)

darribas commented on September 3, 2024

Thanks very much for the suggestion. I think that is technically possible and would amount to parallelising this for loop:

contextily/contextily/tile.py

Lines 219 to 224 in d821ac7

 for t in mt.tiles(w, s, e, n, [zoom]): 

 x, y, z = t.x, t.y, t.z 

 tile_url = provider.build_url(x=x, y=y, z=z) 

 image = _fetch_tile(tile_url, wait, max_retries) 

 tiles.append(t) 

 arrays.append(image)

However, I'm not sure parallelisation would speed up things. It would if the bottleneck was in processing the tiles, but my hunch is that most of the time is spent on download/latency issues. If that's the case, wouldn't parallelisation not solve it (if anything add more downloading load for the same bandwidth)?

from contextily.

tcihak-fqa commented on September 3, 2024

Thanks for the response! So I'm using contextily to generate static maps from a web api that has high bandwidth. I agree that most of the time is spent on I/O latency.

from contextily.

JacobJeppesen commented on September 3, 2024

I have actually been considering making a pull request for this exact functionality. I've been playing around with it in another project, and for large images it can make a rather big difference. The optimal number of parallel downloads differ quite a bit from endpoint to endpoint, but 8-16 is normally a good range. The code below can be used as a drop-in replacement for the for loop:

def bounds2img(
        w, s, e, n, zoom="auto", source=None, ll=False, wait=0, max_retries=2, num_parallel_tile_downloads=16
):
.
.
.
    # download and merge tiles
    # tiles = []
    # arrays = []
    # for t in mt.tiles(w, s, e, n, [zoom]):
    #     x, y, z = t.x, t.y, t.z
    #     tile_url = provider.build_url(x=x, y=y, z=z)
    #     image = _fetch_tile(tile_url, wait, max_retries)
    #     tiles.append(t)
    #     arrays.append(image)
    from joblib import Parallel, delayed  # This should go to the top of the file
    tiles = list(mt.tiles(w, s, e, n, [zoom]))
    tile_urls = [provider.build_url(x=tile.x, y=tile.y, z=tile.z) for tile in tiles]
    max_num_parallel_tile_downloads = 32
    # Note that num_parallel_tile_downloads has been added as an argument to the function
    if num_parallel_tile_downloads < 1 or num_parallel_tile_downloads > max_num_parallel_tile_downloads:
        raise ValueError(
            f"num_parallel_tile_downloads must be between 1 and {max_num_parallel_tile_downloads}"
        )
    arrays = \
        Parallel(n_jobs=num_parallel_tile_downloads, prefer="threads")(
            delayed(_fetch_tile)(tile_url, wait, max_retries) for tile_url in tile_urls)

    merged, extent = _merge_tiles(tiles, arrays)
.
.
.

I just tested it in the intro_guide.ipynb notebook by downloading an extended version of the ghent image in the Coordinate-based searches section, with the following code:

west, south, east, north = (
    3.616218566894531,
    50.98912458110244,
    5.8483047485351562,
    54.13994019806845
             )
import time
start_time = time.time()
ghent_img, ghent_ext = cx.bounds2img(west, 
                                     south, 
                                     east, 
                                     north, 
                                     ll=True, 
                                     zoom=11,
                                     source=cx.providers.Stamen.Toner,
                                     num_parallel_tile_downloads=8
                                    )
print(f"Download time: {time.time() - start_time}")
ghent_img.shape

Note that I had to out-comment the @memory.cache decorator for the _fetch_tile function during the test.

The shape of the image was (7680, 3584, 4), and the download times were:
num_parallel_tile_downloads=1: 69.49s
num_parallel_tile_downloads=2: 36.11s
num_parallel_tile_downloads=4: 18.32s
num_parallel_tile_downloads=8: 9.55s
num_parallel_tile_downloads=16: 5.02s
num_parallel_tile_downloads=32: 2.92s

I don't think there should be any downside to always do it in parallel (the overhead should be minimal), as long as the number of parallel downloads don't bomb the endpoint. A default value of 16 is a good starting point in my experience, with the different endpoints I've tested. That normally gives almost linear improvements. Above that, it differs quite a bit.

That ended up being quite a bit of text. Hopefully it's useful 😉

Let me know if you would like a pull request with the implementation 👍

from contextily.

JacobJeppesen commented on September 3, 2024

Just made a couple more tests with smaller images:

At zoom=9 the shape was (2048, 1024, 4) and download times were:
num_parallel_tile_downloads=1: 5.38s
num_parallel_tile_downloads=16: 0.50s

At zoom=7 the shape was (512, 512, 4) and download times were:
num_parallel_tile_downloads=1: 0.69s
num_parallel_tile_downloads=16: 0.17s (there's only 4 tiles so in practice we only do 4 parallel downloads)

I also tested the three zoom levels with the current for loop implementation. The download times were pretty much exactly the same as with num_parallel_tile_downloads=1.

from contextily.

tcihak-fqa commented on September 3, 2024

This solution looks very nice Jacob! My vote would be to create a PR and get it into a release at some point.

from contextily.

JacobJeppesen commented on September 3, 2024

Thanks @tcihak-fqa 😃

I've just added a pull request with the changes (#217).

If you'd like to use it now, you can install it with:

pip uninstall -y contextily
pip install git+https://github.com/JacobJeppesen/contextily@parallel_tile_downloads

from contextily.

tcihak-fqa commented on September 3, 2024

Thanks Jacob.
I monkey patched your original solution and it seems to be working well. I haven't encountered any memory issues but the number of api requests has been light so far.

from contextily.

martinfleis commented on September 3, 2024

Closed by #217

from contextily.

Add the ability to fetch tiles in parallel about contextily HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	for t in mt.tiles(w, s, e, n, [zoom]):
	x, y, z = t.x, t.y, t.z
	tile_url = provider.build_url(x=x, y=y, z=z)
	image = _fetch_tile(tile_url, wait, max_retries)
	tiles.append(t)
	arrays.append(image)