Giter Club home page Giter Club logo

Comments (8)

darribas avatar darribas commented on September 3, 2024

Thanks very much for the suggestion. I think that is technically possible and would amount to parallelising this for loop:

for t in mt.tiles(w, s, e, n, [zoom]):
x, y, z = t.x, t.y, t.z
tile_url = provider.build_url(x=x, y=y, z=z)
image = _fetch_tile(tile_url, wait, max_retries)
tiles.append(t)
arrays.append(image)

However, I'm not sure parallelisation would speed up things. It would if the bottleneck was in processing the tiles, but my hunch is that most of the time is spent on download/latency issues. If that's the case, wouldn't parallelisation not solve it (if anything add more downloading load for the same bandwidth)?

from contextily.

tcihak-fqa avatar tcihak-fqa commented on September 3, 2024

Thanks for the response! So I'm using contextily to generate static maps from a web api that has high bandwidth. I agree that most of the time is spent on I/O latency.

from contextily.

JacobJeppesen avatar JacobJeppesen commented on September 3, 2024

I have actually been considering making a pull request for this exact functionality. I've been playing around with it in another project, and for large images it can make a rather big difference. The optimal number of parallel downloads differ quite a bit from endpoint to endpoint, but 8-16 is normally a good range. The code below can be used as a drop-in replacement for the for loop:

def bounds2img(
        w, s, e, n, zoom="auto", source=None, ll=False, wait=0, max_retries=2, num_parallel_tile_downloads=16
):
.
.
.
    # download and merge tiles
    # tiles = []
    # arrays = []
    # for t in mt.tiles(w, s, e, n, [zoom]):
    #     x, y, z = t.x, t.y, t.z
    #     tile_url = provider.build_url(x=x, y=y, z=z)
    #     image = _fetch_tile(tile_url, wait, max_retries)
    #     tiles.append(t)
    #     arrays.append(image)
    from joblib import Parallel, delayed  # This should go to the top of the file
    tiles = list(mt.tiles(w, s, e, n, [zoom]))
    tile_urls = [provider.build_url(x=tile.x, y=tile.y, z=tile.z) for tile in tiles]
    max_num_parallel_tile_downloads = 32
    # Note that num_parallel_tile_downloads has been added as an argument to the function
    if num_parallel_tile_downloads < 1 or num_parallel_tile_downloads > max_num_parallel_tile_downloads:
        raise ValueError(
            f"num_parallel_tile_downloads must be between 1 and {max_num_parallel_tile_downloads}"
        )
    arrays = \
        Parallel(n_jobs=num_parallel_tile_downloads, prefer="threads")(
            delayed(_fetch_tile)(tile_url, wait, max_retries) for tile_url in tile_urls)

    merged, extent = _merge_tiles(tiles, arrays)
.
.
.

I just tested it in the intro_guide.ipynb notebook by downloading an extended version of the ghent image in the Coordinate-based searches section, with the following code:

west, south, east, north = (
    3.616218566894531,
    50.98912458110244,
    5.8483047485351562,
    54.13994019806845
             )
import time
start_time = time.time()
ghent_img, ghent_ext = cx.bounds2img(west, 
                                     south, 
                                     east, 
                                     north, 
                                     ll=True, 
                                     zoom=11,
                                     source=cx.providers.Stamen.Toner,
                                     num_parallel_tile_downloads=8
                                    )
print(f"Download time: {time.time() - start_time}")
ghent_img.shape 

Note that I had to out-comment the @memory.cache decorator for the _fetch_tile function during the test.

The shape of the image was (7680, 3584, 4), and the download times were:
num_parallel_tile_downloads=1: 69.49s
num_parallel_tile_downloads=2: 36.11s
num_parallel_tile_downloads=4: 18.32s
num_parallel_tile_downloads=8: 9.55s
num_parallel_tile_downloads=16: 5.02s
num_parallel_tile_downloads=32: 2.92s

I don't think there should be any downside to always do it in parallel (the overhead should be minimal), as long as the number of parallel downloads don't bomb the endpoint. A default value of 16 is a good starting point in my experience, with the different endpoints I've tested. That normally gives almost linear improvements. Above that, it differs quite a bit.

That ended up being quite a bit of text. Hopefully it's useful 😉

Let me know if you would like a pull request with the implementation 👍

from contextily.

JacobJeppesen avatar JacobJeppesen commented on September 3, 2024

Just made a couple more tests with smaller images:

At zoom=9 the shape was (2048, 1024, 4) and download times were:
num_parallel_tile_downloads=1: 5.38s
num_parallel_tile_downloads=16: 0.50s

At zoom=7 the shape was (512, 512, 4) and download times were:
num_parallel_tile_downloads=1: 0.69s
num_parallel_tile_downloads=16: 0.17s (there's only 4 tiles so in practice we only do 4 parallel downloads)

I also tested the three zoom levels with the current for loop implementation. The download times were pretty much exactly the same as with num_parallel_tile_downloads=1.

from contextily.

tcihak-fqa avatar tcihak-fqa commented on September 3, 2024

This solution looks very nice Jacob! My vote would be to create a PR and get it into a release at some point.

from contextily.

JacobJeppesen avatar JacobJeppesen commented on September 3, 2024

Thanks @tcihak-fqa 😃

I've just added a pull request with the changes (#217).

If you'd like to use it now, you can install it with:

pip uninstall -y contextily
pip install git+https://github.com/JacobJeppesen/contextily@parallel_tile_downloads

from contextily.

tcihak-fqa avatar tcihak-fqa commented on September 3, 2024

Thanks Jacob.
I monkey patched your original solution and it seems to be working well. I haven't encountered any memory issues but the number of api requests has been light so far.

from contextily.

martinfleis avatar martinfleis commented on September 3, 2024

Closed by #217

from contextily.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.