Giter Club home page Giter Club logo

terracotta's Introduction

Tests Documentation Status codecov GitHub release PyPI release License Python versions

Logo

Terracotta is a pure Python tile server that runs as a WSGI app on a dedicated webserver or as a serverless app on AWS Lambda. It is built on a modern Python stack, powered by awesome open-source software such as Flask, Zappa, and Rasterio.

Read the docs | Try the demo | Explore the API | Satlas, powered by Terracotta | Docker Image

Why Terracotta?

  • It is trivial to get going. Got a folder full of cloud-optimized GeoTiffs in different projections you want to have a look at in your browser? terracotta serve -r {name}.tif and terracotta connect localhost:5000 get you there.
  • We make minimal assumptions about your data, so you stay in charge. Keep using the tools you know and love to create and organize your data, Terracotta serves it exactly as it is.
  • Serverless deployment is a first-priority use case, so you don’t have to worry about maintaining or scaling your architecture.
  • Terracotta instances are self-documenting. Everything the frontend needs to know about your data is accessible from only a handful of API endpoints.

The Terracotta workflow

1. Optimize raster files

$ ls -lh
total 1.4G
-rw-r--r-- 1 dimh 1049089 231M Aug 29 16:45 S2A_20160724_135032_27XVB_B02.tif
-rw-r--r-- 1 dimh 1049089 231M Aug 29 16:45 S2A_20160724_135032_27XVB_B03.tif
-rw-r--r-- 1 dimh 1049089 231M Aug 29 16:46 S2A_20160724_135032_27XVB_B04.tif
-rw-r--r-- 1 dimh 1049089 231M Aug 29 16:56 S2A_20170831_171901_25XEL_B02.tif
-rw-r--r-- 1 dimh 1049089 231M Aug 29 16:57 S2A_20170831_171901_25XEL_B03.tif
-rw-r--r-- 1 dimh 1049089 231M Aug 29 16:57 S2A_20170831_171901_25XEL_B04.tif

$ terracotta optimize-rasters *.tif -o optimized/

Optimizing rasters: 100%|██████████████████████████| [05:16<00:00, file=S2A_20170831_...25XEL_B04.tif]

2. Create a database from file name pattern

$ terracotta ingest optimized/S2A_{date}_{}_{tile}_{band}.tif -o greenland.sqlite
Ingesting raster files: 100%|███████████████████████████████████████████| 6/6 [00:49<00:00,  8.54s/it]

3. Serve it up

$ terracotta serve -d greenland.sqlite
 * Serving Flask app "terracotta.server" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://localhost:5000/ (Press CTRL+C to quit)

4. Explore the running server

Manually

You can use any HTTP-capable client, such as curl.

$ curl localhost:5000/datasets?tile=25XEL
{"page":0,"limit":100,"datasets":[{"date":"20170831","tile":"25XEL","band":"B02"},{"date":"20170831","tile":"25XEL","band":"B03"},{"date":"20170831","tile":"25XEL","band":"B04"}]}

Modern browsers (e.g. Chrome or Firefox) will render the JSON as a tree.

Interactively

Terracotta also includes a web client. You can start the client (assuming the server is running at http://localhost:5000) using

$ terracotta connect localhost:5000
 * Serving Flask app "terracotta.client" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5100/ (Press CTRL+C to quit)

Then open the client page (http://127.0.0.1:5100/ in this case) in your browser.

preview

Development

We gladly accept bug reports and pull requests via GitHub. For your code to be useful, make sure that it is covered by tests and that it satisfies our linting practices (via mypy and flake8).

To run the tests, just install the necessary dependencies via

$ pip install -e .[test]

Then, you can run

$ pytest

from the root of the repository.

terracotta's People

Contributors

atanas-balevsky avatar bertearazvan avatar bradh avatar brianpojo56 avatar chapmanjacobd avatar charalamm avatar danmindru avatar denizyil avatar dependabot[bot] avatar dionhaefner avatar ecomodeller avatar hummeltech avatar j08lue avatar jeroenderks avatar kiksekage avatar mrpgraae avatar nickeopti avatar panakouris avatar pietertolsma avatar serj90 avatar tomalrussell avatar vlro avatar xanderazuaje avatar yuhangch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terracotta's Issues

daskify metadata computation

We have a project where we want to use Terracotta to serve up some huge watermasks.
There's no way we can load an entire file into memory and do computations (a 32gb machine fails when computing the metadata), this is of course no problem for serving the files, as they are cloud-optimized.

However, the metadata computation when creating the database still assumes that the entire file fits into memory and then some. So we should use Dask to chunk the computations when sizes exceed the memory limit.

To speed up the common case (where files fit into memory) we could do this only when a MemoryError is thrown. Or we could set a memory limit that we think is reasonable and always chunk the files such that we never exceed that and then maybe decrease it if we hit a MemoryError. Thoughts?

False-color support

It should be possible to choose the mapping from band to RGB, in a multi-band raster.

Presumably, the best method would be for the client to pass the mapping as HTTP query parameters.
Additionally, we could have a mapping from band name (e.g. NIR) to band number in raster. This mapping could be specified in the dataset configuration. The client could then specify the false color mapping as something like ?r=nir&g=blue&b=green.

This issue is dependent on / related to #12.

Figure out how to handle previews

Possibilities:

  • Store in the database as base64-encoded binary blob. Might lead to significantly larger databases though.
  • Store a file path in the database. But leading where? This would require additional user input.
  • Generate previews on the fly through /rgb or /singleband with a low zoom level, or add another API endpoint that reads a whole dataset (as opposed to an XYZ tile).

Figure out how to handle categorical data

Challenges:

  • values must be mapped to colors consistently
  • stretching does not make sense
  • legend must be able to return categories
  • whether a dataset is categorical or not must be known at ingestion time
  • or is there a way to provide most of this while keeping terracotta agnostic of categories?
  • how big of a use case is categorical data in the real world™️?

Add pagination for bulk requests

Returning too many rows overloads both frontend and backend. This is usually solved by introducing a page and limit parameter to iterate through results.

Steps to implement:

  • Add page parameter to /datasets schema
  • Add global query limit setting
  • Add some LIMIT and offset to SQL queries

Out of memory when serving large rasters

I used an overview to compute metadata, to get around the issue in #49. When I serve the data in Terracotta, I sometimes see this:

[2018-08-28 14:41:43,573] ERROR in app: Exception on /singleband/20171231/3/3/5.png [GET]
Traceback (most recent call last):
  File "/home/phgr/.conda/envs/terracotta/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/phgr/.conda/envs/terracotta/lib/python3.6/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/phgr/.conda/envs/terracotta/lib/python3.6/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/phgr/.conda/envs/terracotta/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/home/phgr/.conda/envs/terracotta/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/phgr/.conda/envs/terracotta/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/phgr/terracotta/terracotta/api/flask_api.py", line 52, in inner
    return fun(*args, **kwargs)
  File "/home/phgr/terracotta/terracotta/api/singleband.py", line 83, in get_singleband
    parsed_keys, tile_xyz, **options
  File "/home/phgr/terracotta/terracotta/handlers/singleband.py", line 35, in singleband
    tilesize=tile_size)
  File "/home/phgr/terracotta/terracotta/xyz.py", line 27, in get_tile_data
    return driver.get_raster_tile(keys, bounds=target_bounds, tilesize=tilesize, nodata=nodata)
  File "/home/phgr/terracotta/terracotta/drivers/base.py", line 274, in get_raster_tile
    nodata=nodata
  File "/home/phgr/.conda/envs/terracotta/lib/python3.6/site-packages/cachetools/__init__.py", line 87, in wrapper
    v = method(self, *args, **kwargs)
  File "/home/phgr/terracotta/terracotta/drivers/base.py", line 27, in inner
    return fun(self, *args, **kwargs)
  File "/home/phgr/terracotta/terracotta/drivers/base.py", line 208, in _get_raster_tile
    src.crs, target_crs, src.width, src.height, *src.bounds
  File "/home/phgr/.conda/envs/terracotta/lib/python3.6/site-packages/rasterio/env.py", line 363, in wrapper
    return f(*args, **kwds)
  File "/home/phgr/.conda/envs/terracotta/lib/python3.6/site-packages/rasterio/warp.py", line 418, in calculate_default_transform
    src_crs, dst_crs, width, height, left, bottom, right, top, gcps)
  File "rasterio/_warp.pyx", line 646, in rasterio._warp._calculate_default_transform
  File "rasterio/_io.pyx", line 1664, in rasterio._io.InMemoryRaster.__cinit__
  File "rasterio/_err.pyx", line 188, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OutOfMemoryError: memdataset.cpp, 1545: cannot allocate 5816105575 bytes

Which becomes a 500 response. It happens when I zoom out a bit, which might indicate that this could be a problem with loading the overviews. The innermost (highest res) overview is 43846x33163, which corresponds to a size of 1.45 GB (the raster is uint8), so the attempted allocation of 5.8 GB looks like a cast to some 32-bit size dtype of the innermost overview.

Add parallel preprocessing capabilities

Preprocessing is pretty slow on large rasters. Processing several blocks in parallel could mitigate that. Alternatively, we can process multiple files in parallel (e.g. in optimize_rasters).

Introduce multiqueries for dataset lookup

To make it easier to scale to large data collections, we should support queries in /datasets such as

/datasets?year=[2016,2018]

which would return all datasets from 2016 and 2018.

Another consideration could be range-based queries, but that would require the introduction of per-key datatypes, which is something I'd like to avoid for now.

Code Review

Great job so far! Here's the things I stumbled upon:

Documentation

  • Be consistent: timestep vs timestamp
  • I don't think we need to explain the layout of the option files; an example is sufficient.
  • I think it is tremendously helpful to see example responses of the API calls early on.

Configuration

  • Why split path and regex? Just have path_regex.
  • Not sure about the yes/no syntax for boolean settings. How does e.g. Apache or Nginx handle that?

CLI

  • Config path could be a positional argument
  • Please wrap the config path in os.expanduser for us poor windows souls
  • 💡: Accept rasters from the command line to quickly serve up anything: terracotta *.tif (then open a leaflet map in the browser, with the data already added as a layer, for the ultimate wow effect 😄)

API

  • I don't think the API queries should include terracotta. You would either run this as a Flask app on its own port, or configure the proxy in your webserver.
  • Using a non-timestep API endpoint for a timestepped dataset causes an uncaught exception (500 server error; should give "Bad Request" or so)

I'll have a look at the actual code and do some profiling later. I'll update this issue with my findings.

Let users supply key names before deployment?

Currently, key names are read from the database. Alternatively, we could require users to supply both a database and the associated keys.

Pro:

  • API spec can include key names, and becomes fully OpenAPI compliant
  • API endpoints can fail immediately (without database lookup) if request supplies the wrong keys
  • One less database lookup per request, cleaner code in driver (one less table in database)

Con:

  • Either no guaranteed consistency between keys and database structure (if keys are directly supplied by the user) or requires database connection from deploy machine (if read from DB during deployment)
  • Need to introduce a factory for every API route and request schema

Contrast parameters

The client needs to be able to adjust the contrast of the images through query parameters.

For grayscale / single-band this can be easily done, by passing contrast_min and contrast_max query parameters to the existing contrast_stretch function.

For RGB / false-color images, this could be significantly more complicated.

API Spec

We should settle on a stable API that we can document for the front-end developers, as soon as possible.

Add TTL for database retrieval

In certain cases (empty images) most time is spent retrieving remote databases, even if hashes are matching. I propose to cache remote databases with cachetools.TTLCache, so Terracotta only needs to check for a database update every 10 minutes or so.

Revisit up- and downsampling

  • Do we really need two separate options?
  • At which zoom level should the breakpoint between up- and downsampling occur?

GDAL errors for very low zoom levels

If a dataset collapses to only a handful of pixels, GDAL fails to read it. We should check for that case beforehand and just return an empty image.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.