stac-utils / stac-fastapi Goto Github PK

View Code? Open in Web Editor NEW

224.0 224.0 99.0 1.98 MB

STAC API implementation with FastAPI.

Home Page: https://stac-utils.github.io/stac-fastapi/

License: MIT License

Dockerfile 0.31% Makefile 0.25% Python 98.73% Shell 0.71%

stac-fastapi's People

Stargazers

Watchers

Forkers

vincentsarago c-core-labs lossyrob kylebarron waseem-aidash bitner apparell tomaugspurger mangalbhaskar radiantearth aliasmrchips philvarner moradology rsmith013 davidraleigh zstatmanweil twig-aidash jaysnm edkeeble darrenwiens geogubd nmandery aaronxsu mahir-sparkess asgerpetersen robintw septima nicolailolansen drnextgis cehbrecht alukach cuulee tesselo mmcfarland sreimond-eodc sparkgeo borism iliion jonhealy1 kirill888 gnosys-ccoupal hall-b remicres satelliteapplicationscatapult geoplatform jamesoconnor nkleinbaer carderne echeipesh pt-cervest geosynopsis 69blade69 forrestfwilliams hopkina easierdata norahbrown asfhyp3 c-wygoda tonybaloney bmcandr xwl742 fl-w eseglem danti-ai winnerlbm pjhartzell emiliodangelo keul jpolchlo romeokienzler bopen jeffgillan geohardtke spatialdays biodiversitequebec constantinius sdfidk renzodf roger120981 devakellc ranchodeluxe stijncaerts insilecoinc vprivat-ads thomas-maschler rhysrevans3 juliayun23 zachcoleman dylanlee pygeek33 giorgiobasile ukeodhp jon9595 mneagul tjellicoe-tpzuk adelenelai captaincoordinates

stac-fastapi's Issues

landing page from from stac_fastapi.types.core import BaseCoreClient is generic

I would propose moving the landing page method from the sqlalchemy code into the BaseCoreClient and removing the abstractmethod decorator as this code is generic.

relock pipfiles, maybe after pypi release

When trying to install one namespace package from another, pipenv cannot find the package because it is not available on pypi. Not a huge deal because we can still install from the setup.py, but it would nice to keep the lock files up to date.

Another solution is to get pipenv to install from the local source, but I've so far been unsuccessful at doing this.

Update TiTiler (or pin rio-tiler)

TiTiler is pinned to 0.1a2 which is incompatible I think with latest rio-tiler version. we should either pin rio-tiler (<= 2.0.0rc1) or update to TiTiler 0.1.0a12 which has some nice improvement

also worth noting that we remove any use of pkg_ressources in https://github.com/developmentseed/titiler/blob/master/CHANGES.md#010-alpha7-2020-10-13

you can now replace
https://github.com/arturo-ai/arturo-stac-api/blob/457cf8250b78c1e6e6fd519589b99ccee04eff43/stac_api/api/extensions/tiles.py#L17-L28

with

from titiler.templates import templates

rename repo to stac-fastapi

rebuild docs on changes to *py files

We are building API docs with pdocs but the github action to rebuild documentation on master builds doesn't trigger on changes to python files, which means the API docs may be out of date if function signatures are changed, docstrings are updated, or new functionality is added by the PR being merged.

https://github.com/stac-utils/stac-fastapi/blob/master/.github/workflows/deploy_mkdocs.yml#L9-L11

bulk transactions

STAC spec defines transactions endpoints for POSTing new data. This is quite slow for bulk data ingest, as these operations are atomic and require a single INSERT and commit for each row. Also this is currently done through sqlalchemy's ORM which is slow for bulk ingest (sqlalchemy core is much faster for this).

It would be great to expose a bulk transactions extension which allows for more efficient ingest of large amounts of items. I imagine this functionality wouldn't be exposed through the API layer, but instead provide a way to load data server-side without having to write a custom script every time.

Ref https://docs.sqlalchemy.org/en/13/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow

Docs updates

When inserting records, the collection id in each item (item['collection']) needs to match the global id
If you pull a new version, you may need to use docker-compose up --build to rebuild the underlying containers, e.g. if the dependencies have changed.

PUT request for item doesn't update the geometry

I am running into this issue in updating existing items while making PUT request on /collections/{collectionId}/items

Found this in the code where we are dropping geometry before forming update query. Is this an expected behaviour as per STAC spec?

use a mocked out database for test cases

timvt a good example of this: https://github.com/developmentseed/timvt

jaccards API extension

Its really useful to calculate the jaccard score (IOU) when doing spatial queries against really any catalog. The score returns a 1.0 if the search geometry and item geometry are identical, and a 0.0 if the two geometries are very different. You can make the jaccard score inclusive by instead using the intersection of the search and item geometry for the calculation. This effectively returns a score of 1.0 if the search geometry is completely contained by an asset.

This is really useful for clients which care about how item geometries compare to the request geometry past the typical intersects/contains operations. A good example of this is mosaic tiling, where the goal is to use some sort of index (whether it is mosaicjson, postgres etc.) to reduce number of HTTP requests sent to rasters by minimizing the search space. In this case, the tiler can sort the /search response by score and more intelligently send requests to fill that particular tile.

This is one of the downsides of mosaicjson. Because you have to seed it at a particular quadkey (which may or may not align well with your data), the tiler really doesn't have an understanding of what assets cover which tiles at higher zooms than the seed. The result is the tiler has to naively send tile requests to every asset until the particular tile is full which could be avoided if the index does a better job minimizing the search space

Allow exclusion of default fields!

In
https://github.com/arturo-ai/arturo-stac-api/blob/7537eb0ffe179f9e9886a5c80e57a271e7d70aa6/stac_api/config.py#L26-L35

we define the minimal default field to return (so pydantic is happy). In some personal use case I've made (/search) requests where I wanted only the id returned or the item without the geometry...

cc @geospatial-jeff

stac-fastapi does not provide CORS header (Cross-Origin Resource Sharing)

Hi all,

at first thanks for this great software!

When using "STAC browser" on a catalog created with stac-fastapi I get an error message that normally indicates that cors headers are not present.

"NetworkError when attempting to fetch resource.
Please note that some servers don't allow external access via web browsers (e.g., when CORS headers are not present).
Errored URL: https://localhost:8081"

Any comment is appreciated! If you point me to a location in the source code I can also try to include it.

Provide configurable title/version for service

StacAPI should accept title and version parameters which should then be used in OpenAPI generation

Delete collection fails with 'collection does not exist'

r = requests.delete('http://localhost:8081/collections/sentinel-s2-l2a')
r.json()
# {'detail': 'collection does not exist'}

But the collection does exist

r = requests.get('http://localhost:8081/collections/sentinel-s2-l2a')
r.json()

{'id': 'sentinel-s2-l2a',
 'description': 'Sentinel-2a and Sentinel-2b imagery, processed to Level 2A (Surface Reflectance)',
 'stac_version': '1.0.0-beta.2',

switch to stac-utils

Update setup.py to point to new repo.
Do a pypi release to update pypi metadata.
Mention arturo in the readme

move client.get_search to the baseclass

https://github.com/stac-utils/arturo-stac-api/blob/d46da52b484a3bdb802c169c9c4f68a4bfbe0997/stac_api/clients/postgres/core.py#L192-L260

This code is not linked to any DB call ;-)

switch to attrs

We are using dataclass when we really need to be using attrs. Mostly because attrs is more flexible when it comes to the definition of optional and optionally required attributes.

Originally posted by @geospatial-jeff in #71 (comment)

Help to active Titiler routes in the stac-fastapi new version

First off, thanks so much for this superb Python library. I have a small dataset that I would like to make public available to everyone using stac-api.

I was trying to activate the OGC and Titiler routes in my STAC FastAPI app, like in this video.

from stac_api.config import ApiSettings
from stac_api.api import create_app

settings = ApiSettings(
    add_ons=["tiles"]
)
app = create_app(settings)

But it seems that the API changes a bit, I was trying to solve it adding TilesExtension to the stac_fastapi/server/app.py:

from stac_fastapi.api.app import StacApi
from stac_fastapi.extensions.core import (
    FieldsExtension,
    QueryExtension,
    SortExtension,
    TransactionExtension,
)
from stac_fastapi.extensions.third_party import TilesExtension, BaseTilesClient
from stac_fastapi.extensions.third_party import BulkTransactionExtension
from stac_fastapi.sqlalchemy.config import SqlalchemySettings
from stac_fastapi.sqlalchemy.core import CoreCrudClient
from stac_fastapi.sqlalchemy.session import Session
from stac_fastapi.sqlalchemy.transactions import (
    BulkTransactionsClient,
    TransactionsClient,
)

settings = SqlalchemySettings()
session = Session.create_from_settings(settings)
api = StacApi(
    settings=settings,
    extensions=[
        TilesExtension(client=BaseTilesClient()),
        TransactionExtension(client=TransactionsClient(session=session)),
        BulkTransactionExtension(client=BulkTransactionsClient(session=session)),
        FieldsExtension(),
        QueryExtension(),
        SortExtension()
    ],
    client=CoreCrudClient(session=session),
)
app = api.app

But it doesn't work. I'm a very basic user, sorry if it is a silly question, but I will over appreciated any help.

Support collection summaries

Collection summaries are not stored in the database, but they really should to enable collection search for stac-index

remove `PaginationClient`, support paging in core

This was written as it's own class because, before the release of v1.0.0-beta.1, paging was listed as an api extension. Since then this has been changed and pagination has become a part of core (I think for alignment with OGC), so it makes sense to push the pagination code into core here as well.

Fix GET /search pagination

The GET /search pagination link is currently returning a POST pagination link

Add stac_extensions to orm models

stac_extensions is included in the alembic migrations but not present in the item and collection database models

items does not display the stac_extensions after deployment

Hi again :)

Altough the file tests/data/joplin/index.geojson has some stac_extension (eo and proj), they are not imported into the database in the deployment, so they are not shown at the end. I think maybe it is related to the STAC Pydantic models

index.geojson

        {
            "id": "f2cca2a3-288b-4518-8a3e-a4492bb60b08",
            "type": "Feature",
            "collection": "joplin",
            "links": [],
            "geometry": {
                "type": "Polygon",
                "coordinates": [
                    [
                        [
                            -94.6884155,
                            37.0595608
                        ],
                        [
                            -94.6884155,
                            37.0332547
                        ],
                        [
                            -94.6554565,
                            37.0332547
                        ],
                        [
                            -94.6554565,
                            37.0595608
                        ],
                        [
                            -94.6884155,
                            37.0595608
                        ]
                    ]
                ]
            },
            "properties": {
                "proj:epsg": 3857,
                "orientation": "nadir",
                "height": 2500,
                "width": 2500,
                "datetime": "2000-02-02T00:00:00Z",
                "gsd": 0.5971642834779395
            },
            "assets": {
                "COG": {
                    "type": "image/tiff; application=geotiff; profile=cloud-optimized",
                    "href": "https://arturo-stac-api-test-data.s3.amazonaws.com/joplin/images/may24C350000e4102500n.tif",
                    "title": "NOAA STORM COG"
                }
            },
            "bbox": [
                -94.6884155,
                37.0332547,
                -94.6554565,
                37.0595608
            ],
            "stac_extensions": [
                "eo", <----------- HERE
                "proj" <----------- HERE
            ],
            "stac_version": "1.0.0-beta.2"
        }, ....

Run:

docker-compose up --build

Local Browser

http://127.0.0.1:8081/collections/joplin/items/29c53e17-d7d1-4394-a80f-36763c8f42dc

item squema

remove `arturo` from isort config

https://github.com/stac-utils/stac-fastapi/blob/master/tox.ini#L16 there aren't any more arturo modules

Ensure urljoins with base_url use relative paths

In several instances a urljoin is used with Fast API's base_url, where a leading / is used during the join. This works if there's no root_path set, but in the case where base_url contains a path prefix, the leading / makes the resulting join based on the host information and disregards the root_path.

Examples:
BaseLinks.root - joining to "/" erases the root_path, should just use str(self.base_url)

CollectionLinks.parent
ItemLinks.self - removing the leading slash will fix this.

The goal of this issue is to find all the instances where a urljoin is used with a leading slash, and joining to a relative path instead (or avoiding a join in the case where the base_url can be used directly).

Ignore conflict errors in ingest script

When building the docker-compose stack (docker-compose up) we run a python script which ingests a sample dataset into the database (https://github.com/arturo-ai/arturo-stac-api/blob/master/scripts/ingest_joplin.py). If the stack is built when the database container already exists (maybe from a previous build), the POST request to create a new collection returns a 409 Conflict which causes the ingest script to raise an exception.

This exception is confusing because it isn't really an error, it just implies that the collection is already in the database which is after all the purpose of the script in the first place. I think a good solution is to only raise an exception on 5XX codes.

data.items datetime is timestamp without time zone

Timestamp without time zone should always be avoided as it is ambiguous as to the timezone represented by the field and can be lossy at the DST transition time.

Update to stac version 1.0.0-beta.2

Stac pydantic 1.3.x supports 1.0.0-beta.2.

[needs discussion] follows namespaces package convention ?

with the recent update we've split the module to multiple sub-packages (namespaced). In the current repo architecture those packages are placed at the top-level and then dynamically linked in /stac-fastapi.

While the current structure gives a quick overview of all the sub-packages I'm not sure it's well aligned with the namespace convention.

cc @geospatial-jeff @kylebarron

specify async endpoints

For asyncpg or sqlalchemy>1.4 we need to be able to specify async def endpoints so the code may be executed by the event loop rather than a background thread. Right now it only supports def (sync) endpoints.

https://github.com/stac-utils/stac-fastapi/blob/master/stac_fastapi_api/stac_fastapi/api/routes.py#L29

support custom data models

Currently there is no good way to use a different data model than what is defined in models/database.py. Currently this is difficult to change in a way that is sustainable long term, should be much easier once #57 is resolved.

stac validator

Running stac-validator against the app produces the following results:

Landing Page

$ stac_validator http://localhost:8081/

[
    {
        "path": "http://localhost:8081/",
        "asset_type": "catalog",
        "valid_stac": false,
        "error_type": "KeyError",
        "error_message": "Key Error: 'id'"
    }
]

Collection

$ stac_validator http://localhost:8081/collections/joplin

[
    {
        "path": "http://localhost:8081/collections/joplin",
        "asset_type": "collection",
        "id": "joplin",
        "validated_version": "1.0.0-beta.2",
        "valid_stac": true
    }
]

Item

$ stac_validator http://localhost:8081/collections/joplin/items/047ab5f0-dce1-4166-a00d-425a3dbefe02
[
    {
        "path": "http://localhost:8081/collections/joplin/items/047ab5f0-dce1-4166-a00d-425a3dbefe02",
        "asset_type": "item",
        "id": "047ab5f0-dce1-4166-a00d-425a3dbefe02",
        "validated_version": "1.0.0-beta.2",
        "valid_stac": true
    }
]

Just need to add an id to the landing page.

`TilesClient` shouldn't subclass `CoreCrudClient`

makes sense. TilesClient arguably shouldn't subclass CoreCrudClient either. I think composition is a better pattern here.

Originally posted by @geospatial-jeff in #97 (comment)

@kylebarron

align OpenAPI and Landing Page titles

they are parameterized in different places, but should probably be set to the same thing.

Request to /collection/<missing>/items returns 200

If I make a request to an endpoint for a collection that doesn't exists, I get a 404

In [14]: import requests

In [19]: r = requests.get("https://pct-pqe-staging.westeurope.cloudapp.azure.com/stac/v1/collections/not-a-collection")

In [20]: r.status_code
Out[20]: 404

But if I make a request to that collection's /items I get a 200, and the response includes an empty FeatureCollection.

In [21]: r = requests.get("https://pct-pqe-staging.westeurope.cloudapp.azure.com/stac/v1/collections/not-a-collection/items")

In [22]: r.status_code
Out[22]: 200

In [23]: r.json()
Out[23]:
{'type': 'FeatureCollection',
 'features': [],
 'links': [],
 'context': {'returned': 0, 'matched': 0}}

I wanted to verify that this is the expected behavior. I didn't find anything in the API spec, but I admittedly didn't look too closely.

Reported in TomAugspurger/stac-dask-discussion#1

add a .dockerignore

upgrade sqlalchemy to 1.4 for async/await

would also require a sprinkling of async/await syntax, as well as a review of the code to make sure any blocking calls are being run in a separate thread.

decouple backends from api layer

There are still some places where the backend is coupled to the API. For example, the sqlalchemy engine and session are created during app startup. This coupling makes it difficult to support additional backends, and forces us to do some hacky things in the code.

A similar treatment was applied to api extensions in #54.

use a single endpoint factory

https://github.com/arturo-ai/arturo-stac-api/blob/fb47dedfbc45df4488f7fa169b76ca1b30a420f1/stac_api/api/routes.py

Endpoint factories wrap a callable in a function that can be executed as a FastAPI route. Doing so lets us "decorate" the callable with specific request/response models. Currently we have two factories, one for routes which define the request using a dataclass and one as a pydantic model.

Dataclasses are used because of their support for dependency injection.
Pydantic models are used for static types (no dependency injection).

It would be much better to have a single factory instead which means either (1) use a single request type for all routes or (2) one factory that can understand both dataclass + pydantic models.

allow more geometry types for search

It seems that right now only Polygon (POST) and bbox (GET) are supported
https://github.com/arturo-ai/arturo-stac-api/blob/7d4c9572981e935de2521441878f3ffb78f6b9b7/stac_api/clients/postgres/core.py#L321-L327

https://github.com/arturo-ai/arturo-stac-api/blob/master/stac_api/models/schemas.py#L228-L235

the STAC API specs says:

Searches items by performing intersection between their geometry and provided GeoJSON geometry. All GeoJSON geometry types must be supported.
ref:
https://github.com/radiantearth/stac-api-spec/blob/f64a08235cb0ae04dfdb37bd8d6940c3814d057c/item-search/README.md#query-parameter-table

cc @kylebarron @geospatial-jeff

/collections route returns array

The /collections route according to the API spec returns an object with collections and links keys --

The deployed version of the API that I saw this on was based on this branch so I don't know if this is also true on master. If not I won't be offended by a quick close.

`ItemUri`s should provide collection IDs as well as item IDs

The URL structure for the STAC API makes it clear that collection IDs are potentially required to access items by their ID. A small change to the ItemUri model should do the trick: https://github.com/stac-utils/stac-fastapi/blob/master/stac_fastapi/api/stac_fastapi/api/models.py#L66

Add note to README that `docker-compose up` won't work when other postgres is running

I kept getting

stac-api     | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:  role "username" does not exist

when running docker-compose up.

It turned out (with help from this SO answer) that I had a global system Postgres on my Mac that was also running on port 5432. So the Postgres in Docker was being hidden by the system Postgres. When I shut down the system postgres, docker-compose up worked.

refactor PostgresClient

Make migration depend on stac-api in docker compose?

The migration command in docker-compose.yml is:
https://github.com/arturo-ai/arturo-stac-api/blob/4f0ba30a2300fc3273aca83ed6e118def2529b75/docker-compose.yml#L48-L49

It looks like the sleep 10 is just to make sure the app service is deployed first? Could you just add

    depends_on:
      - database
      - app

to the migration config? Or would that not work?

Default settings exclude stac_version from the Landing page.

With the default settings, a query to the root catalog doesn't include stac_version. I believe that the spec says it should be included: https://github.com/radiantearth/stac-api-spec/tree/master/core

$ docker-compose build
$ docker-compose up

$ curl --silent http://localhost:8081 | jq .stac_version
null

I think this is because the landing page sets response_model_exclude_unset=True. If I make this change

diff --git a/stac_fastapi_api/stac_fastapi/api/app.py b/stac_fastapi_api/stac_fastapi/api/app.py
index 56b9493..a336900 100644
--- a/stac_fastapi_api/stac_fastapi/api/app.py
+++ b/stac_fastapi_api/stac_fastapi/api/app.py
@@ -99,7 +99,7 @@ class StacApi:
             name="Landing Page",
             path="/",
             response_model=LandingPage,
-            response_model_exclude_unset=True,
+            response_model_exclude_unset=False,
             response_model_exclude_none=True,
             methods=["GET"],
             endpoint=create_endpoint_with_depends(

then we're able to get the STAC version

$ docker-compose build
$ docker-compose up

$ curl --silent http://localhost:8081 | jq .stac_version
"1.0.0-beta.2"

Does that seem like the right fix, or will it have unintended consequences? Are there other places we should look at?