Giter Club home page Giter Club logo

openeo-eodc-driver's Introduction

openEO EODC Driver

microservice unittest badge

  • openEO version: 1.0.x (currently in development)
  • openEO version: 0.4.2 (legacy, see openeo-openshift-driver tag v1.1.2)

alt text

Information

This repository contains a fully dockerized implementation of the openEO API (openeo.org), written in Python. The web app implementation is based on Flask, while openEO functionalities are implemented as micro-services with Nameko.

Additionally, three docker-compose files implement a CSW server for data access (/csw), an Apache Airflow workflow management platform coupled to a Celery cluster for data processing (/airflow), and UDF Services for the R and Python languages (/udf). The whole setup can be installed on a laptop to simplify development, but each service can run on independent (set of) machines.

Start the API

In order to start the API web app and its micro-services, a simple docker-compose up is needed. However some environment variables must be set first.

Configuration

First some environment variable need to be set.

Copy the sample.env file and the /sample-envs folder to to .env and /envs, respectively. The latter are included in the .gitignore by default. Do not change this. Variables in the .envfile are used in docker-compose.yml when bringing up the project, while variables in the individual env files in /envs are available within each respective container. The following is the list of files to update.

It should be mentioned that most of the env variables are prefixed with OEO_ those variables are used by dynaconf - the configuration management tool we use. The env variables which are not prefixed are used outside of Python and are directly accessed as environment variables without prior validation.

  • .env : note that you MUST create manually the folder specified for 'LOG_DIR'
  • envs/csw.env
  • envs/data.env
  • envs/gateway.env
  • envs/jobs.env
  • envs/files.env
  • envs/processes.env
  • envs/pycsw.env
  • envs/rabbitmq.env
  • envs/users.env

Then also copy /gateway/gateway/sample_openapi.yaml file to /gateway/gateway/openapi.yaml and edit servers and info sections, adding your urls, name and some description.

Bring up web app and services

For local development, you will need a docker network shared across the API, CSW and Airflow setups. Create one like this:

docker network create openeo-v1.0

Leave the name as is or update in the docker-compose_dev.yml files for the API, CSW and Airflow.

From the main folder, run the following command:

docker-compose -f docker-compose.yml -f docker-compose_dev.yml up -d

The docker-compose_dev.yml file is identical to docker-compose.yml, but additionally exposes some ports and assigns the containers to the docker network created above. Additionally, the bash functions in dev_openeo_sample can be used (after filling in the relevant fields) to start the services (Nameko) or the gateway (Flask) locally without Docker containers. In this case one can use breakpoints to debug.

Set up all databases with Alembic

A number of databases are set up with Alembic, namely one for users, one for process graphs and one for jobs. Make sure that the API is up and run the following command to initialize all databases:

bash init_dbs.sh

Add admin user

At least one admin user must exist in the users database, to allow using the API functionalities via the endpoints. This user must be added manually to the database. In order to do so, you need to connecto to the gateway container and run a sequence of commands.

docker exec -it openeo-users-db bash

Once in the container, connect to the database:

psql -U $OEO_DB_USER -p $OEO_DB_PORT -h localhost $OEO_DB_NAME

Before adding user, user profiles must be inserted as well as identity providers (for OpenIDConnect).

The following create two user profiles (profile_1, profile_2), with different data access. The specific profile name and fields in the data access depend on the implementation.

insert into profiles (id, name, data_access) values ('pr-19144eb0-ecde-4821-bc8b-714877203c85', 'profile_1', 'basic,pro');
insert into profiles (id, name, data_access) values ('pr-c36177bf-b544-473f-a9ee-56de7cece055', 'profile_2', 'basic');

Identity providers can be added with the following command:

insert into identity_providers (id, id_openeo, issuer_url, scopes, title, description) values ('ip-c462aab2-fdbc-4e56-9aa1-67a437275f5e', 'google', 'https://accounts.google.com', 'openid,email', 'Google', 'Identity Provider supported in this back-end.');

Finally, users can be added to the users database. In order to add a user for Basic auth, one first needs to create a hashed password. Execute the following in a Python console.

from passlib.apps import custom_app_context as pwd_context
print(pwd_context.encrypt("my-secure-password"))

Then back on the database command line, run the following replacing hash-password-goes-here with the output of the previous command (leave it wrapped in single quotes):

insert into users (id, auth_type, role, username, password_hash, profile_id, created_at, updated_at) values ('us-3eb63b58-9a04-4098-84d7-xxxxxxxxxxxx', 'basic', 'admin', 'my-username', 'hash-password-goes-here', 'pr-c36177bf-b544-473f-a9ee-56de7cece055', '2019-12-18 10:45:18.000000', '2019-12-18 10:45:18.000000');

A user for Basic auth with admin rights is now inserted in the database. Note that the profile_id matches the one of profile_2 above.

The following command creates a user with admin rights for OpenIDConnect auth:

insert into users (id, auth_type, role, email, profile_id, identity_provider_id, created_at, updated_at) values ('us-3eb63b58-9a04-4098-84d7-yyyyyyyyyyyy', 'oidc', 'admin', '[email protected]', 'pr-19144eb0-ecde-4821-bc8b-714877203c85', 'ip-c462aab2-fdbc-4e56-9aa1-67a437275f5e', '2019-12-18 10:45:18.000000', '2019-12-18 10:45:18.000000');

Note that the identity_provider_id matches the only one created above, and the profile_id matches the one of profile_1 above.

Add collections and processes

Currently, no collection and no process are available yet at the endpoints /collections and /processes.

Copy the sample-auth file to auth and fill the back-end URL and user credential (user with admin rights). Then run the following to add collections (sourced from the CSW server) and processes to the back-end:

source auth
python api_setup.py

Bring down web app and services

In order to bring down the API, run the following docker compose from the main folder:

docker-compose down

Tests

For development we provide a set of tests including unittests, linting and static type checking. Find more details here.

openeo-eodc-driver's People

Contributors

gunnarbusch avatar jankovicgd avatar lforesta avatar m-mohr avatar mklan avatar sophieherrmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

bgoesswe eodcgmbh

openeo-eodc-driver's Issues

Discovery endpoints should be available without login

Today I tried to include this backend in the openEO Hub, but the crawling process failed because all endpoints (except /) require login. I think that the discovery endpoints (/collections, /processes, /output_formats, /service_types) should be publicly available. Could you fix that so we can add this backend to the Hub? :)

Include properties for collections

Currently fetching a collection with /collections/<collection_id> yields an empty properties and other_properties. Extend the current implementation with new data

synchronous call limits too strict

@sophieherrmann we've talked about this. I'm just logging it here so that it's documented.

In my opinion the limits for the synchronous call are too strict.
On a quite small process graph I receive the error:

SERVER-ERROR: The supplied process graph is too big to be processed synchronously. Either run it as asynchronous job or try to use a shorter time span / smaller bounding box / less collections.

It's a nice error message. Definitely useful for the user.
But the message (or the limits) are misleading. The tested process graph uses one collection, looks at one pixel, has 3 nodes and ca. 25 time steps.
So the reductions in time span, bounding box and collections are not really possible.

Maybe you can think about the logic of defining the limits (you mentioned currently its nodes*collections), the error threshold used to trigger the error and maybe adapt the error message to the variables that define the limit.

Here the process graph I used:

{
  "id": "test",
  "process_graph": {
    "load_collection_WJSXI6057X": {
      "process_id": "load_collection",
      "arguments": {
        "id": "boa_sentinel_2",
        "spatial_extent": {
          "west": 11.5383,
          "east": 11.5383,
          "south": 46.4867,
          "north": 46.4867
        },
        "temporal_extent": [
          "2016-01-07T12:00:00Z",
          "2016-04-29T12:00:00Z"
        ],
        "bands": [
          "B04",
          "B08"
        ]
      }
    },
    "reduce_dimension_JADHX0569A": {
      "process_id": "reduce_dimension",
      "arguments": {
        "data": {
          "from_node": "load_collection_WJSXI6057X"
        },
        "reducer": {
          "process_graph": {
            "array_element_HROER8569Y": {
              "process_id": "array_element",
              "arguments": {
                "data": {
                  "from_parameter": "data"
                },
                "index": 1,
                "return_nodata": true
              }
            },
            "array_element_VQKRJ1421M": {
              "process_id": "array_element",
              "arguments": {
                "data": {
                  "from_parameter": "data"
                },
                "index": 0,
                "return_nodata": true
              }
            },
            "subtract_DRJET5871T": {
              "process_id": "subtract",
              "arguments": {
                "x": {
                  "from_node": "array_element_HROER8569Y"
                },
                "y": {
                  "from_node": "array_element_VQKRJ1421M"
                }
              }
            },
            "add_GADMZ0974A": {
              "process_id": "add",
              "arguments": {
                "x": {
                  "from_node": "array_element_HROER8569Y"
                },
                "y": {
                  "from_node": "array_element_VQKRJ1421M"
                }
              }
            },
            "divide_INLJN9314W": {
              "process_id": "divide",
              "arguments": {
                "x": {
                  "from_node": "subtract_DRJET5871T"
                },
                "y": {
                  "from_node": "add_GADMZ0974A"
                }
              },
              "result": true
            }
          }
        },
        "dimension": "bands",
        "context": null
      }
    },
    "save_result_CAOGW1055R": {
      "process_id": "save_result",
      "arguments": {
        "data": {
          "from_node": "reduce_dimension_JADHX0569A"
        },
        "format": "NetCDF",
        "options": [
          "{}"
        ]
      },
      "result": true
    }
  },
  "parameters": [],
  "returns": {
    "schema": {
      "type": "boolean"
  }
}
}

Add sample files for OIDC_DIR content

The docker-compose file uses a environment variable OIDC_DIR.

There should be an OIDC_DIR example directory included in the repo, with examples of all needed files inside.

Update arg_parser

Currently the data ArgParser has a set amount of products that have aliases for which it checks if they match. Adding a new product in the system also means adding a new product and alias for the ArgParser. Raises a following error when s3a_olP1_err is actually available:

{
    "code": 400,
    "id": "50becb12-08a8-4544-badc-a0981bd6d89c",
    "links": [
        "http://openeo.eodc.eu/redoc#tag/EO-Data-Discovery/paths/~1collections~1{name}/get"
    ],
    "message": "Product specifier 's3a_ol_1_err' is not valid.",
    "service": "data"
}

delete file/folder after sync_processing

I think the issue we had, is that we try to delete the file/folder while the processd file is still being sent as a response.
Is there a way to check if send_file() has completed passing the file? From the flask docs, I didn't find a useful parameter

All parameter schema's unecessarily have `"minItems": 0.0`

While comparing the VITO and EODC process listings for openEOPlatform/documentation#11 I noticed that all parameter/return schemas have a "minItems": 0.0 field, e.g.

    {
      "id": "absolute",
      "parameters": [
        {
          "name": "x",
          "schema": {
            "minItems": 0.0,
            "type": ["number", "null"]
          }
        }
      ],
      "returns": {
        "schema": {
          "minItems": 0.0,
          "type": ["number", "null"]
        }
      },

As far as I know jsonschema, minItems only makes sense for arrays (and it should be an integer, not float)

wrong error code on /jobs

When trying to get batch job listing, I get error "User ... does not exist and is not whitelisted.".

Fair enough I get this error when I'm indeed not whitelisted, but the error JSON has a couple of issues:

{"code":401,
"id":"347d303....1afa7",
"links":["http://openeo-dev.eodc.eu/redoc",""],
"message":"User stefaa... does not exist and is not whitelisted.",
"service":"gateway"}

dynaconf variables needed out of application

Env vars needed in the gateway or services are now checked with dynaconf and they need to have a prefix (OEO in our case).

In some cases an env variable is needed in the app (gateway or service) as well as out of it (e.g. a bash script). An example is the DB_USER var for the users_db, which is needed with dynaconf's preix OEO in the gateway, but also without it when running alembic upgrade head (alembic assumes this var to be simply DB_USER).
The same issue is present for other variable in the gateway and in the services.

We need to find a general solution (other than duplicating env vars with/without prefix).

process schema issues

Some more process schema issues I found while comparing the VITO and EODC process listings for openEOPlatform/documentation#11

Should non implemented functions return a validation error or a server error

Should the /validation endpoint return a successful response with validation errors and code 200?

Response

{
    "code": 500,
    "id": "3a9c9e07-e19d-4873-ae06-1346ff86e983",
    "links": [],
    "message": "'load_collection' is not in the current set of process definitions.",
    "service": "processes"
}

Test pg

{
  "process_graph": {
    "dc": {
      "process_id": "load_collection",
      "arguments": {
        "id": "Sentinel-2",
        "spatial_extent": {
          "west": 16.1,
          "east": 16.6,
          "north": 48.6,
          "south": 47.2
        },
        "temporal_extent": [
          "2018-01-01",
          "2018-02-01"
        ]
      }
    },
    "bands": {
      "process_id": "filter_bands",
      "description": "Filter and order the bands. The order is important for the following reduce operation.",
      "arguments": {
        "data": {
          "from_node": "dc"
        },
        "bands": [
          "B08",
          "B04",
          "B02"
        ]
      }
    },
    "evi": {
      "process_id": "reduce",
      "description": "Compute the EVI. Formula: 2.5 * (NIR - RED) / (1 + NIR + 6*RED + -7.5*BLUE)",
      "arguments": {
        "data": {
          "from_node": "bands"
        },
        "dimension": "spectral",
        "reducer": {
          "callback": {
            "nir": {
              "process_id": "array_element",
              "arguments": {
                "data": {
                  "from_argument": "data"
                },
                "index": 0
              }
            },
            "red": {
              "process_id": "array_element",
              "arguments": {
                "data": {
                  "from_argument": "data"
                },
                "index": 1
              }
            },
            "blue": {
              "process_id": "array_element",
              "arguments": {
                "data": {
                  "from_argument": "data"
                },
                "index": 2
              }
            },
            "sub": {
              "process_id": "subtract",
              "arguments": {
                "data": [
                  {
                    "from_node": "nir"
                  },
                  {
                    "from_node": "red"
                  }
                ]
              }
            },
            "p1": {
              "process_id": "product",
              "arguments": {
                "data": [
                  6,
                  {
                    "from_node": "red"
                  }
                ]
              }
            },
            "p2": {
              "process_id": "product",
              "arguments": {
                "data": [
                  -7.5,
                  {
                    "from_node": "blue"
                  }
                ]
              }
            },
            "sum": {
              "process_id": "sum",
              "arguments": {
                "data": [
                  1,
                  {
                    "from_node": "nir"
                  },
                  {
                    "from_node": "p1"
                  },
                  {
                    "from_node": "p2"
                  }
                ]
              }
            },
            "div": {
              "process_id": "divide",
              "arguments": {
                "data": [
                  {
                    "from_node": "sub"
                  },
                  {
                    "from_node": "sum"
                  }
                ]
              }
            },
            "p3": {
              "process_id": "product",
              "arguments": {
                "data": [
                  2.5,
                  {
                    "from_node": "div"
                  }
                ]
              },
              "result": true
            }
          }
        }
      }
    },
    "mintime": {
      "process_id": "reduce",
      "description": "Compute a minimum time composite by reducing the temporal dimension",
      "arguments": {
        "data": {
          "from_node": "evi"
        },
        "dimension": "temporal",
        "reducer": {
          "callback": {
            "min": {
              "process_id": "min",
              "arguments": {
                "data": {
                  "from_argument": "data"
                }
              },
              "result": true
            }
          }
        }
      }
    },
    "save": {
      "process_id": "save_result",
      "arguments": {
        "data": {
          "from_node": "mintime"
        },
        "format": "GTiff"
      },
      "result": true
    }
  }
}

Add links to capabilities

The API recommends to add links to capabilities so that (also non-openEO) clients can better discover the service:

Links related to this service, e.g. the homepage of the service provider or the terms of service.
It is highly RECOMMENDED to provide links with the following rel (relation) types:

  1. version-history: A link back to the Well-Known URL (see /.well-known/openeo) to allow clients to work on the most recent version.
  2. terms-of-service: A link to the terms of service. If a back-end provides a link to the terms of service, the clients MUST provide a way to read the terms of service and only connect to the back-end after the user agreed to them. The user interface MUST be designed in a way that the terms of service are not agreed to by default, i.e. the user MUST explicitly agree to them.
  3. privacy-policy: A link to the privacy policy (GDPR). If a back-end provides a link to a privacy policy, the clients MUST provide a way to read the privacy policy and only connect to the back-end after the user agreed to them. The user interface MUST be designed in a way that the privacy policy is not agreed to by default, i.e. the user MUST explicitly agree to them.
  4. service-desc or service-doc: A link to the API definition. Use service-desc for machine-readable API definition and service-doc for human-readable API definition. Required if full OGC API compatibility is desired.
  5. conformance: A link to the Conformance declaration (see /conformance). Required if full OGC API compatibility is desired.
  6. data: A link to the collections (see /collections). Required if full OGC API compatibility is desired.

It would be great if you could implement those links, if applicable to your service. I'm especially interested in 1, 2, 3, and 6.

fix redoc documentation

/redoc endpoint currently throwing the error

SyntaxError: Error resolving $ref pointer "https://openeo.eodc.eu/openapi#/components/schemas/process_argument_value". 
Token "process_argument_value" does not exist.
    at Function.syntax (https://cdn.jsdelivr.net/npm/redoc@next/bundles/redoc.standalone.js:55:6276)
    at u.resolve (https://cdn.jsdelivr.net/npm/redoc@next/bundles/redoc.standalone.js:63:7717)
    at o.resolve (https://cdn.jsdelivr.net/npm/redoc@next/bundles/redoc.standalone.js:55:54761)
    at a._resolve (https://cdn.jsdelivr.net/npm/redoc@next/bundles/redoc.standalone.js:108:48408)
    at s (https://cdn.jsdelivr.net/npm/redoc@next/bundles/redoc.standalone.js:108:50908)
    at https://cdn.jsdelivr.net/npm/redoc@next/bundles/redoc.standalone.js:108:50791
    at Array.forEach (<anonymous>)
    at a (https://cdn.jsdelivr.net/npm/redoc@next/bundles/redoc.standalone.js:108:50713)
    at https://cdn.jsdelivr.net/npm/redoc@next/bundles/redoc.standalone.js:108:50810
    at Array.forEach (<anonymous>)

Adding a job with Web Editor fails due to budget/plan=null

Trying to create a job on the deployed EODC backend fails with the Web Editor.

I took the Use Case 1 process graph, replaced the collection ID with s1a_csar_grdh_iw, pasted it in the "Code" section of the Web Editor, switched to "Visual Model", clicked "Run now" and received the following error:

{
  "code": 500,
  "id":" bac82216-2f99-4f6f-bb3e-d86751bd96b4",
  "links": ["http://openeo.eodc.eu/redoc#tag/Job-Management/paths/~1jobs/post"],
  "message": "{'budget': ['Field may not be null.'], 'plan': ['Field may not be null.']}",
  "service":"jobs"
}

The result is the same when I click on "Create" instead.

According to the API spec, null is allowed (it's even the default); in this case, the default plan should be used.

Support for HTTPS

The API requires HTTPS (instead of HTTP) for the API as authentication credentials with HTTP Basic, OAuth2 and OpenID Connect are transmitted without encryption to the server so they can "easily" be sniffed by a third-person. Therefore the WCPS driver should support HTTPS and the deployed instance should activate it by default.

Another issue is, that the Web Editor, which is deployed with HTTPS can't connect to HTTP back-ends due to browsers forbidding "mixed content" (i.e. forbidding insecure HTTP connections being established by secure/HTTPS web pages). Providing the Web Editor with HTTP would promote insecure workflows and therefore is not desirable.

Add proper link management

Currently hardly any service returns proper links in their responses though this is at least strongly recommanded by the API specification. This should be changed.

Currently there is a very basic LinkHandler in the data service - see links.py. Having on LinkHandler class per service could enable us to add proper and consistent links to all requests in a service. For this the LinkHandler class needs to store meaningful links related to its service.

Additionally it should be mentioned that the DNS_URL environment variable will be removed therefore it will not be possible to add the service_url / base_url in the way it is currently implemented. (The variable is removed as the exact same information needs to be provided in the openapi.yml file and is therefore redundant and error prone.)

A solution to this problem would be to add only the link suffix in the services and the base_url in the ResponseHandler (see response.py) running on the gateway - which has access to the openapi.yml file where the base_url is stored. Good ideas for implementation details are welcome! ;)

process graph from python and r client look different

@sophieherrmann, @flahn, @m-mohr,

I am testing the eodc backend a bit and came across a difference in how the process graphs from the r-client and python client look. The save_result() node from the r-client doesn't look as it usually does: The empty curly brackets in options are quoted. I don't know if it's allowed to be like that according to the api spec. I don't know if it breaks the graph, couldn't get results so far with either of both.

Python:

"saveresult1": {
    "process_id": "save_result",
    "arguments": {
      "data": {
        "from_node": "loadcollection1"
      },
      "format": "netCDF",
      "options": {}
    },
    "result": true
  }

R:

"save_result_EZZJF7912M": {
    "process_id": "save_result",
    "arguments": {
      "data": {
        "from_node": "load_collection_ZIVRE6185P"
      },
      "format": "netCDF",
      "options": [
        "{}"
      ]
    },
    "result": true
  }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.