orchestracities / ngsi-timeseries-api Goto Github PK

QuantumLeap: a FIWARE Generic Enabler to support the usage of NGSIv2 (and NGSI-LD experimentally) data in time-series databases

Home Page: https://quantumleap.rtfd.io/

License: MIT License

Python 97.99% Shell 1.65% Dockerfile 0.21% JavaScript 0.07% Nix 0.07%

fiware ngsi timeseries cratedb quantumleap quantum-leap timescaledb timescale ngsi-ld

ngsi-timeseries-api's People

Contributors

Stargazers

Watchers

ngsi-timeseries-api's Issues

Support arrays of different types

NGSI mentions an Array type, it seems it could be of anything, but need to better investigate.

But Crate's arrays have types: https://crate.io/docs/crate/reference/en/latest/appendix/data-types.html#array

Suggestions for refactoring

looking for tests, i would expect a main folder tests and then inside it the test folders for each module, rather than the other way around. but probably is just a a matter of taste.

also, i would move the source of the module in src folder, so to better structure the repository

for the client, we have a full client in python (https://github.com/smartsdk/ngsi-sdk-python), so it was not needed to develop one just for the purpose of the tests... use your own dog food.

it looks also weird that the folder is named client, when it is an orion client and not the quantum leap one.

it looks quite weird as well to have certain files in the root:
conftest.py
run.sh (this run the tests right? the name does not point to that)

i also suggest to have a Docker folder that include the Dockerfile, but also a default example composition. more complex one can be in the recipe repository.

Add volumes to default recipe

Add docker volumes to recipes mentioned in the documentation.

fix edit link in documentation (gives 404)

[QUERY] Integration with InfluxDB

Hello!

I have seen that there is the possibility of adding other databases, such as InfluxDB, and we wanted to know if you have implemented the integration with that database or in what state that possible implementation is.

A greeting.

Implement a name Sanitizer for attribute/column names

Take into account not only Crate naming restrictions, but also have a look at Orion's naming restrictions. And what if in the future the db changes and restrictions are different?

'-' is an example of an invalid char in the attributes names. For the test-case:
{
'id': 'Impeller_Nueva_pieza_1.dmo',
'type': 'MeasurementResult',
'FA@SP-00': {'type': 'StructuredValue', 'value': [0, 0, 44.3366, 0, 0, 1], 'metadata': {'featureType': {'type': 'Text', 'value': 'POINT,CART'}}}
}

Note reserved words, which cannot be used for schemas, tables or columns:
https://crate.io/docs/crate/reference/en/latest/sql/general/lexical-structure.html#sql-lexical

ql as context broker datasource

context broker is supporting "registration" as way to register external data sources for entities. this means that when queried, the context broker will answer by query the external data source.
ideally QL could act as a data source returning the last value of a given entity. this could be particularly interesting in combination with #101

Array parameters

It seems that QuantumLeap is struggling when catching Array data. I have noticed that the Orion is receiving the data but never store as historical (on the Quantum side).

This is an example of the json sent:

{
 "id": "smartphone-9845C",
 "type": "Device",
 "category": {
    "value": "smartphone"
	},
 "osVersion": {
    "value": "Android 4.0"
	},
 "softwareVersion": {
    "value": "MA-Test 1.6"
	},
 "hardwareVersion": {
    "value": "GP-P9872"
	},
 "firmwareVersion": {
    "value": "SM-A310F"
	},
 "consistOf":{
    "value": 
		[
		 "sensor-9845A",
		 "sensor-9845B",
		 "sensor-9845C"
		]
	},
 "refDeviceModel": {
    "value": "myDevice-345"
	},
 "dateCreated": {
    "value": "2016-08-22T10:18:16Z"
	}
}

This is the error I found when checking the Quantum's Log:
crate.client.exceptions.ProgrammingError: SQLActionException[ColumnValidationException: Validation failed for refdevice: ['device-9845A', 'device-9845B', 'device-9845C'] cannot be cast to type object]

Flask and Django API Framework

Have you compared the two options(Flask and Django Rest Framwork) before Flask twice?
Whats are the advantages?

Fix doc

Remove Subscrition comments

Test ticket from Code Climate

Flaky test: test_not_found

test_not_found in reporter/tests/test_1T1E1A.py fails from time to time usually in the complete suite run. More details in the logs of https://travis-ci.org/smartsdk/ngsi-timeseries-api/builds/409226234?utm_source=github_status&utm_medium=notification

________________________________ test_not_found ________________________________
    def test_not_found():
        query_params = {
            'type': entity_type,
        }
        r = requests.get(query_url(), params=query_params)
>       assert r.status_code == 404, r.text
E       AssertionError: {
E           "detail": "The server encountered an internal error and was unable to complete your request.  Either the server is overloaded or there is an error in the application.",
E           "status": 500,
E           "title": "Internal Server Error",
E           "type": "about:blank"
E         }
E         
E       assert 500 == 404
E        +  where 500 = <Response [500]>.status_code
reporter/tests/test_1T1E1A.py:265: AssertionError
===================== 1 failed, 51 passed in 51.40 seconds =====================

Issues with IndexArray in API specification

https://github.com/smartsdk/ngsi-timeseries-api/blob/363450fb3ced2c1597b36342f0e96dc8797f8a61/specification/quantumleap.yml#L74

Add status endpoint

Add some form of status endpoint, which could be used in liveness/readiness Probes that reports on the status of complementing services (crate, maybe grafana and redis)

Support attributes metadata

Attributes metadata are defined in
http://docs.orioncontextbroker.apiary.io/#introduction/specification/terminology

This information is currently not saved in the DB, so it's lost in the dump/load.

Attributes names loose capitalization

Crate has some (naming restrictions)[https://crate.io/docs/crate/reference/en/latest/sql/ddl/basics.html#naming-restrictions] that forces the tables and column names to be lowercase.

We need to persist the original entity name and attribute names (the skeleton of the entity) somewhere so that information is used when constructing the entities to reply data queries.

JSON size on QL component

This error pops up when I post a long Json structure.

ERROR in app: Exception on /notify [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "reporter/reporter.py", line 110, in notify
    trans.insert([payload])
  File "/src/ngsi-timeseries-api/translators/crate.py", line 146, in insert
    self.cursor.executemany(stmt, entries)
  File "/usr/local/lib/python3.6/site-packages/crate/client/cursor.py", line 67, in executemany
    self.execute(sql, bulk_parameters=seq_of_parameters)
  File "/usr/local/lib/python3.6/site-packages/crate/client/cursor.py", line 54, in execute
    bulk_parameters)
  File "/usr/local/lib/python3.6/site-packages/crate/client/http.py", line 304, in sql
    content = self._json_request('POST', self.path, data=data)
  File "/usr/local/lib/python3.6/site-packages/crate/client/http.py", line 416, in _json_request
    _raise_for_status(response)
  File "/usr/local/lib/python3.6/site-packages/crate/client/http.py", line 170, in _raise_for_status
    error_trace=error_trace)
crate.client.exceptions.ProgrammingError: IllegalArgumentException[Document contains at least one immense term in field="value" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[45, 49, 46, 48, 50, 57, 54, 44, 52, 46, 54, 52, 54, 53, 53, 44, 45, 49, 46, 54, 51, 50, 53, 52, 44, 50, 48, 49, 55, 45]...', original message: bytes can be at most 32766 in length; got 94981]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 94981];

Please, note that Orion Context Broker does not present any problem handling the long structures. The error was retrieved from the QuantumLeap component.

Could it be possible to extend the size capacity on the component to handle such long files? Personally need it to deal with accelerometer data.

geo:json for everyone

As suggested by @chicco785 , it would be interesting to have, for the entities expressing location in a non geo-json format (say, with country codes and addresses), an under-the-hood transformation to an equivalent geo-json structure, so that geo-queries can be made across different entities expressing locations in different ways.

I open this issue to keep track of the discussion of this idea.

OSM geocoder may not return expected osm_type

Since 19 Nov 2018, the following geocoding tests fail:

test_entity_add_point: OSM used to return an osm_type of node for the address used in this test but it now returns an osm_type of way which results in our geocoding module not being able to extract the geo-location coords from the response since we expect node for an address in the format <street name> <street number> , <city> , <country code>.
test_caching: this fails because we use the same address as in test_entity_add_point so we can't extract the location from the OSM response and hence it won't be in the cache.

It could be that the data in OSM for that location got corrupted or a new version of Nominatim (the geocoder) went live and the API behaviour changed. Or it could be something else. But note the other full blown address we use in our tests actually works fine, i.e. we get back an osm_type of node in the response so we manage to extract the location in this case.

On another note, it doesn't look like the OSM project is in a good shape!

We have to decide how to fix this...

Query api follow-ups

Better treatment for missing "type" param in the query
Add test to make sure a reasonable response is returned when no entity matches any of the supported queries

Support configuration of shards and replicas per entity type via API

Currently the only way to update shards and replicas for entity created in crate, is using crate query language.

Support multi-tenancy using service path concept of context broker

When we deal with different scenarios sharing a common infrastructure, but we do not want them to share access to data, we need to find a way to isolate the data.
A simple solution, as the one used in context broker vs mongodb, is to have one database for tenant (identified by the service-path in the header request).

/version should be at root of API

The basepath of the specification is set to /v2, but ideally /version would be better than /v2/version.

Test ticket from Code Climate

Support a default way in the entity data model to track entity instances ownership.

Scenario:

keep track of who injected a certain entity instance. (this should be a "compulsory" attribute in the data model. It could be the dataProvider field defined in the GSMA-Commons.

      "type": "object",
      "properties": {
        "id": {
          "$ref": "#/definitions/EntityIdentifierType"
        },
        "dateCreated": {
          "type": "string",
          "format": "date-time"
        },
        "dateModified": {
          "type": "string",
          "format": "date-time"
        },
        "source": {
          "type": "string"
        },
        "name": {
          "type": "string"
        },
        "alternateName": {
           "type": "string" 
        },
        "description": {
           "type": "string" 
        },
        "dataProvider": {
          "type": "string"
        }
      }
    },

if a model is inject that does not have the field, raise a warning.
Based on an external authz mechanism allow a user to access a given entity instance based on the policy defined by the data provider. This should work both via the NGSI API and the CrateDB API (see #17).

check if docker image can be optimised

https://blog.realkinetic.com/building-minimal-docker-containers-for-python-applications-37d0272c52f3

Fix "similar-code" issue in client/client.py

Similar blocks of code found in 3 locations. Consider refactoring.

https://codeclimate.com/github/smartsdk/ngsi-timeseries-api/client/client.py#issue_5a157acb9e273d0001000070

Food4Thought: Complementing Analytics Service?

In huge queries, cratedb limits the response to the first 100 entries.
What if we supported a response of 100 entries but evenly distributed across the specified querying range?

How to do the regression for the user to be able to see the overall trend without the need of ingesting the whole dataset.
It'd be nice to "repeat" the zooming query across different attributes.

It might be that the approach is by having a complementary "data analytics" microservice taking care of the

Authentication / Authorisation Proxy

May not be a "direct" issue related to QuantumLeap, but a side one.
When accessing from Grafana (assuming we use the existing driver) there should be away to provide a sort of access control on tables and rows in side a table.

Handle aggregation on invalid columns

sum and a avg should not be allowed on attr_names (attributes) of non-numeric types.

Note min, max and count still work on things like bool or string.

Support geo:point

In ngsi: https://orioncontextbroker.docs.apiary.io/#introduction/specification/geospatial-properties-of-entities

In crate: https://crate.io/docs/crate/reference/en/latest/general/ddl/data-types.html#geo-point

update: link to crate docs

Extend supported params in /subscribe endpoint

Add support for "id" and "type" in addition to idPattern. See orion docs

If the user could otherwise specify two different attrs arrays, that'd be useful as well.

Throttling would be nice too.

private word data into structure

Below error pops up when I include the word "data" as part of the attributes. No sure whether this is a similar case as the previous issues mentioned (i.e., about "dataModified"). I mention this, just to let you know that this might be an issue in further developments. I have already worked it around by using a different word.

ERROR in app: Exception on /notify [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "reporter/reporter.py", line 89, in notify
    assert len(payload) == 1, 'Multiple data elements in notifications not supported yet'
AssertionError: Multiple data elements in notifications not supported yet

Update time_index generation policy

As regards method _get_time_index,

With higher priority than dateModified it should look for the timestamp in the metadata (https://github.com/Fiware/dataModels/pull/302/files#diff-9308877b8fc4a861aa6d718690d52f82R331) and the TimeInstant element, in the case of https://github.com/telefonicaid/iotagent-node-lib#the-timeinstant-element

it looks that values can be only Number

https://github.com/smartsdk/ngsi-timeseries-api/blob/363450fb3ced2c1597b36342f0e96dc8797f8a61/specification/quantumleap.yml#L87

Add tests for new Reference Models

As commented in #32 (comment), add tests for:

https://github.com/Fiware/dataModels/tree/master/Alert
https://github.com/Fiware/dataModels/tree/master/WasteManagement
https://github.com/Fiware/dataModels/tree/master/Device
https://github.com/Fiware/dataModels/tree/master/PointOfInterest
https://github.com/Fiware/dataModels/tree/master/Transportation/Bike/BikeHireDockingStation

pin pipenv version in docker file

The latest version of pipenv (2018.11.14 on PyPi) doesn't seem to be able to generate a requirements file when the output is redirected to a file---the way we have it in our Dockerfile. In fact, running

pipenv lock -r > requirements.txt

with version 2018.11.14 when building our Docker image produces a requirements.txt containing the text: " Running.. ". So we should pin the version to the one we tested, i.e. 2018.10.13.

Support multiple endpoints for crate

Supposing that you have a CRATE cluster and that each node expose different endpoints, it would be useful a sort of "roundrobin" process to connect to the different hosts in case one of them is not reachable.
This may not be an issue in Docker Swarm, where hopefully, using healthchecks, the swarm remove and endpoint from the "discovery" feature if it is not healthy.

timestamps

When I include this attribute to the json structure:

"dateModified": {
    "value": "2017-01-18T20:45:42.697Z-0800"
  }

This error emerges:

crate.client.exceptions.ProgrammingError: SQLActionException[ColumnValidationException: Validation failed for time_index: {"metadata"={}, "type"='Text', "value"='2017-08-26T21:43:33.00Z'} cannot be cast to type timestamp]
207.249.127.152 - - [28/Dec/2017 20:30:54] "POST /notify HTTP/1.1" 500 -
INFO:werkzeug:207.249.127.152 - - [28/Dec/2017 20:30:54] "POST /notify HTTP/1.1" 500 -

Document considerations when directly inserting into QL

Some usecases may not use Orion, so expected data format should be well documented in QL.

Provide basic documentation

How to deploy using docker (provide a basic stack made of context broker, quantum leap, crate, grafana)
How to send notifications from context broker to quantum leap
How to connect grafana to crate
How to run queries in crate / against crate (using the http endpoint)

Latest build failing

https://travis-ci.org/smartsdk/ngsi-timeseries-api/jobs/432874130#L921

virtual entity support

Ideally, we should be able to create "virtual entities" from existing entities. These virtual entities ideally should be created as "view" over the database.

For example, let's suppose we have a collection of parking sensors. You would like to be able to have aggregated information by parking lot (assuming that all parking sensors have such information - the lot to which they belong).

Ideally this could be done by having an endpoint that allow to create "virtual entities" that are rendered via a database view. Of course the view should be "clever" enough.

In fact, having to compute "a time series" view (made of different entities with different timestamps) is not that easy.

Ideas?

Delete all entities should drop table.

Delete all entities does not seem to be dropping the table, just all the records. Check why and try to force table dropping as this will allow "starting from the scratch with entity_type foo".

support keyValue mode

I suppose that using a parser it is possible to use also the keyValue mode.
Orion is supporting it, so there should be no reason for QL not being able to to so.

Support automatic registration of data sources in Grafana

Based on a flag and configured Grafana endpoint, when a notification related to a new entity is received, the corresponding data source should be created in Grafana.

geographical queries

afaik, while crateDB supports that quite well, there is no way to support geo queries from any API.

see geo graphical queries in the spec:

http://fiware.github.io/specifications/ngsiv2/stable/

and cratedb support:

https://crate.io/docs/crate/reference/en/latest/general/dql/geo.html

the following ones should be easy to implement:

georel=intersects. Denotes that matching entities are those intersecting with the reference geometry.
- maps to MATCH (column_ident, query_term) using intersects
georel=coveredBy. Denotes that matching entities are those that exist entirely within the reference geometry. When resolving a query of this type, the border of the shape must be considered to be part of the shape.
- maps to MATCH (column_ident, query_term) using within

A bit more complex:

georel=near. The near relationship means that matching entities must be located at a certain threshold distance to the reference geometry. It supports the following modifiers: maxDistance. Expresses, in meters, the maximum distance at which matching entities must be located. minDistance. Expresses, in meters, the minimum distance at which matching entities must be located.

still it seems there are quite some hints here on how to solve that:
https://crate.io/a/geospatial-queries-with-crate-data/

Support retention policy

how long to keep data of a given entity and at which resolution?
e.g. after 1 year, I may be happy to keep instead of all data, only interpolation of data for 1h or so.

Define and document policy for missing values

Many times data injectors send a "null" "empty" value for some attributes.

At the moment QL is discarding received notifications with this kind of "null" input in any of the attributes, but we could implement something more flexible so as not to loose the rest of incoming valid data.

Define a policy for dealing with these cases.

Problems with aggrPeriod attribute

I'm trying to use the aggrPeriod attribute in a query to /entitites/../attrs/...? and I do not know if it does not work or that I have a wrong idea about him.

In my opinion, if I add the attribute "aggrPeriod = minute" with the method "aggrMethod = avg", the query will return me the value of the values, minute by minute.

Is this what should happen?

orchestracities / ngsi-timeseries-api Goto Github PK

ngsi-timeseries-api's People

Contributors

Stargazers

Watchers

Forkers

ngsi-timeseries-api's Issues

Recommend Projects

Recommend Topics

Recommend Org