Giter Club home page Giter Club logo

historian's People

Contributors

arthurmat avatar bastiengithub avatar dependabot[bot] avatar feiznouri avatar lhubert avatar mathieu-rossignol avatar miniplayer avatar mnemsi avatar oalam avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

mnemsi

historian's Issues

Handle "late records" case

Some timestamped records might reach the Data Historian with high latency (lag of several minutes / days / years). These records should integrate seamlessly in the indexation workflow, they they should be eventually added to the relevant chunk.

For a given timestamps range, should we consider only one chunk, or is it better to handle multiple chunks ?

The server REST API should allow the creation of data points

It should be possible to add points through the REST API.. something like...
https://:XXX/historian-server /v1/points/create
{
"TagName": "openSpaceSensors.Temperature",
"points": [
{

            "TimeStamp": "2020-02-17T20:03:10.000Z",
               
            "Value":   "25",
               
            "Quality": 3
            }
       ]  

}

Atomic Compaction possibly via newly created index

When re-compacting existing data with even possibly "late records", there is no impact on queries - that is to say, a new index or part of it might be - like in open analytics - created and when compaction is done and data clean, the newly created index replaces the online one.

Provide tar.gz version of major components

To ease simple developments we need simple tar.gz we can just unzip. Docker is fine but not all customers want docker images for security reasons, more other docker requires some knowledge. A simple pre-configured tar.gz is fine with simple start/stop services.

Test and document the use of Kura to inject points into the historian

Kura offers most of the connectors we need to integrate the factory to our data historian. We can document the use of Kura and extend Kura if needed with some components we have if needed. This issue depends on the Issue for adding data points through the historian server REST API.

[Gateway] search api could be improved

Current behaviour

the search endpoint currently does a contain query depending on request input.

Expected behaviour

The seach endpoint response should be more sophisticated so that it returns depend on input in a more flexible way, for exemple for the request :

{ target: 'upper_50' }

the answer should return only metric that seem's like upper_50.

["upper_25","upper_50","upper_75","upper_90","upper_95"]

and not only name containig the input

["upper_501","aupper_50","upper_50"]

Re-engineering of the engine to get a light engine

A re-engineering of the engine should allow to rely on a direct connection to the historian server to push time series and allow logisland real time processors to be pluggin the historian server directly. This means changing the class loading mechanism in LogIsland. A a matter of fact this issue has to be pushed onto LogIsland but is there for tracking purposes

[Gateway] Grafana search api output should depend on input

Current behaviour

the search endpoint currently does not depend on user input. Moreover it returns all existing metrics.

Expected behaviour

The seach endpoint response should depend on input for exemple for the request :
{ target: 'upper_50' }
the answer should return only metric that seem's like upper_50.
["upper_25","upper_50","upper_75","upper_90","upper_95"]
and not
["hello","temp","pression"]

You can look at the documentation of the simple json plugin if you want more exemples :
https://grafana.com/grafana/plugins/grafana-simple-json-datasource

The max number of target returned should be configurable in the conf file of the Http verticle.

Which algorithm to filter results ?

  • Only data starting by ?
  • Only data containing the request ?
  • The field name on which we make the request if of type String and contain a unique word.

Condition to close this issue

  • implement a solution
  • add an integration test

Alerts should be stored in the backend

As of today alerts are created via grafana and associated to graphs. We need to store alerts in the back end of the data historian especially since alerts are also created by the real time logisland part.

We should be able to specify more info than just the metric in graph panels

For exemple user should be able to chose for each metric :

  • The sampling algorithm to use
  • The size of buckets to use (but should we allow this parameter if it is conflicting with maxDatapoint of the graph ?)

I think if user specify a bucket size too small then the historian ignores it because it could freeze the historian in bucket size is not adapted to the number of points in the specified time range.

Be able to visualize some tag on graph (annotations)

Annotation configuration for the plugin should be similar to the grafana builtin plugin. Being able to filter on tags.

On gateways side, it should support POST /annotations request with body like

{
  "range": {
    "from": "2016-04-15T13:44:39.070Z",
    "to": "2016-04-15T14:44:39.070Z"
  },
  "rangeRaw": {
    "from": "now-1h",
    "to": "now"
  },
  "limit" : 100,
  "tags": ["tag1", "tag2"]
  "matchAny": true,
   "type": "tags"
}

And should respond with something like

[
  {
    "time": 1581075145188,
    "timeEnd": 1581075145188,
    "text": "bbbb",
    "tags": [
      "tag1"
    ]
  }
]

Here are the specifications :

  • "time" is required
  • "timeEnd" is optional (this is used only for events in a range)
  • "text" is required, it describe the event
  • "tags" can be empty, but all tags should be returned if there is any

So those four fields should be save in solr documents for now in a separate collection "annotations".
The historian would return them in the expected response format when it receive the corresponding request.

Here a description of the fields of the request :

  • "range" and "rangeRow" describe the time range in which we want to find annotations. Please use the same method to extract time as done for other endpoints like /query for consistency of the code. SO we should only return annotations whose "time" is in the requested range.
  • "limit" to limit the max number of annotations to return.
  • "tags" if the request "type" is "tags" this is used to filter annotation by tags otherwise it is not used.
  • "matchAny" : if true, we should return any annotation containing at leas one of the tags. If false we should return only annotation containing all the tags.
  • "type" : It is either "tags" either "all". tags type means we want to filter by tags, "all" type means we will return all annotations.

The solr schéma should be :

  • time : long
  • time_end : long
  • tags : string multivalued
  • description : text

Condition for closing this issue :

  • Implement the /annotations endpoint according to the specifications
  • Add integrations tests
  • Test the solution with grafana

Minimal installation of Historian should be possible

We should be able to have the data historian made available on a single node with a Solr, the historian server, the Solr back-end and grafana for visualisation. All other components like Kafka, Spark, LogIsland should only be installed for very large volumes and real time or very advanced analytics

Setup grafana dev environments

Follow this tutorial : https://medium.com/@ivanahuckova/how-to-contribute-to-grafana-as-junior-dev-c01fe3064502
I recommend using visual studio code and install the go plugin :

Once the setup is done we will be able to make our own Datasource plugin based on the simple json datasource plugin.

For this we forked the grafana repo (under Hurence user). So if you have already made the tutorial with grafana repo you can follow this tutorial to change remote url to hurence's one : https://help.github.com/en/github/using-git/changing-a-remotes-url

Importing data using simple CSV or Excel import should be possible

Currently the only way to have data injected to the historian is through either big batch spark processes or with real-time injection through Kafka/LogIsland. A simple mechanism to import CSV files / Excel files should be available. Also may be a REST API to interact with (the current gateway is only for consuming data)

Compaction job to handle unitary records in Solr

Real-time indexation of timestamped records stores unitary records in Solr (one value per record). This is useful to give access to these records in real time.
But there is an underlying need to compact these records into chunks on a regular basis for performance and storage concerns.

The compaction job steps :

  • read all the relevant unitary data from Solr
  • process this data into chuncks
  • delete the unitary data and inject the corresponding compacted data. This operation must be Atomic to prevent from any discrepancy in the indexed data

This job should eventually be compatible with other time series backends (other than Solr)

The trigger to run the compaction job should be configurable, based on :

  • time (every X minutes)
  • number of records (every X records)
  • number of records per partition (every X records per partition)
  • a mixture of previous criteria

Search api should never returns duplicate names in response

Current behaviour

The metrics name returned contain duplicates.

Expected behaviour

When I query on /search the response should not contain duplicate name !

How to reproduce

Insert several chunks with the same name. Query search andpoint the response will contain duplicates.

Condition

Fix and test that

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.