amrc-factoryplus / amrc-connectivity-stack Goto Github PK

The AMRC Connectivity Stack (ACS) is an open-source implementation of the AMRC's Factory+ framework

Home Page: https://factoryplus.app.amrc.co.uk

License: MIT License

Smarty 0.27% Dockerfile 1.13% Makefile 0.40% JavaScript 25.26% HTML 0.27% PLpgSQL 0.69% TypeScript 15.45% Shell 0.19% Python 4.02% PHP 21.89% CSS 0.47% Vue 24.16% Blade 0.25% Java 5.56%

amrc amrc-connectivity-stack factory-plus factoryplus mqtt sparkplug

amrc-connectivity-stack's People

Contributors

Stargazers

Watchers

Forkers

grigals derme302

amrc-connectivity-stack's Issues

Create an inital discovery endpoint

Create an endpoint for initial discovery bootstrapping, given nothing but the base cluster domain name.

This should be accessible without authentication.
This should provide at least the Directory service URL.
This should also provide the information needed to build a krb5.conf.

The main traefik configuration then needs adjusting to serve (or redirect to) this endpoint from the base cluster URL. This should only happen for machine access, i.e. with Accept: application/json or something similar. We want to keep the option for human access to be redirected to some human interface, like the Manager.

[directory] Dynamic ACLs

It is not a lot of use a client being able to search for e.g. Temperatures if that client doesn't have permission to read the data.

We need some way of automatically granting certain clients access to device matching certain criteria. To start with I think this wants to be implemented by the Directory automatically adding devices to (Auth service) groups, so clients can then be given access to appropriate groups.

This depends on #188. This could be implemented as a function of the Directory, or as a separate service. Either way the criteria for adding a device to a group needs to be configurable.

[krbkeys] Changes are not always picked up

From time to time there is a problem with the operator not noticing the creation of new KerberosKey objects. It is not yet clear what causes this. Restarting the operator causes it to pick up the new objects as part of its initial scan.

[directory] Preserve BIRTHs

It is becoming clear that for many purposes it would be useful for the Directory to maintain a record of all active BIRTHs.

Ideally these should be stripped of dynamic data values (which will be stale), but have static values (which only change on BIRTH) left in place. The static values include important information like the device and schema UUIDs. Currently I don't think there is a reliable way to distinguish static from dynamic values; probably we should standardise a metric property.

Birth certificates should probably be made available both in some suitable JSON format (for consistency) and as a binary Sparkplug packet which will be easier to consume using libraries that can already decode Sparkplug. Be aware that there are at least two mappings from Sparkplug to JSON (the Tahu JS library mapping, and the mapping used by the protobuf tools).

Move k5start to a sidecar

All services using k5start could have that process moved into a sidecar container. The HiveMQ update already does this as it was simpler than pulling in k5start.

Advantages:

The running process does not need access to its (client) keytab; instead it only has access to time-limited tickets in the ccache. This is a significant security improvement.
Given a standard image with k5start in there is no need for that binary in the other images.

[edge] Don't restart the whole app when reloading the config file

There is no need to restart the whole app when the config file is reloaded.

The Sparkplug connection can be left entirely alone. We need to republish the config-related metrics.
The southbound connections only need reconnecting if they've changed. Changing one connection should not need to affect any others.
The Devices will need to be rebirthed if their schema mapping has changed, but not otherwise.

[configdb] Work out what to do with deleted objects

I do not think it is ever a good idea to delete objects from the ConfigDB under normal circumstances. If a UUID has been used for a particular purpose it should not be reused for a different purpose, so we need to keep the object entry for ever to record that fact.

The real-world object represented by the ConfigDB object, however, might stop existing. We need a way to represent this. Currently I have hacked in a deleted property into General Information, but this is definitely not correct.

Options include:

Moving the deleted field into the Object Registration app, which means putting it directly in the database somewhere.
Making the ConfigDB 4D: keep a full history of all changes, including 'this object existed from this time to this time'. This is the correct answer but may be more work than we can justify.
Something else?

[manager] Device Profiles

The ACS Manager should support the concept of Profiles to enable the rapid onboarding of similar devices.

Profiles are templates that are created seperately from device configurations but can be used as a starting point for configuring devices. They should be children of SchemaVersions, as they're only valid for a specific schema.

Features

Allow the ability to define variables in the profile that are substituted before instantiation
Provide the option of having changes to the profile propagate to all devices using the profile
Include a Save as Profile button on completed Schemas
- Only allow this on Valid schemas
- Somehow we need to enable replacement of variables (export as JSON?)
- Use the CDS importer structure behind the scenes?

User Inteface

The user should be given the option to create a new configuration or utilise an existing profile when selecting a schema version.

[hivemq-krb] Handle dynamic ACLs better

When ACLs change existing connections continue with their current permission set. This is starting to cause problems with our dynamic ACL adjustments, particularly when a client gets stuck with no permissions because they hadn't been set up yet at the point where they connected.

Handling this will be tricky. It will require a proper auth plugin rather than just setting a permission list at connection time. It will also require working out how to listen to our own MQTT traffic in order to get change-notify from the services; I think there are APIs for tracking packets, but we don't want to get slow.

[manager] Ensure that the Manager registers with the directory when it starts

The manager does not currently register itself with the directory when it starts, which is a breach of the framework specification. It should use 5960c63c-0245-427d-8923-1b5eca4c97eb as the Service_UUID.

[directory] ACLs for the UUIDs used by a Node

Currently Sparkplug Nodes can publish whatever Device UUIDs they like in their birth certificates. This means a malicious Node can 'steal' a Device from its legitimate publisher, and insert incorrect data into the historical record.

The Directory should verify

That the Instance_UUID in the Node's birth certificate is correct. This information should be preset in the Auth service.
That the Node is allowed to publish data for any Device UUIDs it publishes birth certificates for.

This means we need a permission 'This Node is allowed to publish data for this Device'. In order to avoid locking the system down to only allow Devices that have been defined in the Manager (one of the F+ principles is that we should be driven from the edge, and there are many use cases for Nodes which dynamically create Devices), we need

Automatic insertion of Devices into the ConfigDB.
A Node publishing a new Device (with a never-before-seen UUID) should be granted permission to do so permanently.

In the first instance a Node publishing a bad UUID should probably just raise an alert. Later it may be worth looking into whether the Node can be disabled somehow (switch off its MQTT permission?).

[edge] Publish the current state of our southbound connections

One of the issues with edge clusters will be lack of access to logs. We need problems to be made much more visible.

The Edge Agent already publishes a Node birth certificate regardless of having southbound connections. This means it can report, as Node metrics, the current state of these connections, and any problems detected.

This is different from DDEATH, because connections and devices are not necessarily 1-1.

[manager] CSV importer

Supporting batch import of CSV data into the manager configuration would enable users to configure their devices from existing tag lists.

Features

Add a Download CSV import template button to download a blank template (with correct schema metric names)
Enable dynamic creation of sub-schema objects
There is a challenge around ensuring schema compatability so the default behaviour would be to reject the entire import if any metrics do not comply to the schema naming convention. A list of non-compliant tags should be returned to the user to correct.

[configdb] Dump save is broken

Saving a dump includes etags in the result.

Remove Minikube references in readme

The Get Started section of the readme should be updated to generalise around K8s and not define the setup process for minikube.

Sparkplug Node identity belongs in the Auth service

After quite a bit of thought about situations (e.g. cmdesc) where we are trying to authenticate data received over MQTT, I have decided:

A Sparkplug Node is a security principal.
The Sparkplug address of the Node is another identity mapping to the principal UUID, alongside the Kerberos UPN.
Sparkplug addresses of Nodes should live in the Auth service.

There is a partial implementation of this already in the JS client library, which looks up addresses from the ConfigDB. It needs replacing with an API on this service.

[hivemq-krb] Pull out the F+ client library

It would be good to be able to make a Java client library available, even if incomplete.

However, working out where to publish the library to such that we can pull it into the build is tricky. The Github Maven repo requires a PAT to download packages.

Feature Req: Support for Data Value Re-Mapping

Allow users to re-map values collected from a device to a different value expected by a schema.

For example:

A Light.state schema defines the state of a light to be ON = true and OFF = false. However devices can return these values inverted due to engineering decisions. Devices in production cannot be modified to change this state, thus it has to be done further up stream.
A CNC schema defines a spindle direction (0=off, 1-CW, 2= CCW). Other CNCs have different numerics for the same values. DMU has 3: Clockwise spindle rotation, 4: Counterclockwise spindle rotation, 5: Spindle stop

Expected Feature
Allow to map receivedValue to desiredValue. Similar to Grafana.

[manager] Allow enum and const values

This is required to allow schemas to specify and hardcode static values for the Value field. See AMRC-FactoryPlus/schemas#28 (comment) for more details. The requirement in the above PR is for a const (read-only input) but having this support enums too (Dropdown) would be a nice addition whilst we're at it.

Deploy the service configurations

We are creating a lot of Auth groups here. These need to be deployed somehow; work out where this lives.

[configdb] We need a more complete ACL language

The predefined permissions are inadequate. For example, if we define an 'Edge Agent' config app, we want to be able to give a particular Edge Agent permission to read its own config and no others. That is currently impossible.

We need an extensible set of permissions based on templates, like the MQTT plugin.

[manager] Editing Non-Active Connection Overwrites it with Current Active connection

Issue

When having 2 node connections and trying to edit the non-active one, it overwrites the active connection settings over the non-active one.

Workaround

Change connection to active, then edit it, then switch back.

[directory] Use HTTP command escalation

We should be using the HTTP command escalation interface.

[krbkeys] Pull out the F+ client library

This means working out how to publish it somewhere so we can pull it back in again.

[configdb] Separate HTTP and MQTT pods

Ideally it would be good to scale up the web api part of the ConfigDB. This cannot be done unless the MQTT part is a separate process. This means the change-notify needs to happen via the database, as the Directory does.

[directory] Create ConfigDB objects for discovered Devices

The Directory should create ConfigDB objects for Device UUIDs under the Sparkplug Device class.

This should include raising an alert (and refusing to index a Device) if we get a UUID conflict with a non-Device object.

[edge] We should publish with QoS at least 1

An MQTT publish with QoS 0 does not get any response from the broker. This means the broker will just silently drop packets we are not authorised to publish; we don't get any error. This in turn means that if our ACLs are wrong we will not reconnect.

[hivemq-krb] Correctly handle services being unavailable

If services are unavailable we can end up accepting an MQTT connection but not granting any rights. This is not helpful.

We should reject the connection with an appropriate server-not-ready disconnection code.

Allow to select a separate data source for each metric

Allow each schema metric to be assigned a metric from a different source (MQTT/REST/etc)

Record birth/death

I think it would be helpful to record an additional tag in the historian indicating when the device was online.

This would be a boolean which is set on BIRTH and cleared on DEATH.

[edge] Back off when reconnecting to MQTT

With more nodes in our deployment we are starting to see 'thundering herd' problems when all the Edge Agents reconnect at once. Back off, with random delay, when we reconnect.

[edge] REST device sends multiple successive DEATHs

If a REST device gets persistent HTTP errors from the server it is contacting, it will send multiple DDEATHs without an intervening DBIRTH.

ConfigDB registers itself over MQTT

The ConfigDB registers itself over MQTT. This is currently required so that the service registration includes an MQTT device UUID.

[manager] User management

Admins should have the ability to create and delete users
Admins should have the ability to change the passwords of other users
All users should be able to change their own password

This will use the kadmin interface already present in the manager and will authenticate to kadmin using the credentials that the user logged into the manager with.

[manager] Connection specific configuration UI

Having the SparkplugMetric have a different appearance depending on the connection would ensure that users have a contextualised experience when configuring devices.

For example, Address would be named Topic when using an MQTT connection.

Work out how to populate the git repos

The on-prem git repos driving the edge clusters need to be populated. We need a solution for this.

One possibility might be to set them up to pull from public Github.

[directory] MQTT change notify is not useful

The current MQTT change-notify interface has several problems:

It creates a lot of MQTT traffic noone is interested in.
When a large number of rebirths happen, the change notifications get a long way behind.
We are exposing information about devices when perhaps we shouldn't be.

We need a better interface. I am thinking perhaps

Client makes an HTTP request asking for notification about certain events.
Directory creates a new Sparkplug Device for this client, and arranges for ACLs so only that client can read it.
Directory returns device and metric information to the client saying where to get notifications.

We only need to create one device per client. After that we can create new metrics on the same device.

Open questions:

How do we arrange to dynamically set the ACLs?
When do these devices disappear? What if a client just vanishes without telling us?

Add Edge Deployment Operator manifests

[directory] Service advertisement over MQTT doesn't respect ACLs

Perhaps it should be removed altogether?

Although if we have proper Node identity records we could authenticate MQTT adverts.

[auth] We need to be able to quote groups

Currently, when a group is used in an ACE, no distinction is made between 'assign this ACE to all members of this group' and 'assign this ACE to this group specifically'. This is primarily an issue when granting auth service permissions to edit groups, where the group in question is the target of the ACE.

A partial solution here would be to have typed groups (principal/permission/target), and stop expanding groups when we get to a group of the wrong type. This would allow granting 'you can edit this group of principals', for instance. However it is not sufficient when granting permission to edit groups of targets.

An example is the edge krbkeys permission editing:

Edge krbkeys operators need to be able to set permissions for the principals they are managing.
We don't want to grant unrestricted permission to change any permissions. That would be root-equivalent.
So we grant 'Manage ACL by permission' on the permissions the krbkeys operators should be able to grant.
These permissions are generally groups: e.g. 'monitor for edge agent' which is a group of permissions allowing MQTT read, CCL rebirth, CCL config reload.
The krbkeys operators can now grant any of the included permissions arbitrarily.

Kadmin ACLs

Currently ACS deploys with fixed kadmin ACLs in a ConfigMap, which then can't be changed as they are sourced from the Helm chart and Helm upgrades will overwrite them.

We need a strategy to allow user-specified ACLs. Probably this means generating the actual ACL file on KDC startup from some other data source. Integration with the F+ auth framework would be ideal, but this may lead to bootstrapping problems.

[manager] Create UUIDs for southbound connections

It would be helpful to assign a UUID to each southbound connection from an Edge Agent. This will allow e.g. an Alert to link to a particular southbound connection.

Currently the config file does not allow this. Either it needs to be refactored to support this, or each individual southbound connection needs a separate entry in the ConfigDB containing the configuration for that connection.

The latter option opens up the interesting possibility of configuring 'these connections are available, from these hosts on these clusters; I want these devices derived from them' and then the system automatically deploying suitable Edge Agents to handle them.

[edge] Open Protocol driver is not robust enough

The Open Protocol driver doesn't currently handle tools that are offline. If it fails to connect to the tool it does so quietly, which is not ideal. The state management of the OP driver needs investment in maturing if it is to be used outside a lab environment.

[configdb] MQTT change notify is not useful

Allow individual services to be toggled on and off

[configdb] SpecialApp returning Sparkplug Addresses

Currently the Sparkplug Address application is perfectly ordinary, and mostly contains records for Node addresses used in ACLs.

If Node addresses move to the Auth service, it would be useful to turn the Sparkplug Address Application into a SpecialApp proxying that information from the Auth service. We could also proxy current Device addresses from the Directory, making this a unified source for Sparkplug address information.

Error if incorrect permissions

Currently the ingester will sit silently if it does not have the correct permissions. It should throw an error and kill the pod.

[cmdesc] Remove MQTT interface?

We currently still support the old command escalation interface over MQTT. We should consider carefully whether this is useful.

The Directory currently still uses the MQTT interface.

[auth] Investigate ABAC

The current ACL scheme (subject-predicate-object) is not really sufficient, unless we are going to have a lot of dynamic ACL adjusting by daemon services. We should look into Attribute-Based Access Control as an alternative.

We are going to need an access control language (a language in which to express permission grants). It would be better to reuse something existing, if possible, than to design our own.

https://github.com/AMRC-FactoryPlus/amrc-connectivity-stack/blob/bmz/dyn-deploy/acs-auth/docs/redesign

Feature Req: Scaling or Offsetting Values

Issue:
Reading in data from some devices provides non-ideal tag values. For example, it is common that Modbus devices provide values that are 10x larger than the actual reading to provide a decimal point. This creates an issue further down stream when standard SI units cannot be assigned as there is no SI unit for 10x a value.

Desired Feature:
Allow tags to be scaled by providing a multiplier and offset value. In the case above with Modbus, the multiplier would be 0.1.

amrc-factoryplus / amrc-connectivity-stack Goto Github PK

amrc-connectivity-stack's People

Contributors

Stargazers

Watchers

Forkers

amrc-connectivity-stack's Issues

Features

User Inteface

Features

Issue

Workaround

Recommend Projects

Recommend Topics

Recommend Org