Giter Club home page Giter Club logo

hardware-classification-controller's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

hardware-classification-controller's Issues

Manual validation should be applied when no values are provided in YAML for supported sub-parameters.

As of now we have support for following parameters in Hardware classification :
1. CPU 2. DISK 3. RAM 4. NIC

Suppose a profile is configured with combination like :

  1. Configure CPU/NIC with its sub-parameters like min/maxCount or min/maxSpeed but no values provided
  2. Configure DISK/RAM with their sub-parameters like min/maxSize but no values provided

We should validate the YAML while applying such profiles and display errors for appropriate parameters/sub-parameters.

Need a summary of failed hosts while updating the Labels in HCC logs

Need a summary of failed hosts while updating the Label/s for them.

Currently HWC status shows the errors per host, similarly we need to log the summary of failed hosts to convey user which hosts have failed during Label update process.

Current Log Snippet :

2020-08-31T06:13:49.376-0400 ERROR controllers.HardwareClassification.HardwareClassification-Controller Failed to update labels of BareMetalHost{"metal3-hardwareclassification": "metal3/hardwareclassification-sample"}

Update error count in hwcc status field

Hardware-classification-controller should update count of error host such as registration error, provisioning error, power management error, inspection error in the hwcc status field and also label error hosts accordingly.

As of now hcc only updates the profile match status and the error state, it should also update the count for the error host as well.

Manual validation should be applied to YAML when any of sub-parameters of supported parameters are not present in YAML.

As of now we have support for following parameters in Hardware classification :

  1. CPU 2. DISK 3. RAM 4. NIC

Suppose a profile is configured with combination like :

  1. Configure CPU and all its sub-parameters; Configure RAM/DISK/NIC but without any sub-parameters
  2. Configure CPU, RAM and all its sub-parameters; Configure DISK/NIC but without any sub-parameters

We should validate the YAML before applying such profiles and display errors/warnings for appropriate parameters :

  1. When all main parameters are present but without any sub-parameters, reject the profile at the entry itself.
  2. When some of the parameters are configured properly with their sub-parameters, display warning/s for remaining parameters that they're empty or not provided. Hardware classification should not be affected in this case but warning should be displayed.

Extend the HCC to detect and classify bmh hosts based on extra parameters (CPU, System Vendor and Firmware)

HCC should also compare and classify bare-metal hosts based on extra parameters in CPU, Firmware and System Vendor.

At the time of testing HWCC, we found out that we can add more parameters in existing resources so that we can identify matched hosts more precisely.

CPU architecture will be useful if the user wants to have 32-bit or 64-bit processor as per his requirements.

New parameters such as Firmware and System Vendor will be added advantage to identify matched hosts.

System Vendor will be useful to add as a new parameter if the user is looking for particular vendor and their hardware for his application to support or deploy on.
Firmware will be useful to add as a new parameter if user is looking for specific feature/s in particular releases or versions.

Logs should convey the host-name or ID for which the matching is done

HWCC logs should clearly convey the host for which we are comparing the hardware parameters.

Currently host-name is printed or logged only in following cases :

  1. Any mis-match happens for given parameters during comparison process
  2. If all are valid hosts as per applied profile, then at the end list of all valid hosts is logged.

Suggestion : During comparison step, host-name should be logged when parameters for that host are being compared.

Log Snippet :

"metal3/hardwareclassification-sample1", "Expected": 72, "Actual": 48}
"metal3/hardwareclassification-sample1", "Expected": 38, "Actual": 48}
"metal3/hardwareclassification-sample1", "Expected": 3600, "Actual": 1000}
"metal3/hardwareclassification-sample1", "Expected": 1000, "Actual": 1000}
"metal3/hardwareclassification-sample1", "Expected": 200, "Actual": 192}
"metal3/hardwareclassification-sample1", "Expected": 6, "Actual": 192}
"metal3/hardwareclassification-sample1", "Expected": 12, "Actual": 8}
"metal3/hardwareclassification-sample1", "Expected": 1, "Actual": 8}
"metal3/hardwareclassification-sample1", "Expected": 8, "Actual": 1}
"metal3/hardwareclassification-sample1", "Expected": 1, "Actual": 1}
"metal3/hardwareclassification-sample1", "Expected": 4000, "Actual": 836}
"metal3/hardwareclassification-sample1", "Expected": 200, "Actual": 836}
"metal3/hardwareclassification-sample1", "ValidHosts": ["node-52"]} <<<<<<<<< seen at the end of comparison >>>>>

hcc_logs.txt

HWCC logs should include parameters against which comparison is done.

Applied Yaml File:
apiVersion: metal3.io/v1alpha1
kind: HardwareClassification
metadata:
name: hardwareclassification-sample
namespace: metal3
labels:
hardwareclassification-sample: matches
spec:
hardwareCharacteristics:
cpu:
architecture : "x86_64"

Logs:

2021-05-26T07:15:21.305Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "HardwareClassification", "controller": "hardware-classification", "name": "hardwareclassification-sample", "namespace": "metal3"}
2021-05-26T07:15:21.306Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "BareMetalHost", "controller": "baremetalhost", "name": "node-0", "namespace": "metal3"}
2021-05-26T07:15:21.306Z INFO controllers.BareMetalHost reconciling {"host": "metal3/node-1"}
2021-05-26T07:15:21.393Z INFO classifier CPU {"host": "node-1", "profile": "hardwareclassification-sample", "namespace": "metal3", "minCount": 0, "maxCount": 0, "actualCount": 2, "ok": true}
2021-05-26T07:15:21.395Z INFO classifier CPU {"host": "node-1", "profile": "hardwareclassification-sample", "namespace": "metal3", "minSpeed": 0, "maxSpeed": 0, "actualSpeed": 2694, "ok": true}
2021-05-26T07:15:21.395Z INFO classifier CPU {"host": "node-1", "profile": "hardwareclassification-sample", "namespace": "metal3", "architecture": "x86_64", "actualArchitecture": "x86_64", "ok": true}
2021-05-26T07:15:21.395Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "BareMetalHost", "controller": "baremetalhost", "name": "node-1", "namespace": "metal3"}
2021-05-26T07:15:21.395Z INFO controllers.BareMetalHost reconciling {"host": "metal3/node-19"}
2021-05-26T07:15:21.395Z INFO classifier CPU {"host": "node-19", "profile": "hardwareclassification-sample", "namespace": "metal3", "minCount": 0, "maxCount": 0, "actualCount": 72, "ok": true}
2021-05-26T07:15:21.395Z INFO classifier CPU {"host": "node-19", "profile": "hardwareclassification-sample", "namespace": "metal3", "minSpeed": 0, "maxSpeed": 0, "actualSpeed": 3700, "ok": true}
2021-05-26T07:15:21.395Z INFO classifier CPU {"host": "node-19", "profile": "hardwareclassification-sample", "namespace": "metal3", "architecture": "x86_64", "actualArchitecture": "x86_64", "ok": true}
2021-05-26T07:15:21.395Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "BareMetalHost", "controller": "baremetalhost", "name": "node-19", "namespace": "metal3"}
2021-05-26T07:15:21.395Z INFO controllers.BareMetalHost reconciling {"host": "metal3/node-53"}
2021-05-26T07:15:21.395Z INFO classifier CPU {"host": "node-53", "profile": "hardwareclassification-sample", "namespace": "metal3", "minCount": 0, "maxCount": 0, "actualCount": 48, "ok": true}
2021-05-26T07:15:21.395Z INFO classifier CPU {"host": "node-53", "profile": "hardwareclassification-sample", "namespace": "metal3", "minSpeed": 0, "maxSpeed": 0, "actualSpeed": 1000, "ok": true}
2021-05-26T07:15:21.395Z INFO classifier CPU {"host": "node-53", "profile": "hardwareclassification-sample", "namespace": "metal3", "architecture": "x86_64", "actualArchitecture": "x86_64", "ok": true}

HWCC Disk parameters log

  1. Disk count minimum and maximum values are showing same.

Applied Yaml File
apiVersion: metal3.io/v1alpha1
kind: HardwareClassification
metadata:
name: hardwareclassification-sample
namespace: metal3
labels:
hardwareclassification-sample: matches
spec:
hardwareCharacteristics:
disk:
minimumCount: 1
maximumCount: 8

Logs:
2021-06-08T11:08:02.568Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "HardwareClassification", "controller": "hardware-classification", "name": "hardwareclassification-sample", "namespace": "metal3"}
2021-06-08T11:08:02.568Z INFO controllers.BareMetalHost reconciling {"host": "metal3/node-19"}
2021-06-08T11:08:02.568Z INFO classifier Disk Pattern {"host": "node-19", "profile": "hardwareclassification-sample", "namespace": "metal3", "minCount": 5, "maxCount": 5, "actualCount": 21, "ok": false}
2021-06-08T11:08:02.568Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "BareMetalHost", "controller": "baremetalhost", "name": "node-19", "namespace": "metal3"}
2021-06-08T11:08:02.568Z INFO controllers.BareMetalHost reconciling {"host": "metal3/node-53"}
2021-06-08T11:08:02.568Z INFO classifier Disk Pattern {"host": "node-53", "profile": "hardwareclassification-sample", "namespace": "metal3", "minCount": 5, "maxCount": 5, "actualCount": 2, "ok": false}
2021-06-08T11:08:02.568Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "BareMetalHost", "controller": "baremetalhost", "name": "node-53", "namespace": "metal3"}
2021-06-08T11:08:02.568Z INFO controllers.BareMetalHost reconciling {"host": "metal3/node-18"}
2021-06-08T11:08:02.568Z INFO classifier Disk Pattern {"host": "node-18", "profile": "hardwareclassification-sample", "namespace": "metal3", "minCount": 5, "maxCount": 5, "actualCount": 3, "ok": false}
2021-06-08T11:08:02.568Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "BareMetalHost", "controller": "baremetalhost", "name": "node-18", "namespace": "metal3"}

  1. For disk count parameter it is showing classifier disk pattern.
    Logs:

2021-06-08T11:08:02.568Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "HardwareClassification", "controller": "hardware-classification", "name": "hardwareclassification-sample", "namespace": "metal3"}
2021-06-08T11:08:02.568Z INFO controllers.BareMetalHost reconciling {"host": "metal3/node-19"}
2021-06-08T11:08:02.568Z INFO classifier Disk Pattern {"host": "node-19", "profile": "hardwareclassification-sample", "namespace": "metal3", "minCount": 5, "maxCount": 5, "actualCount": 21, "ok": false}
2021-06-08T11:08:02.568Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "BareMetalHost", "controller": "baremetalhost", "name": "node-19", "namespace": "metal3"}
2021-06-08T11:08:02.568Z INFO controllers.BareMetalHost reconciling {"host": "metal3/node-53"}
2021-06-08T11:08:02.568Z INFO classifier Disk Pattern {"host": "node-53", "profile": "hardwareclassification-sample", "namespace": "metal3", **"minCount": 5, "maxCount": 5, "**actualCount": 2, "ok": false}
2021-06-08T11:08:02.568Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "BareMetalHost", "controller": "baremetalhost", "name": "node-53", "namespace": "metal3"}
2021-06-08T11:08:02.568Z INFO controllers.BareMetalHost reconciling {"host": "metal3/node-18"}
2021-06-08T11:08:02.568Z INFO classifier Disk Pattern {"host": "node-18", "profile": "hardwareclassification-sample", "namespace": "metal3", "minCount": 5, "maxCount": 5, "actualCount": 3, "ok": false}
2021-06-08T11:08:02.568Z DEBUG controller Successfully Reconciled {"reconcilerGroup": "metal3.io", "reconcilerKind": "BareMetalHost", "controller": "baremetalhost", "name": "node-18", "namespace": "metal3"}

Error should be updated to "No BareMetalHost Found" in HCC status when there are no baremetal hosts

When there are no BareMetalHosts in BMH list, we applied the profile as below, status column for the profile is updated to "unmatched" instead of that, it should be updated as "No BareMetalHost Found" in error column

kind: HardwareClassification
metadata:
  name: hardwareclassification-sample
  namespace: metal3
  labels:
    hardwareclassification-sample: matches
spec:
  hardwareCharacteristics:
      cpu:
         architecture : "x86_64"
         minimumCount: 48
         maximumCount: 72
         minimumSpeedMHz: 1000
         maximumSpeedMHz: 3600
      disk:
         minimumCount: 1
         maximumCount: 8

Label is set to default even if provided in CR for valid host

If in the CR, provided the label in key-value pair, and apply the profile with kubectl apply -f Profile.yaml.

Label does not get updated with value provided in the CR to valid host it is set to default.

apiVersion: metal3.io/v1alpha1
kind: HardwareClassification
metadata:
  name: hardwareclassification-sample
  namespace: metal3
  labels:
    hardwareclassification-sample: match
spec:
  hardwareCharacteristics:
      cpu:
         minimumCount: 48
         maximumCount: 72
         minimumSpeedMHz: 2600
         maximumSpeedMHz: 3800
      disk:
         minimumCount: 1
         maximumCount: 8
         minimumIndividualSizeGB: 200
         maximumIndividualSizeGB: 3000
      ram:
         minimumSizeGB: 6
         maximumSizeGB: 200
      nic:
         minimumCount: 1
         maximumCount: 12

By applying the above Profile CR, the label is set to default which is matches for valid hosts.

node-50@node50-PowerEdge-R740xd:~$ kubectl get bmh -n metal3  --show-labels
NAME      STATUS   PROVISIONING STATUS   CONSUMER   BMC                                                              HARDWARE PROFILE   ONLINE   ERROR   LABELS
node-49   OK       ready                            redfish://192.168.110.49/redfish/v1/Systems/System.Embedded.1/   unknown            false            hardwareclassification.metal3.io/hardwareclassification-sample=matches

Provisioned node's labels should not be deleted, if we make changes in the same profile and apply.

Steps to reproduce :

  1. Initially have 2-3 BMH in Ready state, create one HCC profile and apply the same.
  2. Check that HCC applies label/s to matching host/s as per criteria.
  3. Now change state of one of the BMH such that it is in Non-Ready state (E.g. execute provision cluster script so that one BMH changes state from Ready -> Provisioning)
  4. Now re-apply the same HCC profile created in Step-1.
  5. Check the status of labels applied earlier in Step-2.

Observations : For node/s in Provision-ed/ing state (non-ready state to be specific), earlier applied labels should not be deleted in above scenario.

provision_node_label_deletion_issue.txt

provision_delete_label

Logs should convey the label value when it's applied to matching hosts

HWCC logs should clearly indicate the label value which is applied to a matching host/s.
Currently this value is printed in "Delete Label" method but not in "Set Label" method.

This will help user to identify the label (default/custom/user-provided) being applied to matching host/s.

Log Snippet :

  1. No label value in Set Label :

2020-08-21T06:15:50.472Z INFO controllers.HardwareClassification.HardwareClassification-Controller Set Label {"metal3-hardwareclassification": "metal3/hardwareclassification-sample", "BareMetalHost": "node-52"}

  1. Label value in Delete Label :

2020-08-21T07:04:39.281Z INFO controllers.HardwareClassification.HardwareClassification-Controller Delete Label {"metal3-hardwareclassification": "metal3/hardwareclassification-sample", "BareMetalHost": "node-52", "hardwareclassification.metal3.io/hardwareclassification-sample": "match"}

Logs should be updated or simplfied to reflect the required hardware comparison for a given profile

The HCC logs should be simplified so that user should be able to observe logs for profile being applied for comparison.

Current issue : The logs are getting bulkier as and when new profile is applied, so it becomes difficult to figure out the comparison details for a given profile. The list of profiles in a single log line keep on increasing so logs become little lengthy to read and difficult to debug.

PFA logs for reference.
hcc_logs.txt

forbidden: User "system:serviceaccount:hcc-system:default" Error

While applying sample Profile an error is encountered in the container logs of hardware-classification-controller.

It successfully extracts the data from Profile. But afterwards it fails.

Below are the container logs-:

ubuntu@ubuntu-PowerEdge-R640:~$ kubectl logs -f hcc-controller-manager-5b9ccfc986-5k9bt --container manager -n hcc-system
2020-05-14T07:33:40.808Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": "127.0 .0.1:8080"}
2020-05-14T07:33:40.808Z INFO setup starting manager
2020-05-14T07:33:40.809Z INFO controller-runtime.manager starting metrics server {"path": "/metrics"}
2020-05-14T07:33:58.207Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"ConfigMap","namespa ce":"hcc-system","name":"controller-leader-election-helper","uid":"47c5a3fd-6e16-426f-8fd3-bcd9118a9892","apiVersion":"v1","res ourceVersion":"3839251"}, "reason": "LeaderElection", "message": "hcc-controller-manager-5b9ccfc986-5k9bt_edc740b9-034d-4b05-84 d6-322abac02804 became leader"}
2020-05-14T07:33:58.208Z INFO controller-runtime.controller Starting EventSource {"controller": "hardwareclassif ication", "source": "kind source: /, Kind="}
2020-05-14T07:33:58.308Z INFO controller-runtime.controller Starting EventSource {"controller": "hardwareclassif ication", "source": "kind source: /, Kind="}
2020-05-14T07:33:58.308Z INFO controller-runtime.controller Starting Controller {"controller": "hardwareclassif ication"}
2020-05-14T07:33:58.408Z INFO controller-runtime.controller Starting workers {"controller": "hardwareclassif ication", "worker count": 1}
2020-05-14T07:58:02.245Z INFO controllers.HardwareClassification Extracted hardware configurations successfully {"Profile": {"CPU":{"minimumCount":48,"maximumCount":72,"minimumSpeed":"2.7","maximumSpeed":"3.6"},"Disk":{"minimumCount":1,"minimumIndividualSizeGB":200,"maximumCount":7,"maximumIndividualSizeGB":3000},"NIC":{"minimumCount":1,"maximumCount":7},"RAM":{"minimumSizeGB":6,"maximumSizeGB":180}}}
E0514 07:58:02.248759 1 reflector.go:123] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:96: Failed to list *v1alpha1.BareMetalHost: baremetalhosts.metal3.io is forbidden: User "system:serviceaccount:hcc-system:default" cannot list resource "baremetalhosts" in API group "metal3.io" at the cluster scope
E0514 07:58:03.252665 1 reflector.go:123] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:96: Failed to list *v1alpha1.BareMetalHost: baremetalhosts.metal3.io is forbidden: User "system:serviceaccount:hcc-system:default" cannot list resource "baremetalhosts" in API group "metal3.io" at the cluster scope
E0514 07:58:04.255663 1 reflector.go:123] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:96: Failed to list *v1alpha1.BareMetalHost: baremetalhosts.metal3.io is forbidden: User "system:serviceaccount:hcc-system:default" cannot list resource "baremetalhosts" in API group "metal3.io" at the cluster scope
E0514 07:58:05.256648 1 reflector.go:123] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:96: Failed to list *v1alpha1.BareMetalHost: baremetalhosts.metal3.io is forbidden: User "system:serviceaccount:hcc-system:default" cannot list resource "baremetalhosts" in API group "metal3.io" at the cluster scope
E0514 07:58:06.259305 1 reflector.go:123] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:96: Failed to list *v1alpha1.BareMetalHost: baremetalhosts.metal3.io is forbidden: User "system:serviceaccount:hcc-system:default" cannot list resource "baremetalhosts" in API group "metal3.io" at the cluster scope

Delete already added label in BM host CR

There is one noticed in our testing that if user by mistake forgot to configure right Hardware parameters and applied profile ABC using kubectl, so HCC proceesed profile ABC and add labels on matches hosts.
After sometime, by looking at profile ABC, user realized that he has to do different configuration. In this case, if user does change in same profile ABC by configuring right HW params and process it.

HCC should delete labels from existing hosts first and classify hosts based on new configuration provided and add label to matches hosts.

Please let me know if you are agree with the above workflow for deleting label feature.
Or
Let us know your approach in this case.

T

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.