openshift / elasticsearch-operator Goto Github PK

License: Apache License 2.0

Shell 22.03% Go 75.69% Dockerfile 0.48% Makefile 1.58% Awk 0.09% Python 0.13%

elasticsearch-operator's Introduction

elasticsearch-operator

Elasticsearch operator to run Elasticsearch cluster on top of Openshift and Kubernetes. Operator uses Operator Framework SDK.

Why Use An Operator?

Operator is designed to provide self-service for the Elasticsearch cluster operations, see Operator Capability Levels.

Elasticsearch operator ensures proper layout of the pods
Elasticsearch operator enables proper rolling cluster restarts
Elasticsearch operator provides kubectl interface to manage your Elasticsearch cluster
Elasticsearch operator provides kubectl interface to monitor your Elasticsearch cluster

To experiment or contribute to the development of elasticsearch-operator, see HACKING.md and REVIEW.md

elasticsearch-operator's People

Contributors

Stargazers

Watchers

Forkers

viaq ewolinetz t0ffel anpingli jcantrill lukas-vlcek josefkarasek charlesakalugwu zak-hassan orz6 richm smarterclayton jpkrohling nhosoi objectiser pavolloffay vladmasarik cfontes sabreoss qiaolingtang openshift-cherrypick-robot bparees bysnupy jaormx gpe-mw-training igor-karpukhin vimalk78 periklis cloud4mama vfreex radave alanconway wewang58 syedriko dinhxuanvu multi-arch andymcc sosiouxme georgettica flacatus tessg22 openshift-bot sabdulrahuman fridmehdi vr4manta 2uasimojo gkarager jupierce huikang arve0 ajaygupta978 yithian sedroche axrayn global-localhost global19 global19-atlassian-net darecoder red-gv akshaybhansali18 keremavci sasagarw isabella232 zhuje sreber84 thinko ronensc rmadamson btaani scaps-nrb shwetaap cahartma vparfonov tim0licious forget6 tucsolo miiraheart xperimental huehnervater doytsujin lm0943111262 frzifus developless clee2691 aminesnow t1seungy kabirbhartirh thiyaguelmails johnrk joaobravecoding luohoufu yasyx

elasticsearch-operator's Issues

Feature request: Allow setting shard allocation awareness

When configuring the replication policy, we can choose between different variants:

ZeroRedundancy
SingleRedundancy
MultipleRedundancy
FullRedundancy

Typically, a production cluster is run across multiple availability zones. If we choose SingleRedundancy or MultipleRedundancy, the shard allocation awareness that Elasticsearch supports should also be set (optionally) when replicating shards to avoid a replica shard ending up within the same availability zone.

When running nodes on multiple VMs on the same physical server, on multiple racks, or across multiple zones or domains, it is more likely that two nodes on the same physical server, in the same rack, or in the same zone or domain will crash at the same time, rather than two unrelated nodes crashing simultaneously.

Upgrade stuck

As of today, I can see to ElasticSearch operators listed in my ClusterServiceVersions:

# oc get csv
NAME                                        DISPLAY                        VERSION              REPLACES                              PHASE
elasticsearch-operator.4.2.1-201910221723   Elasticsearch Operator         4.2.1-201910221723                                         Succeeded
elasticsearch-operator.4.2.4-201911050122   Elasticsearch Operator         4.2.4-201911050122                                         Pending

Describing that pending CSV, I would find in its status:

  - lastTransitionTime: "2019-11-21T15:18:04Z"
    lastUpdateTime: "2019-11-21T15:18:04Z"
    message: conflicting CRD owner in namespace
    phase: Failed
    reason: OwnerConflict
  - lastTransitionTime: "2019-11-21T15:18:07Z"
    lastUpdateTime: "2019-11-21T15:18:07Z"
    message: 'installing: ComponentMissing: missing deployment with name=elasticsearch-operator'
    phase: Pending
    reason: NeedsReinstall

I tried to label the elasticsearch-operator deployment, in the openshift-operators namespace, with name=elasticseach-operator. Did not help.

Looking at the logs from the catalog-operator, in openshift-operator-lifecycle-manager project, we would see:

E1121 15:12:21.581706       1 queueinformer_operator.go:282] sync {"update" "openshift-operators"} failed: logging.openshift.io/v1/Elasticsearch (elasticsearches) already provided by elasticsearch-operator.4.2.4-201911050122
E1121 15:12:21.782068       1 queueinformer_operator.go:282] sync "openshift-operators" failed: logging.openshift.io/v1/Elasticsearch (elasticsearches) already provided by elasticsearch-operator.4.2.4-201911050122
E1121 15:12:23.381951       1 queueinformer_operator.go:282] sync {"update" "openshift-operators"} failed: logging.openshift.io/v1/Elasticsearch (elasticsearches) already provided by elasticsearch-operator.4.2.1-201910221723
E1121 15:12:23.781774       1 queueinformer_operator.go:282] sync "openshift-operators" failed: logging.openshift.io/v1/Elasticsearch (elasticsearches) already provided by elasticsearch-operator.4.2.1-201910221723

Having had a similar issue with Jaeger lately, I suspect this could be fixed by deinstalling then re-installing the operator. Though I'ld rather try and figure out what went wrong.

Any clue what could be going on?

Broken link to maturity model diagram

There is link to operator maturity model in README.md file in Why Use An Operator? section.
It is broken, the diagram seem to have moved to https://github.com/operator-framework/operator-sdk/blob/master/doc/images/operator-maturity-model.png.

Elasticsearch-operator scaling info/documentation/manual

Hi All,
Could you please tell me where i can find any documentation,manual,etc on Elasticsearch scaling (-up, -down) for elasticsearch-operator?

failed to create or get service for metrics

Starting the ElasticSearch operator on OpenShift, we would find the following warning:

# oc logs -f elasticsearch-operator-5d4b85bcf8-65cb9
...
{"level":"info","ts":1574348719.3720665,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1574348719.5793352,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}
...

Looks like a warning, though could be worth fixing. Couldn't figure out what's wrong: I don't have an elasticsearch-operator service yet. Tried to create it, with then without an ownerReference to my operator deployment.

Source image for the ElasticSearch operator pod: registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:69968bc98d29773b89ba9b3c3c5b1fc44d8df5226934683e3029ae14f4829fab

Enable travis tests for openshift

As of now the travis tests are running for https://travis-ci.org/ViaQ/elasticsearch-operator but we would like to make it run for https://travis-ci.org/openshift/elasticsearch-operator as well (or primarily, right?).

Among others the build status in README.md is referencing to https://travis-ci.org/ViaQ/elasticsearch-operator which is confusing.

Dangerous to allow individual nodes to overwrite the general characteristics of all nodes

The Elasticsearch CR allows specification of attributes to be applied to all nodes that can be over-ridden by specifications of individual nodes. The simple implementation is to reuse the stuct. In theory, you can override the image of a specific node which seems like a really BAD idea. We should re-evaluate ElasticsearchNodeSpec [1] to determine the attributes for which it does not make sense to override on individual nodes.

[1] https://github.com/openshift/elasticsearch-operator/blob/master/pkg/apis/elasticsearch/v1alpha1/types.go#L107

ERROR: no RBAC policy matched

I am having issues starting the Elasticsearch operator:

$ oc logs elasticsearch-operator-67c65cdbc7-8stnd
[...]
ERROR: logging before flag.Parse: E1003 12:16:56.123572       1 reflector.go:205]
  github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:91: Failed to list
  *unstructured.Unstructured: elasticsearches.elasticsearch.redhat.com is forbidden:
  User "system:serviceaccount:openshift-logging:default" cannot list
  elasticsearches.elasticsearch.redhat.com in the namespace "openshift-logging":
  no RBAC policy matched

[origin@ip-172-18-7-124 elasticsearch-operator]$ oc get all
NAME                                          READY     STATUS    RESTARTS   AGE
pod/elasticsearch-operator-67c65cdbc7-8stnd   1/1       Running   0          6m
pod/kibana-57f79f6b64-nwnjr                   2/2       Running   0          18m

NAME                               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)     AGE
service/cluster-logging-operator   ClusterIP   172.30.85.35   <none>        60000/TCP   18m
service/kibana                     ClusterIP   172.30.86.23   <none>        443/TCP     18m

NAME                     DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                AGE
daemonset.apps/fluentd   0         0         0         0            0           logging-infra-fluentd=true   18m

NAME                                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/elasticsearch-operator   1         1         1            1           6m
deployment.apps/kibana                   1         1         1            1           18m

NAME                                                DESIRED   CURRENT   READY     AGE
replicaset.apps/elasticsearch-operator-67c65cdbc7   1         1         1         6m
replicaset.apps/kibana-57f79f6b64                   1         1         1         18m

NAME                    SCHEDULE     SUSPEND   ACTIVE    LAST SCHEDULE   AGE
cronjob.batch/curator   30 3 * * *   False     0         <none>          18m

NAME                              HOST/PORT            PATH      SERVICES   PORT      TERMINATION   WILDCARD
route.route.openshift.io/kibana   kibana.example.com             kibana     <all>                   None

I will prepare complete re-creation script so that we can check if I do everything correctly.

Apply license to source code properly

"Nit picking" but the license should be applied properly in terms of:
https://github.com/openshift/elasticsearch-operator/blob/master/LICENSE#L178

Preferably use license checker.

keytool error: java.io.IOException: parseAlgParameters failed: PBE AlgorithmParameters not available

Describe the bug

 k logs elasticsearch-cdm-8mplmgfq-1-6b89fcb7f5-w4dp4   elasticsearch
[2021-11-15 11:09:04,955][INFO ][container.run            ] Begin Elasticsearch startup script
[2021-11-15 11:09:04,959][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2021-11-15 11:09:04,961][INFO ][container.run            ] Inspecting cgroup version...
[2021-11-15 11:09:04,963][INFO ][container.run            ] Detected cgroup v1
[2021-11-15 11:09:04,965][INFO ][container.run            ] Inspecting the maximum RAM available...
[2021-11-15 11:09:04,969][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms24576m -Xmx24576m'
[2021-11-15 11:09:04,971][INFO ][container.run            ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch//secret
[2021-11-15 11:09:04,976][INFO ][container.run            ] Building required p12 files and truststore
keytool error: java.io.IOException: parseAlgParameters failed: PBE AlgorithmParameters not available

Environment

OCP 4.8.14
elastic operator: 5.3.0-67 provided by Red Hat
ClusterLogging instance
the cluster is FIPS enabled: https://docs.openshift.com/container-platform/4.8/installing/installing-fips.html

Logs

 k logs elasticsearch-cdm-8mplmgfq-1-6b89fcb7f5-w4dp4   elasticsearch
[2021-11-15 11:09:04,955][INFO ][container.run            ] Begin Elasticsearch startup script
[2021-11-15 11:09:04,959][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2021-11-15 11:09:04,961][INFO ][container.run            ] Inspecting cgroup version...
[2021-11-15 11:09:04,963][INFO ][container.run            ] Detected cgroup v1
[2021-11-15 11:09:04,965][INFO ][container.run            ] Inspecting the maximum RAM available...
[2021-11-15 11:09:04,969][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms24576m -Xmx24576m'
[2021-11-15 11:09:04,971][INFO ][container.run            ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch//secret
[2021-11-15 11:09:04,976][INFO ][container.run            ] Building required p12 files and truststore
keytool error: java.io.IOException: parseAlgParameters failed: PBE AlgorithmParameters not available

Expected behavior
Should start w/o error

Actual behavior
Fails as described

To Reproduce
Steps to reproduce the behavior:

Upgrade to latest release

Additional context
Add any other context about the problem here.

olm data needs to be updated for 4.3

https://github.com/openshift/elasticsearch-operator/tree/release-4.3/manifests
file not found: manifests/4.3/image-references

Creation of ES instance fails on OCP 4.1.2

When I try to create an elasticsearch backed Jaeger Operator instance using on of the example yaml files (https://github.com/jaegertracing/jaeger-operator/blob/master/deploy/examples/simple-prod-deploy-es.yaml) the operator instance never starts as the elasticsearch pod never gets out of pending state.

There is nothing in the log, but I get the following output from oc get elasticsearch -o yaml:

ovpn-118-43:jaeger-operator kearls$ oc get elasticsearch -o yaml
apiVersion: v1
items:

apiVersion: logging.openshift.io/v1
kind: Elasticsearch
metadata:
creationTimestamp: "2019-06-27T12:56:31Z"
generation: 8
labels:
app: jaeger
app.kubernetes.io/co app.kubernetes.io/co app.kubernetes.io/co app.kubernetes.io/component: elasticsearch
app.kubernetes.io/instance: simple-prod
app.kubernetes.io/name: elasticsearch
app.kubernetes.io/part-of: jaeger
name: elasticsearch
namespace: fud
ownerReferences:
- apiVersion: jaegertracing.io/v1
  controller: true
  kind: Jaeger
  name: simple-prod
  uid: fbd9d6b0-98da-11e9-8a21-fa163e292f36
  resourceVersion: "412161"
  selfLink: /apis/logging.openshift.io/v1/namespaces/fud/elasticsearches/elasticsearch
  uid: fc4ed617-98da-11e9-8a21-fa163e292f36
  spec:
  managementState: Managed
  nodeSpec:
  resources: {}
  nodes:
- nodeCount: 1
  resources: {}
  roles:
  - client
  - data
  - master
    storage: {}
    redundancyPolicy: ""
    status:
    cluster:
    activePrimaryShards: 0
    activeShards: 0
    initializingShards: 0
    numDataNodes: 0
    numNodes: 0
    pendingTasks: 0
    relocatingShards: 0
    status: ""
    unassignedShards: 0
    clusterHealth: ""
    conditions:
- lastTransitionTime: "2019-06-27T12:58:33Z"
  message: Previously used GenUUID "x1s6chde" is no longer found in Spec.Nodes
  reason: Invalid Spec
  status: "True"
  type: InvalidUUID
  nodes:
- conditions:
  - lastTransitionTime: "2019-06-27T12:56:31Z"
    message: '0/5 nodes are available: 5 node(s) didn''t match node selector.'
    reason: Unschedulable
    status: "True"
    type: Unschedulable
    deploymentName: elasticsearch-cdm-x1s6chde-1
    upgradeStatus: {}
    pods:
    client:
    failed: []
    notReady:
    - elasticsearch-cdm-x1s6chde-1-644b55ccdd-b2pzg
      ready: []
      data:
      failed: []
      notReady:
    - elasticsearch-cdm-x1s6chde-1-644b55ccdd-b2pzg
      ready: []
      master:
      failed: []
      notReady:
    - elasticsearch-cdm-x1s6chde-1-644b55ccdd-b2pzg
      ready: []
      shardAllocationEnabled: shard allocation unknown
      kind: List
      metadata:
      resourceVersion: ""
      selfLink: ""

connect to elastic from service

If I use the elastic and cluster logging operators in openshift then presumably the auth used on kibana is the same as the openshift cluster itself, so it requires a bearer token to access. Is this also the case for the elastic? I'm wondering whether there's any easy way to connect to it from a backend service.

Remove duplicate objects in `deploy` directory

Most of the objects in deploy are duplicates of manifests. We should remove those and update our README to point users to the manifests objects or to the make targets.

Please prepare bundle for OCP 4.7

Now that we are fully branched for 4.7, please prepare your operator to supply a 4.7 bundle, so that 4.7 operator publishing works and doesn't overwrite 4.6 bundles. This means at least updating the package.yaml under
https://github.com/openshift/elasticsearch-operator/tree/master/manifests

@jcantrill @richm

Reference: openshift-eng/ocp-build-data#708

Update Alerting rules

ElasticsearchNodeDiskLowForSegmentMerges rule was removed for now.
See lukas-vlcek/elasticsearch-mixin#3

Unable to parse loglevel

Starting the ElasticSearch operator on OpenShift, we would find the following warning:

time="2019-11-21T15:05:18Z" level=warning msg="Unable to parse loglevel \"\""

Source image for that Pod: registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:69968bc98d29773b89ba9b3c3c5b1fc44d8df5226934683e3029ae14f4829fab

Is there anything I should do?

Secret mountpoint needs to be updated

The old mountpoint for our secret is /etc/elasticsearch/secret per the image change [1] to generate our own jks format keys it needs to be updated to /etc/openshift/elasticsearch/secret

[1] openshift/origin-aggregated-logging@0966dbd

permissions error on install with ClusterLogging using reduced resources

I installed the ClusterLogging operator (4.3.16) on my OpenShift (4.3.13) and then created an instance using:

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  name: instance
  namespace: seldon-system
spec:
  managementState: Managed
  logStore:
    type: elasticsearch
    elasticsearch:
      nodeCount: 1
      redundancyPolicy: ZeroRedundancy
      storage:
        storageClassName: gp2
        size: 50G
  visualization:
    type: kibana
    kibana:
      replicas: 1
  curation:
    type: curator
    curator:
      schedule: 30 3 * * *
  collection:
    logs:
      type: fluentd
      fluentd: {}

I just modified the default to reduce the elastic node count and storage size. It creates an elastic instance with this spec:

apiVersion: logging.openshift.io/v1
kind: Elasticsearch
metadata:
  creationTimestamp: '2020-04-30T14:19:42Z'
  generation: 4
  name: elasticsearch
  namespace: seldon-system
  ownerReferences:
    - apiVersion: logging.openshift.io/v1
      controller: true
      kind: ClusterLogging
      name: instance
      uid: b680d1e1-13e8-4034-8160-55b075eaf08b
  resourceVersion: '1217863'
  selfLink: >-
    /apis/logging.openshift.io/v1/namespaces/seldon-system/elasticsearches/elasticsearch
  uid: 1319505d-7de5-4a50-9eea-2f79a9039f08
spec:
  managementState: Managed
  nodeSpec:
    image: >-
      registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:f02e4f75617b706d9b8e2dc06777aa572a443ccc3dd604ce4c21667f55725435
    resources:
      limits:
        memory: 16Gi
      requests:
        cpu: '1'
        memory: 16Gi
  nodes:
    - genUUID: qyaxnpkc
      nodeCount: 1
      resources: {}
      roles:
        - client
        - data
        - master
      storage:
        size: 50G
        storageClassName: gp2
  redundancyPolicy: ZeroRedundancy

But the elasticsearch doesn't start. Doing a describe on its replicaset reveals this error:

  Warning  FailedCreate  18s (x16 over 3m2s)  replicaset-controller  Error creating: pods "elasticsearch-cdm-qyaxnpkc-1-f7cf8c447-" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{1337}: 1337 is not an allowed group spec.initContainers[0].securityContext.securityContext.runAsUser: Invalid value: 1337: must be in the ranges: [1000610000, 1000619999] spec.containers[2].securityContext.securityContext.runAsUser: Invalid value: 1337: must be in the ranges: [1000610000, 1000619999]]

Likewise for kibana:

  Warning  FailedCreate  4m44s (x18 over 10m)  replicaset-controller  Error creating: pods "kibana-6bc755775d-" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{1337}: 1337 is not an allowed group spec.initContainers[0].securityContext.securityContext.runAsUser: Invalid value: 1337: must be in the ranges: [1000610000, 1000619999] spec.containers[2].securityContext.securityContext.runAsUser: Invalid value: 1337: must be in the ranges: [1000610000, 1000619999]]

The fluent instances come up and I've other services running in the namespace. I wasn't expecting to have to set any permissions for instances installed through the operator.

Not launching kibana instance from elasticsearch operator

Describe the bug
Not launching kibana instance from elasticsearch operator when created instance for kibana. Not observed any errors or activity related to this.

Environment

Openshift 4.5
Red Hat OpenShift Jaeger 1.17.6 provided by Red Hat. Created Jeager Production strategy instance with elasticsearch storage.
Elasticsearch Operator 4.5.0-202010161522.p0 provided by Red Hat, Inc

Logs
Not observed any errors or activity related to this.

Expected behavior
Should be able to create kibana instance and integrate with elasticsearch

Actual behavior
Not launching kibana instance from elasticsearch operator

To Reproduce
Steps to reproduce the behavior:

Deploy jeager production strategy instance in openshift 4.5 cluster.
Try to create kibana instance

Additional context

Add PrometheusRule object

We need to provide PrometheusRule object holding the file with alerting (and recording) rules.

Basic resource:

The starting documentation can be found here: prometheus-operator/Alerting and check CR example with:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
...

Other resources can be found all around the source code - search prometheus-operator repo for PrometheusRule.
It seems to me the whole build is initiated from within make which generates the prometheus rules files and these are then used in contrib/kube-prometheus/jsonnet/kube-prometheus/prometheus/prometheus.libsonnet
We are currently developing Prometheus rules and alerts here: https://github.com/lukas-vlcek/elasticsearch-mixin

//cc @brancz

Operator installation fails on OCP 4.4 cluster

I tried to install the ElasticSearch operator from the Red Hat operator source using OperatorHub on an OCP 4.4 cluster (from build 4.4.0-0.nightly-2020-04-16-231032) OperatorHub shows ES operator version 4.4.0-202004130743 using image registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:efa7b8b8225148d459e07071123ffb3d86a053fade04128fdf239a69e2188213

I installed using update channel 4.4, all namespaces for installation mode, and automatic approval strategy. The operator fails to install with:

Failed to pull image "registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:efa7b8b8225148d459e07071123ffb3d86a053fade04128fdf239a69e2188213": rpc error: code = Unknown desc = Error reading manifest sha256:efa7b8b8225148d459e07071123ffb3d86a053fade04128fdf239a69e2188213 in registry.redhat.io/openshift4/ose-elasticsearch-operator: error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: "File not found.""

@objectiser @mattmahoneyrh

Use RetryOnConflict for updates to existing objects

First take a look at https://github.com/kubernetes/client-go/blob/master/examples/create-update-delete-deployment/main.go#L102 which describes why RetryOnConflict is needed.

There are several patterns in our code like this:

  client.Get(object)
  object.somefield = "new value"
  client.Update(object)

The problem is that the object can be updated by another client between the Get and the Update and the Update will return a Conflict error. Instead, we need to wrap all such places in our code with RetryOnConflict

I've already seen cases running e2e tests where we get errors from conflicts.

Future Release Branches Frozen For Merging | branch:release-4.8

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.8

Contact the Test Platform or Automated Release teams for more information.

Please prepare bundle for OCP 4.4

Now that we are fully branched for 4.4, please prepare your operator to supply a 4.4 bundle, so that 4.4 operator publishing works and doesn't overwrite 4.3 bundles. This means at least updating the package.yaml under https://github.com/openshift/elasticsearch-operator/tree/master/manifests
@jcantrill @richm

Unable to launch a Kibana CR instance using latest ElasticSearch operator, Steps needed

Describe the bug
Unable to launch a Kibana CR instance using latest ElasticSearch operator

Environment

OCP 4.5

Logs
We don't see any Error messages neither any resources created.

Expected behavior
Kibana CR instance created successfully and connected to Elastic instance and able to login via route

Actual behavior
Nothing happens, no pods, no secrets get created.

To Reproduce
Steps to reproduce the behavior:

Install ElasticSearch Operator in namespace
Create a ElasticSearch CR instance and then Kibana CR instance using the default CR
ElasticSearch pods get launched, but KIbana pods dont get launched.
No Error or anything to debug

Additional context
YAML File of Kibana CR instance:

apiVersion: logging.openshift.io/v1
kind: Kibana
metadata:
  name: kibana
  namespace: xyz
spec:
  replicas: 1
  resources:
    limits:
      memory: 512Mi
    requests:
      memory: 512Mi
  managementState: Managed
  nodeSelector: {}

memoryleak

Describe the bug
the operator seems to leak memory:
from top:

32830 1000780+ 20 0 16.6g 14.0g 8592 S 14.8 44.7 916:22.85 elasticsearch-o

RSS 14g should not be needed

Environment

image: registry.redhat.io/openshift-logging/cluster-logging-rhel8-operator@sha256:c39216ac4d18f40b793aeea9b9ce2ee98118526cc3e7422b6721c961590a18c3

OpenShift Elasticsearch Operator 5.1.0-96 provided by Red Hat
ClusterLogging instance [1]
ocp 4.7.23

Logs
N/A

Expected behavior
Should chill and not use so much memory

Actual behavior
Leaks, also no resource settings on the operator pod, so that it can occupy the whole box and influence cluster stability. I suggest setting some sensible defaults, this would make k8s kill the pod and it would come up fresh, which is preferrable

To Reproduce
Steps to reproduce the behavior:

just let it run

Additional context
[1]

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"logging.openshift.io/v1","kind":"ClusterLogging","metadata":{"annotations":{"kustomize.toolkit.fluxcd.io/checksum":"b1741967b24e5507a83f60074cc06a6aa67aa9a0"},"labels":{"kustomize.toolkit.fluxcd.io/name":"flux-system","kustomize.toolkit.fluxcd.io/namespace":"flux-system"},"name":"instance","namespace":"openshift-logging"},"spec":{"collection":{"logs":{"fluentd":{"resources":{"limits":{"memory":"2Gi"},"requests":{"cpu":"100m","memory":"1Gi"}},"tolerations":[{"operator":"Exists"}]},"type":"fluentd"}},"logStore":{"elasticsearch":{"nodeCount":3,"nodeSelector":{"node-role.kubernetes.io/cluster-logging":""},"proxy":{"resources":{"limits":{"memory":"256Mi"},"requests":{"memory":"256Mi"}}},"redundancyPolicy":"SingleRedundancy","resources":{"limits":{"memory":"48Gi"},"requests":{"cpu":2,"memory":"32Gi"}},"storage":{"size":"2Ti","storageClassName":"openebs-local"},"tolerations":[{"effect":"NoExecute","key":"logging","operator":"Exists"}]},"retentionPolicy":{"application":{"maxAge":"14d"},"audit":{"maxAge":"14d"},"infra":{"maxAge":"14d"}},"type":"elasticsearch"},"managementState":"Managed","visualization":{"kibana":{"nodeSelector":{"node-role.kubernetes.io/cluster-logging":""},"replicas":1,"tolerations":[{"effect":"NoExecute","key":"logging","operator":"Exists"}]},"type":"kibana"}}}
    kustomize.toolkit.fluxcd.io/checksum: b1741967b24e5507a83f60074cc06a6aa67aa9a0
  creationTimestamp: "2021-04-26T10:20:23Z"
  generation: 5
  labels:
    kustomize.toolkit.fluxcd.io/name: flux-system
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: instance
  namespace: openshift-logging
  resourceVersion: "642854974"
  selfLink: /apis/logging.openshift.io/v1/namespaces/openshift-logging/clusterloggings/instance
  uid: 9576cbb7-39ec-462f-bfd6-e5688103233c
spec:
  collection:
    logs:
      fluentd:
        resources:
          limits:
            memory: 2Gi
          requests:
            cpu: 100m
            memory: 1Gi
        tolerations:
        - operator: Exists
      type: fluentd
  logStore:
    elasticsearch:
      nodeCount: 3
      nodeSelector:
        node-role.kubernetes.io/cluster-logging: ""
      proxy:
        resources:
          limits:
            memory: 256Mi
          requests:
            memory: 256Mi
      redundancyPolicy: SingleRedundancy
      resources:
        limits:
          memory: 48Gi
        requests:
          cpu: 2
          memory: 32Gi
      storage:
        size: 2Ti
        storageClassName: openebs-local
      tolerations:
      - effect: NoExecute
        key: logging
        operator: Exists
    retentionPolicy:
      application:
        maxAge: 14d
      audit:
        maxAge: 14d
      infra:
        maxAge: 14d
    type: elasticsearch
  managementState: Managed
  visualization:
    kibana:
      nodeSelector:
        node-role.kubernetes.io/cluster-logging: ""
      replicas: 1
      tolerations:
      - effect: NoExecute
        key: logging
        operator: Exists
    type: kibana
status:
  clusterConditions:
  - lastTransitionTime: "2021-04-26T10:21:32Z"
    status: "False"
    type: CollectorDeadEnd
  - lastTransitionTime: "2021-07-25T22:13:53Z"
    message: curator is deprecated in favor of defining retention policy
    reason: ResourceDeprecated
    status: "True"
    type: CuratorRemoved
  collection:
    logs:
      fluentdStatus:
        daemonSet: fluentd
        nodes:
          fluentd-274lv: ilt-ksx-r-c01ocp01
          fluentd-2qqhm: alt-ksx-r-c01oco03
          fluentd-6jskd: ilt-ksx-r-c01ocp03
          fluentd-78gtj: alt-eos-r-c01oco03
          fluentd-b72tv: alt-ksx-r-c01oco04
          fluentd-ddp66: alt-ksx-r-c01oco02
          fluentd-flrmc: alt-eos-r-c01oco02
          fluentd-hkpbp: alt-ebs-r-c01oco01
          fluentd-jgxd6: alt-ksx-r-c01oco01
          fluentd-qsd24: ilt-ksx-r-c01ocp02
          fluentd-t9pjn: alt-eos-r-c01oco01
          fluentd-tzptt: alt-ksx-r-c01oco06
          fluentd-xwp7r: alt-ksx-r-c01oco05
        pods:
          failed: []
          notReady: []
          ready:
          - fluentd-274lv
          - fluentd-2qqhm
          - fluentd-6jskd
          - fluentd-78gtj
          - fluentd-b72tv
          - fluentd-ddp66
          - fluentd-flrmc
          - fluentd-hkpbp
          - fluentd-jgxd6
          - fluentd-qsd24
          - fluentd-t9pjn
          - fluentd-tzptt
          - fluentd-xwp7r
  curation:
    curatorStatus:
    - clusterCondition:
        curator-1620531000-nqd4g:
        - lastTransitionTime: "2021-05-09T03:30:12Z"
          reason: Completed
          status: "True"
          type: ContainerTerminated
      cronJobs: curator
      schedules: 30 3 * * *
      suspended: false
  logStore:
    elasticsearchStatus:
    - cluster:
        activePrimaryShards: 255
        activeShards: 510
        initializingShards: 0
        numDataNodes: 3
        numNodes: 3
        pendingTasks: 0
        relocatingShards: 0
        status: green
        unassignedShards: 0
      clusterName: elasticsearch
      nodeConditions:
        elasticsearch-cdm-jzvuahid-1: []
        elasticsearch-cdm-jzvuahid-2: []
        elasticsearch-cdm-jzvuahid-3: []
      nodeCount: 3
      pods:
        client:
          failed: []
          notReady: []
          ready:
          - elasticsearch-cdm-jzvuahid-1-54878cd66c-pfx9s
          - elasticsearch-cdm-jzvuahid-2-799cd6d9c6-wj8n8
          - elasticsearch-cdm-jzvuahid-3-7dbc9d84c7-nmlkf
        data:
          failed: []
          notReady: []
          ready:
          - elasticsearch-cdm-jzvuahid-1-54878cd66c-pfx9s
          - elasticsearch-cdm-jzvuahid-2-799cd6d9c6-wj8n8
          - elasticsearch-cdm-jzvuahid-3-7dbc9d84c7-nmlkf
        master:
          failed: []
          notReady: []
          ready:
          - elasticsearch-cdm-jzvuahid-1-54878cd66c-pfx9s
          - elasticsearch-cdm-jzvuahid-2-799cd6d9c6-wj8n8
          - elasticsearch-cdm-jzvuahid-3-7dbc9d84c7-nmlkf
      shardAllocationEnabled: all
  visualization:
    kibanaStatus:
    - deployment: kibana
      pods:
        failed: []
        notReady: []
        ready:
        - kibana-846cfd9479-74tmj
      replicaSets:
      - kibana-846cfd9479
      replicas: 1

Lack of fsGroup in ClusterLogging CRD.

Hey,
It looks like CRD of ClusterLogging does not allow us to define fsGroup (https://docs.openshift.com/enterprise/3.1/install_config/persistent_storage/pod_security_context.html#fsgroup).

It huge gap due to the fact that by default CSI creates volume with root as owner whereas the user used inside elasticsearch container is created with uid: 1000. It means that all clusters, deployed with storageClass provided by CSI will raise the following error:

main ERROR Unable to create file
/elasticsearch/persistent/elasticsearch/logs/elasticsearch.log java.io.IOException: Could not create directory /elasticsearch/persistent/elasticsearch/logs

due to the insufficient privileges on PV.

Deployments are not removed when the CR's are removed

Playing around with cluster logging it doesnt appear the Deployments get removed when the CR gets removed. Need to confirm

Kibana POD is not getting created in OCP 4.6 cluster (Elasticsearch instance + POD got created, Kibana instance created but POD is not created)

Describe the bug
In Openshift 4.6 cluster and created a project called "ext-elasticsearch". Using operatorhub (OCP web Console --> operators --> operatorHub) installed "Elasticsearch Operator 4.6.0-202010311441.p0 provided by Red Hat, Inc".

After installing Elasticsearch Operator 4.6.0, i created instances of Elasticsearch and Kibana.

Created ElasticSearch Instance (OCP web Console --> Installed Operator --> Project: ext-elasticsearch)

And able to verify that Elasticsearch instance is created and elasticsearch POD is Running

Issue Created Kibana Instance (OCP web Console --> Installed Operator --> Project: ext-elasticsearch)

Kibana instance is created but POD is not created

Environment

Openshift version is 4.6.1
Elasticsearch operato version is Elasticsearch Operator 4.6.0-202010311441.p0 provided by Red Hat, Inc

Logs of oc get all

$ oc get all

NAME                                                READY   STATUS    RESTARTS   AGE
pod/elasticsearch-cdm-plj1ai36-1-5dd4557c6c-744zq   2/2     Running   0          8m13s

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service/elasticsearch           ClusterIP   172.25.203.46    <none>        9200/TCP    8m18s
service/elasticsearch-cluster   ClusterIP   172.25.39.99     <none>        9300/TCP    8m19s
service/elasticsearch-metrics   ClusterIP   172.25.110.229   <none>        60001/TCP   8m18s
service/kibana                  ClusterIP   172.25.31.191    <none>        443/TCP     4m48s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/elasticsearch-cdm-plj1ai36-1   1/1     1            1           8m16s

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/elasticsearch-cdm-plj1ai36-1-5dd4557c6c   1         1         1       8m16s

NAME                              HOST/PORT                                   PATH   SERVICES   PORT    TERMINATION          WILDCARD
route.route.openshift.io/kibana   kibana-ext-elasticsearch.apps-crc.testing          kibana     <all>   reencrypt/Redirect   None

Expected behavior
Kibana POD should be running in namespace "ext-elasticsearch"

Actual behavior
Kibana POD is not running in namespace "ext-elasticsearch"

To Reproduce
Steps to reproduce the behavior:

Install "Elasticsearch Operator 4.6.0-202010311441.p0 provided by Red Hat, Inc" (selecting all namespace option)
Create a project called "ext-elasticsearch"
Create instance of elasticsearch using web console
i. OCP web console --> Administrator --> Operators --> Installed Operators --> project: ext-elasticsearch
ii. Click Create Instance of Elasticsearch and proceed further
Create instance of Kibana using web console
i. OCP web console --> Administrator --> Operators --> Installed Operators --> project: ext-elasticsearch
ii. Click Create Instance of Kibana and proceed further

Additional context

Installed elasticsearch operator
Created Instance of ElasticSearch + Kibana
But Kibana POD is not created

Avoid copying all file to the image

In current Dockerfile, the entire directory is copied to the image by copy . .., which will rebuild the image unnecessarily caused by change made to files under ./test, './hack'.

Add Curator CronJob in ES operator to manage indices of ES

When using ES(elasticsearch) on production env, we consider how to rotate the indices of ES for maintenance. I glanced this repo, Curator is not included. So I think ES operator should include Curator CronJobs, it's reasonable.

If you have workaround of this or some plan, such as providing curator-operator or manual configuration steps of curator job, please close this issue kindly.

Thanks.

Running elasticsearch on CRC

Hi, I am having an issue of bringing up elasticsearch on an CRC cluster with the operator. The es instance is configured to have 2G memory. However, the container keeps failing due to [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

CRC version

CodeReady Containers version: 1.17.0+99f5c87
OpenShift version: 4.5.14 (embedded in binary)

The pod description shows

    Limits:
      memory:  2Gi
    Requests:
      cpu:      100m
      memory:   2Gi
    Readiness:  exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3
    Limits:
      memory:  64Mi
    Requests:
      cpu:     100m
      memory:  64Mi
Collapse

It looks to me that the setting of 2Gi should overwrite the default 64M. But why the container still failed? Thanks.

Initial Tasks to Make This Operator more Generic

Following are an initial set of tasks to make this operator more generic for consumption outside the context of OK cluster logging. This list is by no means exhaustive and additions are welcome.

permission required to write to index

I'm writing to elastic using a serviceaccount. Naturally I need to grant permissions to that service account or else I get:

'no permissions for [indices:data/write/update] and User [name=system:serviceaccount:logs:elasticseldon, roles=[gen_project_operations, gen_user_29547d51e00776c5acf0bed34e1ba5bb9736a4ef, gen_kibana_29547d51e00776c5acf0bed34e1ba5bb9736a4ef]]'

I can write if I grant the serviceaccount cluster-admin with oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:logs:elasticseldon.

Am looking for a narrower role that I can allocate instead. I've tried variations on oc adm policy add-role-to-user -n logs indices:write system:serviceaccount:logs:elasticseldon but not found one that enables me to write. Any ideas on this? Apologies if the answer is obvious to someone who knows this permissions model better than I do.

Unable to create new elasticsearch using default settings

When I try to create a new elasticsearch using the default settings on the OpenShift form from the catalog, the operation fails. The following messages are displayed:

Elasticsearch Client Status
  The field status.pods.client is invalid
Elasticsearch Data Status
  The field status.pods.data is invalid
Elasticsearch Master Status
  The field status.pods.master is invalid
...
Invalid SettingsWrong RedundancyPolicy selected. Choose different RedundancyPolicy or add more nodes with data roles

Here is the operator version installed:

Elasticsearch Operator
4.3.37-202009151447.p0 provided by Red Hat, Inc

Empty image error

Hi, we are creating ES CR without specifying image and we get following error:

time="2019-03-07T14:50:13Z" level=error msg="error syncing key (myproject/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Unable to create Elasticsearch node: Could not create node resource: Deployment.apps "elasticsearch-clientdatamaster-0-1" is invalid: spec.template.spec.containers[0].image: Required value"

the operator was deployed via make deploy on 7fab5e3 (HEAD -> master, up/master) Merge pull request #91 from ewolinetz/e2e_test_fix

oc get elasticsearch elasticsearch -o yaml                                                                                                                   3:53 
apiVersion: logging.openshift.io/v1alpha1
kind: Elasticsearch
metadata:
  clusterName: ""
  creationTimestamp: 2019-03-07T14:50:10Z
  generation: 0
  labels:
    app: jaeger
    app.kubernetes.io/component: elasticsearch
    app.kubernetes.io/instance: simple-prod
    app.kubernetes.io/name: elasticsearch
    app.kubernetes.io/part-of: jaeger
    cluster-name: elasticsearch
  name: elasticsearch
  namespace: myproject
  ownerReferences:
  - apiVersion: jaegertracing.io/v1
    controller: true
    kind: Jaeger
    name: simple-prod
    uid: 4ea35642-40e8-11e9-af07-8c16456c84e7
  resourceVersion: "3658"
  selfLink: /apis/logging.openshift.io/v1alpha1/namespaces/myproject/elasticsearches/elasticsearch
  uid: 4eb1d4b8-40e8-11e9-af07-8c16456c84e7
spec:
  managementState: Managed
  nodeSpec:
    resources: {}
  nodes:
  - nodeCount: 1
    resources: {}
    roles:
    - client
    - data
    - master
    storage: {}
  redundancyPolicy: ""
status:
  clusterHealth: ""
  conditions:
  - lastTransitionTime: 2019-03-07T14:50:12Z
    status: "True"
    type: ScalingUp
  - lastTransitionTime: 2019-03-07T14:50:13Z
    message: Config Map is different
    reason: ConfigChange
    status: "True"
    type: UpdatingSettings
  nodes: null
  pods: null
  shardAllocationEnabled: ""

oc logs po/elasticsearch-operator-6cf7579b6b-bdq7l -n openshift-logging                                                                                      3:50 
time="2019-03-07T14:50:03Z" level=info msg="Go Version: go1.10.3"
time="2019-03-07T14:50:03Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-03-07T14:50:03Z" level=info msg="operator-sdk Version: 0.0.7"
time="2019-03-07T14:50:03Z" level=info msg="Watching logging.openshift.io/v1alpha1, Elasticsearch, , 5000000000"
time="2019-03-07T14:50:13Z" level=info msg="Constructing new resource elasticsearch-clientdatamaster-0-1"
time="2019-03-07T14:50:13Z" level=error msg="error syncing key (myproject/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Unable to create Elasticsearch node: Could not create node resource: Deployment.apps \"elasticsearch-clientdatamaster-0-1\" is invalid: spec.template.spec.containers[0].image: Required value"
time="2019-03-07T14:50:20Z" level=info msg="Constructing new resource elasticsearch-clientdatamaster-0-1"
time="2019-03-07T14:50:20Z" level=error msg="error syncing key (myproject/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Unable to create Elasticsearch node: Could not create node resource: Deployment.apps \"elasticsearch-clientdatamaster-0-1\" is invalid: spec.template.spec.containers[0].image: Required value"
time="2019-03-07T14:50:26Z" level=info msg="Constructing new resource elasticsearch-clientdatamaster-0-1"
time="2019-03-07T14:50:26Z" level=error msg="error syncing key (myproject/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Unable to create Elasticsearch node: Could not create node resource: Deployment.apps \"elasticsearch-clientdatamaster-0-1\" is invalid: spec.template.spec.containers[0].image: Required value"
time="2019-03-07T14:50:33Z" level=info msg="Constructing new resource elasticsearch-clientdatamaster-0-1"
time="2019-03-07T14:50:33Z" level=error msg="error syncing key (myproject/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Unable to create Elasticsearch node: Could not create node resource: Deployment.apps \"elasticsearch-clientdatamaster-0-1\" is invalid: spec.template.spec.containers[0].image: Required value"
time="2019-03-07T14:50:39Z" level=info msg="Constructing new resource elasticsearch-clientdatamaster-0-1"
time="2019-03-07T14:50:40Z" level=error msg="error syncing key (myproject/elasticsearch): Failed to reconcile Elasticsearch deployment spec: Unable to create Elasticsearch node: Could not create node resource: Deployment.apps \"elasticsearch-clientdatamaster-0-1\" is invalid: spec.template.spec.containers[0].image: Required value"

oc describe deploy/elasticsearch-operator  -n openshift-logging                                                                                              3:51 
Name:                   elasticsearch-operator
Namespace:              openshift-logging
CreationTimestamp:      Thu, 07 Mar 2019 15:50:02 +0100
Labels:                 name=elasticsearch-operator
Annotations:            deployment.kubernetes.io/revision=1
Selector:               name=elasticsearch-operator
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           name=elasticsearch-operator
  Service Account:  elasticsearch-operator
  Containers:
   elasticsearch-operator:
    Image:  quay.io/openshift/origin-elasticsearch-operator:latest
    Port:   60000/TCP
    Command:
      elasticsearch-operator
    Environment:
      WATCH_NAMESPACE:  
      OPERATOR_NAME:    elasticsearch-operator
      PROXY_IMAGE:      quay.io/openshift/origin-oauth-proxy:v4.0.0
    Mounts:             <none>
  Volumes:              <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   elasticsearch-operator-6cf7579b6b (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  59s   deployment-controller  Scaled up replica set elasticsearch-operator-6cf7579b6b to 1

Is there is a way to specify "ImagePullPolicy"?

Is there is a way to specify "ImagePullPolicy" for elasticsearch image and proxy image on the operator cr file?

Problem: if we use quay.io/openshift/origin-logging-elasticsearch5:latest, it is not pulling a new image if there is update found. since by default, it is marked as imagePullPolicy: IfNotPresent

elastic-search pod not starting up

I have openshift 4.8 running on AWS cloud with 9 worker nodes. After successfully installing the ElasticSearch and Redhat EFK operator. All 3 elastic search pods are staying in the pending state forever. I see the following error in the events tab
“0/15 nodes are available: 12 Insufficient memory, 2 Insufficient cpu, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.”

Also added 3 worker nodes to the cluster and did retry the installation process. Still getting the above error.

Environment
Openshift Cluster version : 4.8.13
Redhat Openshift logging version : 5.2.2-21
Redhat ElasticSearch version: 5.2.2-21

Logs

Expected behavior
The elasticsearch pod should be up and running.

Steps to reproduce the behavior:

Install the Redhat Elasticsearch operator from the openshift console.
Verify the operator installation was successful
Install the Redhat openshift logging operator from the openshift console.
Verify the operator is installed successfully.
created the cluster logging instance. please see the attachment for a copy of the clusterlogging.yaml file

*Additional context
clusterlogging-instance.txt
*

Kibana server is not ready after updating Logging and ES operators to 4.5

After upgrading an OCP 4.4.16 to 4.5.5 and switching (and updating) Elasticsearch and Logging operators to 4.5 channels I see the following when try to open Kibana from the web console:

This is what is in the logs:

  log   [21:01:02.725] [info][migrations] Creating index .kibana_7.
  log   [21:01:02.743] [warning][migrations] Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_7 and restarting Kibana.

search queries timing out - possible setup issue

I'm port-forwarding to the elastic and am able to perform get requests to particular docs and posts to create docs. But when search using _search my queries all time out. I'm running queries that I know match to the docs that I inserted. I've also tried basic example queries like

{
    "query": {
        "query_string" : {
            "query" : "(new york city) OR (big apple)",
            "default_field" : "content"
        }
    }
}

I can't find any errors in the logs and I've not got a timeout in postman so it must be timing out server-side. I've tried to get to the kibana UI but I get an auth error when I port-forward to it and I don't have any Logging option under Monitoring in the openshift console.

Perhaps I don't have the kibana option because I installed to a custom namespace called 'logs' rather than 'openshift-logging' as the guide suggests.

I did reduce the memory allocation so I'll try again with more resource. But I don't see anything in the logs when I run queries.

Is https://docs.openshift.com/container-platform/4.1/logging/efk-logging-deploying.html the intended way to setup the operator? I suspect I'm deviating from intentions by putting things in different namespaces and playing with resources but just want to double-check this.

OLM doesn't trigger CSV update even though catalog has newer version

Describe the bug
The elasticsearch-operator CSV is stuck at "AtLatestKnown" version of 5.0.0-65 even though the catalog reports version 5.0.2-18 as latest available.

Environment

version of OpenShift: 4.7.0-0.okd-2021-03-28-152009
version of Cluster Logging: cluster-logging.5.0.2-18
version of Elasticsearch Operator: elasticsearch-operator.5.0.0-65
ClusterLogging instance: managementState: Managed, elasticsearchStatus: status: green

Expected behavior
The elasticsearch-operator CSV would update to the latest version.

Actual behavior
CSV reports it's "AtLatestKnown" version.

Additional context
Not sure what to check next. The OLM doesn't seem to report any issues. This seems to have happened after a 4.6->4.7 upgrade but went unnoticed as back then there were no newer versions.

[andrei@andrei-nb:~]$ oc get csv -n openshift-logging
NAME                              DISPLAY                            VERSION    REPLACES                                       PHASE
cluster-logging.5.0.2-18          Red Hat OpenShift Logging          5.0.2-18   cluster-logging.5.0.1-23                       Succeeded
elasticsearch-operator.5.0.0-65   OpenShift Elasticsearch Operator   5.0.0-65   elasticsearch-operator.4.6.0-202103060018.p0   Succeeded
[andrei@andrei-nb:~]$ oc get clusterloggings -n openshift-logging
NAME       MANAGEMENT STATE
instance   Managed
[andrei@andrei-nb:~]$ oc get elasticsearch -n openshift-logging
NAME            MANAGEMENT STATE   HEALTH   NODES   DATA NODES   SHARD ALLOCATION   INDEX MANAGEMENT
elasticsearch   Managed            green    3       3            all                
[andrei@andrei-nb:~]$ oc get kibanas -n openshift-logging
NAME     MANAGEMENT STATE   REPLICAS
kibana   Managed            2

certificates error with default config

I'm running openshift 4.3.9 and have installed the elasitcsearch operator through the embedded OperatorHub UI. When I go through the operator page under 'Installed Operators' in OpenShift, I have the option to create an Elasticsearch instance and it gives me a default configuration:

apiVersion: logging.openshift.io/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
  namespace: openshift-operators
spec:
  managementState: Managed
  nodeSpec:
    image: >-
      registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:d2047214be2e9c809440803ccf5972d99e72db3172f110e4be3d4b87550b9902
    resources:
      limits:
        memory: 1Gi
      requests:
        memory: 512Mi
  redundancyPolicy: SingleRedundancy
  nodes:
    - nodeCount: 1
      roles:
        - client
        - data
        - master

I had to change SingleRedundancy to ZeroRedundancy as with just one node it resulted in an error in the operator saying SingleRedundancy was invalid. After that change it created an elastic but the Pod gets stuck in ContainerCreating due to a Certificate problem:

  Warning  FailedMount  10m (x3 over 31m)     kubelet, ip-10-0-168-54.eu-west-3.compute.internal  Unable to attach or mount volumes: unmounted volumes=[certificates], unattached volumes=[elasticsearch-metrics elasticsearch-storage elasticsearch-config certificates elasticsearch-token-6vssr]: timed out waiting for the condition
  Warning  FailedMount  2m39s (x23 over 33m)  kubelet, ip-10-0-168-54.eu-west-3.compute.internal  MountVolume.SetUp failed for volume "certificates" : secret "elasticsearch" not found

Remove proxy spec

Since the removal of the need for the ES proxy, we should remove the option to configure it via the operator as well.

Fix hardcoded es operator image name in openshift deployment template

I think we need to get rid of hardcoded value in deploy/openshift/elasticsearch-template.yaml:

elasticsearch-operator/deploy/openshift/elasticsearch-template.yaml

Line 28 in d185999

image: t0ffel/es-operator

Currently it is set to image: t0ffel/es-operator and if I understand correctly to make this work it is required to manually modify Makefile as it does not set $(DOCKER_TAG) from outside:

elasticsearch-operator/Makefile

Line 41 in d185999

@docker build -t $(DOCKER_TAG) . $(DOCKER_OPTS)

Index creation permission

Hello

I'm trying to integrate Elastalert with OCP 4.5 Cluster Logging. This requires permission to get/create indexes. I've created SA with cluster-admin role but I'm still not allowed to access indexes:

[2021-01-07T14:54:21,035][INFO ][c.a.o.s.p.PrivilegesEvaluator] [elasticsearch-cdm-luzbzv0n-1] No index-level perm match for User [name=system:serviceaccount:openshift-elastalert:elastalert, roles=[admin_reader], requestedTenant=null] Resolved [aliases=[], indices=[elastalert_status], allIndices=[elastalert_status], types=[*], originalRequested=[elastalert_status], remoteIndices=[]] [Action [indices:admin/get]] [RolesChecked [admin_user]]
[2021-01-07T14:54:21,035][INFO ][c.a.o.s.p.PrivilegesEvaluator] [elasticsearch-cdm-luzbzv0n-1] No permissions for [indices:admin/get]

Please advice how and what permission should I grant.

Should we continue to reconcile in spite of failures?

Posing the question if in spite of some failed reconciliations if we should continue to with others[1]. Elasticsearch will never deploy even though if these [1] bits are not available on the cluster, the ES cluster could still function. IMO, we should consider specifying some error condition in the status but this should not block the ES cluster from starting. It is still functional but just doesn't have the appropriate objects in place to be scraped. Maybe a warning message is more appropriate here.

[1] https://github.com/openshift/elasticsearch-operator/blob/master/pkg/stub/handler.go#L57-L65

Unable to read /etc/elasticsearch/secret/searchguard-key.p12

Followed the guide https://docs.openshift.com/container-platform/4.6/logging/cluster-logging-deploying.html to install ClusterLogging CRD which creates an ElasticSearch CRD on Openshift 4.6.

The pod elasticsearch-cdm fails to start reporting:

$ oc logs elasticsearch-cdm-e09hxm8j-1-b89556454-djqld -c elasticsearch
[2021-08-09 13:29:31,046][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
[2021-08-09 13:29:31,049][INFO ][container.run            ] Checking if Elasticsearch is ready
[2021-08-09 13:29:31,049][INFO ][container.run            ] ES_JAVA_OPTS: ' -Xms1024m -Xmx1024m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Xlog:gc*,gc+age=trace,safepoint:file=/elasticsearch/persistent/elasticsearch/logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m -XX:ErrorFile=/elasticsearch/persistent/elasticsearch/logs/error.log'
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
OpenJDK 64-Bit Server VM warning: Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release.
OpenJDK 64-Bit Server VM warning: Option InitialRAMFraction was deprecated in version 10.0 and will likely be removed in a future release.
OpenJDK 64-Bit Server VM warning: Option MinRAMFraction was deprecated in version 10.0 and will likely be removed in a future release.
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2021-08-09 13:29:33,267 main ERROR No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
13:29:35.016 [main] ERROR org.elasticsearch.bootstrap.Bootstrap - Exception
java.lang.IllegalStateException: failed to load plugin class [com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin]
	at org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:614) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.plugins.PluginsService.loadBundle(PluginsService.java:556) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.plugins.PluginsService.loadBundles(PluginsService.java:471) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:163) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.node.Node.<init>(Node.java:339) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.node.Node.<init>(Node.java:266) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:212) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:212) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:116) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
Caused by: java.lang.reflect.InvocationTargetException
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:490) ~[?:?]
	at org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:605) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	... 15 more
Caused by: org.elasticsearch.ElasticsearchException: Unable to read /etc/elasticsearch/secret/searchguard-key.p12 (/etc/elasticsearch/secret/searchguard-key.p12). Please make sure this files exists and is readable regarding to permissions. Property: opendistro_security.ssl.transport.keystore_filepath
	at com.amazon.opendistroforelasticsearch.security.ssl.DefaultOpenDistroSecurityKeyStore.checkPath(DefaultOpenDistroSecurityKeyStore.java:920) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.ssl.DefaultOpenDistroSecurityKeyStore.resolve(DefaultOpenDistroSecurityKeyStore.java:215) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.ssl.DefaultOpenDistroSecurityKeyStore.initTransportSSLConfig(DefaultOpenDistroSecurityKeyStore.java:257) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.ssl.DefaultOpenDistroSecurityKeyStore.initSSLConfig(DefaultOpenDistroSecurityKeyStore.java:236) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.ssl.DefaultOpenDistroSecurityKeyStore.<init>(DefaultOpenDistroSecurityKeyStore.java:156) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.ssl.OpenDistroSecuritySSLPlugin.<init>(OpenDistroSecuritySSLPlugin.java:216) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin.<init>(OpenDistroSecurityPlugin.java:231) ~[?:?]
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:490) ~[?:?]
	at org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:605) ~[elasticsearch-6.8.1.redhat-00007.jar:6.8.1.redhat-00007]
	... 15 more

Can't install Elasticsearch Operator v4.5 on CodeReady Containers from OperatorHub!

I'm trying to install Elasticsearch Operator (v4.5) in CodeReady Containers.
But I'm having issues... The install fails:

After some investigation I found that I can't pull the image that is needed from elasticsearch-operator deployment:

With 4.4 version everything is ok and I can pull the image.
But I would try OpenShift Service Mesh and, if I'm not wrong, version 4.5 of Elasticsearch Operator is needed.

Configure logger

What about logging in JSON instead of pattern?
Now we log eg:

INFO[1104] cluster elastic1 required action is: ClusterOK

Configuring logrus is easy, see: https://github.com/josefkarasek/formatted-logs/blob/master/main.go#L17

openshift / elasticsearch-operator Goto Github PK

elasticsearch-operator's Introduction

elasticsearch-operator

Why Use An Operator?

elasticsearch-operator's People

Contributors

Stargazers

Watchers

Forkers

elasticsearch-operator's Issues

Recommend Projects

Recommend Topics

Recommend Org