openshift / cluster-ingress-operator Goto Github PK

View Code? Open in Web Editor NEW

71.0 71.0 180.0 102.84 MB

The Cluster Ingress Operator manages highly available network ingress for OpenShift

License: Apache License 2.0

Go 99.17% Dockerfile 0.04% Makefile 0.09% Shell 0.70%

cluster-ingress-operator's People

Contributors

Stargazers

Watchers

Forkers

ironcladlou imcsk8 pravisankar miciah ramr smarterclayton sosiouxme staebler suicidesin abhinavdahiya derekwaynecarr dgoodwin bit4man bysnupy danehans danwinship joelddiaz lihongan juan-lee deads2k rulox ravisantoshgudimetla squeed serbrech openshift-cherrypick-robot rphillips mjudeikis jaormx ingvagabund wshearn larhauga redhat-multiarch-qe gyohuangxin frobware miheer mfojtik tkashem benjaminapetersen haoqing0110 cliles andream12345 openshift-kni sanchezl sqtran wanghaoran1988 bigg01 danielchudc rogbas csrwng laashub-soa sgreene570 wking jkapusi dustman9000 mythi oddtazz anubhav-here-zz guillaumerose sam-nguyen7 rhdedgar xuanyunhui openshift-bot a7vicky cloughster pantianying dmage jupierce lukegb vikramdaswani briantward eformat candita cfergeau osherdp rudiger-braun rfredette ravidbro global-localhost global19 global19-atlassian-net barryzhounb awgreene akhil-rane bmeng jomeier michaelryanmcneill saroj3k imiller0 ross-bryan isabella232 alebedev87 janosi bng0y openshift-ci-robot traktopel lmzuccarelli m-yosefpor pgodowski lyt99 vivekyoganand

cluster-ingress-operator's Issues

[RFE] Discover number of router replicas based on the number of workers

Currently in baremetal environments composed by 3 masters + 1 worker, the number of router replicas is set to 2 so one of the router replicas stays in 'pending' forever. The current workaround is to patch the ingresscontroller as:

oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"replicas": 1}}' --type=merge

While I agree having 2 routers is the best practice, I'd say it would be nice for that kind of deployments (3+1 in baremetal) to be able to deploy with a single router... then, when more workers are added, increase the number of replicas.

invalid state in clusteroperator

openshift/installer#1010 was reported where the creds provided to the cluster expired before the operator could create the DNS entry for apps. Although the creds expiry is fault of the installer.

The operator was reporting Available: true when clearly it had not completed its job

$ oc get clusteroperator openshift-ingress-operator -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: 2019-01-07T19:18:40Z
  generation: 1
  name: openshift-ingress-operator
  resourceVersion: "9771"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/openshift-ingress-operator
  uid: 0a8bc0ad-12b1-11e9-b538-0a63a0bb21a6
spec: {}
status:
  conditions:
  - lastTransitionTime: 2019-01-07T19:18:40Z
    status: "False"
    type: Failing
  - lastTransitionTime: 2019-01-07T19:18:40Z
    status: "False"
    type: Progressing
  - lastTransitionTime: 2019-01-07T19:19:14Z
    status: "True"
    type: Available
  extension: null
  version: ""

as seen in the logs https://github.com/openshift/installer/files/2734351/ingress-operator.log while clearly it was failing...

@openshift/sig-network-edge

persistent ingresswithoutclassname alerts w/no ingresses present.

We have an OKD 4.12 cluster which has persistent and increasing ingresswithoutclassname alerts with no ingresses normally present in the cluster. I believe the ingresswithoutclassname being counted is created as part of the ACME validation process managed by the cert-manager operator with it's openshift route addon which are torn down once the ACME validation is complete.

Ingress cluster operator version:

ingress 4.12.0-0.okd-2023-04-01-051724

For example, one of these calls out cm-acme-http-solver-46jbt

however:

❯ kubectl -n openshift-route-controller-manager get ingress cm-acme-http-solver-46jbt
Error from server (NotFound): ingresses.networking.k8s.io "cm-acme-http-solver-46jbt" not found

and in fact:

❯ kubectl -n openshift-route-controller-manager get ingress
No resources found in openshift-route-controller-manager namespace.

❯ kubectl get ingress --all-namespaces
No resources found

I couldn't find a place in redhat's jira to post this so forgive me if this is the wrong place. I just noticed that the ingresswithoutclassname alert originated from this repo.

Happy to provide any further information if needed or file this issue elsewhere if it's more appropriate.

Remove router-ca

Based on https://github.com/openshift/enhancements/blob/master/enhancements/network/default-ingress-cert-configmap.md#upgrade--downgrade-strategy we can now stop publishing the router-ca ConfigMap

Future Release Branches Frozen For Merging | branch:release-4.1 branch:release-4.0

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.1
release-4.0

Contact the Test Platform or Automated Release teams for more information.

Ingress Conroller Degraded when using different domain for external LB

Hi,

we are trying to setup the following infrastructure.

An internal cluster in AWS which means API- and APPS- routers using private subnets. Beside that we want to provide a public router for running some services in the internet.

We tried the following:

Updating the cluster CRD:

spec:
  baseDomain: internal.foo.bar
  privateZone:
    tags:
      Name: cluster-n4sbw-int
      kubernetes.io/cluster/cluster-n4sbw: owned
  publicZone: public.foo.bar
    tags:
      Name: cluster-n4sbw-external
      kubernetes.io/cluster/cluster-n4sbw-external: owned

Create another ingress controller:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: external
  namespace: openshift-ingress-operator
spec:
  domain: public.foo.bar
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
    type: LoadBalancerService

This seems to work and we are now able to attach public route to our services but the IngressController ends up in a degraded state with the following error:

Some ingresscontrollers are degraded: ingresscontroller "default" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: DNSReady=False (FailedZones: The record failed to provision in some zones: [{ map[Name:cluster-n4sbw-external kubernetes.io/cluster/luster-n4sbw-external:owned]}])

Removing the publicZone from the cluster CRD does not work because then the operator creates the wildcard DNS entry for the public Ingress in the private hosted zone of Route 53 which is not resolvable from outside the VPC.

Is there a way to solve that problem ?

We are using Openshift 4.6.15.

[RFE] Be able to specify topologyKey for router deployment

Router deployment created by the operator are using the topology key "kubernetes.io/hostname" in its podAntiAffinity rule.

It would be great if we can override this key, e.g. by "topology.kubernetes.io/zone" so that router pods can be spread across different zones instead of different hosts.

Future Release Branches Frozen For Merging | branch:release-4.1 branch:release-4.0

release-4.1
release-4.0

Contact the Test Platform or Automated Release teams for more information.

Future Release Branches Frozen For Merging | branch:release-4.1 branch:release-4.0

release-4.1
release-4.0

Contact the Test Platform or Automated Release teams for more information.

IngressController Autoscale

Hello, I don't find in docs autoscaling specification for CRD, is it available now?

Issue for testing automation

Decription line 1
Line 2 of description
Description line three
Line 4 of description

Enhance cluster-ingress-operator to support libvirt

Currently cluster-ingress-operator only supports AWS, Azure and GCP. But in development environment, i.e. CI, libvirt is useful. So it'd better to enhance ingress-operator to support libvirt.

endpointPublishingStrategy.type is ignored

Trying to change the default ingresscontroller endpointPublishingStrategy type to HostNetwork since I want to manage LB for routers on Azure myself (I want to lock the ingress access to the cluster by changing Network Security Rules, because if the service type is LoadBalancer (by default), kubernetes keeps changing the rules back to defaults).

Steps:
once OCP is deployed, I edit the ingresscontroller/default in openshift-ingress-operator namespace and add:

spec:
  endpointPublishingStrategy:
    type: HostNetwork

However the status stays:

  endpointPublishingStrategy:
    loadBalancer:
      scope: External
    type: LoadBalancerService

and openshift-ingress service router-default stays of type LoadBalancer:

$ oc get svc -n openshift-ingress -o wide
NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE       SELECTOR
router-default            LoadBalancer   172.30.111.225   *.*.*.*   80:32184/TCP,443:30618/TCP   15m       ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
router-internal-default   ClusterIP      172.30.91.243    <none>          80/TCP,443/TCP,1936/TCP      15m       ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default

Do I need to add anything else? Or it is not possible to change the default and I need to create a new ingresscontroller? if so, how can i disable the default controller so it won't create public facing LB with open access to the cluster?

Installer version I am running is:

$ ./openshift-install version
./openshift-install unreleased-master-1359-g665a8608e5383e5b929b6fa8eb6b99da0b6b77d8
built from commit 665a8608e5383e5b929b6fa8eb6b99da0b6b77d8
release image registry.svc.ci.openshift.org/origin/release:4.2

Thanks!

Issue using private Load balancer type for ingress operator on azure in OCP 4.2

Description of problem:

Has anyone looked at deploying private LB for ingress router yet? https://docs.openshift.com/container-platform/4.2/release_notes/ocp-4-2-release-notes.html#ocp-4-2-enable-ingress-controllers, looks to work on AWS ok but getting issues on azure.

fails against subnet as doesn't exist

cluster63-99qm4-vnet/cluster63-99qm4-node-subnet

the actual subnet that gets created in Azure is worker-subnet not node-subnet, maybe a bug with Naming standards?

Version-Release number of selected component (if applicable):

4.2.0 on Azure

How reproducible:

every time

Steps to Reproduce:

Destroy ingress router default and re-create using https://docs.openshift.com/container-platform/4.2/release_notes/ocp-4-2-release-notes.html#ocp-4-2-enable-ingress-controllers

Actual results:

Service for internal loadbalancer sits pending with error

Events:
Type Reason Age From Message

Normal EnsuringLoadBalancer 2m45s (x9 over 18m) service-controller Ensuring load balancer
Warning CreatingLoadBalancerFailed 2m45s (x9 over 18m) service-controller Error creating load balancer (will retry): failed to ensure load balancer for service openshift-ingress/router-default: ensure(openshift-ingress/router-default): lb(cluster63-99qm4-internal) - failed to get subnet: cluster63-99qm4-vnet/cluster63-99qm4-node-subnet

The subnet

cluster63-99qm4-vnet/cluster63-99qm4-node-subnet

does not exist the subnets that get created are

clustername-UID-worker-subnet

clustername-UID-master-subnet

interestingly the NSG for

clustername-UID-worker-subnet

is called clustername-UID-node-nsg

Expected results:

service starts correctly with an internalLB ip for azure, guessing this should try and apply against clustername-UID-worker-subnet

Additional info: This works ok on AWS just having issues with Azure, also if you re-label the subnet clustername-UID-worker-subnet to clustername-UID-node-subnet this issue is resolved

Failed to pull images from the registry in Dockerfile.rhel7

I want to build the code on rhel7, but I fail to pull the images in Dockerfile.rhel7.

$ docker pull registry.svc.ci.openshift.org/ocp/builder:golang-1.12

Trying to pull repository registry.svc.ci.openshift.org/ocp/builder ...
Pulling repository registry.svc.ci.openshift.org/ocp/builder
Error: image ocp/builder:golang-1.12 not found

$ docker pull registry.svc.ci.openshift.org/ocp/4.0:base

Trying to pull repository registry.svc.ci.openshift.org/ocp/4.0 ...
Pulling repository registry.svc.ci.openshift.org/ocp/4.0
Error: image ocp/4.0:base not found

Are those images ready?

Future Release Branches Frozen For Merging | branch:release-4.1 branch:release-4.0

release-4.1
release-4.0

Contact the Test Platform or Automated Release teams for more information.

Switching from AWS CLB to NLB for router-default

Hi.
I am not sure if this is the right place to ask this question, but I really need help.

I installed OCP 4.5 private cluster on AWS.
Upon installation, I used AWS CloudFormation to create load balancers and 2 LBs were created.

1 NLB for api calls to master nodes
1 CLB for the routers on worker nodes

My client need static IP address for LBs and because AWS CLBs do not support static IPs,
I need to switch the CLB to a new NLB.

Does anyone have any ideas on how to achieve this?

Thanks.

Test issue for automation final try

Test comment for test issue for automation
Adding another line of text to test
Third line of text

unable to build images locally

Due to base images located in private group, there is no way to build images locally:

docker pull registry.ci.openshift.org/ocp/builder:rhel-8-golang-1.15-openshift-4.6
Error response from daemon: unauthorized: authentication required

docker pull registry.ci.openshift.org/ocp/4.6:base
Error response from daemon: unauthorized: authentication required

Future Release Branches Frozen For Merging | branch:release-4.1 branch:release-4.0

release-4.1
release-4.0

Contact the Test Platform or Automated Release teams for more information.

Ingress as a pluggable component

Why: Ingress as a pluggable component makes OpenShift more appealing and flexible

How: examine a number of ingress-like products, compare features and performance, and pick the best of them to determine an interface.

Bot test

Test issue for the bot

[Question] The process of Ingress Operator adding/removing DNS records.

Some problem about the codes blocks me, when I work on the pull request #308.

What we are doing

A new libvirt provider needs to be implemented in https://github.com/openshift/cluster-ingress-operator/tree/master/pkg/dns.
That provider will need to implement the provider interface:

 // Provider knows how to manage DNS zones only as pertains to routing.
 type Provider interface {
	// Ensure will create or update record.
	Ensure(record *iov1.DNSRecord, zone configv1.DNSZone) error

	// Delete will delete record.
	Delete(record *iov1.DNSRecord, zone configv1.DNSZone) error
}

Additionally, a new configuration may be needed that is libvirt specific:
See:

cluster-ingress-operator/cmd/ingress-operator/main.go

Line 123 in f9a372f

 func createDNSProvider(cl client.Client, operatorConfig operatorconfig.Config, dnsConfig *configv1.DNS, platformStatus *configv1.PlatformStatus) (dns.Provider, error) { 

Question

In the process of Ingress Conrotoller running, which part of the code calls the Ensure and Delete methods, and when they are called?
I find a dnscontroller is created and cluster-ingress-operator/pkg/operator/controller/ingress/dns.go is implementd. What are the links between the two parts and what they do when some DNS records need added by Ingress?

Adding Custom Tags to Ingress Operator LB

Is there currently a method to add additional annotations to the Service for the Ingress Routers?

I see that the annotations are setup here but it doesn't seem clear to me if I can add a Tag annotation:

service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: "environment=prod,owner=devops"

I imagine I could set this after Openshift cluster creation but would just be in trouble if the service/LB needed recreated.

Am I missing a path to add these annotations or is it logic that would need to be added to the configuration for the operator?

Background

The environment I'm working in has strict AWS tagging requirements and in a best case has a service that deletes load balancers missing tags. There are some workarounds I can think of to satisfy this if this level of customisation is to be avoided for the ingress operator.

Environment

Using Openshift 4.1 in UPI mode with Cloudformation templates into AWS.

Route objects owned by Ingress objects are not using the default certificate of the router in case no secret and host list is specified in the Ingress TLS configuration

In case if no certificate is specified for a Route object (just termination: edge) then it will fall back to use the certificate of the router itself. See [1] (chapter "Edge Termination").
E.g.:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: "testRoute"
spec:
tls:
termination: edge
to:
kind: Service
name: testService
This behaviour is pretty convenient if the router itself has a wildcard certificate.

The Kubernetes API reference [2] tells the following about the host list for TLS configuration:

Hosts are a list of hosts included in the TLS certificate. The values in this list must match the name/s used in the tlsSecret. Defaults to the wildcard host setting for the loadbalancer controller fulfilling this Ingress, if left unspecified.
Therefore my assumption would be if I specify an Ingress object with an empty host list in the TLS configuration section (and without a secret) then I should get the same result like above by the route object. This is not the case unfortunately.
E.g. Ingress definition:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
tls:

hosts:
rules:
host: testHost
http:
paths:
- path: /
  backend:
  serviceName: testService
  servicePort: 80
  My questions:

Is there a configuration option / annotation I am not aware of and can be used to achieve the same result?
If not then do you see it feasible to implement this change?
Thanks for any kind of help in advance!

openshift v4.11

Steps To Reproduce

Deploy the Ingress object
Check the created Route object
Current Result

No TLS termination is defined for the created Route object (which was automatically created as by deploying the Ingress object)

TLS Termination:
Expected Result

TLS configured to "edge" and uses the certificate of the router, like in the case when I create the route object specified as above.

TLS Termination: edge

Duplicate credentialsrequests

[mjudeiki@redhat openshift-azure]$ oc get credentialsrequests.cloudcredential.openshift.io 
NAME                               AGE
azure-openshift-ingress            179m
cloud-credential-operator-iam-ro   3h
openshift-image-registry           3h
openshift-ingress                  3h

We have 2 Ingress credentials evaluated from https://github.com/openshift/cluster-ingress-operator/blob/master/manifests/00-ingress-credentials-request.yaml

There should be logic to evaluate one based on the platform.

Add tweakable "ROUTER_IP_V4_V6_MODE/v4v6" parameter in the ingress operator for single stack clusters

Dual-stack network(ipv4/ipv6) is sadly not supported yet on Openstack. Therefore we need a way to enable ipv6 support in haproxy deployment/router-default -n openshift-ingress for our OKD 4.11 environment. Editing the deployment directly is not allowed by the Ingress Operator.

This deployment.go file only adds ipv6 support in the haproxy deployment/router-default if it detects that the cluster has dualstack enabled.

Future Release Branches Frozen For Merging | branch:release-4.16 branch:release-4.17

release-4.16
release-4.17

For more information, see the branching documentation.

Ingress operator should remove deployment when highAvailability.type changes

By default highAvailability.type is set to Cloud, so in BYOR case this should be changed to UserDefined. This however doesn't remove the router deployment and the service - currently this needs to be done manually.

The operator should detect that HA type has changed and recreate the deployment automatically

Bugzilla reference in New Issue template leads to dead end

Upon walking through the New Issue workflow on this repo one is encouraged to open a bugzilla but when you eventually get to the bugzilla from that link you're met with:

Sorry, entering a bug into the product OpenShift Container Platform has been disabled.

OCP is now accepting bugs in Jira! Open a case with customer support, or file directly in Jira: https://issues.redhat.com/secure/CreateIssueDetails!init.jspa?pid=12332330&issuetype=1

So maybe that link should point somewhere in jira instead?

Suggestion for AWS: use CNAMEs instead of A Alias records in Route53

Hi,

I already opened a similar issue for the OpenShift installer: openshift/installer#3150

The gist is: I'm suggesting to use a CNAME record instead of an A Alias record for the "*.apps.<cluster-url" DNS record in Route53 on AWS since CNAME records conform to the RFC standard while A Alias records (to my knowledge) do not.

Would that be feasible for the cluster-ingress-operator or are there any big advantages to using the A Alias that I'm just not seeing because I cannot use them in my enterprise environment?

Another feasible approach for me would, if we were able to parameterize the operator so that it uses CNAME instead of A Alias. This way, A Alias could still be the default and the impact of the change would be a bit lower.

test

Feature Request: Make router log level configurable via ingress controller

We can enable and configure the log format in IngressController cr, but there is no way to configure the log level.
I found a default log level in the code, but there is no way to configure it.

cluster-ingress-operator/pkg/operator/controller/ingress/deployment.go

Line 50 in a29464e

RouterLogLevelEnvName = "ROUTER_LOG_LEVEL"

And in nginx/haproxy conf

images/router/nginx/conf/nginx-config.template
10:{{ $logLevel := firstMatch "info|notice|warn|error|crit|alert|emerg" (env "ROUTER_LOG_LEVEL") "warn" }}

images/router/haproxy/conf/haproxy-config.template
61:  log {{ . }} len {{ env "ROUTER_LOG_MAX_LENGTH" "1024" }} {{ env "ROUTER_LOG_FACILITY" "local1" }} {{ env "ROUTER_LOG_LEVEL" "warning" }}

I proposal we can add a log-level field in IngressController cr.

  logging:
    access:
      destination:
        type: Container
      logEmptyRequests: Log
      logLevel: error # Add a new field

https://docs.openshift.com/container-platform/4.12/networking/ingress-operator.html

Replace Existing DNS Manager with ExternalDNS

ExternalDNS supports a wide range of DNS providers, including AWS Route53. Replacing the existing DNS Manager with ExternalDNS will provide a wide range of DNS providers as OpenShift supports future deployment environments. Thoughts?

Assign a priority class to pods

Priority classes docs:
https://docs.openshift.com/container-platform/3.11/admin_guide/scheduling/priority_preemption.html#admin-guide-priority-preemption-priority-class

Example: https://github.com/openshift/cluster-monitoring-operator/search?q=priority&unscoped_q=priority

Notes: The pre-configured system priority classes (system-node-critical and system-cluster-critical) can only be assigned to pods in kube-system or openshift-* namespaces. Most likely, core operators and their pods should be assigned system-cluster-critical. Please do not assign system-node-critical (the highest priority) unless you are really sure about it.

Documentation not clear about which versions of OpenShift have ingress operator

Here it says

Every new OpenShift installation has an ingresscontroller named default which can be customized, replaced, or supplemented with additional ingress controllers. To view the default ingress controller, use the oc command:

$ oc describe --namespace=openshift-ingress-operator ingresscontroller/default

When I run same command on Openshift cluster 3.11 I get this

john@alaptop:~/gits/notes$ oc describe --namespace=openshift-ingress-operator ingresscontroller/default
error: the server doesn't have a resource type "ingresscontroller"
john@alaptop:~/gits/notes$

If I run same command in Minishift which is also 3.11 I get same issue. I can't provide a link at the moment but I have read that ingress support in OpenShift from 3.11. What am I doing wrong?

With custom networking, authentication operator never goes healthy

Refiling openshift/cluster-authentication-operator#132 here as suggested by @danwinship:

It's quite possible that this is due to something on my side, but I'm hoping folks here can help point me in the right direction as for what to check next.

I've started a cluster using Calico for pod networking with openshift-install version v0.16.0 (this also seems to be the case on at least v4.1.0-rc0, but I'm blocked on testing this on v4.1.0 due to other issues). The installer gets most of the way through, but then fails to complete due to what appears to be a problem with the authentication operator.

i.e.,

time="2019-06-05T13:50:20-07:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.9: 98% complete"
time="2019-06-05T13:51:54-07:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.9: 99% complete"
time="2019-06-05T13:53:07-07:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.9: 99% complete"
time="2019-06-05T14:01:24-07:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator authentication is still updating"
time="2019-06-05T14:18:24-07:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator console has not yet reported success"
time="2019-06-05T14:20:20-07:00" level=fatal msg="failed to initialize the cluster: Cluster operator console has not yet reported success: timed out waiting for the condition"

Running this command:

kubectl get clusteroperators authentication -o yaml

Shows me this:

apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-06-05T20:47:23Z"
  generation: 1
  name: authentication
  resourceVersion: "26265"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/authentication
  uid: 1ec781df-87d3-11e9-8411-0a428212f130
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-06-05T20:49:03Z"
    message: 'Failing: error checking payload readiness: unable to check route health:
      failed to GET route: EOF'
    reason: Failing
    status: "True"
    type: Failing
  - lastTransitionTime: "2019-06-05T20:50:48Z"
    reason: AsExpected
    status: "False"
    type: Progressing
  - lastTransitionTime: "2019-06-05T20:47:29Z"
    reason: Available
    status: "False"
    type: Available
  - lastTransitionTime: "2019-06-05T20:47:23Z"
    reason: NoData
    status: Unknown
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: authentications
  - group: config.openshift.io
    name: cluster
    resource: authentications
  - group: config.openshift.io
    name: cluster
    resource: infrastructures
  - group: config.openshift.io
    name: cluster
    resource: oauths
  - group: ""
    name: openshift-config
    resource: namespaces
  - group: ""
    name: openshift-config-managed
    resource: namespaces
  - group: ""
    name: openshift-authentication
    resource: namespaces
  - group: ""
    name: openshift-authentication-operator
    resource: namespaces
  versions:
  - name: integrated-oauth-server
    version: 4.0.0-0.9_openshift

I can confirm that the Route the authentication operator is attempting to hit doesn't seem to be working through the ingress controller. However, the Service backing the route is reachable from within the cluster.

Hitting the service directly:

curl -k https://openshift-authentication:443 
{
  "paths": [
    "/apis",
    "/healthz",
    "/healthz/log",
    "/healthz/ping",
    "/healthz/poststarthook/oauth.openshift.io-startoauthclientsbootstrapping",
    "/metrics",
    "/readyz",
    "/readyz/log",
    "/readyz/ping",
    "/readyz/poststarthook/oauth.openshift.io-startoauthclientsbootstrapping",
    "/readyz/terminating"
  ]
}

Hitting the service through the Route:

curl -k https://openshift-authentication-openshift-authentication.apps.casey-ocp.openshift.crc.aws.eng.tigera.net:443
curl: (35) Encountered end of file

Customize router deployment config to disable namespace ownership check

On OpenShift 3 it was possible to customize the default deployment of the router by adding environment variables with the needed customization. I cannot find a way of doing this with the cluster-ingress-operator, as the CRD does not contain any options for mutating the deployment.

The specific configuration that I need is ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK: https://github.com/openshift/router/blob/master/pkg/cmd/infra/router/router.go#L73-L92

The specific problem in question is related to solving "HostAlreadyClaimed" when binding to the same hostname (with different paths) in multiple namespaces.

Is there any other way of configuring the router/operator to add options? Or should the CRD be updated with the options for customizing the router?

Bug: affinity rule created in router deployment for single-replica infrastructure and "NodePortService" endpoint publishing strategy

Cluster ingress operator creates router deployments with affinity rules when running in a cluster with non-HA infrastructure plane (InfrastructureTopology=="SingleReplica") and "NodePortService" endpoint publishing strategy. With only one worker node available, rolling update of router-default stalls.

BugZilla report: https://bugzilla.redhat.com/show_bug.cgi?id=2108333

Fails to start with `cannot list oauthclientauthorizations.oauth.openshift.io at the cluster scope`

$ oc desribe po/console-operator-7f48d87777-7fqhp -n openshift-console
...
Image:         registry.svc.ci.openshift.org/openshift/origin-v4.0-20181128050751@sha256:21fb0b92ad78b89c31083bfe7d42b7650174eebdc9d9e04d43fae502d28eb83a
...

$ oc logs -f po/console-operator-7f48d87777-7fqhp -n openshift-console
...
I1129 11:27:58.718708       1 reflector.go:240] Listing and watching *v1.OAuthClientAuthorization from github.com/openshift/console-operator/vendor/github.com/openshift/client-go/oauth/informers/externalversions/factory.go:101
E1129 11:27:58.720525       1 reflector.go:205] github.com/openshift/console-operator/vendor/github.com/openshift/client-go/oauth/informers/externalversions/factory.go:101: Failed to list *v1.OAuthClientAuthorization: oauthclientauthorizations.oauth.openshift.io is forbidden: User "system:serviceaccount:openshift-console:console-operator" cannot list oauthclientauthorizations.oauth.openshift.io at the cluster scope: no RBAC policy matched

Define custom annotations for LB Service

Hi,

Is it possible to define own annotations that would get passed to the LB service created by cluster-ingress-operator? Specifically, I'm interested in adding service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout to override the default idle timeout (60s) for AWS Classic LB. I know I could add this directly to the created service but we are using gitops for OpenShift configuration and I would prefer to not to modify resources created by operators but instead modify the Ingress (cluster) or IngressController (default) resources.

We currently have OS v4.5.15 installed.

Support for custom ports to bind haproxy

When using hostNetwork: true for ingressControllers, the default 80 and 443 ports on the host are binded on HAProxy. However, those ports might be occupied by other processes (such as another set of ingressControllers). In OKD3.11, it was possible to listen on custom host ports via setting these env vars in Routers DeploymentConfig:

        - name: ROUTER_SERVICE_HTTPS_PORT
          value: "10443"
        - name: ROUTER_SERVICE_HTTP_PORT
          value: "10080"

However there is not any options to specify custom ports in IngressController.operator.openshift.io/v1 object.

P.S. Running in hostNetwork is a strict requirement in some scenarios (e.g. our current setup with custom PBR rules).

Degraded status when starting an OCP private cluster deployed on AWS

When starting an OCP 4.3 private cluster deployed on AWS, the cluster ingress operator stays with "degraded" status.
(By "private cluster", I mean the OCP cluster cannot access the internet.)

It seems that the operator is trying to access "https://tagging.us-east-1.amazonaws.com" and this is causing the problem.

Q1. Are there any workaround for this?
Q2. Is it MANDATORY for the operator to be able to access the internet? (This makes it impossible for any Openshift clusters to be private...)

Thanks.

oc get dnsrecords -n openshift-ingress-operator -o yaml
The DNS provider failed to ensure the record: failed to find hosted zone for record: failed to get tagged resources: RequestError: send request failed
caused by: Post https://tagging.us-east-1.amazonaws.com/: dial tcp 52.94.224.124:443: i/o timeout
reason: ProviderError
status: "True"
type: Failed

oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.3.25 True False 59d Error while reconciling 4.3.25: the cluster operator ingress is degraded

Add router image field in IngressController CRD

With 3.x I was using a customized router image to hook into various pieces of the OpenShift router deployment but in 4.x it looks like that integration point has been removed.

I am using UPI and bare-metal and I am planning to move my customizations to a proxy running in front of the cluster ingress controller but I would like to preserve the source IP with the proxy protocol. Currently the only way the Operator enables the proxy protocol is if it determines that it is running on AWS but it would be nice if this option was also exposed in the IngressController CRD.

crd ingresscontrollers need `dnsPlocy` set for deloyment

the pods is hostNetwork, dnsPolicy is ClusterFirst

...
      dnsPolicy: ClusterFirst
      hostNetwork: true

[root@master2 ~]# crictl pods --name router-default-fb744fb7f-hmmn5 -q
0a0c7cc6d1ad6815f7613fd758c5329c4265ddb6607f568b69e30fdafdfc0a52
[root@master2 ~]# crictl ps --pod=0a0c7cc6d1ad6815f7613fd758c5329c4265ddb6607f568b69e30fdafdfc0a52
CONTAINER           IMAGE                                                              CREATED             STATE               NAME                ATTEMPT             POD ID
b04c17fb1c58d       dd7aaceb9081f88c9ba418708f32a66f5de4e527a00c7f6ede50d55c93eb04ed   3 days ago          Running             router              1                   0a0c7cc6d1ad6

but the dnsPolicy is ClusterFirst, so it could not resolve host

[root@master2 ~]# crictl exec b04 cat /etc/resolv.conf
search openshift4.example.com
nameserver 10.226.45.250
[root@master2 ~]# crictl exec b04 curl -s kubernetes.default.svc.cluster.local
FATA[0000] execing command in container failed: command terminated with exit code 6 
[root@master2 ~]# crictl exec b04 curl  kubernetes.default.svc.cluster.local
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: kubernetes.default.svc.cluster.local; Unknown error
FATA[0000] execing command in container failed: command terminated with exit code 6

[root@bastion ~]# oc -n openshift-dns get svc
NAME          TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
dns-default   ClusterIP   172.30.0.10   <none>        53/UDP,53/TCP,9154/TCP   3d17h

Version-Release number of selected component (if applicable):
ocp version 4.5.9

[root@master2 ~]# crictl exec b04 /usr/bin/openshift-router version
openshift-router

majorFromGit: 
minorFromGit: 
commitFromGit: e3b9390202c6f9a9d986d9465c5f25e2214936e3
versionFromGit: 4.0.0-143-ge3b9390
gitTreeState: clean
buildDate: 2020-09-04T14:15:28Z

deloyment router-default is management by the crd ingresscontrollers., we need the key dnsPolicy set to dnsPolicy: ClusterFirstWithHostNet to resolve this problem

Expected results:

[root@master2 ~]# crictl exec b04 cat /etc/resolv.conf
search openshift-ingress.cluster.local svc.cluster.local cluster.local
options ndots:5
nameserver 172.30.0.10

Testing automation again

The first line tells us this.

The second line tells us even more.

The operator must honor the cluster proxy configuration

Goal

The ingress-operator itself (not the operand) must honor the cluster proxy configuration.

Invariants

The apiserver is always excluded from proxying

Challenges

Idiomatic Go libraries tend to rely on http.DefaultTransport (often via http.DefaultClient), which lazily initializes an immutable proxy configuration based on the process environment as of the time of initialization. This makes changing the stdlib default proxy at runtime dangerous given existing clients/connections.

Solution

Refactor ingress-operator to replace usage of http.DefaultTransport with a transport set up from the cluster proxy configuration.

Implications

Requires refactoring to consolidate client setup closer to operator startup and inject client into various consumers
Client properties (e.g. timeouts) can be configured more safely in context than the prior defaults (e.g. to prevent cloud API connection issues from hanging the operator.)

Alternatives considered

Teach CVO to inject proxy environment variables into operator deployments.

Given the invariant that the apiserver is always excluded from proxying, it is assumed operators will be able to reach the apiserver to discover the proxy config to be used for any other clients managed by the operator which may be subject to proxying rules. It's not practical to revisit this CVO design aspect in the timeframe we must support proxying.

Mutate the http.DefaultClient or http.DefaultTransport at runtime

It's not clear that mutating any of the stateful net/http package variables in this way will be safe at runtime in the face of existing clients and connections. Finding a safe way to make these mutations seems to be a nontrivial effort. Granted a successful implementation, one apparent advantage would be the fix would apply transparently to the libraries we use. However, client connection properties would remain shared across all contexts.

Operator sidecar

A sidecar process to the operator could read proxy configuration, render it somehow to a shared volume, and the operator could then configure its own process environment from the shared data in a (hopefully) safe place early during startup. This would have the same overall implications as the http.DefaultClient mutation approach but in a likely safer way.

openshift / cluster-ingress-operator Goto Github PK

cluster-ingress-operator's People

Contributors

Stargazers

Watchers

Forkers

cluster-ingress-operator's Issues

What we are doing

Question

Background

Environment

Goal

Invariants

Challenges

Solution

Implications

Alternatives considered

Teach CVO to inject proxy environment variables into operator deployments.

Mutate the http.DefaultClient or http.DefaultTransport at runtime

Operator sidecar

Recommend Projects

Recommend Topics

Recommend Org