Giter Club home page Giter Club logo

clusterlink's Introduction

ClusterLink

ClusterLink simplifies the connection between application services that are located in different domains, networks, and cloud infrastructures.

Disclaimers and warnings

This is an incomplete work in progress, provided in the interest of sharing experience and gathering feedback. The code is pre-alpha quality right now. This means that it shouldn't be used in production at all.

For more details, visit our website.

ClusterLink in a nutshell

ClusterLink deploys a gateway into each location, facilitating the configuration and access to multi-cloud services.

The ClusterLink gateway contains the following components:

  1. Control Plane is responsible for maintaining the internal state of the gateway, for all the communications with the remote peer gateways by means of the ClusterLink CP Protocol, and for configuring the local data plane to forward user traffic according to policies. Part of the control plane is the policy engine that can also apply network policies (ACL, load-balancing, etc.)
  2. Data Plane responds to user connection requests, both local and remote, initiates policy resolution in the CP, and maintains the established connections. ClusterLink DP relies upon standard protocols and avoids redundant encapsulations, presenting itself as a K8s service inside the cluster and as a regular HTTP endpoint from outside the cluster, requiring only a single open port (HTTP/443) and leveraging HTTP endpoints for connection multiplexing.

alt text

ClusterLink leverages the Kubernetes API using CRDs to configure cross-cluster communication. ClusterLink management is based on the following key concepts:

  • Peer. Represent remote ClusterLink gateways and contain the metadata necessary for creating protected connections to these remote peers.
  • Exported service. Represent application services hosted in the local cluster and exposed to remote ClusterLink gateways as Imported Service entities in those peers.
  • Imported service. Represent remote application services that the gateway makes available locally to clients inside its cluster.
  • Policy. Represent communication rules that must be enforced for all cross-cluster communications at each ClusterLink gateway.

For further information, please refer to the concepts section on the ClusterLink website.

Getting started

ClusterLink can be set up and run on different environments: local environment (Kind), Bare-metal environment, or cloud environment. For more details, refer to the Getting Started Guide.

Additionally, here are some other documents you may find helpful:

  • ClusterLink Tutorials - These tutorials describe how to establish multi-cluster connectivity for applications using two or more clusters.
  • ClusterLink Developer's Guide -This guide explains how to configure a development environment and contribute code to ClusterLink.

Contributing

Our project welcomes contributions from any member of our community. To get started contributing, please see our Contributor Guide.

Scope

In Scope

ClusterLink is intended to connect services and applications running in different clusters. As such, the project will implement or has implemented:

  • Remote Service sharing
  • Extending private Cloud service endpoints to remote sites
  • Centralized management (future)

Out of Scope

ClusterLink will be used in a cloud native environment with other tools. The following specific functionality will therefore not be incorporated:

  • Certificate management: ClusterLink uses certificates and trust bundles provided to it. It does not manage certificate lifetimes, rotation, etc. - these are delegated to external tools.
  • Enabling IP level connectivity between sites. ClusterLink uses existing network paths.
  • Pod to Pod communications. ClusterLink works at the level of Services. You can support Pod-to-Pod communications by creating a service per pod.

Communications

License

This project is licensed under Apache License, v2.0. Code contributions require Developer Certificate of Originality.

Code of Conduct

We follow the CNCF Code of Conduct.

clusterlink's People

Contributors

aviweit avatar dependabot[bot] avatar elevran avatar elkanatovey avatar kathybarabash avatar kfirtoledo avatar michalmalka avatar orozery avatar praveingk avatar ronenkat avatar vadimeisenberg avatar welisheva avatar welisheva22 avatar zivnevo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

clusterlink's Issues

Support for `-capath` path option in addition to `-cacert`

At the moment, ClusterLink uses statically embedded certs and keys, and these are signed by the same CA. In a multi-cloud environment, these could be multiple root of trust, and each gateway would be given certs/keys signed by different root of trust. This means ClusterLink would need to make use of one, or more likely, multiple CA bundles as input. For this reason, tools such as openssl supports both -CApath as well as -CAfile. Is there plan to add this support for ClusterLink?

in the meantime, can use a workaround to deal with multiple CA bundles. As Go can process multiple certificates in a single PEM file (e.g., https://gist.github.com/laher/5795578) - we can take a concatenation of multiple CA certificates in a single file as an intermediate solution. This (bundle in a single file) is something that some Linux distributions also do.
A support for -CApath might still be preferred if we ever want to support dynamic adding and deleting peers/fabrics in the future. Having a directory of CA files rather than having to manage a concatenation of CA files is much easier.

The workaround has been confirmed to work with the Go data plane but has not been tested with Envoy. We may need a control plane change to configure multiple root CA's to Envoy (e.g., via the repeated TlsCertificates entry).

Explore certificate management options and variations

Explore certificate management - both internal management and external tool integrations.
Currently we use an internal CA. There are several options for integrating external tools, including SPIFEF/SPIRE, Let's Encrypt or Vault (both via certmgr, for example)

the current hierarchy of certificates is: fabric/mesh CA signing certificates for all site components (data, control, clients), so (root) fabric CA -> (leaf) certificates.

  • @orozery has suggested an alternative that introduces a site CA as well (which can be multi-use or single use and then regenerated when needed): (root) fabric CA -> (intermediate) Site CA -> (leaf) certificates. We can also validate that leaves are signed by the relevant site CA as extra validation.
  • an extension of that would be to introduce a second intermediary for the Site control plane. The final chain is: (root) fabric CA -> (intermediate) Site CA -> (intermediate) CP CA -> (leaf) certificates. Note that the Site and CP CAs can be combined if it is avaialable to the CP. The advantage would be to allow generating per workload certificates for dataplane communications instead of using a single dataplane certificate and communicating workload attributes. The workload certificate can include all attributes (via x509 extension) perhaps simplifying the authorization flow. This is closer to the way Istio ambient mesh works.

Certificates can carry constraints (e.g., Key Usage). One relevant constraint that we may wish to exploit are Name constraints, where a CA certificate includes a list of hosts, IPs or domains it can sign. If a leaf certificate includes an identity or SAN that violates the name constraints it is rejected by the client TLS library (assuming default verification is not bypassed).
This can be used with root fabric CA (all sites share a DNS suffix (e.g., {name}.fabric so the constraints is .{name}.fabric - with a leading dot) as well for the intermediate Site CA (e.g., .{site}.peer.{name}.fabric)

Some relevant links:

dataplane process dies after a while

I've experienced this problem a few times already, and it usually happens within 24 hours of a fresh installation, and also with very light traffic between the gateways. Of the few times this happened, it always has been the clusterlink gateway that exports a service that dies. I don't know if it's a coincidence or something else.

The first time I noticed dataplane process dying, I contacted @kfirtoledo and was told that it might be something to do with a memory leak that was later fixed by @praveingk . I pulled from the most recent version of the master branch (commit f8910f160edcb06f4d8a34236b9823330f256c68) and built the image, but I am still facing the same issue.

I included the dead dataplane process's log here:

/ # cat /root/.gw/dataplane.log 
INFO   [2023-09-23 21:57:13] Dataplane main started                       
INFO   [2023-09-23 21:57:13] Start Dataplane                               component=DataPlane
INFO   [2023-09-23 21:57:13] Dataplane server listen to port: 443          component=DataPlane
INFO   [2023-09-23 21:58:39] Received connect to service aws1 from MBG: 100.64.0.5  component=DataPlane
INFO   [2023-09-23 21:58:39] Received Incoming Connect request from service: aws1 to service: mc-ztna-s3-02  component=DataPlane
INFO   [2023-09-23 21:58:39] Received control plane response for service aws1 ,connection information : {Allow 169.63.34.218:30443 s3.us-east-2.amazonaws.com:443 aws1:mc-ztna-s3-02:2VodMY7XA5Eg7UyZ8dOP23PrtX9}   component=DataPlane
INFO   [2023-09-23 21:58:39] Starting a Receiver service for s3.us-east-2.amazonaws.com:443 Using serviceEndpoint : 169.63.34.218:30443/aws1:mc-ztna-s3-022VodMZNj0lSD7LP9eE2YwCs3LCC  component=DataPlane
INFO   [2023-09-23 21:58:39] Got {true, mtls, aws1:mc-ztna-s3-022VodMZNj0lSD7LP9eE2YwCs3LCC} from connect   component=DataPlane
INFO   [2023-09-23 21:58:39] Received new Connection at 10.131.0.138:56422, aws1:mc-ztna-s3-022VodMZNj0lSD7LP9eE2YwCs3LCC  component=DataPlane
INFO   [2023-09-23 21:58:39] Starting to initialize mTLS Forwarder for MBG Dataplane at /connectionData/  component=DataPlane
INFO   [2023-09-23 21:58:39] Register new handle func to address =/connectionData/aws1:mc-ztna-s3-022VodMZNj0lSD7LP9eE2YwCs3LCC  component=DataPlane
INFO   [2023-09-23 21:58:39] Connect MBG Target =https://169.63.34.218:30443/connectionData/aws1:mc-ztna-s3-022VodMZNj0lSD7LP9eE2YwCs3LCC  component=DataPlane
INFO   [2023-09-23 21:58:39] Received Connect (aws1:mc-ztna-s3-022VodMZNj0lSD7LP9eE2YwCs3LCC) from 100.64.0.3:38352  component=DataPlane
INFO   [2023-09-23 21:58:39] Connection Hijacked  100.64.0.3:38352->10.131.0.138:443  component=DataPlane
INFO   [2023-09-23 21:58:39] Starting to dispatch MTLS Connection          component=DataPlane
INFO   [2023-09-23 21:58:39] Initiating end of MTLS connection(aws1:mc-ztna-s3-022VodMZNj0lSD7LP9eE2YwCs3LCC)  component=DataPlane
ERROR  [2023-09-23 21:58:39] Dispatch: Read error read tcp 10.131.0.138:56422->52.219.92.113:443: use of closed network connection  connection: (local:10.131.0.138:56422 Remote:52.219.92.113:443)->,(local: 10.131.0.138:443 Remote100.64.0.3:38352)   component=DataPlane file=/gw/pkg/dataplane/mtls_forwarder.go line=220
INFO   [2023-09-23 21:58:39] Initiating end of connection(aws1:mc-ztna-s3-022VodMZNj0lSD7LP9eE2YwCs3LCC)  component=DataPlane
INFO   [2023-09-23 21:58:39] failed to dispatch outgoing connection: read tcp 10.131.0.138:56422->52.219.92.113:443: use of closed network connection  component=DataPlane
ERROR  [2023-09-23 21:58:39] failed to start MTLS receiver: <nil>          component=DataPlane file=/gw/pkg/dataplane/dataplane.go line=121

Simplify/Standardize Makefile and dockerfiles

Currently using a mix of generic and per target rules

  • Extract common patterns (e.g., .dockerfile or /Dockerfile) with generic rules
  • Simplify Dockerfiles based on distrolless/alpine with multiple stages
  • Docker image per executable (no needed for CLI?)
  • common Makefile targets for build- and dockerize- etc.

Add go dataplane support for controlplane xDS APIs

xDS client in dataplane :

Fetches Clusters & Listeners from the control plane, and stores their information.

  1. Typically, Cluster messages could contain information about peer(targets to reach), and exported services(address:port).
  2. Listener messages contain information about an imported service (its listening port)
  3. Connecting Peer1 & Peer2 using xDS : Peer1 exports an iperf-server-app and Peer2 imports

The steps in establishing connection:

  1. An export of a service by peer1 is propagated to the peer1-dataplane as a cluster.
  2. Upon import of a service by peer2, the peer2-controlplane processes the import and sends the listener config to peer2-dataplane which then sets up listener to listen on the specified port.
  3. When a connection is received at the listener of peer2-dataplane, it sends to it peer2-controlplane(egressAuthorization)
  4. The peer2-controlplane now sends an authorization requests to the peer1-controlplane (via peer1-dataplane ingressAuthorization)
  5. The peer1-controlplane returns an jwt token with the cluster name (i.e exported service name) in the response header if its allowed.
  6. The peer2-controlplane receives this token and replies this token back to peer2-dataplane(egressAuthorization) in the response header
  7. The peer2-dataplane sends a http Post request with the jwt token as authorization header to peer1-dataplane:443.
  8. The peer1-dataplane passes the token to peer1-controlplane which parses the jwt token (by sending the auth token to controlplane) to know the "cluster" to redirect the message to and sends the cluster destination (embedded in the header) to peer1-dataplane.
  9. Peer1-dataplane hijacks the connection and establishes the lastmile connection with exported service using the cluster information.
  10. For futher messages the channel is now formed between the applications.

The above sequence will be implemented in the custom dataplane.

Support UDP connections

Support UDP connection for different application use cases.
Another future solution could use H3/QUIC allowing both UDP and TCP to be transported between MVGs.
An optional short-term solution might be to mark UDP sessions as such during establishment, still using TCP as the transport between the gateways. Then gateways encode/decode each UDP data as between gateways, ending with UDP <=> TCP <=> UDP end to end.

Enable additional linters

The current golangci.yaml configuration has some linters disabled as existing code does not pass them.
These should be enabled over time.

To enable a linter:

  1. Set issues.new: false to run the linter on existing code (currently set to true to check only new code)
  2. Uncomment relevant linter under linters.enable and/or the linter's configuration under linters-settings.<name of linter>
  3. Run make lint and fix all reported errors
  4. Revert issues.new: true and then commit the modified code and golangci.yaml

For additional information and reference you may want to refer to this blog and the golanci linters documentation on linters and their configuration

New controlplane missing features

A list of features which exist in the current controlplane, and missing on the newer one:

  • Create k8s service for imports
  • Create k8s service for export (Export.Spec.ExternalService)
  • Connectivity policies
  • Legacy policies? (ACL / LB)
  • #45
    - [ ] Metrics collection
  • Heartbeat
    - [ ] Support plaintext connections (in addition to mTLS which is already supported) ? (Is this needed?)

CLI harmonization and enhancements

CLI should be made consistent in terms of

  • concepts exposed to the user
  • parameter names
  • use of CRUD style (similar to kubectl) etc

Enhancements:

  • saving state so calls are easier (e.g., using kubeconfig/context)
  • logging
  • utility CLIs to help UX/automation (e.g., creating ingress/nodeport YAML for a new gateway, generating GW config including Pod, ServiceAccount, RBAC on per namespace or cluster scope, etc)

Ensure unprivileged deployment works

Complementary to work in #77 - ensure ClusterLink works when deployed in a single namespace.
Correct RBAC is needed to access the single namespace only.
We may want to support a set of related namespace as a single unit (i.e., control and dataplanes support n namespace, with n>1)

Improve documentation for new developers

This means, for example, adding

  • tutorials
  • examples for specific use cases (e.g., service sharing, load balancing, access to Cloud Service)
  • sections to the README:
    • project overview
    • testing and development flow
    • scope and roadmap

Some of the information is already in the README and CONTRIBUTING so need to decide on how to best split between website and GH markdowns.

Standardize error and logging message format

Based on Go language style recommendations:

  • Error messages included in error return values (which can be passed up the stack and concatenated with other errors) should
    start with a lower case letter and not end with punctuation.
  • logging messages (including those in Error level) should be structured as full sentences (capitalized and punctuated).

This should be done across the entire code base

k8s Central management

The goal is to consider how our object model can be mapped into k8s to provide a centralized management function that supports user management, security, persistency, etc.

In essence, we need to identify and define a relevant set of CRDs as well as model their creation and management using k8s facilities (e.g., namespace, RBAC).

A potential mapping would have

  • a Fabric CRD (name, root CA, etc.) - cluster scoped object
  • a Peer/Site CRD (name, certificates, gateway addresses) - these are cluster scoped and create a corresponding namespace. Parts can be read only so other peers can learn of remote's name, gateways etc. In such case, we could move the certificates as secrets to the namespace representing the peer
  • Import/Export/Bindings are per namespace CRDs and are responsible for configuring the peer identified by the namespace (note that we could also explore the use of Kubernetes Multcluster Service (MCS) Export/Import objects in some fashion).

The mapping is relatively straightforward, but limits some of the more advanced checks and use cases we may want to explore. Most of these have to do with k8s RBAC based on verb+kind in a specific namespace, whereas we may want to control per specific attributes.

The could include, for example:

  • restricting which peers you can import a service from (can be emulated by egress/ingress policies)
  • limiting which Services can be exported
  • scaling to multiple fabrics/sites (e.g., when managing multiple fabrics for namespace-scoped ClusterLink)
  • etc.

The above can be built atop of KubeStellar as the distribution where ClusterLink implements per cluster agent/operator (react to CRDs by calling API) and management hub agent (applies management logic such as creating namespaces).
The per cluster operator and CRDs can be developed to allow declarative/kubectl like management irrespective of the central management.

Support async control plane operations

Consider adding support for more of a k8s like experience in the management and control plane, where CLI submits requests to the control plane and these are reconciled asynchronously.

For example, an Import command would need to:

  • update the data plane via xDS
  • create k8s Service to point to the new listener in the dataplane
  • possibly manipulate the DNS entries in the cluster DNS
  • etc

The CLI should not wait before all of these complete. In addition, the control plane may face intermediate errors that can be retried at a later time without holding back the CLI).

This likely implies

  • the addition of Spec/Status sub-resources to our management objects
  • updating the Status resource as progress is made (e.g., via generic items such as k8s Conditions and object specific statuses)
  • creating a way for the control plane to inform/kick off reconciliation loops that attempt to match the actual state with desired state.

Externally persisted control plane store

Current store uses a local boltdb file. This requires that a persistent volume storage claim be set to ensure Pod restarts don't lose state.
Another option is to allow using an external data store for configuration state. For example, a cloud SQL store can be enabled (e.g., Postgress seems to have wide support and also in cluster options. SQLite can be used for testing or as a boltdb replacement to avoid multiple embedded store implementations)

Issue with heartbeat mechanism not using certificate correctly

I'm trying to make use of dynamically generated x509 certificates instead of the statically generated certificates. However, I'm having the following error messages in the controlplane component:

/tmp # /controlplane get log
ERROR  [2023-09-11 15:31:17] Unable to send heartbeat to https://172.18.0.3:30443/hb, Error: Post "https://172.18.0.3:30443/hb": x509: certificate is valid for gwctl2, mbg2, localhost, not mbg1  component=controlPlane/health file=/gw/pkg/controlplane/health/healthMonitor.go line=77
ERROR  [2023-09-11 15:31:18] Unable to send heartbeat to https://172.18.0.3:30443/hb, Error: Post "https://172.18.0.3:30443/hb": x509: certificate is valid for gwctl2, mbg2, localhost, not mbg1  component=controlPlane/health file=/gw/pkg/controlplane/health/healthMonitor.go line=77
ERROR  [2023-09-11 15:31:19] Unable to send heartbeat to https://172.18.0.3:30443/hb, Error: Post "https://172.18.0.3:30443/hb": x509: certificate is valid for gwctl2, mbg2, localhost, not mbg1  component=controlPlane/health file=/gw/pkg/controlplane/health/healthMonitor.go line=77

This is the controlplane on mbg1. From the error message, it seems mbg1 is sending heartbeats to mbg2 and expecting mbg2 to send a certificate with mbg1 in the SAN field. This seems there's a bug.

I discussed the issue with @kfirtoledo, and he suggested that I encode mbg1's cert to also include mbg2 in the SAN field, and the same in mbg2's cert to include mbg1 in its SAN field. This resolved the problem, and everything works again.

gwctl output format

Currently, gwctl get * output format is non-consistent, which is fair for human-readability, but not for use by scripts.
I saw that kubectl get supports many standard output formats via the -o flag (custom-columns,custom-columns-file,go-template,go-template-file,json,jsonpath,jsonpath-as-json,jsonpath-file,name,template,templatefile,wide,yaml).

Which format should our gwctl support, and what should be the default format?

Explore options for controlling policy and workload associations

Explore how to better control who can set policies on workloads. In k8s the network policy selects Pods within the same namespace only. With ClusterLink the policies can have arbitrary from and to fields.

In addition, some users may want to create the notion of "buckets/containers" for policies and workloads and then ensure that the scope of influence is only within the same bucket/container. For example, one could envisage the use of a "network segment" as the container of everything else. We can then have special rules within the same "segment" and rules for governing cross segment access. Everything (policies, workload) are defined at the segment level. A segment can be (e.g., in k8s) a set of namespace, or (e.g., in VPC) subnets.

Exporting multiple non-Kubernetes services

In a demo scenario, we want an app from one Cloud to make use of objects in S3 on AWS. The S3 buckets will be configured so they are only accessible via an AWS VPC, thus, can only be reachable via a ClusterLink gateway deployed in the AWS VPC. The problem I'm having is each S3 bucket has a different endpoint, e.g., [S3 bucket name].s3.amazonaws.com. So, if the app is dealing with 10 buckets, I need to create 10 exports in ClusterLink. If buckets are created and deleted dynamically, e.g., by the app, I would need to also dynamically manage exports accordingly. Would it be possible to support wildcards, e.g., *.s3.amazonaws.com?

This is my findings about S3 hostname to IP address mapping:

I created 3 S3 buckets

mc-ztna-s3-01
mc-ztna-s3-02
mc-ztna-s3-03

The first 2 buckets were created in us-east-2, and the last one was created on us-east-1. It looks like AWS put 8 random load balancers in front of each bucket.

$ dig mc-ztna-s3-01.s3.amazonaws.com

; <<>> DiG 9.10.6 <<>> mc-ztna-s3-01.s3.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35995
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1220
;; QUESTION SECTION:
;mc-ztna-s3-01.s3.amazonaws.com.	IN	A

;; ANSWER SECTION:
mc-ztna-s3-01.s3.amazonaws.com.	42821 IN CNAME	s3-w.us-east-2.amazonaws.com.
s3-w.us-east-2.amazonaws.com. 2	IN	A	52.219.178.236
s3-w.us-east-2.amazonaws.com. 2	IN	A	52.219.80.172
s3-w.us-east-2.amazonaws.com. 2	IN	A	52.219.84.156
s3-w.us-east-2.amazonaws.com. 2	IN	A	52.219.102.100
s3-w.us-east-2.amazonaws.com. 2	IN	A	52.219.111.4
s3-w.us-east-2.amazonaws.com. 2	IN	A	52.219.176.68
s3-w.us-east-2.amazonaws.com. 2	IN	A	52.219.176.116
s3-w.us-east-2.amazonaws.com. 2	IN	A	52.219.176.148

;; Query time: 40 msec
;; SERVER: 2620:1f7::1#53(2620:1f7::1)
;; WHEN: Thu Sep 14 11:08:37 EDT 2023
;; MSG SIZE  rcvd: 216

$ dig mc-ztna-s3-02.s3.amazonaws.com

; <<>> DiG 9.10.6 <<>> mc-ztna-s3-02.s3.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40233
;; flags: qr rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1220
;; QUESTION SECTION:
;mc-ztna-s3-02.s3.amazonaws.com.	IN	A

;; ANSWER SECTION:
mc-ztna-s3-02.s3.amazonaws.com.	42821 IN CNAME	s3-1-w.amazonaws.com.
s3-1-w.amazonaws.com.	26	IN	CNAME	s3-w.us-east-1.amazonaws.com.
s3-w.us-east-1.amazonaws.com. 5	IN	A	16.182.104.185
s3-w.us-east-1.amazonaws.com. 5	IN	A	52.217.236.9
s3-w.us-east-1.amazonaws.com. 5	IN	A	54.231.134.209
s3-w.us-east-1.amazonaws.com. 5	IN	A	54.231.196.81
s3-w.us-east-1.amazonaws.com. 5	IN	A	3.5.17.230
s3-w.us-east-1.amazonaws.com. 5	IN	A	3.5.25.99
s3-w.us-east-1.amazonaws.com. 5	IN	A	3.5.28.205
s3-w.us-east-1.amazonaws.com. 5	IN	A	3.5.29.125

;; Query time: 55 msec
;; SERVER: 2620:1f7::1#53(2620:1f7::1)
;; WHEN: Thu Sep 14 11:08:48 EDT 2023
;; MSG SIZE  rcvd: 237

$ dig mc-ztna-s3-03.s3.amazonaws.com

; <<>> DiG 9.10.6 <<>> mc-ztna-s3-03.s3.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57989
;; flags: qr rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1220
;; QUESTION SECTION:
;mc-ztna-s3-03.s3.amazonaws.com.	IN	A

;; ANSWER SECTION:
mc-ztna-s3-03.s3.amazonaws.com.	42821 IN CNAME	s3-1-w.amazonaws.com.
s3-1-w.amazonaws.com.	16	IN	CNAME	s3-w.us-east-1.amazonaws.com.
s3-w.us-east-1.amazonaws.com. 5	IN	A	52.217.32.4
s3-w.us-east-1.amazonaws.com. 5	IN	A	52.217.106.28
s3-w.us-east-1.amazonaws.com. 5	IN	A	52.217.234.169
s3-w.us-east-1.amazonaws.com. 5	IN	A	3.5.2.176
s3-w.us-east-1.amazonaws.com. 5	IN	A	3.5.2.206
s3-w.us-east-1.amazonaws.com. 5	IN	A	3.5.29.188
s3-w.us-east-1.amazonaws.com. 5	IN	A	52.216.43.233
s3-w.us-east-1.amazonaws.com. 5	IN	A	52.216.244.220

;; Query time: 42 msec
;; SERVER: 2620:1f7::1#53(2620:1f7::1)
;; WHEN: Thu Sep 14 11:08:58 EDT 2023
;; MSG SIZE  rcvd: 237

$ dig mc-ztna-s3-03.s3.amazonaws.com

; <<>> DiG 9.10.6 <<>> mc-ztna-s3-03.s3.amazonaws.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2403
;; flags: qr rd ra; QUERY: 1, ANSWER: 10, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1220
;; QUESTION SECTION:
;mc-ztna-s3-03.s3.amazonaws.com.	IN	A

;; ANSWER SECTION:
mc-ztna-s3-03.s3.amazonaws.com.	42616 IN CNAME	s3-1-w.amazonaws.com.
s3-1-w.amazonaws.com.	133	IN	CNAME	s3-w.us-east-1.amazonaws.com.
s3-w.us-east-1.amazonaws.com. 4	IN	A	3.5.28.104
s3-w.us-east-1.amazonaws.com. 4	IN	A	3.5.29.167
s3-w.us-east-1.amazonaws.com. 4	IN	A	16.182.32.33
s3-w.us-east-1.amazonaws.com. 4	IN	A	52.216.94.219
s3-w.us-east-1.amazonaws.com. 4	IN	A	52.217.163.153
s3-w.us-east-1.amazonaws.com. 4	IN	A	52.217.173.225
s3-w.us-east-1.amazonaws.com. 4	IN	A	54.231.199.249
s3-w.us-east-1.amazonaws.com. 4	IN	A	3.5.25.169

I used dig on mc-ztna-s3-03 twice at the end, and you can see it resolved to a different list of load balancers.

Support "allow only from/to" policies

Based on discussions, seems that we can't currently support a (useful?) construct that allows only specific to/from (e.g., "allow ssh only to workload X"). As it could be thought of as combining allow and deny in the same policy, it could potentially be implemented as combination of "privileged allow" with "normal deny", but this opens up the possibility of normal users removing the deny rule.
Perhaps we need another type level for "allow-only", sitting below the deny and above the allow in each priority level?

@zivnevo commented:

Actually, we can currently support this use case with just privileged deny policies - simply use K8s set-based requirements (a.k.a. matchExpressions) for defining the label selectors. Suppose your workload X is distinguishable from all other workloads by having the unique value X for the workloadName label key (or any other attribute that is not under user control). The following policy should work:

apiVersion: clusterlink/v1alpha1
kind: PrivilegedConnectivityPolicy
metadata:
  name: deny-ssh-to-all-but-x
spec:
  action: deny
  from:
  - workloadSelector:
      labelSelector: {}
  to:
  - workloadSelector:
      matchExpressions:
        - { key: workloadName, operator: NotIn, values: [X] }
  connectionAttrs:
  - protocol: TCP
    port: 22

Now, this uses double negation (deny + NotIn), which some people find confusing. Based on the usefulness of the "allow only" construct, we can decide whether or not add a new policy type.

Reorg packages, directories and files under pkg/policyengine

Suggestion from @Oro :

  • pkg/policyEngine -> pkg/policy [maybe this is orthogonal to this PR, as this package already exists today]
  • connectivity_pdp.go -> pkg/policy/connectivity/pdp.go
  • policytypes/connectivity_policy.go -> pkg/policy/connectivity/types.go
  • connectivity_policy_crd.go -> pkg/api/policy/types.go
  • The conversion (NativePolicy()) from user-facing types (which are under pkg/api) and the internal types (which are under pkg/policy/types.go) should not be inside the pkg/api, but moved in the server that accepts this API, i.e. pkg/server/policy/server.go.
  • move all policyTier code to pkg/policy/tier.go and make it work on a generic policy interface (not just connectivity policies).
  • make PDP generic as well and place in pkg/policy/pdp.go
  • drop connectivity prefix from types defined in pkg/policy/connectivity/ (e.g. ConnectivityPolicy -> Policy).
  • drop all "policy" prefixes from names defined under pkg/policy.

In addition, summarize all exported PDP methods in one interface.
Also, avoid duplication between api.Policy and ConnectivityPolicy

Improve DNS support

In TLS communication, the client accessing a service via ClusterLink gateways requires that the local name resolved appears in the certificate presented by the server.

For k8s services, it means we need to keep the source and destination DNS the same (i.e., maintain the name and namespace between clusters). This may require cluster wide privileges (i.e., create a service in a different namespace than the one the gateway runs in).
This is even more apparent when importing a service that is external to a cluster and uses TLS, The client needs to resolve the name as it appears in the certificate (e.g., api.example.com) to the local gateway. Note that a secondary local domain (e.g., "multicluster.local" or "foo.io") would not work as it too would not be present in the cloud service's certificate.

Kubernetes allows this via a few mechanisms:

  • manually (or via Admission controller) the DNS configuration of the client Pod to use a special DNS server that can do the overrides
  • change the CoreDNS configuration to rewrite/resolve external names in a special way

For reference on changing Pod DNS see here. For examples manipulating CoreDNS see here and here

In addition, we would need to extend the service import (or binding) to have an optional DNS alias name that includes the expected client DNS entry.
The hope is that the above also can be used to resolve issues with protocols that seed client with bootstrap servers which are in turn used to get the full list of servers (e.g., Kafka does that). In that case, we would need to alias the full list of servers that can be discovered by the client

Integration of policy engine into the new controlplane

Policy engine interface, as decided with @orozery :

type PolicyDecider interface {
	AddLBPolicy(lbPolicy *LBPolicy) error
	DeleteLBPolicy(lbPolicy *LBPolicy) error

	AddAccessPolicy(policy *api.Policy) error
	DeleteAccessPolicy(policy *api.Policy) error

	AuthorizeAndRouteConnection(connReq *event.ConnectionRequestAttr) (event.ConnectionRequestResp, error)

	AddPeer(peer *api.Peer) error
	DeletePeer(name string) error

	AddBinding(imp *api.Binding) (event.Action, error)
	DeleteBinding(imp *api.Binding) error

	AddExport(exp *api.Export) (event.ExposeRequestResp, error)
	DeleteExport(name string) error
}

Handling certificate rotation?

In the context of Zero Trust, certificates and tokens are only valid for a short period of time, e.g., a few hours. Afterwards, they are rotated, to minimize the chance of stolen certificates being used to launch attacks. How does the Clusterlink gateway handle such events?

Provide ways to choose dataplane type between envoy and go

One option is to use cl-adm to specify dataplane type - this can be a command line option (e.g., --dataplane envoy) or by deploying a different image.
The separate image method seems to be somewhat preferred (e.g., no need to support tweaking of command line options in the Pod definition

Hardcoded `dataplane:443` in controlplane ?

I have a scenario where clusterlink components are deployed in AWS, and its dataplane service is exposed as a LoadBalancer type. In my other cluster on IBM Cloud, I could then peer the gateways using the LoadBalancer hostname and port, e.g.,

NAME           TYPE           CLUSTER-IP      EXTERNAL-IP                                                              PORT(S)         AGE
controlplane   ClusterIP      172.30.48.159   <none>                                                                   443/TCP         26m
dataplane      LoadBalancer   172.30.68.44    a5cbed9a36b704aec91cf741913ded00-760213696.us-east-2.elb.amazonaws.com   30443:32465/TCP   26m

In the above example, the clusterlink gw on IBM Cloud would use the address a5cbed9a36b704aec91cf741913ded00-760213696.us-east-2.elb.amazonaws.com:30443 to peer with the one that runs in AWS. The port 30443 needs to be very specific as that port needs to be explicitly opened to allow incoming connections.

However, I noticed this doesn't work as there was a lot of error messages in AWS's controlplane process as it tries to communicate with its local dataplane process via the address dataplane:443, and in our case, it translates to 172.30.68.44:443, which doesn't exist. Is there a way to configure the controlplane process to use a different port number than the default 443 when communicating with its local dataplane process?

Enable control plane metrics

Possible metrics may include:

  • authorizations upcalls (grouping can be on allow/deny, matching ACL policies, service and/or client)
  • CPU and memory
  • objects configured (count of services, peers, etc, configurations error)
  • count of requests and latencies to other peers (CP to CP traffic)

Ensure privileged deployment mode works

There are two modes which would like ClusterLink to support:

  • running in a separate (and privileged) namespace, thus limiting access to administrators; and
  • running control and data planes in user namespace (unprivileged deployment)

We need to validate the first use case and ensure that (initial list, more items might be discovered):

  • control and data plane run in a dedicated (e.g., clusterlink-system) namespace
  • control plane has sufficient (but not too wide) RBAC privileges to (e.g.,) create Services in other namespace, watch Pods in other namespace (for policy attribute determination), etc.

This should likely be related to a more k8s native management (e.g., using CRDs) where per namespace configuration is also possible via (e.g., Export/Import) CRDs. Doing so using the CLI makes separating developer and administrator privileges difficult (e.g., granular roles encoded and managed via CLI token, etc.)
See #28 for CRD discussion

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.