Giter Club home page Giter Club logo

lighthouse's People

Contributors

anfredette avatar aswinsuryan avatar blue-troy avatar danibachar avatar davidohana avatar dependabot-preview[bot] avatar dependabot[bot] avatar dfarrell07 avatar jaanki avatar maayanf24 avatar mangelajo avatar mkimuram avatar mkolesnik avatar nyechiel avatar pengbinbin1 avatar roytman avatar skitt avatar sridhargaddam avatar stevemattar avatar submariner-bot avatar tpantelis avatar vthapar avatar yboaron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lighthouse's Issues

Aggregate ServiceImports

Currently ServiceImports are distributed as individual copy from each source cluster and exist that way. Aggregation is done in Plugin code, which is not optimal and not in compliance with the MCS API spec. Agent should aggregate ServiceImports into a single resource on destination clusters and plugin should use those. This will also make it easy to troubleshoot.

See enhancement proposal for details.

Migrate E2E tests to reuse submariner framework

When the lighthouse E2E tests were initially added, a snapshot of the submariner project's framework module was copied into lighthouse. We should reuse the submariner framework coode instead maintaining a separate copy.

Add Lighthouse controller to support MCS API design

kubernetes/enhancements#1646 proposes a solution for Multi Cluster Service discovery that requires a central controller to aggregate and distribute resources to all the clusters.

Current lighthouse distrubutes each cluster's individual MCS CR and local agents aggregate it. We need a central controller to support distribution of aggregated MCS CR to clusters.

Add support for Armada-driven K8s deployments

To better reuse deployment logic, submariner-io/armada has been created to abstract multicluster K8s deployments with kind under the hood. It would help the maintainability of Lighthouse to move to this shared tooling.

This work is parallel to submariner-io/submariner#317, which added Armada support to the main Submariner repo. Also related to submariner-io/submariner#369, which will involve sharing scripting around Armada between various submariner-io/* repos.

Test Headless service discovery with Globalnet

I opened this issue to track Subctl support for Lighthouse + Globalnet testing, when using Headless service.

On my env with Globalnet - the Headless service could not be exported:
https://qe-jenkins-csb-skynet.cloud.paas.psi.redhat.com/job/Submariner-OSP-AWS/797/Test-Report/

Status:
  Conditions:
    Last Transition Time:  2020-08-28T07:56:54Z
    Message:               Service doesn't have a global IP yet
    Reason:                ServiceGlobalIPUnavailable
    Status:                False
    Type:                  Initialized

@vthapar
We should update Website docs, and point to submariner-io/submariner#732

Originally posted by @manosnoam in #271 (comment)

Dependabot can't resolve your Go dependency files

Dependabot can't resolve your Go dependency files.

As a result, Dependabot couldn't update your dependencies.

The error Dependabot encountered was:

verifying github.com/submariner-io/[email protected]/go.mod: checksum mismatch
	downloaded: h1:cPwX5Xwr6tZs7qQZmCPKNFL5LxOHR1W4MlRSZgwVBcw=
	go.sum:     h1:5vxFEjdLY3+kBeXLvxixXRRmcaemptjzJQeYUvmks9A=

SECURITY ERROR
This download does NOT match an earlier download recorded in go.sum.
The bits may have been replaced on the origin server, or an attacker may
have intercepted the download attempt.

For more information, see 'go help module-auth'.

If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.

View the update logs.

Append to the ServiceExportCondition list

When we update the ServiceExport status, we always overwrite the entire in the ServiceExportCondition. Since it's a list, we should append if the new ServiceExportCondition is different than the previous entry. Also we probably want to truncate the list to keep the last 10 or so entries.

Lighthouse-controller is not reachable between private and public clusters

Installing Submariner with service-discovery on OSP (private cluster) and AWS (public cluster), the lighthouse-controller seems to be unreachable between clusters:

Testing connection between ngnix <--> netshoot works with direct IPs, but does not work with domain name:

export KUBECONFIG=/home/nmanos/automation/ocp-install/nmanos-cluster-a/auth/kubeconfig
/home/nmanos/automation/ocp-install/oc exec netshoot-58785d5fc7-82kc7 -- curl --output /dev/null  --verbose --head --fail 100.96.144.67
*   Trying 100.96.144.67:80...
* TCP_NODELAY set
* Connected to 100.96.144.67 (100.96.144.67) port 80 (#0)
> HEAD / HTTP/1.1
> Host: 100.96.144.67
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.17.8
< Date: Thu, 12 Mar 2020 12:56:09 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Tue, 21 Jan 2020 14:39:00 GMT
< Connection: keep-alive
< ETag: "5e270d04-264"
< Accept-Ranges: bytes
< 
* Connection #0 to host 100.96.144.67 left intact
nginx_service_cluster_b=nginx-demo
# Nginx service on Cluster B, will be identified by its Domain Name (with --service-discovery): nginx-demo
/home/nmanos/automation/ocp-install/oc exec netshoot-58785d5fc7-82kc7 -- curl --output /dev/null -m 30 --verbose --head --fail nginx-demo
* Could not resolve host: nginx-demo                                          
* Closing connection 0
curl: (6) Could not resolve host: nginx-demo
command terminated with exit code 6

Lighthouse pod log shows:

{Name:"nmanos-cluster-b-tunnel-jsvwf"}}, Status:v1beta1.KubeFedClusterStatus{Conditions:[]v1beta1.ClusterCondition{v1beta1.ClusterCondition{Type:"Offline", Status:"True", LastProbeTime:v1.Time{Time:time.Time{wall:0x0, ext:63719526907, loc:(*time.Location)(0x1dc4b60)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63719526907, loc:(*time.Location)(0x1dc4b60)}}, Reason:"ClusterNotReachable", Message:"cluster is not reachable"}}, Zones:[]string(nil), Region:""}}

submariner_downstream_11032020_1341.log

Add Status for ServiceExport

ServiceExport has a status field which is currently not being used. Update the status correctly so it can be used for automation and diagnostics.

Lighthouse pod log is blown with "transform function returned nil"

I would expect to see in LH pod logs:

klog.Info("Lighthouse agent syncer started")

However, it seems the LH pod log is blown with "transform function returned nil", and there's no indication of "Lighthouse agent syncer started", as in the following pod log:
lh.log

It's a 5mb LH pod log, that was generated in only 2 hours since Submariner was deployed and joined a completely new cluster.

Add support for StatefulSets

To support StatefulSets [https://kubernetes.io/docs/concepts/workloads/controllers/statefulset] we need to provide means to access individual pods in the set using hostnames, rather than just return list of PodIPs as we do in Headless Services. This is currently not possible with what we have in ServiceImports and will likely need EndpointSlice support.

Create EnpointSlices for a Service

An endpointSlice needs to be created for each HeadlessService in Cluster.

  • The endpoint slice shall be the same namespace as the service
  • It should have a naming convention to avoid any conflict with other cluster endpoint slices.
  • It should have the owner set to lighthouse agent.

When a service is available in multiple clusters, DNS apparently only returns one

In a setup with two clusters, both exporting the same service, the imports are synchronised:

NAMESPACE               NAME                                              AGE
submariner-k8s-broker   nginx-default-gn2                                 4d16h
submariner-k8s-broker   submariner-test-app-quattro-submariner-test-gn2   15m
submariner-k8s-broker   submariner-test-app-quattro-submariner-test-gn3   35m
submariner-operator     nginx-default-gn2                                 4d16h
submariner-operator     submariner-test-app-quattro-submariner-test-gn2   15m
submariner-operator     submariner-test-app-quattro-submariner-test-gn3   35m

(on the broker cluster) but dig only ever returns one of the IP addresses — the first one to be exported in this case.

Access remote services in a specific cluster explicitly

I'm looking at options for deploying submariner and lighthouse in pre-existing clusters. If two clusters east and west have the same (default) dns suffix cluster.local is there a way for pods in cluster east to access services in cluster west explicitly using some form of a forward plugin config for CoreDNS in east so that lookups for *.svc.west.local (or something else) are sent to lighthouse and will only match services exported by the west cluster?

Note that this is a little different from but not necessarily incompatible with the current (0.4.0) design of Lighthouse which supports multi-cluster services which can be exported from (and therefore serviced by) one or more clusters. Such multi-cluster services are visible in all clusters, and discoverable via a new supercluster.local domain name, and it's assumed that namespaces that export such services will be globally unique across the set of clusters. (I assume this looks like _service_._namespace_.svc.supercluster.local ?)

However, if the clusters are pre-existing, it's possible that they may have existing namespaces that are not globally unique, such as kube-system or monitoring (or in my case kafka). So I'm trying to connect to a specific remote service in a specific remote cluster, not a "multi-cluster" service that may exist in one or more clusters. This makes the use case more similar to the pre-0.4.0 design of Lighthouse, but since both clusters are pre-existing, they do not have unique DNS names and both consider themselves to be .cluster.local.

Lighthouse 0.6.0 Docs

The Lighthouse Documentation needs to be updated with 0.6.0 changes

  • Information about the HA support in the architecture guide
  • Creation of Endpoint slices in architecture guide
  • Round Robin load balancing in the architecture guide
  • Headless services in Quick start guide and architecture guide
  • Stateful sets in Quick start guide and architecture guide

prune the plugin list

The list of plugins doesn’t match the list in CoreDNS 1.5.2
review and prune as most of them are not really in use

Change lighthouse tests to use forward plugin.

The lighthouse implementation shall be changed to make use of the forward plugin, available in CoreDNS.

  1. Lighthouse shall be running in a separate DNS server ( can be coreDns based).
  2. Lighthouse shall be responsible to answer queries on a domain name it owns (svc.supercluster.local.)
  3. The in-cluster coreDns can be configured to forward DNS requests to svc.supercluster.local to the lighthouse coreDns server.

Add "Ready" status in ServiceExport

Currently, Lighthouse does not support kubectl/oc wait as it does not populate ready status.(submariner-io/submariner#640)

oc wait --timeout=3m --for=condition=ready serviceexport "nginx-cl-b"

This shall be populated when the service is exported as below.

  • type: Ready
    status: "True"
    lastTransitionTime: "2020-03-30T01:33:51Z"

E2E tests are failing when deploying with globalnet

While working on #127 I discovered that globalnet isn't supported by the E2E tests.

The failures are recorded in the log - log.txt

Excerpt:

2020-05-11T09:45:50.2149430Z �[91m�[1m• Failure [136.424 seconds]�[0m
2020-05-11T09:45:50.2149647Z [dataplane] Test Service Discovery Across Clusters
2020-05-11T09:45:50.2154525Z �[90m/go/src/github.com/submariner-io/lighthouse/test/e2e/dataplane/service_discovery.go:16�[0m
2020-05-11T09:45:50.2154684Z   when a pod tries to resolve a service in a remote cluster
2020-05-11T09:45:50.2155033Z   �[90m/go/src/github.com/submariner-io/lighthouse/test/e2e/dataplane/service_discovery.go:19�[0m
2020-05-11T09:45:50.2155343Z     �[91m�[1mshould be able to discover the remote service successfully [It]�[0m
2020-05-11T09:45:50.2155633Z     �[90m/go/src/github.com/submariner-io/lighthouse/test/e2e/dataplane/service_discovery.go:20�[0m
2020-05-11T09:45:50.2155716Z 
2020-05-11T09:45:50.2156603Z     �[91mFailed to verify if service IP is discoverable. expected execution result "; <<>> DiG 9.14.8 <<>> @100.90.0.10 nginx-demo.e2e-tests-dataplane-sd-pmx54.svc.cluster2.local +short\n; (1 server found)\n;; global options: +cmd\n;; connection timed out; no servers could be reached\n169.254.3.81" to contain "100.90.209.156"
2020-05-11T09:45:50.2156764Z     Unexpected error:
2020-05-11T09:45:50.2156844Z         <exec.CodeExitError>: {
2020-05-11T09:45:50.2156938Z             Err: {
2020-05-11T09:45:50.2157034Z                 s: "command terminated with exit code 9",
2020-05-11T09:45:50.2157133Z             },
2020-05-11T09:45:50.2157472Z             Code: 9,
2020-05-11T09:45:50.2157573Z         }
2020-05-11T09:45:50.2157872Z         command terminated with exit code 9
2020-05-11T09:45:50.2158333Z     occurred�[0m

...

2020-05-11T09:47:46.3741029Z �[91m�[1m• Failure [116.159 seconds]�[0m
2020-05-11T09:47:46.3742070Z [dataplane] Test Service Discovery Across Clusters
2020-05-11T09:47:46.3744907Z �[90m/go/src/github.com/submariner-io/lighthouse/test/e2e/dataplane/service_discovery.go:16�[0m
2020-05-11T09:47:46.3746916Z   when a pod tries to resolve a service which is present locally and in a remote cluster
2020-05-11T09:47:46.3750226Z   �[90m/go/src/github.com/submariner-io/lighthouse/test/e2e/dataplane/service_discovery.go:25�[0m
2020-05-11T09:47:46.3751495Z     �[91m�[1mshould resolve the local service [It]�[0m
2020-05-11T09:47:46.3755627Z     �[90m/go/src/github.com/submariner-io/lighthouse/test/e2e/dataplane/service_discovery.go:26�[0m
2020-05-11T09:47:46.3757270Z 
2020-05-11T09:47:46.3760909Z     �[91mFailed to verify if service IP is discoverable
2020-05-11T09:47:46.3761064Z     Unexpected error:
2020-05-11T09:47:46.3761437Z         <exec.CodeExitError>: {
2020-05-11T09:47:46.3761536Z             Err: {
2020-05-11T09:47:46.3761666Z                 s: "command terminated with exit code 9",
2020-05-11T09:47:46.3761768Z             },
2020-05-11T09:47:46.3761844Z             Code: 9,
2020-05-11T09:47:46.3761940Z         }
2020-05-11T09:47:46.3762034Z         command terminated with exit code 9
2020-05-11T09:47:46.3762339Z     occurred�[0m

Exporting headless service - Host could not be resolved

This issue is related to #271

I've installed Submariner version: v0.6.0-1-gcb7275d with service-discovery (but on non-overlapping clusters cidrs, and without Globalnet), and exported a headless service - but it's DNS nginx-cl-b.test-submariner-headless.svc.clusterset.local could not be resolved, even after 3 minutes:

08:30:35 $ oc get serviceexport "nginx-cl-b"  -n test-submariner-headless -o yaml

 apiVersion: lighthouse.submariner.io/v2alpha1
 kind: ServiceExport
 metadata:
   creationTimestamp: "2020-08-31T05:27:21Z"
   generation: 3
   name: nginx-cl-b
   namespace: test-submariner-headless
   resourceVersion: "11622968"
   selfLink: /apis/lighthouse.submariner.io/v2alpha1/namespaces/test-submariner-headless/serviceexports/nginx-cl-b
   uid: b4daec8f-7541-4366-ac6f-c8d17ebdb0f9
 status:
   conditions:
   - lastTransitionTime: "2020-08-31T05:30:31Z"
     message: Awaiting sync of the ServiceImport to the broker
     reason: AwaitingSync
     status: "True"
     type: Initialized
   - lastTransitionTime: "2020-08-31T05:30:31Z"
     message: Service was successfully synced to the broker
     reason: ""
     status: "True"
     type: Exported


### After 3 minutes:

08:33:36 $ oc exec netshoot-cl-a-new -n test-submariner -- ping -c 1 nginx-cl-b.test-submariner-headless.svc.clusterset.local 

 ping: nginx-cl-b.test-submariner-headless.svc.clusterset.local: Name does not resolve

Full test report:
https://qe-jenkins-csb-skynet.cloud.paas.psi.redhat.com/job/Submariner-OSP-AWS/800/Test-Report/
The last step includes pods logs, and subctl info.

Note that the same test, but with regular service, not headless, the connection works good:

$ oc exec netshoot-cl-a -n test-submariner -- /bin/bash -c "curl --max-time 30 --verbose nginx-cl-b.test-submariner.svc.clusterset.local:8080"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 100.96.76.75:8080...
* Connected to nginx-cl-b.test-submariner.svc.clusterset.local (100.96.76.75) port 8080 (#0)

add lighthouse e2e tests to downstream CI

@Vishal Thapar @mkolesni @tom Pantelis

var _ = Describe("[discovery] Test Service Discovery Across Clusters", func() {
Hi , Do you guys know how/where these tests are run ? via github actions / subctl verify ? , thanks

test/e2e/discovery/service_discovery.go:21
var _ = Describe("[discovery] Test Service Discovery Across Clusters", func() {
https://github.com/submariner-io/lighthouse|submariner-io/lighthousesubmariner-io/lighthouse | Added by GitHub

WIP:

Vishal Thapar 11:45 AM
@pkomarov via both. Github Actions runs these in lighthouse CI e.g. https://github.com/submariner-io/lighthouse/actions/runs/170918828
Example of using subctl verify is in operator repo's CI e.g. https://github.com/submariner-io/submariner-operator/runs/876513305?check_suite_focus=true#step:5:8436

https://github.com/submariner-io/lighthouse/actions/runs/170918828

https://github.com/submariner-io/submariner-operator/runs/876513305?check_suite_focus=true#step:5:8436

Opt-in for multi-cluster service-discovery

Now all services ( except for few default services ) are discoverable across clusters.

An opt-in feature shall be added, where only services with a specific label (could explore other options) are discoverable.

DNS resolution is very erratic sometimes

Deploy three KIND clusters (you can use submariner repo and execute "make clusters")

  1. subctl deploy-broker --kubeconfig output/kubeconfigs/kind-config-cluster1 --service-discovery
  2. subctl join --kubeconfig output/kubeconfigs/kind-config-cluster2 --clusterid cluster2 --disable-nat --version devel ./broker-info.subm --cable-driver libreswan
  3. subctl join --kubeconfig output/kubeconfigs/kind-config-cluster3 --clusterid cluster3 --disable-nat --version devel ./broker-info.subm --cable-driver libreswan
    Note: Used latest subctl

[sgaddam@localhost submariner]$ export KUBECONFIG=output/kubeconfigs/kind-config-cluster2
[sgaddam@localhost submariner]$ kubectl get svc -n submariner-operator
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
submariner-lighthouse-coredns ClusterIP 100.92.253.164 53/UDP 42m
submariner-operator-metrics ClusterIP 100.92.66.37 8383/TCP,8686/TCP 42m
[sgaddam@localhost submariner]$

ConfigMap of CoreDNS
apiVersion: v1
data:
Corefile: |
#lighthouse
supercluster.local:53 {
forward . 100.92.253.164 <--- This matches with the serviceip of lighthouse-coredns
}
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster2.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}

After removing the submariner.io/gateway label on active gatewaynode of cluster2, enginePod is terminated, and when we try DNS resolution from the Pod, it's very erratic (PSB).

In clientPod (deployed via kubectl run netshoot-2-1 -i --tty --image nicolaka/netshoot -- /bin/bash) on cluster-2
bash-5.0# The ping requests below were issued without any delay.
bash-5.0# ping nginx.default.svc.supercluster.local
PING nginx.default.svc.supercluster.local (100.93.129.152) 56(84) bytes of data. <--- Here ping resolves
^C
--- nginx.default.svc.supercluster.local ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

bash-5.0#
bash-5.0# ping nginx.default.svc.supercluster.local
ping: nginx.default.svc.supercluster.local: Name does not resolve <--- ping does not resolve
bash-5.0#
bash-5.0# ping nginx.default.svc.supercluster.local
ping: nginx.default.svc.supercluster.local: Name does not resolve
bash-5.0#
bash-5.0# ping nginx.default.svc.supercluster.local
PING nginx.default.svc.supercluster.local (100.93.129.152) 56(84) bytes of data. <--- ping resolves again
^C
--- nginx.default.svc.supercluster.local ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8215ms

After leaving the setup for some 20 mins, when I tried to run ping/dig, DNS does not resolve at all.

bash-5.0# ping nginx.default.svc.supercluster.local
ping: nginx.default.svc.supercluster.local: Name does not resolve
bash-5.0#

Re-verified the lighthouse-coredns serviceip and it matches with the ipaddress in configMap of CoreDNS, but still DNS does not work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.