Giter Club home page Giter Club logo

shared-loadbalancer's Introduction

Shared Kubernetes LoadBalancer

Background

We know that in Kubernetes, there are generally 3 ways to expose workloads publicly:

  • Service (with type NodePort)
  • Service (with type LoadBalancer)
  • Ingress

kubectl proxy and similar dev/debug solutions are not counted in.

NodePort Service comes almost as early as born of Kubernetes. But due to limitation on ports range (30000~32767), randomness of port, and the need to expose public network of (almost) the whole cluster, NodePort Service is usually not considered as a good L4 solution in serious production workloads.

A viable solution today for L4 apps is LoadBalancer service. It's implemented differently in different Kubernetes offerings, by connecting an Kubernetes Service object with a real/virtual IaaS LoadBalancer, so that traffic going through LoadBalancer endpoint can be routed to destination pods properly.

However, in reality, L7 (e.g. HTTP) workloads are way more widely used than L4 ones. So community comes up with the Ingress concept. Ingress object defines how incoming request can be routed to internal Service, and under the hood there is an ingress controller (1) dealing with Ingress objects, setting up mapping rules by leveraging Nginx/Envoy/etc. and also (2) (normally) exposing via LoadBalancer externally.

There is a misunderstanding that using Ingress, it's also doable to manage L4 workloads. It's not true. Why Ingress can work is b/c it can differentiate requests by HTTP headers, but for a L4 packet, it's only ip + port.

Motivation

Ingress introduces a possibility which enables you to expose multiple internal L7 services through one public endpoint. But it doesn't work for L4 workloads.

From the above picture, you might wonder where's the missing piece for L4 services? This is exactly the problem we're trying to solve in this project. And following factors are considered:

  • Cost effective
  • User friendly
  • Reusing existing Kubernetes assets
  • Minimum operation efforts
  • Consistent with Kubernetes roadmap

How It Works

We introduce a "SharedLoadBalancer Controller" to customize current Kubernetes behavior.

Without a "SharedLoadBalancer Controller", it's N Services (of type LoadBalancer) mapped to N LoadBalancer endpoints:

With a "SharedLoadBalancer Controller", it's N SharedLB CR objects mapped to 1 LoadBalancer endpoint (on different ports):

More Info

Want to get more info on this? Join us at KubeCon + CloudNativeCon North America 2018 in Seattle, December 11-13, we will be giving a session on this.

shared-loadbalancer's People

Contributors

brahmaroutu avatar huang-wei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

shared-loadbalancer's Issues

avoid unnecessary LoadBalancer creation

LoadBalancer creation in some cloud providers (esp. GKE and AKS, takes ~1min) is time consuming.

Suppose we have 2 requests comes in a row, and we're out of capacity. Based on current code, the 2 requests will try to create 2 LoadBalancers in the same time, which is unnecessary. We should come up with a solution to hold the 2nd request until 1st request finishes.

Regarding the solution, we can't simply check if len(pendingQ) != 0 b/c it could be the case that port of 1st request has been occupied; in this case, 2nd request is expected to use existing LB, instead of being hold "aggressively".

[AKS] oauth2 refresh issue

2018-12-06T13:56:01.188-0800 ERROR providers.aks providers/aks.go:131 cannot query public ip {"pip": "137.135.101.36", "error": "azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/58de4ac8-a2d6-499b-b983-6f1c870d398e/resourceGroups/MC_res-grp-1_wei-aks_eastus/providers/Microsoft.Network/publicIPAddresses/kubernetes-a6d05da86f9a111e8b9676adc8f35aad?api-version=2017-09-01: StatusCode=0 -- Original Error: adal: Failed to execute the refresh request. Error = 'Get http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com%2F: dial tcp 169.254.169.254:80: i/o timeout'"}

apply scheduling design and practices

Matching incoming (Shared)LoadBalancer request to a real LoadBalancer is actually a scheduling/horizontal-auto-scaling problem.

  • horizontal auto scaling
    • if LoadBalancer resources are out of capacity, create one dynamically
    • if getting error on creating a LoadBalancer service, usually you ran out of capacity/quota in your account, then need to invoke IaaS SDK to increase quota for your account - maybe safer to do it manually by operator
  • scheduling predicates - filtering LB resources
    • ports conflict (#6)
    • protocol (TCP/UDP) matching?
    • service "requests/limits" (or simply use a "weight" term)
    • service "{anti-}affinity"
  • scheduling priorities - prioritizing LB resources
    • lease requested (balanced) vs. most requested (packed)

[General] concurrent CR creations may leads to unexpected result

Creating a LoadBalancer will firstly get the LoadBalancer Service obj created, but with status "pending".

If we got 2 incoming CR requests, although they're protected by "pendingQ", it's still problematic that 2nd CR are trying to use "half-completed" LoadBalancer which is still being created - as cacheMap are already populated as the LoadBalancer Service has existed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.