youtube / doorman Goto Github PK

Doorman: Global Distributed Client Side Rate Limiting.

License: Apache License 2.0

Go 67.59% Protocol Buffer 4.25% Makefile 0.06% Python 27.83% Shell 0.27%

doorman's Introduction

Doorman

Doorman is a solution for Global Distributed Client Side Rate Limiting. Clients that talk to a shared resource (such as a database, a gRPC service, a RESTful API, or whatever) can use Doorman to voluntarily limit their use (usually in requests per second) of the resource. Doorman is written in Go and uses gRPC as its communication protocol. For some high-availability features it needs a distributed lock manager. We currently support etcd, but it should be relatively simple to make it use Zookeeper instead.

Getting started

The purpose of Doorman is to apportion and distribute capacity to clients based on some definition of fairness. The capacity a client gets for a resource depends on four things:

The configured maximum capacity for the resource (world-wide).
The capacity need (wants) of this client.
The capacity needs (wants) of all other clients on the planet.
The exact algorithm used (as defined by the configuration) to apportion the capacity among all the clients.

The Doorman master server remembers all clients that currently have capacity and whenever a client asks for capacity it inserts the clients request into its memory and runs the algorithm to figure out what this client should get.

Lease length and refresh interval

Doorman only gives out capacity for a limited amount of time, in the form of leases. Each capacity grant comes with a lease length: The client is guaranteed that amount of capacity for the duration of the lease. A typical lease length is five minutes. On top of the lease length the Doorman server also returns a refresh interval. This is the interval after which the client is expected to check back in to get a new lease. A typical refresh interval is five seconds.

Note: The Doorman system is cooperative. The clients are expected to honour the capacity grant, the lease length, and the refresh interval. The system provides no protection against misbehaving clients.

In the normal operation of the system the clients all check in regularly with the server to refresh their capacity. The server knows of all clients and their resource needs, and on every request makes the best possible apportionment of the capacity. For optimization purposes (reduce qps on the Doorman server) the client code does bulk refreshes for all resources whenever it sends out a request to the Doorman server. This means that under specific circumstances (for instance when registering a new resource) a resource might get its capacity refreshed a bit sooner than expected.

The Doorman configuration specifies which algorithm should be used to distribute capacity among all clients. The page on algorithms explains which algorithms currently are available and how they apportion capacity to each client.

The two parameters lease_length and refresh_interval optimize a number of different behaviors of the system:

The load on the Doorman server.
The speed with which the system converges as resource needs change and clients appear and disappear.
How the system deals with the Doorman server being unreachable or slow.

When the Doorman server goes unreachable and comes back

When a client cannot reach the Doorman server the following happens:

The client misses one or more refresh intervals. This does not matter much for the client other than that the capacity the client has is not adjusted for potentially changed resource needs.
When the Doorman server is unavailable for a longer period of time leases expire and the resources revert to their configured safe capacity. This can be either:
- -1, meaning an unbounded (infinite) rate limit, or
- 0, meaning that all access to the resource is blocked, or a positive number
As soon as the Doorman server becomes reachable again the clients will resume requesting capacity.

Note: Doorman uses multiple servers in different clusters and a master election procedure to determine the current master.

Doorman does not share or store its internal database. That means that when a Doorman server becomes the master it starts with an empty repository of clients and outstanding leases. However this is not as problematic as it seems, because once the server is available all clients will start calling it to refresh their leases. Since the Doorman server knows that it does not have enough information to run its algorithms it will simply return the currently assigned capacity, or zero if it is a request from a client which currently does not have capacity for the resource. The server knows the currently assigned capacity because clients helpfully include it in every GetCapacity RPC. This phase of the server is known as learning mode.

During the learning mode of a resource every request will be answered with a new lease for the same capacity the client currently has. Practically speaking after a couple of refresh intervals the server can be reasonably sure that it has been contacted by every existing client out there. However for reasons of safety the default learning mode duration is the same as the lease length. This decision ensures that when the learning mode duration expires we can be sure that there are no leases out there that we don't know about (because these would have expired by then). If you want the system to converge faster after a Doorman master election you can explicitly configure a learning_mode_duration in the resource template (see the page on the Configuration of the system for more information).

Who wants what?

Doorman requires the clients to inform it of the desired capacity (the so-called wants). If you are using the low-level Doorman client you need to figure out your capacity need and call the appropriate methods to make sure that the client library requests that amount of capacity during its refresh cycle. However if you use the rate limiter objects provided by the Doorman clients the desired capacity is determined automatically by observing the behavior of the threads that want to access the resource. This automatic wants determination uses a moving average to smoothen out any spikes.

Next steps

Read the tutorial.
Read more about available algorithms.
Read a Kubernetes deployment tutorial.
Read about Doorman's configuration.
Read the in-depth design doc.
Read the client documentation.

Status and Plans

Doorman should be currently considered Alpha quality software. The server and Go client received a decent amount of testing at Google (both functional and load testing), so we are pretty confident they do what they are supposed to do. However, in the process of open-sourcing the code we switched from internal Google technologies to their Open Source equivalents – and this needs more testing. Finally, there's no proper versioning scheme at the moment.

Short term plans:

C++ client;
Python client;
Docker image;

Longer term plans:

Ruby client;
Proper semantic versioning.

Installation

First, you need to have Go installed. You can either follow the official installation instructions or, on OS X, just do

brew install go

As part of the initial setup, you have to set GOPATH, wihch is the location where Go keeps all its sources and binary artifacts.

export GOPATH=...

With this out of the way, Doorman is just one go get away:

go get github.com/youtube/doorman/go/cmd/doorman

If you are interested in a checkout of Doorman that you can modify, you can do:

mkdir -p $GOPATH/src/github.com/youtube
git clone [email protected]:youtube/doorman.git

Go version <= 1.5

If you are using a version of Go earlier than 1.6, you will need to set an environment variable to enable vendoring (see https://golang.org/s/go15vendor):

export GO15VENDOREXPERIMENT=1

Contributing

See CONTRIBUTING for details on submitting patches and the contribution workflow.

License

Doorman is under the Apache 2.0 license. See the LICENSE file for details.

Note

This is not an official Google product.

doorman's People

Stargazers

Watchers

Forkers

ryszard robmurtha luw2007 josvisser gitter-badger johnsonc tchen0123 dgem markrey ligadous tharanga-abeyseela xn0px90 mrblack1117 brianchoate ngaut pombredanne miffa tangyi1989 linearregression cit-lab rnaveiras kleopatra999 codemartial digideskio goprag billf sayonetech arhitiron kinget007 prabhatkjena blueblue-lee mwaaas duzhanyuan awesome-golang lybwb anrs huichen ustackq nyukhalov gao-xiao-long vicever farooqarahim imagineagents jessiewy chideat jove817 geshuning xunknown barala daniezimmer fingthinking fuath jangocheng panzhenyu12 dalavancloud sts0mrg0 pippo1980 zhanglei isgasho dolanor-galaxy connectionmaster hhy5277 exhorder laiqiqi cnxtech przor3n wade1990 klijeesh waldow90 mattmendick 5l1v3r1 xunleer ajityadav-broadcom undertreetech binbin0325 isabella232 sshyran hrz123 tryweirder zzy0331 bearbearkeeper wzdxhcschg yangzhongj liulangwa zhglin renowncoder excloudx6 gladiopeace ojoadeolagabriel2 samkenxstream org-mars tryweirdier ram535ii maximecaron doytsujin cyberflamego onqlave-cesc qpc-github cha369 charygao

doorman's Issues

Looking for a rate limiting project is this actively maintained

C++ client for Doorman.

Error while building loadtest docker server

How to resolve this ?

docker build -t glocalregistry/doorman/doorman-server:v0.1.7 docker/server/
Sending build context to Docker daemon 3.584 kB
Step 1 : FROM golang:1.5
1.5: Pulling from library/golang
357ea8c3d80b: Pull complete
52befadefd24: Pull complete
3c0732d5313c: Pull complete
fee55c622298: Pull complete
70ff2aeff174: Pull complete
01195e06f03d: Pull complete
2f0f050412f9: Pull complete
Digest: sha256:3be07b667a868a246b9cee4ddc5ecce2ad1e211958bd6043a25fc1d19d55e6ba
Status: Downloaded newer image for golang:1.5
---> 99668503de15
Step 2 : ADD config.yml .
---> cc142bba6fac
Removing intermediate container 5d804b03735d
Step 3 : RUN go get github.com/youtube/doorman/go/cmd/doorman
---> Running in 2afcdbff0a9d

github.com/youtube/doorman/proto/doorman

src/github.com/youtube/doorman/proto/doorman/doorman.pb.go:879: cannot use _Capacity_Discovery_Handler (type func(interface {}, context.Context, func(interface {}) error) (interface {}, error)) as type grpc.methodHandler in field value
src/github.com/youtube/doorman/proto/doorman/doorman.pb.go:883: cannot use _Capacity_GetCapacity_Handler (type func(interface {}, context.Context, func(interface {}) error) (interface {}, error)) as type grpc.methodHandler in field value
src/github.com/youtube/doorman/proto/doorman/doorman.pb.go:887: cannot use _Capacity_GetServerCapacity_Handler (type func(interface {}, context.Context, func(interface {}) error) (interface {}, error)) as type grpc.methodHandler in field value
src/github.com/youtube/doorman/proto/doorman/doorman.pb.go:891: cannot use _Capacity_ReleaseCapacity_Handler (type func(interface {}, context.Context, func(interface {}) error) (interface {}, error)) as type grpc.methodHandler in field value

Ruby client

Ruby client talking to the Doorman service.
QPS based rate limiter.
Installable as Gem.

Note: this may depend on the state of gRPC for Ruby.

Client-initiated capacity update

Feature request.

For some use cases it would be great to have an option for a client-initiated capacity update.

E.g. despite of being granted a lease, the target service returns some "rate exceeded" error. A Doorman client knows that but there is no way [I found] to notify the Doorman server of it, and to shrink the quotas for other clients using the same target service.

The ProportionalShare algorithm is incorrect.As a result, subsequent nodes cannot obtain leases.

There's a problem here.：

		// If the client wants less than it equal share or
		// if the sum of what all clients want together is less
		// than the available capacity we can give this client what
		// it wants.
		if store.SumWants() <= capacity || r.Wants <= equalSharePerClient {
			return store.Assign(r.Client, length, interval,
				minF(r.Wants, unusedCapacity), r.Wants, r.Subclients)
		}

As a result, subsequent nodes cannot obtain leases.
When the capacity is used up, the capacity of the newly added nodes is always allocated to 0.

doorman client can't connect new master server when old master is down forever

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
feature request

What happened:
Assume only have root level node, there are three doorman servers, one is master and other two are slave.

In doorman client, I configure the doorman master server addr. As the time going on, the master server is down, so new master is elected. However, doorman client don't know anything about it, doorman client just retry connect the configured addr continually and get failure response continually. So client rate limiting not work normally.

What you expected to happen or what your proposal is:
I think we should configure all addrs,including master addr and slave addr, in doorman client. So when master server is down forever, doorman client can retry connect with new addr.

prometheus config error related to role

level=error msg="Error loading config: couldn't load configuration (-config.file=/etc/prometheus/prometheus.yml): role missing (one of: container, pod, service, endpoint, node, apiserver)" source="main.go:149"

Does doorman stop maintenance now?

If so, for what reason?

[leader election] master should't give up it's leadership easily

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
feature request

What happened:
There is leader election in doorman server, which is achieved by etcd. Set a key with delay ttl and continually refresh it every 1/3 delay interval.

when leader is down, this etcd key will expire. And then new leader is elected.

see the source code:

	go func() {
		for {
			log.V(2).Infof("trying to become master at %v", e.lock)
			if _, err := kapi.Set(ctx, e.lock, id, &client.SetOptions{
				TTL:       e.delay,
				PrevExist: client.PrevNoExist,
			}); err != nil {
				log.V(2).Infof("failed becoming the master, retrying in %v: %v", e.delay, err)
				time.Sleep(e.delay)
				continue
			}
			e.isMaster <- true
			log.V(2).Info("Became master at %v as %v.", e.lock, id)
			for {
				time.Sleep(e.delay / 3)
				log.V(2).Infof("Renewing mastership lease at %v as %v", e.lock, id)
				_, err := kapi.Set(ctx, e.lock, id, &client.SetOptions{
					TTL:       e.delay,
					PrevExist: client.PrevExist,
					PrevValue: id,
				})

				if err != nil {
					log.V(2).Info("lost mastership")
					e.isMaster <- false
					break
				}
			}
		}
	}()

when master fail to renew lease because some temp reasons, for example network jitter, it just loses leadership easily. But actually, if the master try again, it will renew lease successfully.

This problem will resulting in unnecessary learning mode and it takes time to converge.

What you expected to happen or what your proposal is:

I think we shold add retry mechanism when renew lease. If it fails twice or other retry-counts, then lose its leadership.

Security Policy violation Binary Artifacts

This issue was automatically created by Allstar.

Security Policy Violation
Project is out of compliance with Binary Artifacts policy: binaries present in source code

Rule Description
Binary Artifacts are an increased security risk in your repository. Binary artifacts cannot be reviewed, allowing the introduction of possibly obsolete or maliciously subverted executables. For more information see the Security Scorecards Documentation for Binary Artifacts.

Remediation Steps
To remediate, remove the generated executable artifacts from the repository.

Artifacts Found

doc/loadtest/docker/client/client
doc/loadtest/docker/target/target

Additional Information
This policy is drawn from Security Scorecards, which is a tool that scores a project's adherence to security best practices. You may wish to run a Scorecards scan directly on this repository for more details.

Allstar has been installed on all Google managed GitHub orgs. Policies are gradually being rolled out and enforced by the GOSST and OSPO teams. Learn more at http://go/allstar

This issue will auto resolve when the policy is in compliance.

Issue created by Allstar. See https://github.com/ossf/allstar/ for more information. For questions specific to the repository, please contact the owner or maintainer.

Python client

A Python client for Doorman:

A client talking to the Doorman service using gRPC.
A QPS based rate limiter.
Installable with Pip.