Giter Club home page Giter Club logo

dhcplb's Introduction

What is dhcplb?

dhcplb is Facebook's implementation of:

  • a DHCP v4/v6 relayer with load balancing capabilities
  • a DHCP v4/v6 server framework

Both modes currently only support handling messages sent by a relayer which is unicast traffic. It doesn't support broadcast (v4) and multicast (v6) requests. Facebook currently uses it in production, and it's deployed at global scale across all of our data centers. It is based on @insomniacslk dhcp library.

Why did you do that?

Facebook uses DHCP to provide network configuration to bare-metal machines at provisioning phase and to assign IPs to out-of-band interfaces.

dhcplb was created because the previous infrastructure surrounding DHCP led to very unbalanced load across the DHCP servers in a region when simply using Anycast+ECMP alone (for example 1 server out of 10 would receive >65% of requests).

Facebook's DHCP infrastructure was presented at SRECon15 Ireland.

Later, support for making it responsible for serving dhcp requests (server mode) was added. This was done because having a single threaded application (ISC KEA) queuing up packets while doing backend calls to another services wasn't scaling well for us.

Why not use an existing load balancer?

  • All the relayer implementations available on the internet lack the load balancing functionality.
  • Having control of the code gives you the ability to:
    • perform A/B testing on new builds of our DHCP server
    • implement override mechanism
    • implement anything additional you need

Why not use an existing server?

We needed a server implementation which allow us to have both:

  • Multithreaded design, to avoid blocking requests when doing backend calls
  • An interface to be able to call other services for getting the IP assignment, boot file url, etc.

How do you use dhcplb at Facebook?

This picture shows how we have deployed dhcplb in our production infrastructure:

DHCPLB deployed at Facebook

TORs (Top of Rack switch) at Facebook run DHCP relayers, these relayers are responsible for relaying broadcast DHCP traffic (DISCOVERY and SOLICIT messages) originating within their racks to anycast VIPs, one DHCPv4 and one for DHCPv6.

In a Cisco switch the configuration would look like this:

ip helper-address 10.127.255.67
ipv6 dhcp relay destination 2401:db00:eef0:a67::

We have a bunch of dhcplb Tupperware instances in every region listening on those VIPs. They are responsible for received traffic relayed by TORs agents and load balancing them amongst the actual dhcplb servers distributed across clusters in that same region.

Having 2 layers allows us to A/B test changes of the server implementation.

The configuration for dhcplb consists of 3 files:

  • json config file: contains the main configuration for the server as explained in the Getting Started section
  • host lists file: contains a list of dhcp servers, one per line, those are the servers dhcplb will try to balance on
  • overrides file: a file containing per mac overrides. See the Getting Started section.

TODOs / future improvements

dhcplb does not support relaying/responding broadcasted DHCPv4 DISCOVERY packets or DHCPv6 SOLICIT packets sent to ff02::1:2 multicast address. We don't need this in our production environment but adding that support should be trivial though.

TODOs and improvements are tracked here

PRs are welcome!

How does the packet path looks like?

When operating in v4 dhcplb will relay relayed messages coming from other relayers (in our production network those are rack switches), the response from the server will be relayed back to the rack switches:

dhcp client <---> rsw relayer ---> dhcplb (relay) ---> dhcplb (server)
                      ^                                      |
                      |                                      |
                      +--------------------------------------+

In DHCPv6 responses by the dhcp server will traverse the load balancer.

Installation

To install dhcplb into $GOPATH/bin/dhcplb, simply run:

$ go install github.com/facebookincubator/dhcplb@latest

Cloning

If you wish to clone the repo you can do the following:

$ mkdir -p $GOPATH/src/github.com/facebookincubator
$ cd $_
$ git clone https://github.com/facebookincubator/dhcplb
$ go install github.com/facebookincubator/dhcplb

Run unit tests

You can run tests with:

$ cd $GOPATH/src/github.com/facebookincubator/dhcplb/lib
$ go test

Getting Started and extending dhcplb

dhcplb can be run out of the box after compilation.

To start immediately, you can run sudo dhcplb -config config.json -version 6. That will start the relay in v6 mode using the default configuration.

Should you need to integrate dhcplb with your infrastructure please see Extending DHCPLB.

Virtual lab for development and testing

You can bring up a virtual lab using vagrant. This will replicate our production environment, you can spawn VMs containing various components like:

  • N instances of ISC dhcpd
  • An instance of dhcplb
  • An instance of dhcrelay, simulating a top of rack switch.
  • a VM where you can run dhclient or ISC perfdhcp

All of that is managed by vagrant and chef-solo cookbooks. You can use this lab to test your dhcplb changes. For more information have a look at the vagrant directory.

Who wrote it?

dhcplb started in April 2016 during a 3 days hackathon in the Facebook Dublin office, the hackathon project proved the feasibility of the tool. In June we were joined by Vinnie Magro (@vmagro) for a 3 months internship in which he worked with two production engineers on turning the hack into a production ready system.

Original Hackathon project members:

  • Angelo Failla (@pallotron), Production Engineer
  • Roman Gushchin (@rgushchin), Production Engineer
  • Mateusz Kaczanowski (@mkaczanowski), Production Engineer
  • Jake Bunce, Network Engineer

Internship project members:

  • Vinnie Magro (@vmagro), Production Engineer intern
  • Angelo Failla (@pallotron), Intern mentor, Production Engineer
  • Mateusz Kaczanowski (@mkaczanowski), Production Engineer

Other contributors:

  • Emre Cantimur, Production Engineer, Facebook, Throttling support
  • Andrea Barberio, Production Engineer, Facebook
  • Pablo Mazzini, Production Engineer, Facebook

License

BSD License. See the LICENSE file.

dhcplb's People

Contributors

16point7 avatar abulimov avatar beejeebus avatar gaby avatar gorhamc avatar keizar901 avatar lclarkmichalek avatar lelenanam avatar lsiudut avatar marcoguerri avatar natolumin avatar nemith avatar pallotron avatar pmazzini avatar pzmarzly avatar rojer9-fb avatar stanislavglebik avatar vmagro avatar xaionaro avatar zertosh avatar zpao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dhcplb's Issues

Broken dependency with golang-lru

Hello,

golang-lru has changed their main package:

"Please use github.com/hashicorp/golang-lru/v2 for all new code as this
version supports generics and is faster; old code can specify a specific tag,
e.g. github.com/hashicorp/golang-lru/v0.6.0 for backwards compatibility. "

Can you fix the broken dependency with golang-lru?

Thanks.

Please provide example for using dhcplb as a server.

Please provide an example config.json, hosts-v4.txt and/or hosts-v6.txt when dhcplb is configured as server, or at least give the public a hint on how to use this piece of wonderful open-sourced software as a dhcp server. Kindly improve documentation on the usage of dhcplb as a server.

Thank you.

Change hosts file no update

If I change hosts file, available servers cannot update. But if I open config.json file and close with saved file, available servers can update. I think code just checks config.json.

the host list file example ?

anyone can provide hosts-v4.txt example to list dhcp server ?

i edited hosts-v4.txt as bellow and it's not OK !

[root@localhost dhcplb]# cat hosts-v4.txt
10.84.8.31
10.84.8.32

Advice on scaling with kea-dhcp4 in Kubernetes?

I am loadtesting a Kea DHCPv4 deployment in Kubernetes, and the objective is to reach 10000 DORA transactions per second, with 0.00% drop ratio.

Current setup:

  • 10 x kea-perdhcp pods running 1000 transactions each (eg perfdhcp -xi -t1 -r1000 -R5000 -p30 DHCPLB_CLUSTERIP)
  • 1 x DHCPLB pod + service (ClusterIP) in relay mode, throttling* params = 0
  • N x Kea-dhcp4 pods - in-memory lease DB per pod

I scale kea-dhcp4 pods from N=1 to N=40.

Unfortunately I cannot get any kind of effective scaling past 7 pods, and the drop-ratio reaches only a minimum of ~10%.
Please refer to the plot below (shows data for dhcplb container running in both privileged and unprivileged modes).
I would be grateful for any suggestions, and can share more test details on request of course.

dhcplb_kea_scaling

dhcpv6 server list filesourcer error

I try to run a dhcplb in ipv6 mode, but got error during dhcplbv6 start:

Mar 30 18:20:02 dhcplb1 systemd[1]: Started dhcplb6.
Mar 30 18:20:02 dhcplb1 dhcplb[6670]: I0330 18:20:02.130078    6670 config.go:114] Loaded 0 override(s)
Mar 30 18:20:02 dhcplb1 dhcplb[6670]: E0330 18:20:02.130933    6670 filesourcer.go:97] Can't convert port 20aa to int
Mar 30 18:20:02 dhcplb1 dhcplb[6670]: E0330 18:20:02.131083    6670 filesourcer.go:97] Can't convert port 25aa to int
Mar 30 18:20:02 dhcplb1 dhcplb[6670]: E0330 18:20:02.131232    6670 filesourcer.go:97] Can't convert port 25aa to int
Mar 30 18:20:02 dhcplb1 dhcplb[6670]: I0330 18:20:02.131489    6670 server.go:94] Setting up throttle: Cache Size: 1024 - Cache Rate: 128 - Request Rate: 256
Mar 30 18:20:02 dhcplb1 dhcplb[6670]: I0330 18:20:02.132771    6670 main.go:84] Starting dhcplb in v6 mode
Mar 30 18:20:02 dhcplb1 dhcplb[6670]: I0330 18:20:02.132925    6670 update_servers.go:17] Starting to update server list...
Mar 30 18:20:02 dhcplb1 dhcplb[6670]: I0330 18:20:02.133058    6670 server.go:43] Started server, processing DHCP requests...
Mar 30 18:20:15 dhcplb1 dhcplb[6670]: I0330 18:20:15.680422    6670 filesourcer.go:134] Event: "/home/vagrant/dhcp-servers-v6.cfg": WRITE File changed, reloading host list
Mar 30 18:20:15 dhcplb1 dhcplb[6670]: E0330 18:20:15.680503    6670 filesourcer.go:97] Can't convert port 25aa to int
Mar 30 18:20:15 dhcplb1 dhcplb[6670]: E0330 18:20:15.680520    6670 filesourcer.go:97] Can't convert port 25aa to int
Mar 30 18:20:15 dhcplb1 dhcplb[6670]: E0330 18:20:15.680535    6670 filesourcer.go:97] Can't convert port 25aa to int

Process is running as:

/opt/go/bin/dhcplb -version 6 -config /home/vagrant/dhcplb.config.json

dhcpv6 server list file:

root@dhcplb1:/home/vagrant# cat /home/vagrant/dhcp-servers-v6.cfg
fdfa:25aa:efc8:1002::114
fdfa:25aa:efc8:1002::115
fdfa:25aa:efc8:1002::116

broken locking and other issues in modulo.go

Via this post I came accross this file, which looks pretty broken to me.

First and foremost, there is a datarace: selectRatioBasedDhcpServer doesn't acquire a read-lock but reads members of m (so there might be a Read-Write race with updateServerList or setRCRatio, which btw also needs locking), while selectServerFromList does acquire a lock, but then doesn't use the receiver at all.

I also think you can significantly cut down the critical sections of that code. For example, the hashing of a message ID doesn't need to be protected by a lock at all. Only very little of that code needs protection (and you probably could even get away without locking at all).

You are also hashing the message ID twice, when using selectRatioBasedDhcpServer (which is probably the main entry point).

I'd refactor that whole file. AIUI it's in the hotpath, but it has several inefficiencies in it and is overall just more complicated than it'd need to be.

(btw, love that post :) I think that is a pretty cool project for an intern :) )

rr algorithm

If I use xid algorithm, I do not have any problem.

I0415 17:22:47.994353 17570 modulo.go:67] List of available stable servers:
I0415 17:22:47.994376 17570 modulo.go:69] Hostname: 10.8.2.100, IP: 10.8.2.100, Port: 67
I0415 17:22:47.994387 17570 modulo.go:69] Hostname: 10.8.2.101, IP: 10.8.2.101, Port: 67
I0415 17:23:01.064832 17570 glog_logger.go:104] client_mac: 00:50:56:aa:0a:79, dhcp_server: 10.8.2.100, giaddr: 10.0.1.252, latency_us: 144, server_is_rc: false, source_ip: 10.0.1.252, success: true, type: DISCOVER, version: 4, xid: 0x731a5494
I0415 17:23:01.069970 17570 glog_logger.go:104] client_mac: 00:50:56:aa:0a:79, dhcp_server: 10.8.2.100, giaddr: 10.0.1.252, latency_us: 47, server_is_rc: false, source_ip: 10.0.1.252, success: true, type: REQUEST, version: 4, xid: 0x731a5494
I0415 17:23:17.994353 17570 modulo.go:67] List of available stable servers:
I0415 17:23:17.994376 17570 modulo.go:69] Hostname: 10.8.2.100, IP: 10.8.2.100, Port: 67
I0415 17:23:17.994387 17570 modulo.go:69] Hostname: 10.8.2.101, IP: 10.8.2.101, Port: 67

But if I change rr algorithm, no packet is sent to dhcp servers also no periodic update available servers like stopped service.

I0415 17:51:28.471107 19507 main.go:78] Config changed
I0415 17:51:28.471110 19507 server.go:52] Updating server config
I0415 17:51:28.471113 19507 rr.go:82] List of available stable servers:
I0415 17:51:28.471117 19507 rr.go:84] Hostname: 10.8.2.100, IP: 10.8.2.100, Port: 67
I0415 17:51:28.471121 19507 rr.go:84] Hostname: 10.8.2.101, IP: 10.8.2.101, Port: 67
I0415 17:51:28.471125 19507 rr.go:82] List of available rc servers:
I0415 17:51:28.471129 19507 server.go:57] Updated server config

Vendor Dependencies

Considering modern go versions have vendor support, it would be easy to set this up instead of listing them explicitly.

It would add bloat to the codebase but it would make it easier to install for other users.

DHCP server connection check mechanism

I use KEA 1.5 DHCP server and default dhcplb configuration . When I close kea-dhcp4.service, available stable servers do not update and request packets are trying to send to inactive dhcp server. Then I try poweroff dhcp server and still dhcplb try to send to closed dhcp server. free_conn_timeout does not check connection?

Update the "Requirements" category section to latest way of installing libraries

$ go get github.com/fsnotify/fsnotify
$ go get github.com/golang/glog
$ go get github.com/facebookgo/ensure
$ go get github.com/hashicorp/golang-lru
$ go get github.com/insomniacslk/dhcp/dhcpv4
$ go get github.com/insomniacslk/dhcp/dhcpv6
$ go get golang.org/x/time/rate

Update it to:
go install example.com/cmd@latest for an easy installation.

Compile error after dhcp lib update

src/github.com/facebookincubator/dhcplb/lib/handler.go:269:20: undefined: dhcpv6.DHCPv6Relay
src/github.com/facebookincubator/dhcplb/lib/handler.go:276:15: undefined: dhcpv6.DHCPv6Message
src/github.com/facebookincubator/dhcplb/lib/handler.go:322:23: undefined: dhcpv6.DHCPv6Relay

Value Change Problem

When I change throttle values and port number, it does not change in run-time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.