Giter Club home page Giter Club logo

alcor-control-agent's Introduction

Build Status License: MIT Percentage of issues still open Average time to resolve an issue

Alcor Control Agent

Next-gen networking control plane - Alcor Control Agent

In this README:

Introduction

The Cloud Fabric Alcor Control Agent (ACA) runs on each host machine. It serves as a stateless proxy between Alcor controller and host machine networking components for control plane operations. Below are the highlevel Agent components.

Agent Components

Repositories

The Alcor project is divided across a few GitHub repositories.

  • alcor/alcor: This is the main repository of Alcor Regional Controller that you are currently looking at. It hosts controllers' source codes, build and deployment instructions, and various documents that detail the design of Alcor.

  • alcor/alcor_control_agent: This repository contains source codes for a host-level stateless agent that connects regional controllers to the host data-plane component. It is responsible for programming on-host data plane with various network configuration for CURD of VPC, subnet, port, Security group etc., and monitoring network health of containers and VMs on the host.

  • alcor/integration: The integration repository contains codes and scripts for end-to-end integration of Alcor control plane with popular orchestration platforms and data plane implementations. We currently support integration with Kubernetes (via CNI plugin) and Mizar Data Plane. We will continue to integrate with other orchestration systems and data plane implementations.

Directory Structure

This main repository of Alcor Control Agent is organized as follows:

  • build: script and docker files for building
  • docs: design documentation
  • etc/k8s: k8s integration files
  • include: header files
  • src: source code
  • test: Unit and integration test code

Notes

alcor-control-agent's People

Contributors

cj-chung avatar davidliu506 avatar er1cthe0ne avatar gzure avatar huaqingtu avatar kiran1048 avatar lfu-ps avatar liangbin-pub avatar lly00 avatar phudtran avatar pikapikaw avatar spagochitarra avatar tianyuan129 avatar vanderchen avatar w2520n2520 avatar yanmo96 avatar zhangml avatar zmn223 avatar zqy11 avatar zzxgzgz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alcor-control-agent's Issues

[Require investigation] Random host IP is assigned and injected to packet header at EP host

This is an endpoint on 172.20.52.56 trying to ping an endpoint on 172.20.47.86.
172.20.35.126 is the switch

02:30:30.285735 IP 4.0.0.0.5273 > 172.20.35.126.6081: Geneve, Flags [none], vni 0xbb8: ARP, Request who-has 10.0.0.6 tell 10.0.0.7, length 28
02:30:31.309734 IP 4.0.0.0.5273 > 172.20.35.126.6081: Geneve, Flags [none], vni 0xbb8: ARP, Request who-has 10.0.0.6 tell 10.0.0.7, length 28

The switch drops all the packet because it doesn't know what 4.0.0.0 is...

Implement update and delete for port ovs programming

ACA has "create" port ovs programming on compute host and accept goal state message from Alcor server for its configuration. Alcor server can also send down "update" and "delete" port ovs programming on compute host, but the support is not implemented in ACA yet.

Implement Dataplane Abstraction Layer

With the updated design for Dataplane Abstraction Layer, we need to:

  1. Add the new core network programming interface.
  2. Refactor the current code to use the new core network programming interface.
  3. Update the existing unit test code and add new test cases as needed.

switch from shell out calls to netlink library call

the current shell out call doesn't provide meaningful error information for failure. Which makes it hard to debug for issues. We should switch to use netlink library call which provide better performance and better diagnosability.

Note that we may still need to use shell out calls for rescue path when trying to exec instruction from a helping neighbor control agent.

aca test falure

After I run the test in loop few hundred times I observe failures:

[==========] 18 tests from 1 test suite ran. (1067 ms total)
[ PASSED ] 12 tests.
[ FAILED ] 6 tests, listed below:
[ FAILED ] net_config_test_cases.create_namespace_valid
[ FAILED ] net_config_test_cases.create_veth_pair_valid
[ FAILED ] net_config_test_cases.setup_peer_device_valid
[ FAILED ] net_config_test_cases.move_to_namespace_valid
[ FAILED ] net_config_test_cases.setup_veth_device_valid
[ FAILED ] net_config_test_cases.rename_veth_device_valid

I have captured some of the failures:

Elapsed time for system command took: 1300363 nanoseconds or 1 milliseconds.
Command failed!!!: ip netns add test_ns
/mnt/host/code/test/gtest/aca_tests.cpp:93: Failure
Expected equality of these values:
rc
Which is: 256
0

Did I used up some resource?

Thanks,
David

Parallel port programming

Port creation at the host is time consuming as a number of system calls are involved including ns creation, veth pair creation and programming, veth renaming and ns move etc. This would introduce extensive latency in a large customer deployment scenario, like deploying a few VM/container pods to the same host.

In order to increase Alcor system throughput and reduce latency, we need to parallelize port creations with concurrent threads.

Data Plane abstraction layer design

To support multiple Data Plane implementation including OVS and Mizar, we plan to add a new DP abstraction layer in Alcor Control Agent so that APIs of various data plane implementations are "invisible" to control plane.

Hosted_interface got overwritten by subsequent call

Details:

Jan 10 23:04:20 ip-172-20-35-126 transit[5930]: [update_ep_1_svc:222] update_ep_1 ep tunid: 3000, ip: 0x600000a, type: 1, veth: vethf37810eb-7f, hosted_interface:peerf37810eb-7f
Jan 10 23:04:20 ip-172-20-35-126 transit[5930]: [update_ep_1_svc:222] update_ep_1 ep tunid: 3000, ip: 0x600000a, type: 1, veth: vethf37810eb-7f, hosted_interface:

The first call updates with the peer interface,
The consecutive call overwrites that update but this time there is no peer interface for the "hosted interface field"

Implement DHCP server support

Success Criteria:

Agent support DHCP programming and allow VMs/Containers to receive the assigned IP address through DHCP.

Details:

We need to implement the DHCP support in OpenStack environment, taking over the responsibility of neutron DHCP agent. This tasks includes:

  1. Done: Close down on the current DHCP design draft document #96 - https://github.com/futurewei-cloud/alcor-control-agent/blob/master/docs/dhcp_programming.adoc
  2. Done: Modify the current Alcor network state message to support DHCP programming - https://github.com/futurewei-cloud/alcor/blob/master/schema/proto3/dhcp.proto
  3. Done: Implement the DHCP programming interface according to the design - https://github.com/futurewei-cloud/alcor-control-agent/blob/master/include/aca_dhcp_programming_if.h
  4. In progress: Implement DHCP handler class to work with openflow and act as DHCP server, remaining items:
    4a. We need to add option flow rule to capture DHCP packets and send to openflow controller (ACA), we can add that into the DHCP class init function, called by Aca_Goal_State_Handler::Aca_Goal_State_Handler() constructor
    -the add rule should look like: add-flow br-int "table=0,priority=25,udp,udp_src=68,udp_dst=67,actions=CONTROLLER"
    -the delete rule should look like: del-flows br-int udp,udp_src=68,udp_dst=67
    -to program the openflow rules, use: ACA_OVS_L2_Programmer::get_instance().execute_openflow_command
    4b. When aca_ovs_control code received a DHCP packet, it needs to call DHCP function to parse and process it, please provide the interface to call and @cj-chung can tell you where to change the code to call it.
  5. Unit testing on DHCP functionality
    5a. Please see DISABLED_2_ports_ROUTING_test_traffic_one_machine in https://github.com/futurewei-cloud/alcor-control-agent/blob/master/test/gtest/aca_tests.cpp for an example on how we used docker + ovs-docker on physical machine or VM to create container for testing. We can create container and assigned a mac address to it, let it do DHCP to test our DHCP implementation. See https://goldmann.pl/blog/2014/01/30/assigning-ip-addresses-to-docker-containers-via-dhcp/
  6. End to End test plan and scenario testing
  7. Scale and perf analysis

[Documentation] Integration with OVS

Basic Requirements

  • Highlevel Architecture
  • Background of OVS Dataplane (interface, components etc.)
  • Communication between ACA and OVS
  • Workflow of basic port programming
  • Workflow of router programming
  • Workflow of security group programming
  • Workflow of network ACL programming
  • Programming protocol
  • ACA Implementation Details
  • Compare to Neutron Implementation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.