futurewei-cloud / alcor-control-agent Goto Github PK
View Code? Open in Web Editor NEWCloud native SDN platform - network control agent
License: MIT License
Cloud native SDN platform - network control agent
License: MIT License
ETA 7/12
To be able to call Transit daemon API to create VPC assuming the dependency of Transit daemon is working.
With the updated design for Dataplane Abstraction Layer, we need to:
Port creation at the host is time consuming as a number of system calls are involved including ns creation, veth pair creation and programming, veth renaming and ns move etc. This would introduce extensive latency in a large customer deployment scenario, like deploying a few VM/container pods to the same host.
In order to increase Alcor system throughput and reduce latency, we need to parallelize port creations with concurrent threads.
This is an endpoint on 172.20.52.56 trying to ping an endpoint on 172.20.47.86.
172.20.35.126 is the switch
02:30:30.285735 IP 4.0.0.0.5273 > 172.20.35.126.6081: Geneve, Flags [none], vni 0xbb8: ARP, Request who-has 10.0.0.6 tell 10.0.0.7, length 28
02:30:31.309734 IP 4.0.0.0.5273 > 172.20.35.126.6081: Geneve, Flags [none], vni 0xbb8: ARP, Request who-has 10.0.0.6 tell 10.0.0.7, length 28
The switch drops all the packet because it doesn't know what 4.0.0.0 is...
The current system() call used by aca does not handle timeout and reading the output of the command. We want to replace system() and do fork()/socketpair()/exec() to allow those capabilities.
design draft (working in progress) is available at:
https://github.com/er1cthe0ne/alcor-control-agent/blob/docs/design/docs/dhcp_programming.adoc
put async GRPC service into a separate thread, and able to shut it down in the "ctrl-c" handler.
update high-level architecture diagram to:
Success Criteria
Comprehensive design document to support port programming with security group and Network ACL.
Better handling of race condition. Investigate to use thread pools.
Abstraction layer to support multiple DP implementation (including OVS)
Functionality to support L2/L3 switching, L3 routing and NAT, and DHCP
XDP programming is through corresponding RPC apis by transit daemon.
Details:
Jan 10 23:04:20 ip-172-20-35-126 transit[5930]: [update_ep_1_svc:222] update_ep_1 ep tunid: 3000, ip: 0x600000a, type: 1, veth: vethf37810eb-7f, hosted_interface:peerf37810eb-7f
Jan 10 23:04:20 ip-172-20-35-126 transit[5930]: [update_ep_1_svc:222] update_ep_1 ep tunid: 3000, ip: 0x600000a, type: 1, veth: vethf37810eb-7f, hosted_interface:
The first call updates with the peer interface,
The consecutive call overwrites that update but this time there is no peer interface for the "hosted interface field"
To support multiple Data Plane implementation including OVS and Mizar, we plan to add a new DP abstraction layer in Alcor Control Agent so that APIs of various data plane implementations are "invisible" to control plane.
valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --num-callers=20 --track-fds=yes ./networkControlAgent
ETA 07/19
Need to leverage the code from issue #3.
After I run the test in loop few hundred times I observe failures:
[==========] 18 tests from 1 test suite ran. (1067 ms total)
[ PASSED ] 12 tests.
[ FAILED ] 6 tests, listed below:
[ FAILED ] net_config_test_cases.create_namespace_valid
[ FAILED ] net_config_test_cases.create_veth_pair_valid
[ FAILED ] net_config_test_cases.setup_peer_device_valid
[ FAILED ] net_config_test_cases.move_to_namespace_valid
[ FAILED ] net_config_test_cases.setup_veth_device_valid
[ FAILED ] net_config_test_cases.rename_veth_device_valid
I have captured some of the failures:
Elapsed time for system command took: 1300363 nanoseconds or 1 milliseconds.
Command failed!!!: ip netns add test_ns
/mnt/host/code/test/gtest/aca_tests.cpp:93: Failure
Expected equality of these values:
rc
Which is: 256
0
Did I used up some resource?
Thanks,
David
Basic Requirements
To cover the following scenarios:
(1) VPC Creation
(2) Subnet Creation
(3) Port Creation
deserialize directly from kafka message to protobuf goalstate object, to avoid the extra copy: kafka message (binary) -> string -> protobuf goalstate.
For example, how to group a set of agents as a group of kafka consumers
https://stackoverflow.com/questions/34550873/difference-between-groupid-and-consumerid-in-kafka-consumer
Success Criteria
Support basic port creation, update and deletion in the control plane
the current shell out call doesn't provide meaningful error information for failure. Which makes it hard to debug for issues. We should switch to use netlink library call which provide better performance and better diagnosability.
Note that we may still need to use shell out calls for rescue path when trying to exec instruction from a helping neighbor control agent.
ACA has "create" port ovs programming on compute host and accept goal state message from Alcor server for its configuration. Alcor server can also send down "update" and "delete" port ovs programming on compute host, but the support is not implemented in ACA yet.
Success Criteria:
Agent support DHCP programming and allow VMs/Containers to receive the assigned IP address through DHCP.
Details:
We need to implement the DHCP support in OpenStack environment, taking over the responsibility of neutron DHCP agent. This tasks includes:
Success Criteria
Agent support router programming and full VPC communication
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.