Giter Club home page Giter Club logo

merak's Introduction

"This Project has been archived by the owner, who is no longer providing support. The project remains available to authorized users on a "read only" basis."

Merak

A Large-scale Cloud Emulator provides ability to

  • emulate data center physical topologies and network devices including hosts, switches, and routers.
  • emulate a large volume of compute nodes (more than 100K) with limited physical hardware resources.
  • conduct the performance test for a target project's (e.g., Alcor) control plane with a large-size of VPC having more than 1M VMs.
  • automatically create and conduct different performance test scenarios and collect results.

Platforms

There are many different hardware resource management platform in the field, currently we choose two platforms to investigate and create our prototype:

  • Kubernetes cluster with Meshnet CNI
  • Distrinet with LXD containers

Architecture

The following diagram illustrate the high-level architecture of Merak on a kubernetes cluster using Meshnet CNI and the basic workflow to emulate Alcor's control plane for creating VMs in the emulated compute nodes.

Merak Architecture

Components

  • Scenario Manager: create the required topology and test scenarios.
  • K8S-Topo: deploy pods with the given topology.
  • Merak Network: create network infrastructure resources, e.g., vpcs, subnets, and security groups.
  • Merak Compute: register compute nodes informantion, create VMs and collect test results from merak agents.
  • Merak Agent: create virtual network devices (bridges, tap devices and veth pairs) and network namespace for VMs, collect test results and send the results back to merak compute.

Scalability

In order to provide more virtual and emulated resources with limited hardware resources, three possible solutions are investigated and developed in this project:

  • Docker-in-Docker
  • Kubernetes-in-Kubernetes (KinK)
  • Kubernetes cluster in virtual machines

For more detail design and information, please refer to the docs folder in this repository.

Kind: Simple Deployment and E2E Test

This test will bring up Merak and Alcor in a single master node Kind Kubernetes cluster.

Prerequisites

  • Minimum Machine Requirements (Our tests were ran on AWS t2.2xlarge ec2 instances)

    • 16GB RAM
    • 8 Core CPU
  • Update

sudo apt-get update
sudo apt-get install make
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install | sh
export PATH=$PATH:/home/ubuntu/.linkerd2/bin
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.17.0/kind-linux-amd64 && chmod +x ./kind && sudo mv ./kind /usr/local/bin/kind
curl -LO https://dl.k8s.io/release/v1.26.0/bin/linux/amd64/kubectl && sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
sudo apt-get install docker.io
  • Add current user to docker group (for running docker without sudo)
sudo groupadd docker
sudo gpasswd -a $USER docker
newgrp docker

Step 1: Deploy

You can deploy Merak and Alcor in Kind with the command below.

git clone https://github.com/futurewei-cloud/merak.git
cd merak
make kind-alcor

Please wait for all pods to be in running state as shown in the picture below before proceeding to the next step. This should take approximately 5 minutes.

Successful Merak Deployment

Step 2: Run The Test

You can use the prebuilt test tool as shown below.

./tools/teste2e/bin/teste2e

This will create 5 hosts with 10 VM each. Once everything is created, you can test network connnectivity as shown below.

  1. Run kubectl get pods -A to see all vhost pods. Step 1

  2. Merak uses network namespaces to emulate VMs, run kubectl exec -it -n <namespace of the pod> vhost-0 ip netns exec v000 ip a to get the IP address of the emulated VM v000 inside of the emulated host vhost-0. Step 2

  3. Ping the VM v000 on vhost-0 from a different VM on vhost-1 with the following command kubectl exec -it -n <namespace of the pod> vhost-1 ip netns exec v000 ping (IP address from step 2) Step 2

Clean-up:

Run the command below to clean up the Kind environment.

kind delete cluster

Getting Started With Development

To build this project, please make sure the following things are installed:

Then, the project can be built with:

make

How to Deploy a Development Cluster

Prerequisites

Before deploying Merak with Alcor, you will need the following.

  • A Kubernetes cluster with flannel installed
  • Helm
    • curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
  • Needed for Alcor
    • Linkerd installed on the cluster
    • openvswitch-switch installed on every node (apt install openvswitch-switch)

NOTE: Please wait for all LinkerD pods and containers to be fully running before moving on to the steps below.

LinkerD

Deployment

Once your cluster is ready, you can deploy the latest small scale development (one replica for every service) build of Merak and Alcor with the command below.

kubectl kustomize https://github.com/futurewei-cloud/merak/deployments/kubernetes/alcor --enable-helm | kubectl apply -f -

A successful deployment should take roughly 5 minutes for all pods to come to running state. The deployed components are as follows:

  • Merak Microservices and their Redis instances
    • Merak Scenario Manager
    • Merak Topology
    • Merak Network
    • Merak Compute
  • Meshnet CNI
  • Temporal
  • Prometheus
  • Alcor Microservices and their Ignite instances
    • Port Manager
    • Network Config Manager
    • API Manager
    • EIP Manager
    • Dataplane Manager
    • IP Manager
    • Mac Manager
    • Node Manager
    • Quota Manager
    • Route Manager
    • Security Group Manager
    • Subnet Manager
    • VPC Manager
  • LinkerD

Successful Merak Deployment

The deployment settings such as container image and replicas can be changed by editing the kustomize file under deployments/kubernetes/alcor/kustomization.yaml and redeploying with

kubectl kustomize deployments/kubernetes/alcor --enable-helm | kubectl apply -f -

merak's People

Contributors

phudtran avatar cj-chung avatar zzengcs avatar yanmo96 avatar r12f avatar zhdgao avatar

Stargazers

 avatar  avatar Liguang Xie avatar Hoa Nguyen avatar  avatar

Watchers

YX@CentaurusAI avatar Vinay Kulkarni avatar Liguang Xie avatar  avatar  avatar kimeunju108 avatar

merak's Issues

Refactor Protobuf files

Currently protobuf files are in a single package.
Need to refactor each components protobuf file into their own package

Merak-Network: When start k8s pod, sometimes can't connect to DB

When start k8s pod, sometimes can't connect to DB.
It happens every once awhile. To fix it on the fly is to delete the pod, then when the pod come back on again it will be able to connect to the DB pod.
The issue is sometime the DB pod is a bit slow, then when the Network pod start, can not find the DB pod.

Merak Network: The old network config info still in db after the network config got deleted

Reproduce Step:

  1. Deploy a network config (1 vpc, 1 security group, 1 subnet) from scenario manager
  2. Delete the network config from scenario manager
  3. Deploy a new network config again (1 vpc, 1 security group, 1 subnet)

Result:
Merak network returns two vpcs, 2 security groups, 2 subnets

Merak network seems didn't clean up the db or it may use global variable to keep the network config in the memory.

{
    "data": {
        "returnMessage": "returnNetworkMessage Finished",
        "securityGroupIds": [
            "d0ff0ae2120f436ba4b6576a69ea916f",
            "7c2f29a850c24f50966ad391cc7dd391"
        ],
        "vpcs": [
            {
                "projectId": "123456789",
                "subnets": [
                    {
                        "numberVms": 100,
                        "subnetCidr": "10.0.1.0/20",
                        "subnetGw": "10.0.1.1",
                        "subnetId": "590d16bc-cfb6-4e55-9458-1086b54da56f"
                    }
                ],
                "tenantId": "123456789",
                "vpcId": "f15fa247-1c8c-4d22-bf74-232f1007c256"
            },
            {
                "projectId": "123456789",
                "subnets": [
                    {
                        "numberVms": 100,
                        "subnetCidr": "10.0.1.0/20",
                        "subnetGw": "10.0.1.1",
                        "subnetId": "dc5058cd-8bbb-45c2-b3e4-3164f00833ce"
                    }
                ],
                "tenantId": "123456789",
                "vpcId": "7886aa20-8974-445b-8d47-51b75a852c4d"
            }
        ]
    },
    "message": "Action successfully - DEPLOY on Network done",
    "status": "OK"
}

Test: script for integration test from scenario manger to deploy compute

We need a script for integration test. The script can be a bash or python script including functions like:

  1. Create request body for topology, service-config, network-config, and compute-config.
  2. deploy/check/delete topology, network, and compute
  3. get the return message.
  4. identify the test pass or failed by check the return message or code.

Merak-Network: issue deleting security groups

2022/09/08 18:07:33 RequestCall Fail  403 {"timestamp":"2022-09-08T18:07:33.666+00:00","status":403,"error":"Forbidden","message":"","path":"/project/123456789/security-groups/6bf03a03b07f4c818fdaa849bce7f948"}
2022/09/08 18:07:33 returnErr {"timestamp":"2022-09-08T18:07:33.666+00:00","status":403,"error":"Forbidden","message":"","path":"/project/123456789/security-groups/6bf03a03b07f4c818fdaa849bce7f948"}

Merak Network: Check network info from scenario manager got error

Scenario action on network to check network info, the following error return from network:

{"Service":"scenario-manager","level":"info","msg":"constructNetConfMessage: config:{format_version:1  revision_number:1  request_id:\"5f85f79d5b0a4d33a47c63442d91cf0a\"  netconfig_id:\"e6b7e72ac496490e8e4d9127798ec802\"  network:{id:\"e6b7e72ac496490e8e4d9127798ec802\"  name:\"network-config-1\"  number_of_vpcs:1  number_of_subnet_per_vpc:1  vpcs:{tenant_id:\"123456789\"  project_id:\"123456789\"  subnets:{subnet_cidr:\"10.0.1.0/20\"  subnet_gw:\"10.0.1.1\"  number_vms:100}}  number_of_security_groups:1  routers:{name:\"string\"  subnets:\"string\"}  gateways:{name:\"string\"  ips:\"string\"}  security_groups:{name:\"sg-1\"  rules:{name:\"string\"  description:\"string\"  ethertype:\"string\"  protocol:\"string\"  port_range:\"string\"  remote_group_id:\"string\"  remote_ip_prefix:\"string\"}  apply_to:\"string\"}}}","time":"2022-08-22T17:48:01Z"}
{"Service":"scenario-manager","level":"error","msg":"Error when calling Merak-Network: rpc error: code = Unknown desc = redis: nil","time":"2022-08-22T17:48:01Z"}
{"Service":"scenario-manager","level":"error","msg":"error return from grpc server: error when calling merak-network grpc server: rpc error: code = Unknown desc = redis: nil","time":"2022-08-22T17:48:01Z"}
{"Service":"scenario-manager","level":"error","msg":"'CHECK' network failed: deploy network failed, Error = 'error return from grpc server: error when calling merak-network grpc server: rpc error: code = Unknown desc = redis: nil'","time":"2022-08-22T17:48:01Z"}

Merak-Network: When do INFO after DELTE it will have db not found error.

2022/09/08 18:10:23 OP type INFO
2022/09/08 18:10:23 Info
2022/09/08 18:10:23 VnetInfo
2022/09/08 18:10:23 DB GET netconfig:51800604dd6f4aae87fa0a5ef9b0e102
2022/09/08 18:10:23 DB Get Issue: redis: nil
2022/09/08 18:10:23 networkInfoReturn: %!s(chan *network.ReturnNetworkMessage=0xc00013e120)
2022/09/08 18:10:23 returnNetworkMessage <nil>

Should put a proper return for it.

Document: EKS configuration manual

We need a document or manual to list out the step by step for EKS setup and configuration.
Please write the detail procedures and descriptions either in OneNote page or github.

Merak-Network: issue deleting subnet, port in subnet

2022/09/08 18:07:33 RequestCall Fail  409 {"timestamp":"2022-09-08T18:07:33.329+0000","status":409,"error":"Conflict","message":"There is some ports in the subnet, we can Not delete subnet","path":"/project/123456789/subnets/a272393a-1c53-4c6b-be2d-7a50b9188a3e"}
2022/09/08 18:07:33 returnErr {"timestamp":"2022-09-08T18:07:33.329+0000","status":409,"error":"Conflict","message":"There is some ports in the subnet, we can Not delete subnet","path":"/project/123456789/subnets/a272393a-1c53-4c6b-be2d-7a50b9188a3e"}

Merak Network: delete subnet got error from alcor and didn't return proper message back to scenario

Merack network component got error when scenario manager issue DELETE network.
Network got error:

2022/08/31 18:03:08 returnMessage {"subnet":{"id":"48c8e6dc-3a36-4ad1-ac38-4ec53096fc14","project_id":"123456789","tenant_id":"123456789","name":"YM_sample_subnet","description":null,"network_id":"a87c95cc-8ad7-444f-bb72-70471dcab184","cidr":"10.0.1.0/20","availability_zone":null,"gateway_ip":"10.0.0.1","gatewayPortId":"bdf133a1-af9d-44b2-afe6-bdcedb7854ce","gateway_port_detail":{"gateway_macAddress":"aa:bb:cc:98:6d:a4","gateway_port_id":"bdf133a1-af9d-44b2-afe6-bdcedb7854ce"},"attached_router_id":null,"port_detail":null,"enable_dhcp":true,"primary_dns":null,"secondary_dns":null,"dns_list":null,"ip_version":4,"ipV4_rangeId":"9effc294-726c-444b-b535-f9a62ece4c9c","ipV6_rangeId":null,"ipv6_address_mode":null,"ipv6_ra_mode":null,"revision_number":1,"segment_id":null,"shared":null,"sort_dir":null,"sort_key":null,"subnetpool_id":null,"dns_publish_fixed_ip":false,"tags":[],"tags-any":null,"not-tags":null,"not-tags-any":null,"fields":null,"dns_nameservers":[],"allocation_pools":[{"start":"10.0.0.1","end":"10.0.15.254"}],"host_routes":[],"prefixlen":null,"use_default_subnet_pool":false,"service_types":[],"created_at":"2022-08-31 17:39:27","updated_at":"2022-08-31 17:39:27"}}
2022/08/31 18:03:08 returnJson : {{48c8e6dc-3a36-4ad1-ac38-4ec53096fc14 123456789 123456789 YM_sample_subnet <nil> a87c95cc-8ad7-444f-bb72-70471dcab184 10.0.1.0/20 <nil> 10.0.0.1 bdf133a1-af9d-44b2-afe6-bdcedb7854ce {aa:bb:cc:98:6d:a4 bdf133a1-af9d-44b2-afe6-bdcedb7854ce}  <nil> %!s(bool=true) <nil> <nil> <nil> %!s(int=4) 9effc294-726c-444b-b535-f9a62ece4c9c <nil> <nil> <nil> %!s(int=1) <nil> <nil> <nil> <nil> <nil> %!s(bool=false) [] <nil> <nil> <nil> <nil> [] [{10.0.0.1 10.0.15.254}] [] <nil> %!s(bool=false) [] 2022-08-31 17:39:27 2022-08-31 17:39:27}}
2022/08/31 18:03:08 getSubnetRouter done
2022/08/31 18:03:08 deleteSubnet
2022/08/31 18:03:08 body ""
2022/08/31 18:03:18 RequestCall Fail  500 {"timestamp":"2022-08-31T18:03:18.292+0000","status":500,"error":"Internal Server Error","message":"I/O error on DELETE request for \"http://routemanager-service.default.svc.cluster.local:9003/project/123456789/subnets/48c8e6dc-3a36-4ad1-ac38-4ec53096fc14/routetable\": timeout; nested exception is java.net.SocketTimeoutException: timeout","path":"/project/123456789/subnets/48c8e6dc-3a36-4ad1-ac38-4ec53096fc14"}
2022/08/31 18:03:18 returnErr {"timestamp":"2022-08-31T18:03:18.292+0000","status":500,"error":"Internal Server Error","message":"I/O error on DELETE request for \"http://routemanager-service.default.svc.cluster.local:9003/project/123456789/subnets/48c8e6dc-3a36-4ad1-ac38-4ec53096fc14/routetable\": timeout; nested exception is java.net.SocketTimeoutException: timeout","path":"/project/123456789/subnets/48c8e6dc-3a36-4ad1-ac38-4ec53096fc14"}
2022/08/31 18:03:18 deleteVpc
2022/08/31 18:03:18 body ""
2022/08/31 18:03:18 returnMessage {"id":"a87c95cc-8ad7-444f-bb72-70471dcab184"}
2022/08/31 18:03:18 deleteVpc done
2022/08/31 18:03:18 deleteSg
2022/08/31 18:03:18 body ""
2022/08/31 18:03:18 returnMessage
2022/08/31 18:03:18 deleteVpc done
2022/08/31 18:03:18 returnMessage return_message:"returnNetworkMessage Finished" vpcs:{vpc_id:"a87c95cc-8ad7-444f-bb72-70471dcab184" tenant_id:"123456789" project_id:"123456789" subnets:{subnet_id:"48c8e6dc-3a36-4ad1-ac38-4ec53096fc14" subnet_cidr:"10.0.1.0/20" subnet_gw:"10.0.1.1" number_vms:100}} security_group_ids:"b2cb0a7a2fe94fad8674d5adb862fb3b"

The merak network didn't return proper message back to scenario. It makes scenario keep waiting for the return message from merak network.

Merak-Test

New component for initiating cluster wide network tests from scenario manager

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.