openinfrastructure / terraform-google-multinic Goto Github PK

Connect two VPC networks with an auto-healing, auto-scaling group of IP router instances.

License: Apache License 2.0

Shell 41.41% HCL 56.84% Smarty 1.75%

auto-healing vpc-network ilb gcp router terraform terraform-module

terraform-google-multinic's Introduction

Multi-nic VM Routing

This terraform module implements a Linux VM acting as IP router between two VPC networks. The primary use case is an alternative to VPC peering and VPN tunneling for east-west connectivity.

Functionality:

ILB as Next Hop for high availability and reliability.
Auto-healing with persistence of established TCP connections.
Auto scaling based on CPU utilization. See Autoscaler for details.
Virtual wire behavior, traffic ingress to eth0 egresses eth1 and vice-versa.
Separate health checks for load balancing and auto-healing.
Multiple region support. See examples/multiregion/.
Cloud logging with structured log examples.
Fast startup and shutdown, no packages installed.
Systemd integration for easier control and logging.
Zero downtime upgrades of this module.
CentOS 8 base image.

Upgrades

When upgrading to a new versions of this module, follow the process described in UPGRADE.md to avoid downtime.

Getting Started

The core functionality is implemented in the 50_compute/ nested module. This module is intended to be easily reused in your environment.

See examples/compute/ for a complete example which ties together the following resources.

See examples/networksetup for code to create VPC networks and other dependencies necessary to evaluate the solution.

The module operates similar to GKE's model of one instance group per availability zone. Each VPC network has one Internal TCP/UDP Load Balancer forwarding rule. Each ILB distributes traffic across multiple instances in multiple zones within a single region.

Multiple zonal Instance Groups.
Auto-healing health check
2 regional backend services, one for each VPC.
Traffic health check to control traffic distribution separate from auto-healing.
2 ILB forwarding rules, one for each VPC.

OS Images

Version 3.1.0 and later of this module pins the OS image used for multinic instances to a specific value. This ensures the same image is used as instances scale in, scale out, and are auto-healed. In addition, multiple runs of terraform will use the same image specified by the image_name input value.

See Deploying the Latest Image for additional information.

Requirements

ILB addresses should be in the same subnet as the associated network interface so that additional forwarding rules can be added and removed at runtime without having to reconfigure each multinic instance.

Routed VPC networks must be attached to nic0 and nic1 presently. Additional VPC networks may be attached, but they are not configured for policy routing. A future enhancement may support an arbitrary number of attached networks.

Operational Playbook

Take an instance out of rotation with systemctl stop hc-traffic.service.

Start the auto-healing process with systemctl stop hc-health.service.

Exercise a kernel panic with systemctl start kpanic.service. This is useful for evaluating failure modes and recovery behavior.

Behavior

Draining

Stopping hc-traffic.service causes new connections to use healthy instances, if available. Existing connections flowing through this instance continue to do so.

See Balancing mode for more information.

Planned Maintenance

Stopping hc-health.service causes the instance group to auto-heal the instance. Existing connections flowing through this instance being flowing through another healthy instance.

Established TCP connections remain established during the process.

The process takes ~30 to ~40 seconds with the following health check configuration.

check_interval_sec  = 10
timeout_sec         = 5
healthy_threshold   = 2
unhealthy_threshold = 3

Auto-healed instances receive the same name as the instance they replace, but have a unique instance ID number.

Unplanned Maintenance

Triggering a kernel panic with systemctl start kpanic.service exercises auto-healing behavior. Existing connections pause for ~45 seconds with check_interval_sec=10 and unhealthy_threshold=3, then recover without disconnect.

Logging

Information regarding startup and shutdown events are logged to the projects Global resource under the multinic entry. For example:

gcloud logging read logName="projects/${PROJECT_ID}/logs/multinic"

Health Checks

There are two types of health checks used in this solution:

Managed Instance Group auto-healing checks.
Load Balancing traffic distribution checks.

The MIG auto-healing health checks will come into nic0.

The Load Balancing health checks will come into the nic attached to the network associated with the ILB forwarding rule being checked.

Helper Scripts

Helper scripts are included in the scripts directory.

`rolling_update`

Use this script after applying changes to the instance template used by the managed instance group. The helper script performs a [Rolling Update][rolling-update] which replaces each instance in the vpc-link group with a new instance.

`panic`

Triggers a kernel panic on one of the MIG instances. This is intended to exercise the behavior of auto-healing, and impact to application network flows in unplanned situations.

Policy Routing

Linux Policy Routing is configured with the following behavior:

There are two additional routing tables named nic0 and nic1. The tables are identical except for the default route:

Table nic0 uses nic0's gateway as the default route.
Table nic1 uses nic1's gateway as the default route.

Traffic with a source address in the subnet attached to nic1 uses the nic1 routing table. Similarly, traffic with a source address in the subnet attaced to nic0 uses the nic0 table. This source traffic includes ILB addresses. See [requirements][#Requirements].

Policy routing is configured on each instance using the policy-routing.service unit file, which executes /usr/bin/policy-routing.

Startup and Shutdown scripts

The startup script is responsible for enabling ip forwarding, configuring policy routing in the instance, and starting the health check endpoints.

Log into the instance, then run:

sudo DEBUG=1 google_metadata_script_runner --script-type startup --debug

The shutdown script is responsible for signaling the load balancer should take the instance out of rotation. It does this by stopping the hc-traffic service, then sleeping. The sleep is intended to maintain service levels until the load balancer health check takes the instance out of service.

sudo DEBUG=1 google_metadata_script_runner --script-type shutdown --debug

Benchmarking

Due to VPC Network Limits in GCP, the number of link instances in the Managed Instance Group will determine the total bandwidth available between the VPCs. These are different than quotas in that they cannot be changed. To test these limits, metrics are provided using iPerf2.

Item	Limit	Notes
Maximum ingress data rate	Depends on machine type	GCP does not artificially cap VM instance inbound or ingresstraffic. VMs are allowed to receive as much traffic as resources and network conditions allow. For purposes of capacity planning, you should assume that each VM instance can handle no more than 10 Gbps of external Internet traffic. This value is an approximation, is not covered by an SLA, and is subject to change. Adding Alias IP addresses or multiple network interfaces to a VM does not increase its ingress capacity.
Maximum egress data rate	Depends on the machine type of the VM: All shared-core machine types are limited to 1 Gbps. 2 Gbps per vCPU, up to 32 Gbps per VM for machine types that use the Skylake CPU platform with 16 or more vCPUs. This egress rate is also available for ultramem machine types. 2 Gbps per vCPU, up to 16 Gbps per VM for all other machine types with eight or more vCPUs.	Egress traffic is the total outgoing bandwidth shared among all network interfaces of a VM. It includes data transfer to persistent disks connected to the VM.

Tests

Client

iperf -c <INTERNAL IP ADDRESS> -P 100 -t 60 -c Run as client -P Run multiple clients (If it’s run with one client, all traffic will go through 1 VM) -t seconds to run test (Allow a large number to average out results)

Server

iperf -s -s Run as Server

Descriptions

Client VM - A Test VM inside the Service Project in a Local VPC.
Server VM - A Test VM inside the host project inside a Subnet of the Shared VPC
Link VM - A VM used to Route Traffic between VPCs, lives inside the Service Project.
Potential Bandwidth - Limits according to the above chart from the GCP Documentation
Actual bandwidth Output of the tests run by iperf.
- These results are consolidated into a single number if there are multiple clients/servers running at once.

Test 1

3 Link VMs 8vCPU each - Potential Egress 16 Gbps Each 2 Client VMs 16 vCPU each - Multi-Stream Clients 2 Server VMs 16 vCPU Each

Client 1 to Server 1 - 24.4 Gbps Client 2 to Server 2 - 22.4 Gbps Simultaneously

Potential Bandwidth - 48 Gbps Actual Bandwidth - 46.8 Gbps

Test 2

1 Link VM 16vCPU - Potential Egress 32 Gbps 1 Client VM 16vCPU - Multi-Stream Clients 1 Server VM 16vCPU

Potential Bandwidth - 32 Gbps Actual Bandwidth - 30.2 Gbps

Test 3

1 Link VM 8vCPU - Potential Egress 16 Gbps 1 Client VM 16vCPU Multi-Stream Client 1 Server VM 16vCPU

Potential Bandwidth - 16 Gbps Actual Bandwidth - 14.3 Gbps

Test 4

1 Link VM 8vCPU - Potential Egress 16 Gbps 1 Client VM 16 vCPU Single-Stream Client 1 Server VM 16 vCPU

Potential Bandwidth - 16 Gbps Actual Bandwidth - 13.4 Gbps

References

Red Hat Enterprise Linux Network Performance Tuning Guide provides detailed information on tuning network interfaces. It focuses on TCP, which is not relevant to the stateless IP routing nature of the vpc-link router instances, but it is full of useful information, like detecting dropped packets.

terraform-google-multinic's People

Contributors

Stargazers

Watchers

Forkers

jeffmccune holtskinner billyfoss johnkramers techoutcomes

terraform-google-multinic's Issues

Terraform 0.13 compatibility

Document behavior of systemctl restart google-network-daemon

Summary

When executing systemctl restart google-network-daemon there is a ~3 second window where packets are dropped while the secondary network interfaces are reconfigured.

This needs to be documented in the operational notes.

Affected version: 0.4.3-4-g4ccf4f9

Expose forwarding rule load balancing scheme

For Weijen

https://www.terraform.io/docs/providers/google/r/compute_forwarding_rule.html#load_balancing_scheme

Investigate Autoscaling

Discovery

See: Autoscaling groups of instances

Decisions

Use Cloud Monitoring metrics
Use Per-group metrics (beta)
Compute the group's bandwidth utilization.
Target 80% utilization

Metrics of use:

The capacity and current_utilization metrics of the autoscaler might be useful.

capacity - Utilization target multiplied by number of serving VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 180 seconds.

current_utilization - The sum of the utilization of a specified metric for all serving VMs. Sampled every 60 seconds. After sampling, data is not visible for up to 180 seconds.

instance/network/sent_bytes_count - Delta count of bytes sent over the network. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.

l3/internal/ingress_bytes_count - The number of bytes sent from client to ILB backend (for TCP flows it's counting bytes on application stream only). Sampled every 60 seconds. After sampling, data is not visible for up to 150 seconds.
client_network: Network of the client instance in ILB flow.
client_subnetwork: Subnetwork of the client instance in ILB flow.
client_zone: Zone of the client instance in ILB flow.

Terraform 0.14 compatibility

Make systemctl restart google-network-daemon work

Summary

Consider a vpc-link instance which has completed startup and is forwarding packets correctly. When the google-network-daemon service is restarted, the instance no longer forwards packets correctly.

When a client (10.19.16.45) on the local VPC attached to eth1, which pings 10.0.0.6 in the shared VPC, the ICMP reply generates a huge number of packets which are "stuck" in eth0:

On the client (10.19.16.45) instance:

ping 10.0.0.6 -c 1

On the IP router instance, note the ICMP echo reply getting "stuck" in of eth0, when it should be forwarded out eth1.

sudo tcpdump -i eth0 host 10.0.0.6 -n
00:22:27.876266 IP 10.19.16.45 > 10.0.0.6: ICMP echo request, id 25139, seq 1, length 64
00:22:27.881121 IP 10.0.0.6 > 10.19.16.45: ICMP echo reply, id 25139, seq 1, length 64
00:22:27.881126 IP 10.0.0.6 > 10.19.16.45: ICMP echo reply, id 25139, seq 1, length 64
00:22:27.881225 IP 10.0.0.6 > 10.19.16.45: ICMP echo reply, id 25139, seq 1, length 64
00:22:27.881230 IP 10.0.0.6 > 10.19.16.45: ICMP echo reply, id 25139, seq 1, length 64
<huge number of same id packets>
00:22:27.882245 IP 10.0.0.6 > 10.19.16.45: ICMP echo reply, id 25139, seq 1, length 64
00:22:27.882301 IP 10.0.0.6 > 10.19.16.45: ICMP echo reply, id 25139, seq 1, length 64
00:22:27.882325 IP 10.0.17.55 > 10.0.0.6: ICMP time exceeded in-transit, length 92

Root cause

The default via 10.0.3.1 dev eth1 route should exist in table rt1, but does not after dhclient eth1 completes, run by google-network-daemon:

Correct table

# ip route show table rt1
default via 10.0.3.1 dev eth1
10.0.3.0/24 dev eth1 scope link src 10.0.3.55

Incorrect table

The following is the table rt1 after dhclient operates against eth1. Note the missing default route, which causes the packet to go back out eth0.

# ip route show table rt1
10.0.3.0/24 dev eth1 scope link src 10.0.3.55

Next steps

Configure ip route add default via "${gateway}" dev eth1 table rt1 using a dhclient exit hook.
Do not configure route table setup and rules via dhclient exit hooks, otherwise they stack up. Ensure ip rules are only configured once on boot.

/etc/hosts to easily track cases:

10.0.0.6    server-east
10.0.17.55  router-east # eth0
10.0.3.55   router-west # eth1
10.19.16.45 client-west

Verification

Client can ping router-west
Client can ping server-east
Server can ping router-east
Server can ping client-west

Merge 2.x up to main

Remove nic0_cidrs, NIC0_CIDRS, nic1_cidrs, NIC1_CIDRS

The routing policy has been updated to be "like a wire." Traffic coming in nic0 flows out nic1 and vice-versa, the variables and configuration for the cidr ranges are no longer used.

Remove them from the examples input variables to avoid confusion.

Never used:

❯ rg NIC0_CIDRS
modules/50_compute/templates/startup-script-config.tpl
16:NIC0_CIDRS="${nic0_cidrs}"
❯ rg NIC1_CIDRS
modules/50_compute/templates/startup-script-config.tpl
18:NIC1_CIDRS="${nic1_cidrs}"

Make scripts/rolling_update create before deleting

Summary

When executing ./scripts/rolling_update vpc-link-myapp-us-central1-a with num_instances=1, the running instance is deleted at the same time it's replacement is being created. This creates a window in time where packets are dropped as there are no operational vpc-link instances or associated route resources.

Affected version: 0.4.3-4-g4ccf4f9

Replace iptables fwmark with iproute2 rule

Iptables marking is unnecessary and introduces a dependency on netfilter. To eliminate this dependency, switch to using iproute2 rules to implement the "virtual wire"

ip rule add from all iif eth0 lookup viaeth1
ip rule add from all iif eth1 lookup viaeth0

See ip-rule

          iif NAME

                 select the incoming device to match. If the interface
                 is loopback, the rule only matches packets originating
                 from this host. This means that you may create separate
                 routing tables for forwarded and local packets and,
                 hence, completely segregate them.

Resolve race between google-network-daemon and google-startup-scripts

Summary:

The configuration of policy routing is not reliable on boot. A new instance with incorrect policy routing results in packets forwarded out eth0 instead of eth1:

[jmccune@endpoint-vpc-link-shared-vpc-1 ~]$ ping 10.19.16.45
PING 10.19.16.45 (10.19.16.45) 56(84) bytes of data.
From 10.0.17.54 icmp_seq=1 Time to live exceeded
From 10.0.17.54 icmp_seq=2 Time to live exceeded
From 10.0.17.54 icmp_seq=3 Time to live exceeded
From 10.0.17.54 icmp_seq=4 Time to live exceeded
From 10.0.17.54 icmp_seq=5 Time to live exceeded
From 10.0.17.54 icmp_seq=6 Time to live exceeded

The TTL is exceed because a packet sent out eth0 is sent right back into eth0 by the Shared VPC static route.

The incorrect policy routing configuration is likely caused by a race between the google-network-daemon.service and the google-startup-scripts.service:

journalctl -u google-startup-scripts.service

-- Logs begin at Fri 2019-08-23 20:32:29 UTC, end at Fri 2019-08-23 21:29:27 UTC. --
Aug 23 20:32:43 vpc-link-lfwd-us-central1-a-px22 systemd[1]: Starting Google Compute Engine Startup Scripts...
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO Starting startup scripts.
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO Found startup-script in metadata.
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:44 +0000 2019 Debug [1514]: BEGIN: stdlib::cmd() command=[systemctl restart systemd-sysctl.service]
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Debug [1514]: END: stdlib::cmd() command=[systemctl restart systemd-sysctl.service] exit_code=0
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Info [1514]: IP Forwarding enabled via /etc/sysctl.d/50-ip-router.conf
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Debug [1514]: BEGIN: stdlib::cmd() command=[ip route add 10.0.3.0/24 src 10.0.3.55 dev eth1 table rt1]
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Debug [1514]: END: stdlib::cmd() command=[ip route add 10.0.3.0/24 src 10.0.3.55 dev eth1 table rt1] exit_co
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Debug [1514]: BEGIN: stdlib::cmd() command=[ip route add default via 10.0.3.1 dev eth1 table rt1]
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Debug [1514]: END: stdlib::cmd() command=[ip route add default via 10.0.3.1 dev eth1 table rt1] exit_code=0
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Debug [1514]: BEGIN: stdlib::cmd() command=[ip rule add from 10.0.3.55/32 table rt1]
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Debug [1514]: END: stdlib::cmd() command=[ip rule add from 10.0.3.55/32 table rt1] exit_code=0
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Debug [1514]: BEGIN: stdlib::cmd() command=[ip rule add to 10.0.3.55/32 table rt1]
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Debug [1514]: END: stdlib::cmd() command=[ip rule add to 10.0.3.55/32 table rt1] exit_code=0
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Debug [1514]: BEGIN: stdlib::cmd() command=[ip rule add to 10.19.16.0/20 table rt1]
Aug 23 20:32:45 vpc-link-lfwd-us-central1-a-px22 startup-script[1468]: INFO startup-script: Fri Aug 23 20:32:45 +0000 2019 Debug [1514]: END: stdlib::cmd() command=[ip rule add to 10.19.16.0/20 table rt1] exit_code=0

journalctl -u google-network-daemon
-- Logs begin at Fri 2019-08-23 20:32:29 UTC, end at Fri 2019-08-23 21:29:27 UTC. --
Aug 23 20:32:43 vpc-link-lfwd-us-central1-a-px22 systemd[1]: Started Google Compute Engine Network Daemon.
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 google-networking[1462]: INFO Starting Google Networking daemon.
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 network-setup[1462]: INFO Disabling IPv6 on Ethernet interface: ['eth0'].
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 network-setup[1462]: INFO Calling Dhclient for IPv6 configuration on the Ethernet interfaces ['eth0'].
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 dhclient[1519]: Internet Systems Consortium DHCP Client 4.2.5
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 dhclient[1519]: Copyright 2004-2013 Internet Systems Consortium.
Aug 23 20:32:48 vpc-link-lfwd-us-central1-a-px22 google_network_daemon[1462]: Internet Systems Consortium DHCP Client 4.2.5
Aug 23 20:32:48 vpc-link-lfwd-us-central1-a-px22 google_network_daemon[1462]: Copyright 2004-2013 Internet Systems Consortium.
Aug 23 20:32:48 vpc-link-lfwd-us-central1-a-px22 google_network_daemon[1462]: All rights reserved.
Aug 23 20:32:48 vpc-link-lfwd-us-central1-a-px22 google_network_daemon[1462]: For info, please visit https://www.isc.org/software/dhcp/
Aug 23 20:32:48 vpc-link-lfwd-us-central1-a-px22 google_network_daemon[1462]: Listening on Socket/eth0
Aug 23 20:32:48 vpc-link-lfwd-us-central1-a-px22 google_network_daemon[1462]: Sending on   Socket/eth0
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 dhclient[1519]: All rights reserved.
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 dhclient[1519]: For info, please visit https://www.isc.org/software/dhcp/
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 dhclient[1519]: [2B blob data]
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 dhclient[1519]: Listening on Socket/eth0
Aug 23 20:32:44 vpc-link-lfwd-us-central1-a-px22 dhclient[1519]: Sending on   Socket/eth0
Aug 23 20:32:49 vpc-link-lfwd-us-central1-a-px22 network-setup[1462]: WARNING Could not release IPv6 lease on interface ['eth0'].
Aug 23 20:32:49 vpc-link-lfwd-us-central1-a-px22 network-setup[1462]: INFO Ethernet interfaces: ['eth1'].
Aug 23 20:32:49 vpc-link-lfwd-us-central1-a-px22 network-setup[1462]: INFO Created config file for interface eth1.
Aug 23 20:32:49 vpc-link-lfwd-us-central1-a-px22 network-setup[1462]: INFO Enabling the Ethernet interfaces ['eth1'].
Aug 23 20:32:49 vpc-link-lfwd-us-central1-a-px22 dhclient[1699]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 6 (xid=0x66b0146c)
Aug 23 20:32:49 vpc-link-lfwd-us-central1-a-px22 dhclient[1699]: DHCPREQUEST on eth1 to 255.255.255.255 port 67 (xid=0x66b0146c)
Aug 23 20:32:49 vpc-link-lfwd-us-central1-a-px22 dhclient[1699]: DHCPOFFER from 169.254.169.254
Aug 23 20:32:49 vpc-link-lfwd-us-central1-a-px22 dhclient[1699]: DHCPACK from 169.254.169.254 (xid=0x66b0146c)

Next steps:

Ensure policy routing is configured after google-network-daemon completes
Determine if TTL exceeded still occurs after boot. If so, create new issue. If not, record as verified in this issue.

Check the health of IP forwarding

The health check today (0.5.0) does not indicate the health of IP forwarding. To increase the robustness of the health check, an end-to-end test should be performed.

Note there is an official documented example of this for the ECMP NAT gateway solution at Startup script: startup.sh

Remove hardcoded zones

Zone a does not exist in at least two GCP regions, read the list of available zones from remote data.

Support more than 2 VPC networks

Potentially. A use case is needed to inform the design and implementation.

Reduce logspam from the health check services

Noticed this in /var/log/messages

Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:12:58] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.161 - - [14/Sep/2020 10:12:59] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:12:59] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.163 - - [14/Sep/2020 10:13:00] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.165 - - [14/Sep/2020 10:13:00] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.171 - - [14/Sep/2020 10:13:00] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:13:01] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.161 - - [14/Sep/2020 10:13:02] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:13:02] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.163 - - [14/Sep/2020 10:13:03] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.165 - - [14/Sep/2020 10:13:03] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.171 - - [14/Sep/2020 10:13:03] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:13:04] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.161 - - [14/Sep/2020 10:13:05] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:13:05] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.163 - - [14/Sep/2020 10:13:06] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.165 - - [14/Sep/2020 10:13:06] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.171 - - [14/Sep/2020 10:13:06] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:13:07] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.161 - - [14/Sep/2020 10:13:08] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:13:08] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.163 - - [14/Sep/2020 10:13:09] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.165 - - [14/Sep/2020 10:13:09] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.171 - - [14/Sep/2020 10:13:09] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:13:10] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.161 - - [14/Sep/2020 10:13:11] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:13:11] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.163 - - [14/Sep/2020 10:13:12] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.165 - - [14/Sep/2020 10:13:12] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.171 - - [14/Sep/2020 10:13:12] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:13:13] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.161 - - [14/Sep/2020 10:13:14] "GET /status.json HTTP/1.1" 200 -
Sep 14 10:13:39 multinic-a-x9w3 python3[2811]: 130.211.2.173 - - [14/Sep/2020 10:13:14] "GET /status.json HTTP/1.1" 200 -

Align MIG update policy with GKE (maxSurge=1 maxUnavailable=0)

The update policy of max_unavailable_fixed = 1 could result in capacity being reduced during a rolling update, causing disruption. If the autoscaler has a target size of 6, then one instance may be removed while the replacement instance is still in the process of being created.

The default behavior should align with GKE, see Determining your optimal surge configuration

All new node pools are automatically configured to use surge upgrades (maxSurge=1 maxUnavailable=0).

Resolve conflict with google-guest-agent.service

Noticed this:

sudo systemctl restart google-guest-agent.service

Causes an established TCP session to stop working:

23:01:03.463575 IP endpoint-main-general-qtw8.c.multinic-networks-18d1.internal.ssh > 10.36.0.6.53152: Flags [.], seq 16135564:16136972, ack 11089, win 328, options [nop,nop,TS val 3866343826 ecr 3328104240], length 1408
23:01:03.464128 IP endpoint-main-general-qtw8.c.multinic-networks-18d1.internal.ssh > 10.36.0.6.53152: Flags [.], seq 16135564:16136972, ack 11089, win 328, options [nop,nop,TS val 3866343826 ecr 3328104240], length 1408
23:01:03.464152 IP multinic-b-jtwx.c.multinic-networks-18d1.internal > endpoint-main-general-qtw8.c.multinic-networks-18d1.internal: ICMP time exceeded in-transit, length 556

After the restart, the nic1 table isn't correct:

[jeff@multinic-b-jtwx ~]$ sudo ip route show table nic1
10.33.0.1 dev eth0 scope link

It should be:

[jeff@multinic-a-x9w3 ~]$ sudo ip route show table nic1
default via 10.37.0.1 dev eth1
10.33.0.1 dev eth0 scope link
10.37.0.1 dev eth1 scope link

For reference, here's the policy based routing setup:

#! /bin/bash
# These tables manage default routes based on policy.
if ! grep -qx '10 nic0' /etc/iproute2/rt_tables; then
  echo "10 nic0" >> /etc/iproute2/rt_tables
fi
if ! grep -qx '11 nic1' /etc/iproute2/rt_tables; then
  echo "11 nic1" >> /etc/iproute2/rt_tables
fi

## These are essentially the same tables, just different default gateways.
# Traffic addresses attached to the nic0 primary interface
ip route add default via "10.33.0.1" dev eth0 table nic0
ip route add "10.33.0.1" dev eth0 scope link table nic0
ip route add "10.37.0.1" dev eth1 scope link table nic0
# Traffic addresses attached to the nic1 primary interface
ip route add default via "10.37.0.1" dev eth1 table nic1
ip route add "10.33.0.1" dev eth0 scope link table nic1
ip route add "10.37.0.1" dev eth1 scope link table nic1

# NOTE: These route rules are not cleared by dhclient, they persist.
ip rule add from "10.33.0.54" table nic0
ip rule add from "10.37.0.54" table nic1
# ILB IP addresses are expected to be in the nic's subnet.
ip rule add from "10.33.0.54/20" table nic0
ip rule add from "10.37.0.54/20" table nic1
# Firewall marking
iptables -A PREROUTING -i eth0 -t mangle -j MARK --set-mark 1
iptables -A PREROUTING -i eth1 -t mangle -j MARK --set-mark 2
# Packets ingress nic0 egress nic1
ip rule add fwmark 1 table nic1
# Packets ingress nic1 egress nic0
ip rule add fwmark 2 table nic0
# Netblocks via VPC default gateways
ip route flush cache
ip rule

[jeff@multinic-b-jtwx ~]$ sudo ip route show table nic1 | tee before
default via 10.37.0.1 dev eth1
10.33.0.1 dev eth0 scope link
10.37.0.1 dev eth1 scope link
[jeff@multinic-b-jtwx ~]$ sudo systemctl restart google-guest-agent.service
[jeff@multinic-b-jtwx ~]$ sudo ip route show table nic1 | tee after
10.33.0.1 dev eth0 scope link

TODO

Investigate NetworkManager-dispatcher-routing-rules

Summary: Likely not a solution #10 (comment)

Determine if RHEL7 & CentOS7 policy based routing is a solution. This seems like the most likely solution.
~~Determine if NetworkManager-config-routing-rules package is a solution. "then create /etc/sysconfig/network-scripts/route-XXX files where XXX is the interface name."~~

Dig through the guest-agent source

Find it. https://github.com/GoogleCloudPlatform/guest-agent
Dig into it

Add multi-region example

Don't set MIG target_size when autoscaler is enabled

Running Terraform when the autoscaler is enabled forces the instances to scale down, which may have a negative effect.

The target_size documentation notes:

(Optional) The target number of running instances for this managed instance group. This value should always be explicitly set unless this resource is attached to an autoscaler, in which case it should never be set. Defaults to 0.

Error: Invalid prefix for given prefix length

Using Version 2.0.0, Potentially related to:

Dec 23 16:25:59 multinic-us-east1-trxs systemd[1]: Started System Logging Service.
Dec 23 16:25:59 multinic-us-east1-trxs policy-routing[1587]: Error: Invalid prefix for given prefix length.
Dec 23 16:25:59 multinic-us-east1-trxs policy-routing[1587]: Error: Invalid prefix for given prefix length.
Dec 23 16:25:59 multinic-us-east1-trxs policy-routing[1587]: Error: Invalid prefix for given prefix length.
Dec 23 16:25:59 multinic-us-east1-trxs policy-routing[1587]: Error: Invalid prefix for given prefix length.
Dec 23 16:25:59 multinic-us-east1-trxs systemd[1]: Started Configure Policy Routing to behave as a virtual wire.

"This is causing the health checks to fail on new instances."

Investigate destroy error upgrading from 1.4.0 to 2.0.0

Terraform refuses to destroy the instance template, saying it's in use by the instance group manager.

Custom startup scripts

Would it be possible to support a custom startup script that would run after the default routing startup and configuration scripts?

This could be used with the image_project/image_name parameters to use a custom centos-8 based image and configure additional monitoring, security, etc. in the running VMs. In our case, we would like to configure monitoring on the VMs.

Set instance group size to 0 when num_instances=0

In the following situation, instances remain running because the instance group target_size is unmanaged when autoscaling is enabled.

First, manage a region using examples/multiregion with num_instances=1, autoscale=true.

Second, re-run terraform with num_instances=0.

Observe instances remain running:

gcloud compute instance-groups managed list

NAME                LOCATION    SCOPE  BASE_INSTANCE_NAME  SIZE  TARGET_SIZE  INSTANCE_TEMPLATE                      AUTOSCALED
multinic-v3-8ea6b9  us-west1-a  zone   multinic-v3         2     2            multinic-v320210114002715562800000001  yes
multinic-v3-d350bf  us-west1-b  zone   multinic-v3         1     1            multinic-v320210114002715562800000001  yes
multinic-v3-6bc66c  us-west1-c  zone   multinic-v3         1     1            multinic-v320210114002715562800000001  yes
multinic-v3-ccc106  us-west2-c  zone   multinic-v3         1     1            multinic-v320210114002715563700000002  yes
multinic-v3-0585e8  us-west2-b  zone   multinic-v3         1     1            multinic-v320210114002715563700000002  yes
multinic-v3-07ea5d  us-west2-a  zone   multinic-v3         1     1            multinic-v320210114002715563700000002  yes

This happens because Terraform does not manage the instance group target size when autoscale is true. This is intentional, because if Terraform managed the instance groups target size it would fight the autoscaler.

The exception is when the target size should be zero. In this case, Terraform should override the autoscaler and force the instance group to a target size of zero.

# 50_compute/main.tf
  # See https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance_group_manager#target_size
  # This value should always be explicitly set unless this resource is attached
  # to an autoscaler, in which case it should never be set.
  target_size = var.autoscale ? null : var.num_instances

openinfrastructure / terraform-google-multinic Goto Github PK

terraform-google-multinic's Introduction

Multi-nic VM Routing

Upgrades

Getting Started

OS Images

Requirements

Operational Playbook

Behavior

Draining

Planned Maintenance

Unplanned Maintenance

Logging

Health Checks

Helper Scripts

rolling_update

panic

Policy Routing

Startup and Shutdown scripts

Benchmarking

Tests

Client

Server

Descriptions

References

terraform-google-multinic's People

Contributors

Stargazers

Watchers

Forkers

terraform-google-multinic's Issues

Summary

Discovery

Decisions

Metrics of use:

Summary

Root cause

Correct table

Incorrect table

Next steps

Verification

Summary

TODO

Investigate NetworkManager-dispatcher-routing-rules

Dig through the guest-agent source

Recommend Projects

Recommend Topics

Recommend Org

`rolling_update`

`panic`