AWS VPC 网络中基于 keepalived 的主备切换
雲端並沒有什麼場景一定需要用keepalived,只不過是一些傳統企業地端思維揮之不去,當然,ELB 要花錢,省錢也是重要的因素,特別是業務沒有那麼重要,卻就是需要HA。
但是,AWS VPC 並不支持VRRP multicast,這讓很多在地端運行的商業 Firewall/LB 廠商上雲之後都要做修改,比如F5/Radware/A10/深信服 等等,然而,並沒有官方文件說明AWS VPC 支持或是不支持,官方blog 文章在某種程度上可以視作你可以自己做,但是我們不會為你進行官方的技術支持,可就是有人看不懂。
當然,官方blog 語焉不詳也是一個問題,SA們寫出許多不在官方支持範圍內但又可以在AWS 平台上運行的blog,但他們又沒有講得很清楚,你不去理解blog 而是完全照著去做,是會失敗的。
首先創建兩台ec2,基於 Amazon Linux 2,
安裝必要的組件
sudo yum install keepalived jq -y
從webconsole 修改 IMDSv2 為 Optional
======
ec2 10.4.133.196 is node-1 MASTER
keepalived.conf
global_defs {
router_id node-1
}
vrrp_script health_check {
script /etc/keepalived/health-check.sh
interval 2
}
vrrp_instance VI_1 {
state MASTER
debug 2
interface eth0
virtual_router_id 10
priority 100
advert_int 1
unicast_peer {
10.4.129.250
}
track_script {
health_check
}
notify_master /etc/keepalived/i-am-master.sh
}
======
ec2 10.4.129.250 is node-2 SLAVE
keepalived.conf
global_defs {
router_id node-2
}
vrrp_script health_check {
script /etc/keepalived/health-check.sh
interval 2
}
vrrp_instance VI_1 {
state BACKUP
debug 2
interface eth0
virtual_router_id 10
priority 50
advert_int 1
unicast_peer {
10.4.133.196
}
track_script {
health_check
}
notify_master /etc/keepalived/i-am-master.sh
}
從webconsole 添加一個新的 eni-032a8bec5486841fd , and assign static ip 10.4.128.9
添加一個健康檢測試腳本 /etc/keepalived/health-check.sh
------
#!/bin/bash
exit 0 # Always healthy
------
keepalived 切換時調用的eni attach/detach 腳本:/etc/keepalived/i-am-master.sh
------
#!/bin/bash
ENI=eni-032a8bec5486841fd
METADATA=http://169.254.169.254/latest/meta-data/instance-id
export AWS_DEFAULT_REGION=ap-northeast-3
# get ENI attachment information
attach=$(aws ec2 describe-network-interface-attribute --network-interface-id $ENI --attribute attachment --output json)
# check if ENI has already been attached to this instance
inst=$(curl -qs $METADATA)
if echo "$attach" | jq -e '.Attachment' >/dev/null 2>&1; then
attachedInst=$(echo "$attach" | jq -r ".Attachment.InstanceId")
if [ "$inst" = "$attachedInst" ]; then
exit 0
fi
# get attachment ID and detach it
id=$(echo "$attach" | jq -r ".Attachment.AttachmentId")
if [[ $id == eni-attach-* ]]; then
aws ec2 detach-network-interface --attachment-id $id
# Wait for detachment to complete
echo "Waiting for ENI detachment..."
sleep 10
# Wait until ENI is actually detached
while aws ec2 describe-network-interfaces --network-interface-ids $ENI --query 'NetworkInterfaces[0].Status' --output text | grep -q "in-use"; d
o
echo "Still detaching..."
sleep 2
done
fi
fi
# attach to this instance
aws ec2 attach-network-interface --network-interface-id $ENI --instance-id $inst --device-index 1
------
創建一個 iam role ,這裡我們叫他 keepalived-eni ,添加下面的Trusted entities 和Permissions policies 並附加到兩台測試用例:
------Trusted entities
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
------
------Permissions policies
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeNetworkInterfaceAttribute",
"ec2:AttachNetworkInterface",
"ec2:DetachNetworkInterface"
],
"Resource": "*"
}
]
}
------
從ec2 內測試確認,這裡我們看到可以獲取到角色:
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
Keepalived-ENI
[root@ip-10-4-133-196 keepalived]# aws sts get-caller-identity
{
"Account": "123454567890",
"UserId": "AGHTAQ3DUCTSWKIOWBGTI:i-0a5aacedc82d1f580",
"Arn": "arn:aws:sts::123454567890:assumed-role/Keepalived-ENI/i-0a5aacedc82d1f580"
}
=======開始測試!
attach eni to keepalived-a and
systemctl enable keepalived
systemctl start keepalived
systemctl status keepalived
systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2025-11-22 08:43:32 UTC; 13s ago
Process: 2593 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 2594 (keepalived)
login to keepalived-b and
systemctl enable keepalived
systemctl start keepalived
systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2025-11-22 08:44:45 UTC; 83ms ago
Process: 2370 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 2371 (keepalived)
======check slave======
sudo journalctl -u keepalived -f
Nov 22 08:43:37 ip-10-4-129-250.ap-northeast-3.compute.internal Keepalived_vrrp[2597]: Opening script file /etc/keepalived/i-am-master.sh
Nov 22 08:47:13 ip-10-4-129-250.ap-northeast-3.compute.internal Keepalived_vrrp[2597]: VRRP_Instance(VI_1) Received advert with higher priority 100, ours 50
Nov 22 08:47:13 ip-10-4-129-250.ap-northeast-3.compute.internal Keepalived_vrrp[2597]: VRRP_Instance(VI_1) Entering BACKUP STATE
ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.4.129.250 netmask 255.255.240.0 broadcast 10.4.143.255
inet6 fe80::ce5:7fff:fedd:e065 prefixlen 64 scopeid 0x20<link>
ether 0e:e5:7f:dd:e0:65 txqueuelen 1000 (Ethernet)
RX packets 2000 bytes 3503391 (3.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1653 bytes 213478 (208.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
======check master======
Nov 22 08:47:12 ip-10-4-133-196.ap-northeast-3.compute.internal Keepalived_healthcheckers[2195]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 22 08:47:13 ip-10-4-133-196.ap-northeast-3.compute.internal Keepalived_vrrp[2196]: VRRP_Instance(VI_1) Transition to MASTER STATE
Nov 22 08:47:14 ip-10-4-133-196.ap-northeast-3.compute.internal Keepalived_vrrp[2196]: VRRP_Instance(VI_1) Entering MASTER STATE
Nov 22 08:47:14 ip-10-4-133-196.ap-northeast-3.compute.internal Keepalived_vrrp[2196]: Opening script file /etc/keepalived/i-am-master.sh
ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.4.133.196 netmask 255.255.240.0 broadcast 10.4.143.255
inet6 fe80::c10:b2ff:febb:7d53 prefixlen 64 scopeid 0x20<link>
ether 0e:10:b2:bb:7d:53 txqueuelen 1000 (Ethernet)
RX packets 1294 bytes 3447754 (3.2 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1031 bytes 141789 (138.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.4.128.9 netmask 255.255.240.0 broadcast 10.4.143.255
inet6 fe80::c61:7eff:fe76:2c57 prefixlen 64 scopeid 0x20<link>
ether 0e:61:7e:76:2c:57 txqueuelen 1000 (Ethernet)
RX packets 81 bytes 7072 (6.9 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 90 bytes 9050 (8.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
======failover======
reboot 10.4.133.196
Nov 22 09:10:52 ip-10-4-129-250.ap-northeast-3.compute.internal Keepalived_vrrp[3334]: VRRP_Instance(VI_1) Entering BACKUP STATE
Nov 22 09:14:20 ip-10-4-129-250.ap-northeast-3.compute.internal Keepalived_vrrp[3334]: VRRP_Instance(VI_1) Transition to MASTER STATE
Nov 22 09:14:21 ip-10-4-129-250.ap-northeast-3.compute.internal Keepalived_vrrp[3334]: VRRP_Instance(VI_1) Entering MASTER STATE
Nov 22 09:14:21 ip-10-4-129-250.ap-northeast-3.compute.internal Keepalived_vrrp[3334]: Opening script file /etc/keepalived/i-am-master.sh
eni switched to 10.4.129.250
ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.4.129.250 netmask 255.255.240.0 broadcast 10.4.143.255
inet6 fe80::ce5:7fff:fedd:e065 prefixlen 64 scopeid 0x20<link>
ether 0e:e5:7f:dd:e0:65 txqueuelen 1000 (Ethernet)
RX packets 5150 bytes 3704335 (3.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2458 bytes 391996 (382.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.4.128.9 netmask 255.255.240.0 broadcast 10.4.143.255
inet6 fe80::c61:7eff:fe76:2c57 prefixlen 64 scopeid 0x20<link>
ether 0e:61:7e:76:2c:57 txqueuelen 1000 (Ethernet)
RX packets 12 bytes 1444 (1.4 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 18 bytes 2177 (2.1 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
when 10.4.133.196 back online , the eni switched back
Nov 22 09:15:08 ip-10-4-133-196.ap-northeast-3.compute.internal systemd[1]: Started LVS and VRRP High Availability Monitor.
Nov 22 09:15:08 ip-10-4-133-196.ap-northeast-3.compute.internal Keepalived_healthcheckers[2438]: Opening file '/etc/keepalived/keepalived.conf'.
Nov 22 09:15:08 ip-10-4-133-196.ap-northeast-3.compute.internal Keepalived_vrrp[2439]: VRRP_Instance(VI_1) Transition to MASTER STATE
Nov 22 09:15:09 ip-10-4-133-196.ap-northeast-3.compute.internal Keepalived_vrrp[2439]: VRRP_Instance(VI_1) Entering MASTER STATE
Nov 22 09:15:09 ip-10-4-133-196.ap-northeast-3.compute.internal Keepalived_vrrp[2439]: Opening script file /etc/keepalived/i-am-master.sh
ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.4.133.196 netmask 255.255.240.0 broadcast 10.4.143.255
inet6 fe80::c10:b2ff:febb:7d53 prefixlen 64 scopeid 0x20<link>
ether 0e:10:b2:bb:7d:53 txqueuelen 1000 (Ethernet)
RX packets 1589 bytes 3497704 (3.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1264 bytes 170077 (166.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.4.128.9 netmask 255.255.240.0 broadcast 10.4.143.255
inet6 fe80::c61:7eff:fe76:2c57 prefixlen 64 scopeid 0x20<link>
ether 0e:61:7e:76:2c:57 txqueuelen 1000 (Ethernet)
RX packets 81 bytes 7072 (6.9 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 90 bytes 9050 (8.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
======ping
Sat Nov 22 17:25:04 TST 2025: 64 bytes from 10.4.128.9: icmp_seq=263 ttl=253 time=49.366 ms
Sat Nov 22 17:25:05 TST 2025: 64 bytes from 10.4.128.9: icmp_seq=264 ttl=253 time=52.408 ms
Sat Nov 22 17:25:07 TST 2025: Request timeout for icmp_seq 265
Sat Nov 22 17:25:08 TST 2025: Request timeout for icmp_seq 266
Sat Nov 22 17:25:09 TST 2025: Request timeout for icmp_seq 267
Sat Nov 22 17:25:10 TST 2025: Request timeout for icmp_seq 268
Sat Nov 22 17:25:11 TST 2025: Request timeout for icmp_seq 269
Sat Nov 22 17:25:12 TST 2025: Request timeout for icmp_seq 270
Sat Nov 22 17:25:13 TST 2025: Request timeout for icmp_seq 271
Sat Nov 22 17:25:14 TST 2025: Request timeout for icmp_seq 272
Sat Nov 22 17:25:15 TST 2025: Request timeout for icmp_seq 273
Sat Nov 22 17:25:16 TST 2025: Request timeout for icmp_seq 274
Sat Nov 22 17:25:17 TST 2025: Request timeout for icmp_seq 275
Sat Nov 22 17:25:18 TST 2025: Request timeout for icmp_seq 276
Sat Nov 22 17:25:19 TST 2025: Request timeout for icmp_seq 277
Sat Nov 22 17:25:20 TST 2025: Request timeout for icmp_seq 278
Sat Nov 22 17:25:21 TST 2025: Request timeout for icmp_seq 279
Sat Nov 22 17:25:22 TST 2025: Request timeout for icmp_seq 280
Sat Nov 22 17:25:23 TST 2025: Request timeout for icmp_seq 281
Sat Nov 22 17:25:24 TST 2025: Request timeout for icmp_seq 282
Sat Nov 22 17:25:24 TST 2025: 64 bytes from 10.4.128.9: icmp_seq=283 ttl=253 time=47.442 ms
Sat Nov 22 17:25:25 TST 2025: 64 bytes from 10.4.128.9: icmp_seq=284 ttl=253 time=50.554 ms
參考文件:https://aws.amazon.com/cn/blogs/china/routing-redundancy-solution-in-aws-vpc-network/



