從地端 Redis server / Redis cluster 遷移到 雲端 ElastiCache 的四種方式

通常redis 是用來做cache ,但是隨著各種應用野蠻而自由的發展,或者說濫用,redis 現在已經快要變成一個存儲everything 的database,比如wordpress 的object-cache plugin 就會把所有的posts 都丟進去,很多的遊戲廠商則有頻繁刷新的board/score ,以及各種遊戲人物的屬性。

對雲端平台而言,以AWS 為例,從地端遷移redis 大概有這樣四種方式:

  1. rdb 文件從地端server dump 出來,上傳到S3,然後再從ElastiCache 服務來調用S3 上的rdb 文件進行恢復。
  2. ElastiCache 提供了在線遷移的方式,雖然官方文件上說支持在EC2 上架設的redis server,但實際上,只要你的網速夠快,你的redis 在哪裡都沒所謂,只要是自建的redis server 都可以,或者是支持CONFIG/SYNC/PSYNC/SLAVEOF 命令的第三方服務商。
  3. RIOT 是redis 官方推出的工具,大概是遷移需求實在是呼聲太高,而總是使用第三方的工具來遷移,似乎顯得redis 官方不作為。
  4. 開源的 redis-shake ,在實時同步方面被廣泛採用。這個工具需要調用源端的 sync/psync 命令,而這些命令在AWS/Azure/GCP 中都是被禁用的,如果源在雲端,則需要開立技術支持工單來啟用他們之後,才可以使用redis-shake。


文中的VPN環境配置為:地端有IPSec-VPN 連結至雲端VPC,但並非使用VPC 內的IPSec-VPN,而是在EC2 上使用StrongSwan ,所以你會看到地端的server 都會通過10.4.4.4 連結至雲端的服務,而雲端作為slave 連結到地端的時候,也會通過10.4.4.4。

下面首先創建一個地端單機版redis server。

首先我們在地端建立一個redis server,插入1Gb 數據,因為redis 之前想要從大家口袋裡面掏錢,搞了個妖蛾子,所以現在很多雲平台已經轉向valkey ,最高的redis 版本停留在v7.2,所以這裡我使用redis v7.0 來作為地端的演示版本。

redis-cli -h 10.1.1.8
10.1.1.8:6379> info
# Server
redis_version:7.0.15

使用Jacky Huang 之前的測試python 腳本來插入數據,隨機生成5k~50k 大小沒有設置TTL 的keys ,總計插入超過1G data size:

python3 -m venv redis-env

source redis-env/bin/activate && pip install redis
Collecting redis
  Downloading redis-7.1.0-py3-none-any.whl.metadata (12 kB)
Downloading redis-7.1.0-py3-none-any.whl (354 kB)
Installing collected packages: redis
Successfully installed redis-7.1.0

(redis-env) Ken@MacBookPro ~ % python3 insert-redis-1g-data.py
...

redis-cli -h 10.1.1.8
info memory
# Memory
used_memory:1154971864
used_memory_human:1.08G

# Keyspace
db0:keys=140286,expires=0,avg_ttl=0

看起來是寫入了 140286 個keys ,這樣源數據就準備好了。

下面詳細介紹如何使用這四種方式進行遷移:

【1】將rdb 文件從地端server dump 出來,上傳到S3,然後再從ElastiCache 服務來調用S3 上的rdb 文件進行恢復。

⇧⇧⇧

首先要生成rdb 文件:

10.1.1.8:6379> BGSAVE
Background saving started
10.1.1.8:6379> LASTSAVE
(integer) 1763975643

將rdb 文件從server 複製到本地,這裡我使用了docker:

docker cp 0a9e3e71607e:/data/dump.rdb .
Successfully copied 233MB to /root/.

因為經過壓縮,1G memory 的資料dump 為rdb 只有200多M,接下來你需要參考這個官方文件,已經寫得很清楚了,但很多人就是不認真看:

https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/backups-seeding-redis.html

Create an Amazon S3 bucket and folder:在你想要創建ElastiCache 的AWS region 創建一個S3 bucket 和folder。

Upload your backup to Amazon S3: 上傳剛才BGSAVE 的dump.rdb 文件到S3 folder。

Grant ElastiCache read access to the .rdb file:這是很重要的一步,你需要修改S3 bucket 的acl 來允許ElastiCache 服務可以讀取這個folder 以及rdb 文件。

Grant ElastiCache read access to the .rdb file in a default Region:對於預設開通的region,請按照這個官方文件來操作,

https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/backups-seeding-redis.html#backups-seeding-redis-default-region

此刻你可能會發現你無法修改bucket 的acl:

This bucket has the bucket owner enforced setting applied for Object Ownership
When bucket owner enforced is applied, use bucket policies to control access. Learn more

點那個bucket owner enforced,啟用acl,然後回到bucket 的acl 設置,點擊Edit,點擊Add grantee,輸入Canoncial ID:

540804c33a284a299d2547575ce1010f2312ef3da9b3a053c8bc45bf233e4353

這個ID 是固定的,是不會變的,不要怕,輸入他,把四個權限打上勾勾,Save。

然後選擇剛才上傳的dump.rdb 文件,在Permissions 頁面點擊Edit,點擊Add grantee,輸入Canoncial ID:

540804c33a284a299d2547575ce1010f2312ef3da9b3a053c8bc45bf233e4353

把三個權限打上勾勾,Save,這樣S3 的設置就完成,接下來到ElastiCache 的控制台,從這個rdb 文件創建新的集群。

https://console.aws.amazon.com/elasticache

選擇創建一個Node-based cluster Redis OSS,並且選擇Restore from backup,選擇other backups,並在下方輸入RDB file S3 location ,例如我剛才創建並上傳的rdb 文件路徑:

test20251124002/dump.rdb

Cluster mode 則是Disabled,因為地端是一台單機,然後選擇redis version,保持為v7.0 ,選擇的機型大小則是可用的memory 需要超過1G,這裡可以選擇cache.t3.medium。

Encryption in transit 可以先取消勾選,因為後面可以再修改為啟用,畢竟地端並沒有加密,如果app 已經改造完成,也是可以選上,最後點擊Create 。

大概10分鐘就創建完成,登進去看看:

redis-cli -h master.t20251124002.t18fhy.apn3.cache.amazonaws.com --tls info
# Server
redis_version:7.0.7

redis-cli -h master.t20251124002.t18fhy.apn3.cache.amazonaws.com --tls info memory
# Memory
used_memory:1162590880
used_memory_human:1.08G

redis-cli -h master.t20251124002.t18fhy.apn3.cache.amazonaws.com --tls info Keyspace
# Keyspace
db0:keys=140286,expires=0,avg_ttl=0

get 幾個key 驗證一下,沒問題,到這裡就完成了。

【2】 ElastiCache 提供了在線遷移的方式,雖然官方文件上說支持在EC2 上架設的redis server,但實際上,只要你的網速夠快,你的redis 在哪裡都沒所謂,原則上只要是自建的redis server 都可以。

⇧⇧⇧

https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/OnlineMigration.html

那我們就廢物利用一下,先清空剛才使用rdb 文件導入的集群:

redis-cli -h master.t20251124002.t18fhy.apn3.cache.amazonaws.com --tls flushdb
OK
Ken@MacBookPro ~ % redis-cli -h master.t20251124002.t18fhy.apn3.cache.amazonaws.com --tls scan 0
1) "0"
2) (empty array)

然後來到ElastiCache webconsole 的集群詳情頁面,點擊Actions,選擇Start migration,在Source endpoint 中輸入我們地端的redis server IP地址,選擇Start migration,因為剛才創建的ElastiCache 集群啟用了Encryption in transit ,這裡將會提示“Migration to in-transit encryption enabled replication group is not supported.”

https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/Migration-Prepare.html

因為,前置條件裡面寫了 “It doesn’t have encryption in-transit enabled.”

所以,需要創建一個新的集群來進行Online migration,或者是把現在這個改成Encryption in transit Disabled。

那麼,我新建了一個集群,來到ElastiCache webconsole 的集群詳情頁面,點擊Actions,選擇Start migration,在Source endpoint 中輸入我們地端的redis server IP地址 10.1.1.8,選擇Start migration。

Start migration之前,地端的redis server :

redis-cli -h 10.1.1.8 info replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:8f92929994997c298a4dc6971bb07f73fb72fe91
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

Start migration之前,雲端的ElastiCache redis:

redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com info Replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.7.1.186,port=6379,state=online,offset=164514,lag=0
master_failover_state:no-failover
master_replid:285c1bb5600aa012a8e56de07aaac1e3472301dd
master_replid2:e7aafc319b13948927b1d7d0ef8d3c5951439315
master_repl_offset:164514
second_repl_offset:138947
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:136280
repl_backlog_histlen:28235

通常在企業環境中可以在機房防火牆和AWS VPC 之間使用IPSec-VPN 來達成彼此之間LAN 互通,比如我的地端和雲端之間有IPSec-VPN 鏈路,所以彼此的LAN 是互通的,相信你也不想直接把你的redis server 開到public IP的某個port 上吧。

當然,如果你沒有IPSec-VPN 的環境,你也可以嘗試使用Client VPN 將地端的redis server 作為客戶端接入VPC,總之就是讓二者之間的網絡穩定又快速。

Start migration之后,地端的redis server ,可以看到雲端已經作為slave 連結過來:

redis-cli -h 10.1.1.8 info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.4.4.4,port=6379,state=online,offset=224,lag=1
master_failover_state:no-failover
master_replid:f9fdefd29e82475434b941db1f6d5b1957b79758
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:224
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:224

Start migration之后,雲端的ElastiCache redis,可以看到他已經變成了slave,其中 master_link_status:up 表示他有正確連結到master node, 然後在從10.1.1.8 sync data。

redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com info Replication
# Replication
role:slave
master_host:10.1.1.8
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_read_repl_offset:224
slave_repl_offset:224
repl_sync_enabled:1
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:1
slave0:ip=10.7.1.186,port=6379,state=wait_bgsave,offset=0,lag=0
master_failover_state:no-failover
master_replid:f9fdefd29e82475434b941db1f6d5b1957b79758
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:224
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:210

很快就完成了,驗證一下,挑選一個隨機key:

redis-cli -h 10.1.1.8 randomkey
"house_821593_41769"

redis-cli -h 10.1.1.8 hgetall house_821593_41769
319) "field_160"
320) "value_60_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com hgetall house_821593_41769
319) "field_160"
320) "value_60_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

看起來沒問題,登進去看一下,和地端的數據指標看起來一致:

redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com info memory              
# Memory
used_memory:1162559880
used_memory_human:1.08G

redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com info keyspace
# Keyspace
db0:keys=140286,expires=0,avg_ttl=0

在當下這個moment,地端和雲端的同步仍然在持續進行,如果在地端寫入一個新的key,他也會被同步到雲端,來測試一下:

redis-cli -h 10.1.1.8 set Japan Beauty            
OK
redis-cli -h 10.1.1.8 set Taiwan Good
OK

redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com get Japan   
"Beauty"
redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com get Taiwan
"Good"

只要你地端自建redis 和雲端ElastiCache redis 之間的網速夠快,通過AWS 官方的在線遷移,是完全沒有問題的。

此時還有最後一件事情要做,cut over。

https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/Migration-Complete.html

這一步動作你可以從webconsole 來完成,也可以通過cli 來完成,需要注意的是,這一步可能需要較長的時間,大約3~5分鐘,不是你點下去就結束了。

當complete the migration 后,我們再看一下地端和雲端的replication 狀態:

redis-cli -h 10.1.1.8 info replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:f9fdefd29e82475434b941db1f6d5b1957b79758
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1841
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1841
redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.7.1.186,port=6379,state=online,offset=4455,lag=0
master_failover_state:no-failover
master_replid:d0eb17787813ecedd3292b550b4daa74d2acc926
master_replid2:f9fdefd29e82475434b941db1f6d5b1957b79758
master_repl_offset:4455
second_repl_offset:1842
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:4441

可以看到地端和雲端的同步關係已經解除,雲端已經變成了master,只有當complete the migration 后,ElastiCache 才會成為master node 提供服務,在這之前,所有的節點都是slave,所以他是需要有停機時間的。

到這裡就完成了使用AWS ElastiCache 官方在線遷移功能,從地端自建redis server 遷移至雲端。

【3】 RIOT:Redis Input/Output Tools,是redis 官方推出的工具,大概是遷移這個需求實在是呼聲太高,而總是使用第三方的工具來遷移,似乎顯得redis 官方不作為。

⇧⇧⇧

前面兩種方式都是不需要獨立的中間人server 來完成的,後面這兩種方式,則是都需要一台獨立的中間人server 來安裝相關的軟體完成遷移,

先把雲端的data 抹掉:

redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com flushdb     
OK
Ken@MacBookPro ~ % redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com scan 0       
1) "0"
2) (empty array)

安裝RIOT,安裝的位置可以在地端的server,也可以在雲端的EC2,還是那句話,只要你網速夠快就可以。

命令很簡單,就一行,第一個是source server,第二個是 target server:

example:riot replicate redis://source redis://target

riot replicate --mode live redis://10.1.1.8:6379 redis://t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com:6379

此時你將會收到一個錯誤:

Keyspace notifications not property configured. Expected notify-keyspace-events 'KEA' but was ''.

參考其官方文件:https://redis.github.io/riot/#_replication

Make sure the source database has keyspace notifications enabled using:
redis.conf: notify-keyspace-events = KEA
CONFIG SET notify-keyspace-events KEA

所以先執行一下CONFIG SET notify-keyspace-events KEA:

redis-cli -h 10.1.1.8 CONFIG SET notify-keyspace-events KEA
OK

再來一次,我想中間那個WARN 應該是advertisement :

riot replicate --mode live redis://10.1.1.8:6379 redis://t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com:6379
Scanning/Listening  ? % [=                              ] 0/? (0:00:00 / ?) ?/s
2025-11-24 20:29:08.191 [main] WARN RedisSupportCheck - ⚠️ Unsupported Redis detected.
Consider upgrading to Redis Cloud for full-featured, scalable Redis with lower TCO.
https://redis.io/cloud/
Scanning/Listening  ? % [  =                                                ] 30500/? (0:01:01 / ?) 500.0/s

此刻查看雲端的ElastiCache data 有在增加喔:

Ken@MacBookPro ~ % redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com info keyspace
# Keyspace
db0:keys=24900,expires=0,avg_ttl=0
Ken@MacBookPro ~ % redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com info keyspace
# Keyspace
db0:keys=25350,expires=0,avg_ttl=0
Ken@MacBookPro ~ % redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com info keyspace
# Keyspace
db0:keys=53586,expires=0,avg_ttl=0
Ken@MacBookPro ~ % redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com info keyspace
# Keyspace
db0:keys=53800,expires=0,avg_ttl=0

直到他看起來似乎停止在140288個keys:

Scanning/Listening  ? % [                                      =           ] 140288/? (0:05:09 / ?) 454.0/s

選擇一個隨機key 驗證一下,看起來沒有問題:

redis-cli -h 10.1.1.8 randomkey            
"house_375502_67309"
redis-cli -h 10.1.1.8 hgetall house_375502_67309
249) "field_125"
250) "value_414_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com hgetall house_375502_67309
249) "field_125"
250) "value_414_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

那麼我們從地端寫入幾個新的key 試試:

redis-cli -h 10.1.1.8 set Japan sabishii
OK
redis-cli -h 10.1.1.8 set Taiwan nightmarket
OK

redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com get Japan    
"sabishii"
redis-cli -h t20251124003.t18fhy.ng.0001.apn3.cache.amazonaws.com get Taiwan
"nightmarket"

完全沒有問題啊,此時還可以注意到RIOT 那邊又跳了兩個字:

Scanning/Listening  ? % [                                        =         ] 140290/? (0:08:59 / ?) 260.3/s

CTRL+c 終止他就好了,到這裡就完成了使用Redis 官方工具RIOT 從地端自建redis server 遷移至雲端。

但是,官方說明:

The live replication mechanism does not guarantee data consistency. Redis sends keyspace notifications over pub/sub which does not provide guaranteed delivery. It is possible that RIOT can miss some notifications in case of network failures for example.

Also, depending on the type, size, and rate of change of data structures on the source it is possible that RIOT cannot keep up with the change stream.

簡單的說,就是不保證數據一致性,因為pub/sub 不是保證一定交付的,該工具可能無法應對頻繁更新的大key,當數據變更的頻率讓RIOT 無法承受,他就丟了,但畢竟是官方工具,使用的人也還是很多。

【4】開源的 redis-shake online migration,這個工具有多種同步方式,其中也包括RIOT 所使用類似的SCAN mode,但更多人愛用的還是sync mode,而sync/psync 這些命令在AWS/Azure/GCP 中都是被禁用的,如果源redis server 在雲端,例如要從AWS 遷移至Azure ,或是從GCP 遷移至AWS, 則需要開立技術支持工單在源端啟用sync/psync 之後,才可以使用redis-shake。

⇧⇧⇧

我們現在是從地端遷移到雲端,並不需要開啟這個命令,因為地端自建redis server 沒有這個限制。首先當然是清空一下:

redis-cli -h t20251125001.ioudks.ng.0001.apne1.cache.amazonaws.com flushdb
OK
Ken@MacBookPro ~ % redis-cli -h t20251125001.ioudks.ng.0001.apne1.cache.amazonaws.com scan 0
1) "0"
2) (empty array)

接下來需要安裝redis-shake,安裝在地端或者雲端都可以,只要你網速夠快,因為這玩意兒沒有官方docker image所以我還要起一台地端server 或是EC2,對於大數據量的遷移,建議開啟一台memory 和CPU配置較高的機型,因為這個工具在遷移的過程中非常耗費資源:

ssh myserver
curl -O https://github.com/tair-opensource/RedisShake/releases/download/v4.4.1/redis-shake-v4.4.1-linux-amd64.tar.gz -L

tar zxf redis-shake-v4.4.1-linux-amd64.tar.gz 

vi shake.toml 

解壓縮了會有example 文件,看你需要,也可以新建一個:

[sync_reader]
address = "10.1.1.8:6379"
[redis_writer]
address = "t20251125001.ioudks.ng.0001.apne1.cache.amazonaws.com:6379" 

從server 上測試一下到兩個redis 的連通性:

nc -v 10.1.1.8 6379
10.1.1.8: inverse host lookup failed: Unknown host
(UNKNOWN) [10.1.1.8] 6379 (redis) open
^C

nc -v t20251125001.ioudks.ng.0001.apne1.cache.amazonaws.com 6379
Warning: inverse host lookup failed for 10.2.10.6: Unknown host
t20251125001-001.ioudks.0001.apne1.cache.amazonaws.com [10.2.10.6] 6379 (redis) open
^C

跑一下看看,可以看到中間人server 已經連結到兩個redis server:

./redis-shake shake.toml
2025-11-25 09:11:07 INF load config from file: shake.toml
2025-11-25 09:11:07 INF log_level: [info], log_file: [/tmp/data/shake.log]
2025-11-25 09:11:07 INF changed work dir. dir=[/tmp/data]
2025-11-25 09:11:07 INF GOMAXPROCS defaults to the value of runtime.NumCPU [8]
2025-11-25 09:11:07 INF not set pprof port
2025-11-25 09:11:07 INF create SyncStandaloneReader
2025-11-25 09:11:07 INF * address: 10.1.1.8:6379
2025-11-25 09:11:07 INF * username: 
2025-11-25 09:11:07 INF * password: 
2025-11-25 09:11:07 INF * tls: false
2025-11-25 09:11:07 INF create RedisStandaloneWriter
2025-11-25 09:11:07 INF * address: t20251125001.ioudks.ng.0001.apne1.cache.amazonaws.com:6379
2025-11-25 09:11:07 INF * username: 
2025-11-25 09:11:07 INF * password: 
2025-11-25 09:11:07 INF * tls: false
2025-11-25 09:11:07 INF start syncing...
2025-11-25 09:11:07 INF [reader_10.1.1.8_6379] source db is not doing bgsave! continue.
2025-11-25 09:11:12 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], receiving rdb, size=[126 MiB/222 MiB]
2025-11-25 09:11:17 INF read_count=[134519], read_ops=[26903.33], write_count=[134518], write_ops=[26903.13], syncing rdb, size=[1.5 MiB/222 MiB]
2025-11-25 09:11:22 INF read_count=[307304], read_ops=[34557.19], write_count=[307303], write_ops=[34557.19], syncing rdb, size=[3.4 MiB/222 MiB]
2025-11-25 09:11:27 INF read_count=[477718], read_ops=[34081.22], write_count=[477717], write_ops=[34081.22], syncing rdb, size=[5.4 MiB/222 MiB]
2025-11-25 09:11:32 INF read_count=[643506], read_ops=[33157.12], write_count=[643505], write_ops=[33157.12], syncing rdb, size=[7.3 MiB/222 MiB]
2025-11-25 09:11:37 INF read_count=[811787], read_ops=[33651.83], write_count=[811786], write_ops=[33651.83], syncing rdb, size=[9.1 MiB/222 MiB]

到源看一下,可以看到中間人server 10.1.1.5 作為slave 連結到10.1.1.8 :

redis-cli -h 10.1.1.8 info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.1.1.5,port=0,state=online,offset=2001,lag=0
master_failover_state:no-failover
master_replid:5149a6a7041bd225b514c3a58d81fe4136043e6d
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:2001
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1918
repl_backlog_histlen:84

到ElastiCache 看一下,似乎沒有什麼變化,因為redis-shake 作為client 往目標中插入數據:

redis-cli -h t20251125001.ioudks.ng.0001.apne1.cache.amazonaws.com info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.3.1.78,port=6379,state=online,offset=265757422,lag=1
master_failover_state:no-failover
master_replid:e7cb8aa566b3ba70c055281ea21fb14334e663d2
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:267641877
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:266578481
repl_backlog_histlen:1063397

大概十幾分鐘後同步完成,這個時間取決於數據量和三方之間的網速。

Principle: redis-shake 將自己模擬成一個連接到master 的slave,master 會將數據發送給 redis-shake,包括全量和增量兩部分。redis-shake 會接收全量和增量數據並暫時存儲在中間人server 上。在全量同步階段,redis-shake 首先將 RDB 文件解析為單個 Redis命令,然後將這些命令發送到目標端,這個階段在中間人server 上是最容易出現問題的,諸如CPU 利用率過高,網路過慢。在增量同步階段,redis-shake 會持續將 本機AOF 數據流同步到目標端。

保障網速和中間人server 的資源!
保障網速和中間人server 的資源!
保障網速和中間人server 的資源!

重要的事情說三遍,因為總是有人不聽。

選擇一個隨機key 驗證一下,看起來沒有問題:

redis-cli -h 10.1.1.8 randomkey            
"house_629971_43548"
redis-cli -h 10.1.1.8 hgetall house_629971_43548
209) "field_105"
210) "value_4392_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -h t20251125001.ioudks.ng.0001.apne1.cache.amazonaws.com hgetall house_629971_43548
209) "field_105"
210) "value_4392_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

此時同步仍在進行,在地端寫入幾個新的key 測試一下:

redis-cli -h 10.1.1.8 set Japan Wasabi
OK
redis-cli -h 10.1.1.8 set Taiwan Bubbletea
OK

redis-cli -h t20251125001.ioudks.ng.0001.apne1.cache.amazonaws.com get Japan    
"Wasabi"
redis-cli -h t20251125001.ioudks.ng.0001.apne1.cache.amazonaws.com get Taiwan
"Bubbletea"

完全沒有問題啊,此時redis-shake 的終端跳了一點點:

2025-11-25 09:26:22 INF read_count=[19662867], read_ops=[0.00], write_count=[19662867], write_ops=[0.00], syncing aof, diff=[0]
2025-11-25 09:26:27 INF read_count=[19662868], read_ops=[0.20], write_count=[19662868], write_ops=[0.20], syncing aof, diff=[0]
2025-11-25 09:26:32 INF read_count=[19662869], read_ops=[0.20], write_count=[19662869], write_ops=[0.20], syncing aof, diff=[0]

最後就是cut over,CTRL+c 終止redis-shake 就可以了。

^C2025-11-25 09:30:52 INF Got signal: interrupt to exit. Press Ctrl+C again to force exit.
2025-11-25 09:30:52 INF all done

此時源redis 將會觀察到slave 斷開:

redis-cli -h 10.1.1.8 info replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:5149a6a7041bd225b514c3a58d81fe4136043e6d
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:3647
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1918
repl_backlog_histlen:1730

到這裡就完成了使用redis-shake 從地端自建redis server 遷移至雲端。

上面講完了單機版地端redis server 如何通過四種方式遷移到雲端,那麼對於地端本來就已經存在的cluster 怎麼遷移呢? cluster 是否有特別的方式或者是需要拆分呢?

⇧⇧⇧

首先創建一個地端的環境,還是使用redis v7.0,先在一台debian 主機上添加多個IP地址:

vi /etc/network/interfaces

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

allow-hotplug ens192
iface ens192 inet static
address 10.1.1.31
netmask 255.255.255.0
gateway 10.1.1.1

auto ens192:1
iface ens192:1 inet static
address 10.1.1.32
netmask 255.255.255.0

auto ens192:2
iface ens192:2 inet static
address 10.1.1.33
netmask 255.255.255.0

auto ens192:3
iface ens192:3 inet static
address 10.1.1.34
netmask 255.255.255.0

auto ens192:4
iface ens192:4 inet static
address 10.1.1.35
netmask 255.255.255.0

auto ens192:5
iface ens192:5 inet static
address 10.1.1.36
netmask 255.255.255.0

使用docker-compose 啟動6個redis server:

services:
  redis-node-1:
    image: redis:7.0
    restart: always
    command: redis-server --port 6379 --bind 10.1.1.31 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --cluster-announce-ip 10.1.1.31
    network_mode: host
    volumes:
      - /data/redis-1:/data

  redis-node-2:
    image: redis:7.0
    restart: always
    command: redis-server --port 6379 --bind 10.1.1.32 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --cluster-announce-ip 10.1.1.32
    network_mode: host
    volumes:
      - /data/redis-2:/data

  redis-node-3:
    image: redis:7.0
    restart: always
    command: redis-server --port 6379 --bind 10.1.1.33 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --cluster-announce-ip 10.1.1.33
    network_mode: host
    volumes:
      - /data/redis-3:/data

  redis-node-4:
    image: redis:7.0
    restart: always
    command: redis-server --port 6379 --bind 10.1.1.34 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --cluster-announce-ip 10.1.1.34
    network_mode: host
    volumes:
      - /data/redis-4:/data

  redis-node-5:
    image: redis:7.0
    restart: always
    command: redis-server --port 6379 --bind 10.1.1.35 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --cluster-announce-ip 10.1.1.35
    network_mode: host
    volumes:
      - /data/redis-5:/data

  redis-node-6:
    image: redis:7.0
    restart: always
    command: redis-server --port 6379 --bind 10.1.1.36 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --cluster-announce-ip 10.1.1.36
    network_mode: host
    volumes:
      - /data/redis-6:/data

創建cluster:

redis-cli --cluster create \
  10.1.1.31:6379 10.1.1.32:6379 10.1.1.33:6379 \
  10.1.1.34:6379 10.1.1.35:6379 10.1.1.36:6379 \
  --cluster-replicas 1
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 10.1.1.35:6379 to 10.1.1.31:6379
Adding replica 10.1.1.36:6379 to 10.1.1.32:6379
Adding replica 10.1.1.34:6379 to 10.1.1.33:6379
M: c6ef420cfa82e31f4d9382378ad906cc21ececa0 10.1.1.31:6379
   slots:[0-5460] (5461 slots) master
M: c07d24e66600b61bb6832ccd71cc176e7fe1532a 10.1.1.32:6379
   slots:[5461-10922] (5462 slots) master
M: 9893a61210a15ad97528c4929dc01a84eb78f34c 10.1.1.33:6379
   slots:[10923-16383] (5461 slots) master
S: ef300a94843cd15a06dbe8b9376b3d236367c28a 10.1.1.34:6379
   replicates 9893a61210a15ad97528c4929dc01a84eb78f34c
S: 2d897556dbd6d883da535a872d364d0d0c24f1d1 10.1.1.35:6379
   replicates c6ef420cfa82e31f4d9382378ad906cc21ececa0
S: 1f143cbe3922b43f97a293697bcfd5d3bde9c7ba 10.1.1.36:6379
   replicates c07d24e66600b61bb6832ccd71cc176e7fe1532a
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.
>>> Performing Cluster Check (using node 10.1.1.31:6379)
M: c6ef420cfa82e31f4d9382378ad906cc21ececa0 10.1.1.31:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: c07d24e66600b61bb6832ccd71cc176e7fe1532a 10.1.1.32:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 2d897556dbd6d883da535a872d364d0d0c24f1d1 10.1.1.35:6379
   slots: (0 slots) slave
   replicates c6ef420cfa82e31f4d9382378ad906cc21ececa0
S: 1f143cbe3922b43f97a293697bcfd5d3bde9c7ba 10.1.1.36:6379
   slots: (0 slots) slave
   replicates c07d24e66600b61bb6832ccd71cc176e7fe1532a
S: ef300a94843cd15a06dbe8b9376b3d236367c28a 10.1.1.34:6379
   slots: (0 slots) slave
   replicates 9893a61210a15ad97528c4929dc01a84eb78f34c
M: 9893a61210a15ad97528c4929dc01a84eb78f34c 10.1.1.33:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

測試一下寫入幾個key:

redis-cli -c -h 10.1.1.31

10.1.1.31:6379> set Japan Happy
-> Redirected to slot [5595] located at 10.1.1.32:6379
OK
10.1.1.32:6379> get Japan
"Happy"
10.1.1.32:6379> set Taiwan Joy
-> Redirected to slot [2038] located at 10.1.1.31:6379
OK
10.1.1.31:6379> get Taiwan
"Joy"

看一下cluster info 和每個節點的角色:

redis-cli -c -h 10.1.1.31 cluster info 
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:581
cluster_stats_messages_pong_sent:599
cluster_stats_messages_sent:1180
cluster_stats_messages_ping_received:594
cluster_stats_messages_pong_received:581
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:1180
total_cluster_links_buffer_limit_exceeded:0


redis-cli -c -h 10.1.1.31 cluster nodes
c6ef420cfa82e31f4d9382378ad906cc21ececa0 10.1.1.31:6379@16379 myself,master - 0 1764121511000 1 connected 0-5460
c07d24e66600b61bb6832ccd71cc176e7fe1532a 10.1.1.32:6379@16379 master - 0 1764121511566 2 connected 5461-10922
2d897556dbd6d883da535a872d364d0d0c24f1d1 10.1.1.35:6379@16379 slave c6ef420cfa82e31f4d9382378ad906cc21ececa0 0 1764121511566 1 connected
1f143cbe3922b43f97a293697bcfd5d3bde9c7ba 10.1.1.36:6379@16379 slave c07d24e66600b61bb6832ccd71cc176e7fe1532a 0 1764121512000 2 connected
ef300a94843cd15a06dbe8b9376b3d236367c28a 10.1.1.34:6379@16379 slave 9893a61210a15ad97528c4929dc01a84eb78f34c 0 1764121512068 3 connected
9893a61210a15ad97528c4929dc01a84eb78f34c 10.1.1.33:6379@16379 master - 0 1764121511064 3 connected 10923-16383

可以看到10.1.1.31/32/33 是master, 10.1.1.34/35/36 是slave

還是使用Jacky Huang 之前的測試python 腳本來插入數據,隨機生成2k~10k 大小沒有設置TTL 的keys ,總計插入超過3G data size:

python3 -m venv redis-env

source redis-env/bin/activate && pip install redis
Collecting redis
  Downloading redis-7.1.0-py3-none-any.whl.metadata (12 kB)
Downloading redis-7.1.0-py3-none-any.whl (354 kB)
Installing collected packages: redis
Successfully installed redis-7.1.0

(redis-env) Ken@MacBookPro ~ % python3 insert-redis-cluster-3g-data.py
...

redis-cli -c -h 10.1.1.31 info memory
used_memory_human:1.11G
redis-cli -c -h 10.1.1.32 info memory
used_memory_human:1.23G
redis-cli -c -h 10.1.1.33 info memory
used_memory_human:1.23G

redis-cli -c -h 10.1.1.31 info keyspace
db0:keys=710691,expires=0,avg_ttl=0
redis-cli -c -h 10.1.1.32 info keyspace
db0:keys=790502,expires=0,avg_ttl=0
redis-cli -c -h 10.1.1.33 info keyspace
db0:keys=788941,expires=0,avg_ttl=0

這樣地端的源數據就準備好了,接下來還是這四種方式:

【5】將rdb 文件從地端server dump 出來,按照順序命名為redis-0001.rdb/redis-0002.rdb/redis-0003.rdb,上傳到S3,然後再從ElastiCache 服務來調用S3 上的rdb 文件進行恢復。

⇧⇧⇧

首先在每一个master node 上执行BGSAVE:

redis-cli -c -h 10.1.1.31 BGSAVE     
Background saving started
redis-cli -c -h 10.1.1.32 BGSAVE
Background saving started
edis-cli -c -h 10.1.1.33 BGSAVE
Background saving started

把他們從地端server 複製到EC2,然後上傳至S3 :

root@debian:/data# ls
redis-1  redis-2  redis-3  redis-4  redis-5  redis-6
root@debian:/data# mv redis-1/dump.rdb redis-0001.rdb
root@debian:/data# mv redis-2/dump.rdb redis-0002.rdb
root@debian:/data# mv redis-3/dump.rdb redis-0003.rdb

ssh myec2-ip
sftp ken@10.1.1.31
Connected to 10.1.1.31.
sftp> ls
redis-0001.rdb  redis-0002.rdb  redis-0003.rdb  
sftp> get *.rdb
Fetching /data/redis-0001.rdb to redis-0001.rdb
redis-0001.rdb                                100%  264MB   5.8MB/s   00:45    
Fetching /data/redis-0002.rdb to redis-0002.rdb
redis-0002.rdb                                100%  293MB   5.3MB/s   00:55    
Fetching /data/redis-0003.rdb to redis-0003.rdb
redis-0003.rdb                                100%  293MB   5.0MB/s   00:58    
sftp> bye


aws s3 cp redis-0001.rdb s3://test20251124002/
upload: ./redis-0001.rdb to s3://test20251124002/redis-0001.rdb     
aws s3 cp redis-0002.rdb s3://test20251124002/
upload: ./redis-0002.rdb to s3://test20251124002/redis-0002.rdb     
aws s3 cp redis-0003.rdb s3://test20251124002/
upload: ./redis-0003.rdb to s3://test20251124002/redis-0003.rdb 

上傳完成後,參考前文,到webconsole 為這三個rdb 文件設置適當的permissions。

此時使用webconsole 就無法達成恢復rdb 的目標了,需要使用aws cli:

首先看一下剛才檢查過的地端 cluster nodes,可以看到,0-5460 slots 在10.1.1.31 – redis-0001.rdb,5461-10922 slots 在10.1.1.32 – redis-0002.rdb,10923-16383 slots 在10.1.1.33 – redis-0003.rdb,需要將每一個node 的slots 分佈,在aws cli 命令中進行配置。

確認一下subnet group的名稱,這裡我使用subnet-group-for-elasticache。

aws elasticache create-replication-group \
    --replication-group-id m3s3-20251126002 \
    --replication-group-description "Imported from on-premises" \
    --engine redis \
    --cache-node-type cache.t3.medium \
    --cache-subnet-group-name subnet-group-for-elasticache \
    --automatic-failover-enabled \
    --multi-az-enabled \
    --node-group-configuration \
        "ReplicaCount=1,Slots=0-5460" \
        "ReplicaCount=1,Slots=5461-10922" \
        "ReplicaCount=1,Slots=10923-16383" \
    --snapshot-arns \
        arn:aws:s3:::test20251124002/redis-0001.rdb \
        arn:aws:s3:::test20251124002/redis-0002.rdb \
        arn:aws:s3:::test20251124002/redis-0003.rdb \
    --region ap-northeast-3

執行結果:

{
    "ReplicationGroup": {
        "ReplicationGroupId": "m3s3-20251126002",
        "Description": "Imported from on-premises",
        "GlobalReplicationGroupInfo": {},
        "Status": "creating",
        "PendingModifiedValues": {},
        "MemberClusters": [
            "m3s3-20251126002-0001-001",
            "m3s3-20251126002-0001-002",
            "m3s3-20251126002-0002-001",
            "m3s3-20251126002-0002-002",
            "m3s3-20251126002-0003-001",
            "m3s3-20251126002-0003-002"
        ],
        "AutomaticFailover": "enabled",
        "MultiAZ": "enabled",
        "SnapshotRetentionLimit": 0,
        "SnapshotWindow": "03:00-04:00",
        "ClusterEnabled": true,
        "CacheNodeType": "cache.t3.medium",
        "TransitEncryptionEnabled": false,
        "AtRestEncryptionEnabled": false,
        "ARN": "arn:aws:elasticache:ap-northeast-3:234234345678:replicationgroup:m3s3-20251126002",
        "LogDeliveryConfigurations": [],
        "ReplicationGroupCreateTime": "2025-11-26T07:18:50.593000+00:00",
        "AutoMinorVersionUpgrade": true,
        "NetworkType": "ipv4",
        "IpDiscovery": "ipv4",
        "ClusterMode": "enabled",
        "Engine": "redis"
    }
}

從webconsole 查看,可以看到cluster 正在創建中,等待他創建完成,隨機取一個key 驗證下:

地端的數據:

redis-cli -c -h 10.1.1.31 randomkey                   
"house_375704_1761851"

redis-cli -c -h 10.1.1.31 hgetall house_375704_1761851
65) "field_33"
66) "value_462_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

雲端的數據看起來完全一致:

redis-cli -c -h m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com hgetall house_375704_1761851
65) "field_33"
66) "value_462_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

ElastiCache 使用域名而不是IP地址連結,endpoint domain 將會解析到所有的節點:

nslookup m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com

Name:    m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com
Address: 10.4.46.180
Name:    m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com
Address: 10.4.140.223
Name:    m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com
Address: 10.4.22.184
Name:    m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com
Address: 10.4.134.234
Name:    m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com
Address: 10.4.35.110
Name:    m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com
Address: 10.4.23.255

看起來沒有問題,到這裡就完成了使用 dump rdb 文件的方式來從地端到雲端恢復一個redis cluster 的過程。

【6】接下來要看看AWS ElastiCache 提供的在線遷移,是不是支持redis cluster 呢?

⇧⇧⇧

首先清空ElastiCache

redis-cli -c -h m3s3-20251126002-0001-001.t18fhy.0001.apn3.cache.amazonaws.com FLUSHALL
redis-cli -c -h m3s3-20251126002-0002-001.t18fhy.0001.apn3.cache.amazonaws.com FLUSHALL
redis-cli -c -h m3s3-20251126002-0003-001.t18fhy.0001.apn3.cache.amazonaws.com FLUSHALL

然後來到ElastiCache webconsole 的cluster 詳情頁面,點擊Actions,選擇Start migration,在Source endpoint 中輸入地端的redis cluster IP地址,選擇Start migration,

開始後,查看一下地端的redis cluster ,可以看到雲端ElastiCache 已經作為slave1 連進來:

redis-cli -h 10.1.1.31 info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.1.1.31,port=6379,state=online,offset=9142,lag=0
slave1:ip=10.4.4.4,port=6379,state=wait_bgsave,offset=0,lag=0
master_failover_state:no-failover
master_replid:2fe6aab233322a786dd282d72e60540dcc2d81ab
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:9142
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:9142

redis-cli -h 10.1.1.32 info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.1.1.31,port=6379,state=online,offset=9156,lag=0
slave1:ip=10.4.4.4,port=6379,state=wait_bgsave,offset=0,lag=0
master_failover_state:no-failover
master_replid:320cf1ec6b27eb5cef506a17a844ded5b76b0334
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:9156
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:9156

redis-cli -h 10.1.1.33 info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.1.1.31,port=6379,state=online,offset=9184,lag=0
slave1:ip=10.4.4.4,port=6379,state=wait_bgsave,offset=0,lag=0
master_failover_state:no-failover
master_replid:f4f2e31d8529a5e413d72bec66fa84431702fdb0
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:9184
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:9184

從AWS webconsole 則可以觀察到Events:

Starting migration operation for target cluster m3s3-20251126002 with source cluster endpoint 10.1.1.31 and port 6379.

從cli 看一下,則可以觀察到,ElastiCache 的三個master node 已經作為slave 分別和 10.1.1.31/32/33 進行數據同步。

redis-cli -c -h m3s3-20251126002-0001-001.t18fhy.0001.apn3.cache.amazonaws.com info replication
# Replication
role:slave
master_host:10.1.1.31
master_port:6379
master_link_status:up
master_last_io_seconds_ago:7
master_sync_in_progress:0
slave_read_repl_offset:9562
slave_repl_offset:9562
repl_sync_enabled:1
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:1
slave0:ip=10.7.1.128,port=6379,state=online,offset=9562,lag=1
master_failover_state:no-failover
master_replid:2fe6aab233322a786dd282d72e60540dcc2d81ab
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:9562
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:9157
repl_backlog_histlen:406


redis-cli -c -h m3s3-20251126002-0002-001.t18fhy.0001.apn3.cache.amazonaws.com info replication
# Replication
role:slave
master_host:10.1.1.32
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_read_repl_offset:9660
slave_repl_offset:9660
repl_sync_enabled:1
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:1
slave0:ip=10.7.0.199,port=6379,state=online,offset=9660,lag=0
master_failover_state:no-failover
master_replid:320cf1ec6b27eb5cef506a17a844ded5b76b0334
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:9660
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:9157
repl_backlog_histlen:504


redis-cli -c -h m3s3-20251126002-0003-001.t18fhy.0001.apn3.cache.amazonaws.com info replication
# Replication
role:slave
master_host:10.1.1.33
master_port:6379
master_link_status:up
master_last_io_seconds_ago:7
master_sync_in_progress:0
slave_read_repl_offset:9702
slave_repl_offset:9702
repl_sync_enabled:1
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:1
slave0:ip=10.7.1.162,port=6379,state=online,offset=9702,lag=1
master_failover_state:no-failover
master_replid:f4f2e31d8529a5e413d72bec66fa84431702fdb0
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:9702
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:9171
repl_backlog_histlen:532

很快就完成了,驗證一下,挑選3個隨機key:

redis-cli -c -h 10.1.1.31 randomkey
"house_266631_415719"

redis-cli -c -h 10.1.1.31 hgetall house_266631_415719
49) "field_25"
50) "value_8049_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -c -h m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com hgetall house_266631_415719
49) "field_25"
50) "value_8049_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"


redis-cli -c -h 10.1.1.32 randomkey
"house_431791_751049"

redis-cli -c -h 10.1.1.31 hgetall house_431791_751049
41) "field_21"
42) "value_3492_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -c -h m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com hgetall house_431791_751049
41) "field_21"
42) "value_3492_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"


redis-cli -c -h 10.1.1.33 randomkey
"house_214409_581622"

redis-cli -c -h 10.1.1.31 hgetall house_214409_581622
51) "field_26"
52) "value_2867_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -c -h m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com hgetall house_214409_581622
51) "field_26"
52) "value_2867_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

看起來沒有問題,當complete the migration 后,就完成了使用AWS ElastiCache 官方在線遷移功能,從地端自建redis cluster 遷移至雲端。

只有當complete the migration 后,ElastiCache 才會成為master node 提供服務,在這之前,所有的六個節點都是slave,所以他是需要有停機時間的。

【7】RIOT 自然是支持redis cluster 的。

⇧⇧⇧

首先來清空ElastiCache:

redis-cli -c -h m3s3-20251126002-0001-001.t18fhy.0001.apn3.cache.amazonaws.com flushall
OK
redis-cli -c -h m3s3-20251126002-0002-001.t18fhy.0001.apn3.cache.amazonaws.com flushall
OK
redis-cli -c -h m3s3-20251126002-0003-001.t18fhy.0001.apn3.cache.amazonaws.com flushall
OK

然後設置:CONFIG SET notify-keyspace-events KEA

redis-cli -h 10.1.1.31 -p 6379 CONFIG SET notify-keyspace-events KEA
OK
redis-cli -h 10.1.1.32 -p 6379 CONFIG SET notify-keyspace-events KEA
OK
redis-cli -h 10.1.1.33 -p 6379 CONFIG SET notify-keyspace-events KEA
OK

請參考官方文件 ,對於cluster 來說,需要明確source 和target 的類型,Just Do it,中間那個WARN 一定是廣告

riot replicate --source-cluster --target-cluster --mode live redis://10.1.1.31:6379 redis://m3s3-20251126002-0001-001.t18fhy.0001.apn3.cache.amazonaws.com:6379

2025-11-26 16:57:09.323 [main] WARN RedisSupportCheck - ⚠️ Unsupported Redis detected.
Consider upgrading to Redis Cloud for full-featured, scalable Redis with lower TCO.
https://redis.io/cloud/
Scanning/Listening  ? % [             =                   ] 2293513/? (1:14:59 / ?) 509.8/s

等到2293513 完全停止了跳动,CTRL+c 終止同步,來隨機挑選3個keys 看看:

redis-cli -c -h 10.1.1.31 randomkey
"house_963499_1605766"

redis-cli -c -h 10.1.1.31 hgetall house_963499_1605766
49) "field_25"
50) "value_4519_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -c -h m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com hgetall house_963499_1605766
49) "field_25"
50) "value_4519_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -c -h 10.1.1.32 randomkey
"house_916075_108747"

redis-cli -c -h 10.1.1.31 hgetall house_916075_108747
39) "field_20"
40) "value_4597_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -c -h m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com hgetall house_916075_108747
39) "field_20"
40) "value_4597_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -c -h 10.1.1.33 randomkey
"house_284171_1158349"

redis-cli -c -h 10.1.1.31 hgetall house_284171_1158349
47) "field_24"
48) "value_2640_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -c -h m3s3-20251126002.t18fhy.clustercfg.apn3.cache.amazonaws.com hgetall house_284171_1158349
47) "field_24"
48) "value_2640_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

看起來沒有問題,當CTRL+c 后,RIOT 才會停止同步,這樣就完成了使用Redis 官方工具來進行在線遷移,從地端自建redis cluster 遷移至雲端。

【8】開源的 redis-shake 。

⇧⇧⇧

安裝redis-shake,並配置他:

curl -O https://github.com/tair-opensource/RedisShake/releases/download/v4.4.1/redis-shake-v4.4.1-linux-amd64.tar.gz -L
tar zxf redis-shake-v4.4.1-linux-amd64.tar.gz 

vi shake.toml

解壓縮了會有example 文件,看你需要,也可以新建一個,和前面不一樣的是,這裡我加入了cluster = true :

[sync_reader]
address = "10.1.1.31:6379"
cluster = true
[redis_writer]
address = "t20251201.ioudks.clustercfg.apne1.cache.amazonaws.com:6379" 
cluster = true

從中間人 server 上測試一下到兩個redis 的連通性:

nc -v 10.1.1.31 6379
Ncat: Version 7.93 ( https://nmap.org/ncat )
Ncat: Connected to 10.1.1.31:6379.

nc -v t20251201.ioudks.clustercfg.apne1.cache.amazonaws.com 6379
Ncat: Version 7.93 ( https://nmap.org/ncat )
Ncat: Connected to 10.2.10.9:6379.

跑一下看看,可以看到中間人server 已經連結到兩個redis cluster 並開始同步data:

./redis-shake shake.toml
2025-12-03 10:04:30 INF load config from file: shake.toml
2025-12-03 10:04:30 INF log_level: [info], log_file: [/home/ec2-user/data/shake.log]
2025-12-03 10:04:30 INF changed work dir. dir=[/home/ec2-user/data]
2025-12-03 10:04:30 INF GOMAXPROCS defaults to the value of runtime.NumCPU [2]
2025-12-03 10:04:30 INF not set pprof port
2025-12-03 10:04:30 INF create SyncClusterReader
2025-12-03 10:04:30 INF * address (should be the address of one node in the Redis cluster): 10.1.1.31:6379
2025-12-03 10:04:30 INF * username: 
2025-12-03 10:04:30 INF * password: 
2025-12-03 10:04:30 INF * tls: false
2025-12-03 10:04:30 INF address=10.1.1.31:6379, reply=c6ef420cfa82e31f4d9382378ad906cc21ececa0 10.1.1.31:6379@16379 myself,master - 0 1764756268000 1 connected 0-5460
c07d24e66600b61bb6832ccd71cc176e7fe1532a 10.1.1.32:6379@16379 master - 0 1764756269310 8 connected 5461-10922
ef300a94843cd15a06dbe8b9376b3d236367c28a 10.1.1.34:6379@16379 slave 9893a61210a15ad97528c4929dc01a84eb78f34c 0 1764756270315 3 connected
9893a61210a15ad97528c4929dc01a84eb78f34c 10.1.1.33:6379@16379 master - 0 1764756269000 3 connected 10923-16383
2d897556dbd6d883da535a872d364d0d0c24f1d1 10.1.1.35:6379@16379 slave c6ef420cfa82e31f4d9382378ad906cc21ececa0 0 1764756269000 1 connected
1f143cbe3922b43f97a293697bcfd5d3bde9c7ba 10.1.1.36:6379@16379 slave c07d24e66600b61bb6832ccd71cc176e7fe1532a 0 1764756268507 8 connected
2025-12-03 10:04:31 INF create RedisClusterWriter
2025-12-03 10:04:31 INF * address (should be the address of one node in the Redis cluster): t20251201.ioudks.clustercfg.apne1.cache.amazonaws.com:6379
2025-12-03 10:04:31 INF * username: 
2025-12-03 10:04:31 INF * password: 
2025-12-03 10:04:31 INF * tls: false
2025-12-03 10:04:31 INF address=t20251201.ioudks.clustercfg.apne1.cache.amazonaws.com:6379, reply=2285cf3fccd4b4f06b3527b336df979e2b235c85 10.2.10.153:6379@1122 myself,slave fe207aa28dd47b9b56fb224450ad152fae632527 0 1764756270000 0 connected
4b8c8333f0bdef2acc84869166a82e8cb81e828a 10.2.10.172:6379@1122 master - 0 1764756268000 2 connected 10923-16383
546af11d185ea9f64aa12af138b0e413532477ed 10.2.10.169:6379@1122 slave fbb9018ab8a1e7774ea8f947518c45f56057cd2b 0 1764756269000 3 connected
fe207aa28dd47b9b56fb224450ad152fae632527 10.2.10.13:6379@1122 master - 0 1764756270923 0 connected 0-5461
fbb9018ab8a1e7774ea8f947518c45f56057cd2b 10.2.10.9:6379@1122 master - 0 1764756269000 3 connected 5462-10922
49f51c2e56f8c516a77064d26527daae9dabd821 10.2.10.150:6379@1122 slave 4b8c8333f0bdef2acc84869166a82e8cb81e828a 0 1764756269914 2 connected
2025-12-03 10:04:31 INF redisClusterWriter connected to redis cluster successful. addresses=[10.2.10.172:6379 10.2.10.13:6379 10.2.10.9:6379]
2025-12-03 10:04:31 INF start syncing...
2025-12-03 10:04:31 INF [reader_10.1.1.31_6379] source db is not doing bgsave! continue.
2025-12-03 10:04:31 INF [reader_10.1.1.32_6379] source db is not doing bgsave! continue.
2025-12-03 10:04:31 INF [reader_10.1.1.33_6379] source db is not doing bgsave! continue.
2025-12-03 10:04:36 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, waiting bgsave
2025-12-03 10:04:41 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave
2025-12-03 10:04:46 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave
2025-12-03 10:04:51 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, receiving rdb, size=[56 MiB/293 MiB]
2025-12-03 10:04:56 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave
2025-12-03 10:05:01 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave
2025-12-03 10:05:06 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, receiving rdb, size=[88 MiB/293 MiB]
2025-12-03 10:05:16 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave
2025-12-03 10:05:21 INF read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, receiving rdb, size=[120 MiB/293 MiB]
log太多了這是省略符
2025-12-03 10:07:16 INF read_count=[3189549], read_ops=[232994.48], write_count=[3189548], write_ops=[232994.48], src-0, receiving rdb, size=[204 MiB/264 MiB]
2025-12-03 10:10:21 INF read_count=[44218279], read_ops=[43355.58], write_count=[44218277], write_ops=[43355.78], src-1, syncing rdb, size=[243 MiB/293 MiB]
2025-12-03 10:10:26 INF read_count=[44421017], read_ops=[40272.06], write_count=[44421016], write_ops=[40272.25], src-2, syncing rdb, size=[180 MiB/293 MiB]
2025-12-03 10:18:01 INF read_count=[64151444], read_ops=[41212.64], write_count=[64151444], write_ops=[41213.04], src-0, syncing rdb, size=[261 MiB/264 MiB]
2025-12-03 10:18:06 INF read_count=[64362597], read_ops=[41902.90], write_count=[64362597], write_ops=[41902.90], src-1, syncing aof, diff=[0]
2025-12-03 10:18:11 INF read_count=[64566606], read_ops=[41119.00], write_count=[64566605], write_ops=[41118.80], src-2, syncing rdb, size=[285 MiB/293 MiB]
2025-12-03 10:18:16 INF read_count=[64768590], read_ops=[40073.36], write_count=[64768590], write_ops=[40073.56], src-0, syncing aof, diff=[0]
2025-12-03 10:18:21 INF read_count=[64961378], read_ops=[38889.17], write_count=[64961377], write_ops=[38888.97], src-1, syncing aof, diff=[0]
2025-12-03 10:18:26 INF read_count=[65125571], read_ops=[32962.70], write_count=[65125571], write_ops=[32962.90], src-2, syncing aof, diff=[0]
2025-12-03 10:18:31 INF read_count=[65125571], read_ops=[0.00], write_count=[65125571], write_ops=[0.00], src-0, syncing aof, diff=[0]
2025-12-03 10:18:36 INF read_count=[65125571], read_ops=[0.00], write_count=[65125571], write_ops=[0.00], src-1, syncing aof, diff=[0]
2025-12-03 10:18:41 INF read_count=[65125571], read_ops=[0.00], write_count=[65125571], write_ops=[0.00], src-2, syncing aof, diff=[0]

登錄一個節點,可以看到keys 仍在增加:

t20251201.ioudks.clustercfg.apne1.cache.amazonaws.com:6379> info keyspace
# Keyspace
db0:keys=583406,expires=0,avg_ttl=0
t20251201.ioudks.clustercfg.apne1.cache.amazonaws.com:6379> info keyspace
# Keyspace
db0:keys=583785,expires=0,avg_ttl=0
...
t20251201.ioudks.clustercfg.apne1.cache.amazonaws.com:6379> info keyspace
# Keyspace
db0:keys=732815,expires=0,avg_ttl=0
t20251201.ioudks.clustercfg.apne1.cache.amazonaws.com:6379> info keyspace
# Keyspace
db0:keys=733346,expires=0,avg_ttl=0
...
t20251201.ioudks.clustercfg.apne1.cache.amazonaws.com:6379> info keyspace
# Keyspace
db0:keys=790048,expires=0,avg_ttl=0
t20251201.ioudks.clustercfg.apne1.cache.amazonaws.com:6379> info keyspace
# Keyspace
db0:keys=790048,expires=0,avg_ttl=0

在進度條靜止,write_ops=[0.00] 的時候就進入了持續同步的狀態,此時在地端寫入新的key 雲端就會出現,在前文已經驗證過,不再贅述。

選擇一個隨機key 驗證一下,看起來沒有問題:

redis-cli -c -h 10.1.1.31 randomkey  
"house_862255_977410"
redis-cli -h 10.1.1.31 hgetall house_862255_977410
39) "field_20"
40) "value_5799_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

redis-cli -c -h t20251201.ioudks.clustercfg.apne1.cache.amazonaws.com hgetall house_862255_977410
39) "field_20"
40) "value_5799_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

最後CTRL+c 終止redis-shake 就可以了,到這裡就完成了使用redis-shake 從地端redis cluster 遷移至雲端。

這四種方式,各有特點,比較總結如下:

⇧⇧⇧

  1. RDB文件遷移(S3恢復)

優點:操作簡單,適合一次性遷移 – 不需要持續網絡連接 – 數據完整性高

缺點: 需要停機時間 – 需要配置S3權限(ACL設置略複雜)

  1. AWS ElastiCache在線遷移

優點: AWS官方支持,穩定可靠 – 操作界面友好

缺點:需要切換時間,不是HA – 不支持加密傳輸的集群 – 地端Redis可被雲端訪問,需要穩定的網絡連接(建議VPN)

  1. RIOT官方工具

優點: Redis官方工具,權威性高 – 支持實時同步 – 跨平台支持

缺點: 不保證數據一致性(基於pub/sub) – 可能遺漏高頻更新的數據 – 需要啟用keyspace notifications

  1. Redis-shake

優點: 基於PSYNC,數據一致性高 – 分批傳輸,適合大數據量 – 支持多種同步模式 – 實時同步能力強

缺點: 需要獨立中間人服務器,數據量大時需要保障足夠的資源。

綜上所述:

成功複製的關鍵因素:

  • 網絡穩定性和頻寬/帶寬。
  • 中間人服務器資源配置。
  • 數據量大小和更新頻率。
  • 業務停機容忍度,完全停機,或是需要時間進行切換。

額外加成:

  • 地端低版本可以順手升級為高版本。
  • 地端沒有TLS 的環境可以在遷移後啟用TLS。

那麼,AWS 在線遷移工具,對於已經開啟了psync 的第三方雲端redis server 是否可行呢?

答案:不行,因為該工具會調用CONFIG 命令,開啟sync/psync 並不會解禁CONFIG。