如何使用 Terraform 來完成WordPress Spot instance 的滾動更新

使用AWS Spot instance 的朋友都知道,他很便宜,有些機型可以便宜90%。

他有兩種運行模式,一種是One time 一種是Persist,在AWS 容量充足的情況下,使用Persist 方式可以在很長時間內維持一個低價位的運行水平,但,問題是,AWS 的容量是動態調整的,比如您使用c5.large 的persist spot instance ,當有人啟用了更多的c5.large On-demand 容量,您使用的Spot instance 就會被強制釋放掉,而這段不可用的時間可能持續數分鐘到數小時,完全無法預估。這時候只能手動製作一個AMI,然後從AMI 啟動一個新的不同的instance type ,通常可用。

那麼,如何做到無需人工干預,讓Spot instance 在不同的instance type 之間自動續命不停歇呢?

當有人啟用了更多的On-demand 容量時,他總不可能把所有的instance type 都用盡吧,根據這個思路,我們可以使用Spot Fleet Request,在這個fleet 中設置幾個不同的機型。

那麼選用那種類型的instance 呢?您一定想當然的認為 t3/t4 會比較便宜,然而在Spot instance ,並不是這樣,是使用的人越少,他越便宜,下面以Osaka region 的部分ARM64 機型為例,使用aws cli query:

aws ec2 describe-spot-price-history \
  --instance-types t4g.medium c6g.medium c6gd.medium c7g.medium c7gd.medium c8g.medium m6g.medium m6gd.medium m7g.medium m8g.medium r6g.medium r7g.medium r8g.medium \
  --product-descriptions "Linux/UNIX" \
  --start-time $(date -u +"%Y-%m-%dT%H:%M:%SZ") \
  --end-time $(date -u +"%Y-%m-%dT%H:%M:%SZ") \
  --query "sort_by(SpotPriceHistory[?AvailabilityZone=='ap-northeast-3a' || AvailabilityZone=='ap-northeast-3c'], &SpotPrice)[*].[InstanceType,SpotPrice,AvailabilityZone]" \
  --output table

------------------------------------------------
|           DescribeSpotPriceHistory           |
+-------------+------------+-------------------+
|  c8g.medium |  0.012400  |  ap-northeast-3c  |
|  c8g.medium |  0.012500  |  ap-northeast-3a  |
|  m6g.medium |  0.012600  |  ap-northeast-3c  |
|  c7g.medium |  0.013600  |  ap-northeast-3a  |
|  m6g.medium |  0.014200  |  ap-northeast-3a  |
|  c7gd.medium|  0.014200  |  ap-northeast-3a  |
|  m8g.medium |  0.014500  |  ap-northeast-3c  |
|  m8g.medium |  0.014500  |  ap-northeast-3a  |
|  m6gd.medium|  0.014600  |  ap-northeast-3a  |
|  m6gd.medium|  0.014800  |  ap-northeast-3c  |
|  m7g.medium |  0.014800  |  ap-northeast-3c  |
|  m7g.medium |  0.015000  |  ap-northeast-3a  |
|  r6g.medium |  0.015200  |  ap-northeast-3c  |
|  r6g.medium |  0.015300  |  ap-northeast-3a  |
|  r7g.medium |  0.016300  |  ap-northeast-3a  |
|  c6gd.medium|  0.017400  |  ap-northeast-3c  |
|  c6g.medium |  0.017700  |  ap-northeast-3c  |
|  c6g.medium |  0.017700  |  ap-northeast-3a  |
|  r7g.medium |  0.018200  |  ap-northeast-3c  |
|  r8g.medium |  0.018200  |  ap-northeast-3a  |
|  c7g.medium |  0.018200  |  ap-northeast-3c  |
|  r8g.medium |  0.018400  |  ap-northeast-3c  |
|  c6gd.medium|  0.018700  |  ap-northeast-3a  |
|  t4g.medium |  0.019800  |  ap-northeast-3c  |
|  t4g.medium |  0.020100  |  ap-northeast-3a  |
+-------------+------------+-------------------+

t4g.small On-Demand 的價格是 每小時 $0.0218,上面的價格比t4g.small 還要低,能不用他嗎?

對於CPU 持續運算型的workload 可以選用C/M 系列,對於有多線程優化的workload 可以選用t4g.medium,對於Memory 優先的workload 則可以選用R/M 系列。

但是,請注意,因為m8g/c8g 使用了更新一代的Graviton 4 chip,所以,即使只有1顆vCPU,因為他的處理速度更快,他的實際表現仍然可能會超過擁有兩顆Graviton 2 vCPU 的t4g,何況他還更便宜。

各大雲平臺的ARM64 機器都是玄學,具體的性能表現沒有一個官方的評測,我估計是因為在不同的workload 上表現可能差異很大,因為傳統研發還是更注重於x86_64 。

首先創建一個launch template ,裡面包含現在的EC2 製作的AMI,記錄一下 launch template id ,接下來會用到他。

然後,從 Spot Requests 點擊 Create Spot Fleet request,首先當然是選擇使用launch template。

對於 Target capacity 可以根據實際需要,比如我這個blog ,可以只選擇1 就好。

對於 Network , 如果在launch template 中有設置這裡就不需要了,但不要多處設置以免衝突。

好了,接下來是最重要的 Instance type requirements , 請選擇 Manually select instance types,然後 Add instance types ,將需要的instance types 加入進去。

對於 Allocation strategy 就選擇預設的最低價。

請不要點擊創建,可以看到在launch 旁邊有一個JSON config,點擊他將config download 到本機,保存為/Downloads/fleet.json。

使用下面的aws cli 命令 直接創建 spot fleet –

aws ec2 request-spot-fleet --spot-fleet-request-config file://Downloads/fleet.json



{
    "IamFleetRole": "arn:aws:iam::123454567890:role/aws-ec2-spot-fleet-tagging-role",
    "AllocationStrategy": "priceCapacityOptimized",
    "TargetCapacity": 1,
    "ValidFrom": "2026-01-06T08:19:06.000Z",
    "ValidUntil": "2027-01-06T08:19:06.000Z",
    "TerminateInstancesWithExpiration": true,
    "Type": "maintain",
    "OnDemandAllocationStrategy": "lowestPrice",
    "LaunchSpecifications": [],
    "LaunchTemplateConfigs": [
        {
            "LaunchTemplateSpecification": {
                "LaunchTemplateId": "lt-0716c882cb57a921d",
                "Version": "$Latest"
            },
            "Overrides": [
                {
                    "InstanceType": "m6g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0d2535d7e7cacf658",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "m6g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0c255e6c5c642da2a",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "m6g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-06ee85eed40f5261b",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "m7g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0d2535d7e7cacf658",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "m7g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0c255e6c5c642da2a",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "m7g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-06ee85eed40f5261b",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "m8g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0d2535d7e7cacf658",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "m8g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0c255e6c5c642da2a",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "m8g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-06ee85eed40f5261b",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "r6g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0d2535d7e7cacf658",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "r6g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0c255e6c5c642da2a",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "r6g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-06ee85eed40f5261b",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "r7g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0d2535d7e7cacf658",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "r7g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0c255e6c5c642da2a",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "r7g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-06ee85eed40f5261b",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "r8g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0d2535d7e7cacf658",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "r8g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0c255e6c5c642da2a",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "r8g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-06ee85eed40f5261b",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "t4g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0d2535d7e7cacf658",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "t4g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-0c255e6c5c642da2a",
                    "SpotPrice": "0.017"
                },
                {
                    "InstanceType": "t4g.medium",
                    "WeightedCapacity": 1,
                    "SubnetId": "subnet-06ee85eed40f5261b",
                    "SpotPrice": "0.017"
                }
            ]
        }
    ]
}

大功告成!

記錄一下 Request ID sfr-c7ccc145-d71d-4268-8b67-089161e02af6 ,接下來會用到他。

這樣,當有人啟用了更多的 On-demand 容量而迫使您當前使用的Spot Instance 被終止後,Spot Fleet 將會從這些機型中啟動新的Spot Instance ,不可能所有機型的Spot 容量都被用盡,這是絕對不可能的!

那麼,我們要如何更新呢?比如説,Security Patch,升級nginx ,php ,wordpress 版本?

對於Wordpress 而言,他的更新很頻繁,我不希望剛post 一個新的blog 然後instance 就被terminate 了,所以,這裡我們要使用EFS 來放置所有的Wordpress 文件。

那麼nginx , php 是不行的呀,如何用terraform 來自動化這些更新的動作?

根據這個思路,我們先梳理一下具體要做哪些步驟:

首先需要 Create AMI after nginx/php/OS change with current instance ID and give a name to this AMI

aws ec2 create-image \
    --instance-id i-0716c882cb57a921d \
    --name "web2026" \
    --no-reboot \
    --tag-specifications 'ResourceType=image,Tags=[{Key=auto-delete,Value=no}]' \
    --region ap-northeast-3

然後需要 Update launch template to use this AMI and set default version

Create new version with updated AMI
aws ec2 create-launch-template-version \
    --launch-template-id lt-0716c882cb57a921d \
    --source-version '$Latest' \
    --launch-template-data '{"ImageId":"ami-"}' \
    --region ap-northeast-3

Set the new version as default
aws ec2 modify-launch-template \
    --launch-template-id lt-0716c882cb57a921d \
    --default-version '$Latest' \
    --region ap-northeast-3

Get AMI id from last step and check AMI progress
aws ec2 describe-images \
    --image-ids ami-xxxxxxxx \
    --region ap-northeast-3 \
    --query 'Images[0].State'

如果AMI 製作成功,那就 Terminate old instance ,此時Spot Fleet將會使用launch tempate 中的新的AMI 啟動新的Spot Instance。

按照上述的流程,

接下來安裝 terraform –

terraform -v
Terraform v1.5.7 on darwin_arm64

創建一個工作目錄 mkdir ami-update-workflow

創建main.tf 配置文件,需要修改的地方有 – 正確的Region,前面配置的Launch Template ID,以及啟動的Spot Fleet Request ID:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "ap-northeast-3"
}

# Get instance ID from spot fleet using external data source
data "external" "spot_instance" {
  program = ["bash", "-c", <<-EOT
    INSTANCE_ID=$(aws ec2 describe-spot-fleet-instances \
      --spot-fleet-request-id sfr-c7ccc145-d71d-4268-8b67-089161e02af6 \
      --region ap-northeast-3 \
      --query 'ActiveInstances[0].InstanceId' --output text)
    echo "{\"instance_id\":\"$INSTANCE_ID\"}"
  EOT
  ]
}

# Create AMI with correct EBS settings using AWS CLI
resource "null_resource" "create_ami_with_ebs" {
  provisioner "local-exec" {
    command = <<-EOT
      AMI_ID=$(aws ec2 create-image \
        --instance-id ${data.external.spot_instance.result.instance_id} \
        --name "web-${formatdate("YYYYMMDD-hhmm", timestamp())}" \
        --no-reboot \
        --block-device-mappings '[{"DeviceName":"/dev/xvda","Ebs":{"DeleteOnTermination":true,"VolumeSize":10,"VolumeType":"gp3"}}]' \
        --tag-specifications 'ResourceType=image,Tags=[{Key=auto-delete,Value=no}]' \
        --region ap-northeast-3 \
        --query 'ImageId' --output text)
      echo "{\"ami_id\":\"$AMI_ID\"}" > ami_output.json
    EOT
  }
}

# Get the AMI ID from the output file
data "local_file" "ami_output" {
  depends_on = [null_resource.create_ami_with_ebs]
  filename   = "ami_output.json"
}

locals {
  ami_data = jsondecode(data.local_file.ami_output.content)
  ami_id   = local.ami_data.ami_id
}

# Update launch template and terminate instance
resource "null_resource" "update_launch_template" {
  depends_on = [data.local_file.ami_output]

  provisioner "local-exec" {
    command = <<-EOT
  # Wait for AMI to be available
      while [ "$(aws ec2 describe-images --image-ids ${local.ami_id} --query 'Images[0].State' --output text --region ap-northeast-3)" != "available" ]; do
        echo "Waiting for AMI to be available..."
        sleep 30
      done

      # Create new version
      aws ec2 create-launch-template-version \
        --launch-template-id lt-0716c882cb57a921d \
        --source-version '$Latest' \
        --launch-template-data '{"ImageId":"${local.ami_id}"}' \
        --region ap-northeast-3

      # Set as default
      aws ec2 modify-launch-template \
        --launch-template-id lt-0716c882cb57a921d \
        --default-version '$Latest' \
        --region ap-northeast-3

      # Terminate the old instance
      aws ec2 terminate-instances \
        --instance-ids ${data.external.spot_instance.result.instance_id} \
        --region ap-northeast-3
    EOT
  }
}

output "ami_id" {
  value = local.ami_id
}

output "instance_id" {
  value = data.external.spot_instance.result.instance_id
}

在目錄中創建一個run.sh 內容物如下:

rm terraform.tfstate*
rm -rf .terraform/
terraform init

來跑一下:

./run.sh 

Initializing the backend...
......
Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Run “terraform plan”

......

Plan: 2 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + ami_id      = (known after apply)
  + instance_id = "i-0ef93b2b937468a98"

───────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't
guarantee to take exactly these actions if you run "terraform apply" now.

Run “terraform apply”,也可以直接跑到這裡不用plan 。

......
null_resource.update_launch_template: Creation complete after 3m19s [id=3702937929507644703]

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Outputs:

ami_id = "ami-0dd89a88bf12cb2bd"
instance_id = "i-0ef93b2b937468a98"

執行完成後,spot instance 就被製作成新的AMI 並被設置為launch template 的default AMI,而且最後會terminate 當前的spot instance 喔。

目前運行穩定,沒發現有什麼問題,似乎使用ASG 也可以達成這個目的,讓我們下次再議。

CIA 最近一段時間動作頻頻,發佈了多條針對中國招募線人的廣告Videos,很多人認為是在中國的諜報人員都被處決,我並不認同,我認為這是為了掩護某些重要情報來源而故意放的煙霧彈,當你看身邊所有人都像五十萬的時候,真正的五十萬就隱藏起來了。