Category: Tech

  • 如何使用 Lambda 來完成WordPress Spot instance 的滾動更新

    AI 說使用Terraform 不是適用於這個場景的,使用Lambda 更合適,好吧,您也可以使用一台EC2 或者本地部署他,這樣Lambda 的錢也不用付。

    添加一個Lambda function ,名字叫RotateSpotInstance,修改Timeout 為 15min,因為這個過程可能需要比較長的時間,特別是在製作AMI 的部分。

    為自動生成的IAM role
    RotateSpotInstance-role-mhe3v2sg 添加下面的權限,為什麼是這些?您可以看一下下面需要完成的幾步工作 –

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "ec2:DescribeSpotFleetInstances",
            "ec2:CreateImage",
            "ec2:DescribeImages",
            "ec2:CreateLaunchTemplateVersion",
            "ec2:ModifyLaunchTemplate",
            "ec2:ModifySpotFleetRequest",
            "ec2:TerminateInstances",
            "ec2:CreateTags",
            "ec2:DescribeInstanceStatus"
          ],
          "Resource": "*"
        }
      ]
    }

    Lambda 的Configuration 配置按照實際情況,例如:

    LAUNCH_TEMPLATE_ID = lt-0716c882cb57a921d
    SPOT_FLEET_ID = sfr-c7ccc145-d71d-4268-8b67-089161e02af6

    這個function 通過這樣幾步來完成這項工作 –

    # 1. Get current instance – 從當前的spot fleet 中獲取運行中的instance id。
    # 2. Create AMI – 從當前運行中的instance 創建一個AMI。
    # 3. Wait for AMI – 檢測並且等待這個AMI 創建完成。
    # 4. Update Launch Template – 將創建完成的AMI id 更新到模板。
    # 5. Increase capacity – 修改spot fleet request 的Total target capacity 為2,這樣會自動起一台新的先。
    # 6. Wait for new instance to be healthy – 從instance status checks 判斷,等待新啟動的instance 通過健康檢查。
    # 7. Terminate old instance – 當新啟動的instance 通過健康檢查後,終止舊的instance。
    # 8. Restore capacity – 立刻修改spot fleet request 的Total target capacity 為1,這樣就不會再起更多的instance。

    import boto3
    import os
    import time
    import urllib.request
    import urllib.error
    from datetime import datetime
    
    
    def lambda_handler(event, context):
        SPOT_FLEET_ID = os.environ['SPOT_FLEET_ID']
        LAUNCH_TEMPLATE_ID = os.environ['LAUNCH_TEMPLATE_ID']
    
        ec2 = boto3.client('ec2')  # 自動使用 Lambda 所在的 region
    
        try:
            print(f"Starting rotation for Spot Fleet: {SPOT_FLEET_ID}")
    
            # 1. 取得目前的 instance
            response = ec2.describe_spot_fleet_instances(
                SpotFleetRequestId=SPOT_FLEET_ID
            )
    
            if not response['ActiveInstances']:
                return {'statusCode': 400, 'error': 'No active instances'}
    
            instance_id = response['ActiveInstances'][0]['InstanceId']
            print(f"Current instance: {instance_id}")
    
            # 2. 建立 AMI
            ami_name = f"wordpress-{datetime.now().strftime('%Y%m%d-%H%M')}"
            ami_response = ec2.create_image(
                InstanceId=instance_id,
                Name=ami_name,
                NoReboot=True,
                TagSpecifications=[{
                    'ResourceType': 'image',
                    'Tags': [{'Key': 'auto-delete', 'Value': 'no'}]
                }]
            )
            ami_id = ami_response['ImageId']
            print(f"Creating AMI: {ami_id}")
    
            # 3. 等 AMI 可用
            print("Waiting for AMI to be available...")
            waiter = ec2.get_waiter('image_available')
            waiter.wait(
                ImageIds=[ami_id],
                WaiterConfig={'Delay': 30, 'MaxAttempts': 40}
            )
            print(f"AMI {ami_id} is available")
    
            # 4. 更新 Launch Template —— 重點修正區段
            # 4a. 建立新版本,拿到實際版本號
            new_version_response = ec2.create_launch_template_version(
                LaunchTemplateId=LAUNCH_TEMPLATE_ID,
                SourceVersion='$Latest',
                LaunchTemplateData={'ImageId': ami_id}
            )
            new_version_number = str(
                new_version_response['LaunchTemplateVersion']['VersionNumber']
            )
            print(f"Created launch template version {new_version_number} with AMI {ami_id}")
    
            # 4b. 用實際版本號設定 default(不是 '$Latest')
            ec2.modify_launch_template(
                LaunchTemplateId=LAUNCH_TEMPLATE_ID,
                DefaultVersion=new_version_number
            )
    
            # 4c. 驗證 default version 真的更新了
            lt_desc = ec2.describe_launch_templates(
                LaunchTemplateIds=[LAUNCH_TEMPLATE_ID]
            )
            actual_default = str(lt_desc['LaunchTemplates'][0]['DefaultVersionNumber'])
            if actual_default != new_version_number:
                raise Exception(
                    f"Default version mismatch: expected {new_version_number}, "
                    f"got {actual_default}"
                )
            print(f"Verified launch template default version = {actual_default}")
    
            # 4d. 稍等 Spot Fleet 看到新的 default
            print("Waiting 60s for Spot Fleet to pick up new launch template version...")
            time.sleep(60)
    
            # 5. 增加 capacity —— 會用新的 AMI 啟動
            ec2.modify_spot_fleet_request(
                SpotFleetRequestId=SPOT_FLEET_ID,
                TargetCapacity=2
            )
            print("Increased capacity to 2")
    
            # 6. 等新 instance 變健康,並驗證它是用新 AMI 啟動
            new_instance_id = wait_for_new_instance(
                ec2, SPOT_FLEET_ID, instance_id, expected_ami_id=ami_id
            )
            print(f"New instance {new_instance_id} is healthy and using new AMI")
    
            # 7. 終止舊 instance
            ec2.terminate_instances(InstanceIds=[instance_id])
            print(f"Terminated old instance: {instance_id}")
    
            # 8. 恢復 capacity
            ec2.modify_spot_fleet_request(
                SpotFleetRequestId=SPOT_FLEET_ID,
                TargetCapacity=1
            )
            print("Restored capacity to 1")
    
            return {
                'statusCode': 200,
                'ami_id': ami_id,
                'launch_template_version': new_version_number,
                'old_instance': instance_id,
                'new_instance': new_instance_id
            }
    
        except Exception as e:
            print(f"Error: {str(e)}")
            try:
                ec2.modify_spot_fleet_request(
                    SpotFleetRequestId=SPOT_FLEET_ID,
                    TargetCapacity=1
                )
                print("Rolled back capacity to 1")
            except Exception as rollback_error:
                print(f"Rollback failed: {rollback_error}")
            return {'statusCode': 500, 'error': str(e)}
    
    
    def check_http_port(ec2, instance_id, port=80):
        """透過 HTTP HEAD 檢查 port 是否可連線"""
        try:
            response = ec2.describe_instances(InstanceIds=[instance_id])
            instance = response['Reservations'][0]['Instances'][0]
            public_ip = instance.get('PublicIpAddress') or instance.get('PrivateIpAddress')
    
            if not public_ip:
                return False
    
            try:
                req = urllib.request.Request(f'http://{public_ip}:{port}', method='HEAD')
                urllib.request.urlopen(req, timeout=5)
                print(f"HTTP port {port} is responding on {public_ip}")
                return True
            except (urllib.error.URLError, urllib.error.HTTPError, TimeoutError):
                return False
    
        except Exception as e:
            print(f"Error checking HTTP port: {e}")
            return False
    
    
    def get_instance_ami(ec2, instance_id):
        """取得 instance 實際使用的 AMI ID"""
        try:
            resp = ec2.describe_instances(InstanceIds=[instance_id])
            return resp['Reservations'][0]['Instances'][0].get('ImageId')
        except Exception as e:
            print(f"Error getting AMI for {instance_id}: {e}")
            return None
    
    
    def wait_for_new_instance(ec2, spot_fleet_id, old_instance_id,
                              expected_ami_id=None, max_wait=600):
        """
        等新 instance 出現、變 running、HTTP 通。
        若提供 expected_ami_id,會驗證新 instance 確實使用該 AMI,
        避免 Spot Fleet 用到舊 template 卻被誤判為成功。
        """
        print("Waiting for new instance to be healthy...")
    
        for i in range(max_wait // 10):
            response = ec2.describe_spot_fleet_instances(
                SpotFleetRequestId=spot_fleet_id
            )
            instances = response['ActiveInstances']
    
            if len(instances) < 2:
                print(f"Waiting for new instance... ({i*10}s)")
                time.sleep(10)
                continue
    
            new_instances = [inst for inst in instances
                             if inst['InstanceId'] != old_instance_id]
    
            if not new_instances:
                print(f"No new instance found yet... ({i*10}s)")
                time.sleep(10)
                continue
    
            new_instance_id = new_instances[0]['InstanceId']
    
            # 驗證新 instance 使用的是預期的 AMI
            if expected_ami_id:
                actual_ami = get_instance_ami(ec2, new_instance_id)
                if actual_ami != expected_ami_id:
                    raise Exception(
                        f"New instance {new_instance_id} is using AMI {actual_ami}, "
                        f"expected {expected_ami_id}. "
                        f"Launch template default version may not have been picked up."
                    )
                print(f"Confirmed new instance {new_instance_id} uses AMI {actual_ami}")
    
            try:
                status_response = ec2.describe_instance_status(
                    InstanceIds=[new_instance_id],
                    IncludeAllInstances=True
                )
    
                if not status_response['InstanceStatuses']:
                    print(f"Instance {new_instance_id} status not available yet...")
                    time.sleep(10)
                    continue
    
                status = status_response['InstanceStatuses'][0]
                instance_state = status['InstanceState']['Name']
    
                print(f"Instance: {instance_state}")
    
                # running 後檢查 HTTP port 80
                if instance_state == 'running':
                    if check_http_port(ec2, new_instance_id, port=80):
                        print(f"Instance {new_instance_id} is fully healthy!")
                        return new_instance_id
                    else:
                        print("Instance running but HTTP port 80 not ready yet...")
    
            except Exception as e:
                print(f"Error checking status: {e}")
    
            time.sleep(10)
    
        raise Exception(f"New instance didn't become healthy within {max_wait}s")

    Deploy 到Lambda,Publish!

    trigger it –

    aws lambda invoke \
      --function-name RotateSpotInstance \
      --region ap-northeast-3 \
      --invocation-type Event \
      response.json

    View logs in real-time –

    aws logs tail /aws/lambda/RotateSpotInstance \
      --follow \
      --region ap-northeast-3
    
    
    2026-03-05T06:33:57.808000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 INIT_START Runtime Version: python:3.14.v35 Runtime Version ARN: arn:aws:lambda:ap-northeast-3::runtime:35b4fe1ff6a2b42e1513619f35af63e09acce626823e1d0e547d6393c854bc71
    2026-03-05T06:33:58.102000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 START RequestId: 7bf945e2-dcbf-4345-a7fb-8e811842cf69 Version: $LATEST
    2026-03-05T06:34:00.771000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Starting rotation for Spot Fleet: sfr-c7ccc145-d71d-4268-8b67-089161e02af6
    2026-03-05T06:34:01.142000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Current instance: i-0e187b0c5195efaa6
    2026-03-05T06:34:01.538000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Creating AMI: ami-07590290f72731092
    2026-03-05T06:34:01.538000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Waiting for AMI to be available...
    2026-03-05T06:34:59.256000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b INIT_START Runtime Version: python:3.14.v35 Runtime Version ARN: arn:aws:lambda:ap-northeast-3::runtime:35b4fe1ff6a2b42e1513619f35af63e09acce626823e1d0e547d6393c854bc71
    2026-03-05T06:34:59.567000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b START RequestId: 3e94eeea-257e-4203-b1af-9cd12adab8d4 Version: $LATEST
    2026-03-05T06:35:02.268000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Starting rotation for Spot Fleet: sfr-c7ccc145-d71d-4268-8b67-089161e02af6
    2026-03-05T06:35:02.664000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Current instance: i-0e187b0c5195efaa6
    2026-03-05T06:35:03.024000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Creating AMI: ami-0c75c75b4f29b190e
    2026-03-05T06:35:03.024000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Waiting for AMI to be available...
    2026-03-05T06:36:01.324000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f INIT_START Runtime Version: python:3.14.v35 Runtime Version ARN: arn:aws:lambda:ap-northeast-3::runtime:35b4fe1ff6a2b42e1513619f35af63e09acce626823e1d0e547d6393c854bc71
    2026-03-05T06:36:01.627000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f START RequestId: d9621da3-bd40-4aed-a028-1ac35f2ed22c Version: $LATEST
    2026-03-05T06:36:01.980000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 AMI ami-07590290f72731092 is available
    2026-03-05T06:36:02.354000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Updated Launch Template to use ami-07590290f72731092
    2026-03-05T06:36:02.542000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Increased capacity to 2
    2026-03-05T06:36:02.542000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Waiting for new instance to be healthy...
    2026-03-05T06:36:02.652000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Waiting for new instance... (0s)
    2026-03-05T06:36:04.407000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f Starting rotation for Spot Fleet: sfr-c7ccc145-d71d-4268-8b67-089161e02af6
    2026-03-05T06:36:04.783000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f Current instance: i-0e187b0c5195efaa6
    2026-03-05T06:36:05.128000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f Creating AMI: ami-035bf50417603be87
    2026-03-05T06:36:05.128000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f Waiting for AMI to be available...
    2026-03-05T06:36:12.748000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Waiting for new instance... (10s)
    2026-03-05T06:36:22.920000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance: running, System: initializing, Check: initializing
    2026-03-05T06:36:33.082000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance: running, System: initializing, Check: initializing
    2026-03-05T06:36:43.263000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance: running, System: initializing, Check: initializing
    2026-03-05T06:36:53.430000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:03.489000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b AMI ami-0c75c75b4f29b190e is available
    2026-03-05T06:37:03.598000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:03.889000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Updated Launch Template to use ami-0c75c75b4f29b190e
    2026-03-05T06:37:04.054000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Increased capacity to 2
    2026-03-05T06:37:04.054000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Waiting for new instance to be healthy...
    2026-03-05T06:37:04.212000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:13.770000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:14.379000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:23.922000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:24.542000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:34.083000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:34.704000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:44.269000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:44.869000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:54.661000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance: running, System: initializing, Check: initializing
    2026-03-05T06:37:55.047000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Instance: running, System: initializing, Check: initializing
    2026-03-05T06:38:04.816000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance: running, System: ok, Check: ok
    2026-03-05T06:38:04.816000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Instance i-0638e625f7b42d66a is fully healthy!
    2026-03-05T06:38:04.816000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 New instance i-0638e625f7b42d66a is healthy
    2026-03-05T06:38:05.113000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Terminated old instance: i-0e187b0c5195efaa6
    2026-03-05T06:38:05.229000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Instance: running, System: ok, Check: ok
    2026-03-05T06:38:05.229000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Instance i-0638e625f7b42d66a is fully healthy!
    2026-03-05T06:38:05.229000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b New instance i-0638e625f7b42d66a is healthy
    2026-03-05T06:38:05.309000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 Restored capacity to 1
    2026-03-05T06:38:05.333000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 END RequestId: 7bf945e2-dcbf-4345-a7fb-8e811842cf69
    2026-03-05T06:38:05.333000+00:00 2026/03/05/[$LATEST]4e889110fba78fe4e51e9fb26315e4d6 REPORT RequestId: 7bf945e2-dcbf-4345-a7fb-8e811842cf69  Duration: 247229.90 ms  Billed Duration: 247521 ms  Memory Size: 128 MB Max Memory Used: 98 MB  Init Duration: 290.80 ms
    2026-03-05T06:38:05.496000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Terminated old instance: i-0e187b0c5195efaa6
    2026-03-05T06:38:05.580000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f AMI ami-035bf50417603be87 is available
    2026-03-05T06:38:05.630000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b Restored capacity to 1
    2026-03-05T06:38:05.647000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b END RequestId: 3e94eeea-257e-4203-b1af-9cd12adab8d4
    2026-03-05T06:38:05.647000+00:00 2026/03/05/[$LATEST]4907e5924ad374a64d41ffd64da3c19b REPORT RequestId: 3e94eeea-257e-4203-b1af-9cd12adab8d4  Duration: 186080.32 ms  Billed Duration: 186387 ms  Memory Size: 128 MB Max Memory Used: 97 MB  Init Duration: 306.42 ms
    2026-03-05T06:38:05.944000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f Updated Launch Template to use ami-035bf50417603be87
    2026-03-05T06:38:06.115000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f Error: An error occurred (FleetNotInModifiableState) when calling the ModifySpotFleetRequest operation: Fleet Request: sfr-c7ccc145-d71d-4268-8b67-089161e02af6 is not in a modifiable state.
    2026-03-05T06:38:06.555000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f Rolled back capacity to 1
    2026-03-05T06:38:06.588000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f END RequestId: d9621da3-bd40-4aed-a028-1ac35f2ed22c
    2026-03-05T06:38:06.588000+00:00 2026/03/05/[$LATEST]5fd07f95f7fb4ff087fdefaab8e80e9f REPORT RequestId: d9621da3-bd40-4aed-a028-1ac35f2ed22c  Duration: 124960.33 ms  Billed Duration: 125260 ms  Memory Size: 128 MB Max Memory Used: 97 MB  Init Duration: 299.41 ms

    有併發數量的問題,限制Lambda 只能1個進程跑 –

    aws lambda put-function-concurrency \
    –function-name RotateSpotInstance \
    –reserved-concurrent-executions 1 \
    –region ap-northeast-3
    {
    “ReservedConcurrentExecutions”: 1
    }

    確實清爽多了,也不會有服務中斷的問題。

    當然了,為了服務的持續性,您需要使用Dynamic DNS 來將新的instance ip report 到DNS server 並且設置script 在 boot 的時候自動執行。

    例如ddclient 更新到Cloudflare –

    INFO:    [cloudflare][private.bbken.org]> getting Cloudflare Zone ID
    INFO:    [cloudflare][private.bbken.org]> Zone ID is 0933028cb8e70c5cb4f0c736be6fee37
    INFO:    [cloudflare][private.bbken.org]> setting IPv4 address to 10.4.41.150
    SUCCESS: [cloudflare][private.bbken.org]> IPv4 address set to 10.4.41.150
    INFO:    [cloudflare][kix.bbken.org]> getting Cloudflare Zone ID
    INFO:    [cloudflare][kix.bbken.org]> Zone ID is 0933028cb8e70c5cb4f0c736be6fee37
    INFO:    [cloudflare][kix.bbken.org]> setting IPv4 address to 172.15.168.113
    SUCCESS: [cloudflare][kix.bbken.org]> IPv4 address set to 172.15.168.113
    INFO:    [cloudflare][kix.bbken.org]> setting IPv6 address to 2406:da16:a8d:2bc6:5a03:80fb:3a46:796
    SUCCESS: [cloudflare][kix.bbken.org]> IPv6 address set to 2406:da16:a8d:2bc6:5a03:80fb:3a46:796
  • 如何使用 Terraform 來完成WordPress Spot instance 的滾動更新

    使用AWS Spot instance 的朋友都知道,他很便宜,有些機型可以便宜90%。

    他有兩種運行模式,一種是One time 一種是Persist,在AWS 容量充足的情況下,使用Persist 方式可以在很長時間內維持一個低價位的運行水平,但,問題是,AWS 的容量是動態調整的,比如您使用c5.large 的persist spot instance ,當有人啟用了更多的c5.large On-demand 容量,您使用的Spot instance 就會被強制釋放掉,而這段不可用的時間可能持續數分鐘到數小時,完全無法預估。這時候只能手動製作一個AMI,然後從AMI 啟動一個新的不同的instance type ,通常可用。

    那麼,如何做到無需人工干預,讓Spot instance 在不同的instance type 之間自動續命不停歇呢?

    當有人啟用了更多的On-demand 容量時,他總不可能把所有的instance type 都用盡吧,根據這個思路,我們可以使用Spot Fleet Request,在這個fleet 中設置幾個不同的機型。

    那麼選用那種類型的instance 呢?您一定想當然的認為 t3/t4 會比較便宜,然而在Spot instance ,並不是這樣,是使用的人越少,他越便宜,下面以Osaka region 的部分ARM64 機型為例,使用aws cli query:

    aws ec2 describe-spot-price-history \
      --instance-types t4g.medium c6g.medium c6gd.medium c7g.medium c7gd.medium c8g.medium m6g.medium m6gd.medium m7g.medium m8g.medium r6g.medium r7g.medium r8g.medium \
      --product-descriptions "Linux/UNIX" \
      --start-time $(date -u +"%Y-%m-%dT%H:%M:%SZ") \
      --end-time $(date -u +"%Y-%m-%dT%H:%M:%SZ") \
      --query "sort_by(SpotPriceHistory[?AvailabilityZone=='ap-northeast-3a' || AvailabilityZone=='ap-northeast-3c'], &SpotPrice)[*].[InstanceType,SpotPrice,AvailabilityZone]" \
      --output table
    
    ------------------------------------------------
    |           DescribeSpotPriceHistory           |
    +-------------+------------+-------------------+
    |  c8g.medium |  0.012400  |  ap-northeast-3c  |
    |  c8g.medium |  0.012500  |  ap-northeast-3a  |
    |  m6g.medium |  0.012600  |  ap-northeast-3c  |
    |  c7g.medium |  0.013600  |  ap-northeast-3a  |
    |  m6g.medium |  0.014200  |  ap-northeast-3a  |
    |  c7gd.medium|  0.014200  |  ap-northeast-3a  |
    |  m8g.medium |  0.014500  |  ap-northeast-3c  |
    |  m8g.medium |  0.014500  |  ap-northeast-3a  |
    |  m6gd.medium|  0.014600  |  ap-northeast-3a  |
    |  m6gd.medium|  0.014800  |  ap-northeast-3c  |
    |  m7g.medium |  0.014800  |  ap-northeast-3c  |
    |  m7g.medium |  0.015000  |  ap-northeast-3a  |
    |  r6g.medium |  0.015200  |  ap-northeast-3c  |
    |  r6g.medium |  0.015300  |  ap-northeast-3a  |
    |  r7g.medium |  0.016300  |  ap-northeast-3a  |
    |  c6gd.medium|  0.017400  |  ap-northeast-3c  |
    |  c6g.medium |  0.017700  |  ap-northeast-3c  |
    |  c6g.medium |  0.017700  |  ap-northeast-3a  |
    |  r7g.medium |  0.018200  |  ap-northeast-3c  |
    |  r8g.medium |  0.018200  |  ap-northeast-3a  |
    |  c7g.medium |  0.018200  |  ap-northeast-3c  |
    |  r8g.medium |  0.018400  |  ap-northeast-3c  |
    |  c6gd.medium|  0.018700  |  ap-northeast-3a  |
    |  t4g.medium |  0.019800  |  ap-northeast-3c  |
    |  t4g.medium |  0.020100  |  ap-northeast-3a  |
    +-------------+------------+-------------------+

    t4g.small On-Demand 的價格是 每小時 $0.0218,上面的價格比t4g.small 還要低,能不用他嗎?

    對於CPU 持續運算型的workload 可以選用C/M 系列,對於有多線程優化的workload 可以選用t4g.medium,對於Memory 優先的workload 則可以選用R/M 系列。

    但是,請注意,因為m8g/c8g 使用了更新一代的Graviton 4 chip,所以,即使只有1顆vCPU,因為他的處理速度更快,他的實際表現仍然可能會超過擁有兩顆Graviton 2 vCPU 的t4g,何況他還更便宜。

    各大雲平臺的ARM64 機器都是玄學,具體的性能表現沒有一個官方的評測,我估計是因為在不同的workload 上表現可能差異很大,因為傳統研發還是更注重於x86_64 。

    首先創建一個launch template ,裡面包含現在的EC2 製作的AMI,記錄一下 launch template id ,接下來會用到他。

    然後,從 Spot Requests 點擊 Create Spot Fleet request,首先當然是選擇使用launch template。

    對於 Target capacity 可以根據實際需要,比如我這個blog ,可以只選擇1 就好。

    對於 Network , 如果在launch template 中有設置這裡就不需要了,但不要多處設置以免衝突。

    好了,接下來是最重要的 Instance type requirements , 請選擇 Manually select instance types,然後 Add instance types ,將需要的instance types 加入進去。

    對於 Allocation strategy 就選擇預設的最低價。

    請不要點擊創建,可以看到在launch 旁邊有一個JSON config,點擊他將config download 到本機,保存為/Downloads/fleet.json。

    使用下面的aws cli 命令 直接創建 spot fleet –

    aws ec2 request-spot-fleet --spot-fleet-request-config file://Downloads/fleet.json
    
    
    
    {
        "IamFleetRole": "arn:aws:iam::123454567890:role/aws-ec2-spot-fleet-tagging-role",
        "AllocationStrategy": "priceCapacityOptimized",
        "TargetCapacity": 1,
        "ValidFrom": "2026-01-06T08:19:06.000Z",
        "ValidUntil": "2027-01-06T08:19:06.000Z",
        "TerminateInstancesWithExpiration": true,
        "Type": "maintain",
        "OnDemandAllocationStrategy": "lowestPrice",
        "LaunchSpecifications": [],
        "LaunchTemplateConfigs": [
            {
                "LaunchTemplateSpecification": {
                    "LaunchTemplateId": "lt-0716c882cb57a921d",
                    "Version": "$Latest"
                },
                "Overrides": [
                    {
                        "InstanceType": "m6g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0d2535d7e7cacf658",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "m6g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0c255e6c5c642da2a",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "m6g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-06ee85eed40f5261b",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "m7g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0d2535d7e7cacf658",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "m7g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0c255e6c5c642da2a",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "m7g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-06ee85eed40f5261b",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "m8g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0d2535d7e7cacf658",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "m8g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0c255e6c5c642da2a",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "m8g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-06ee85eed40f5261b",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "r6g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0d2535d7e7cacf658",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "r6g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0c255e6c5c642da2a",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "r6g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-06ee85eed40f5261b",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "r7g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0d2535d7e7cacf658",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "r7g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0c255e6c5c642da2a",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "r7g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-06ee85eed40f5261b",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "r8g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0d2535d7e7cacf658",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "r8g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0c255e6c5c642da2a",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "r8g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-06ee85eed40f5261b",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "t4g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0d2535d7e7cacf658",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "t4g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-0c255e6c5c642da2a",
                        "SpotPrice": "0.017"
                    },
                    {
                        "InstanceType": "t4g.medium",
                        "WeightedCapacity": 1,
                        "SubnetId": "subnet-06ee85eed40f5261b",
                        "SpotPrice": "0.017"
                    }
                ]
            }
        ]
    }

    大功告成!

    記錄一下 Request ID sfr-c7ccc145-d71d-4268-8b67-089161e02af6 ,接下來會用到他。

    這樣,當有人啟用了更多的 On-demand 容量而迫使您當前使用的Spot Instance 被終止後,Spot Fleet 將會從這些機型中啟動新的Spot Instance ,不可能所有機型的Spot 容量都被用盡,這是絕對不可能的!

    那麼,我們要如何更新呢?比如説,Security Patch,升級nginx ,php ,wordpress 版本?

    對於Wordpress 而言,他的更新很頻繁,我不希望剛post 一個新的blog 然後instance 就被terminate 了,所以,這裡我們要使用EFS 來放置所有的Wordpress 文件。

    那麼nginx , php 是不行的呀,如何用terraform 來自動化這些更新的動作?

    根據這個思路,我們先梳理一下具體要做哪些步驟:

    首先需要 Create AMI after nginx/php/OS change with current instance ID and give a name to this AMI

    aws ec2 create-image \
        --instance-id i-0716c882cb57a921d \
        --name "web2026" \
        --no-reboot \
        --tag-specifications 'ResourceType=image,Tags=[{Key=auto-delete,Value=no}]' \
        --region ap-northeast-3

    然後需要 Update launch template to use this AMI and set default version

    Create new version with updated AMI
    aws ec2 create-launch-template-version \
        --launch-template-id lt-0716c882cb57a921d \
        --source-version '$Latest' \
        --launch-template-data '{"ImageId":"ami-"}' \
        --region ap-northeast-3
    
    Set the new version as default
    aws ec2 modify-launch-template \
        --launch-template-id lt-0716c882cb57a921d \
        --default-version '$Latest' \
        --region ap-northeast-3
    
    Get AMI id from last step and check AMI progress
    aws ec2 describe-images \
        --image-ids ami-xxxxxxxx \
        --region ap-northeast-3 \
        --query 'Images[0].State'

    如果AMI 製作成功,那就 Terminate old instance ,此時Spot Fleet將會使用launch tempate 中的新的AMI 啟動新的Spot Instance。

    按照上述的流程,

    接下來安裝 terraform –

    terraform -v
    Terraform v1.5.7 on darwin_arm64

    創建一個工作目錄 mkdir ami-update-workflow

    創建main.tf 配置文件,需要修改的地方有 – 正確的Region,前面配置的Launch Template ID,以及啟動的Spot Fleet Request ID:

    terraform {
      required_providers {
        aws = {
          source  = "hashicorp/aws"
          version = "~> 5.0"
        }
      }
    }
    
    provider "aws" {
      region = "ap-northeast-3"
    }
    
    # Get instance ID from spot fleet using external data source
    data "external" "spot_instance" {
      program = ["bash", "-c", <<-EOT
        INSTANCE_ID=$(aws ec2 describe-spot-fleet-instances \
          --spot-fleet-request-id sfr-c7ccc145-d71d-4268-8b67-089161e02af6 \
          --region ap-northeast-3 \
          --query 'ActiveInstances[0].InstanceId' --output text)
        echo "{\"instance_id\":\"$INSTANCE_ID\"}"
      EOT
      ]
    }
    
    # Create AMI with correct EBS settings using AWS CLI
    resource "null_resource" "create_ami_with_ebs" {
      provisioner "local-exec" {
        command = <<-EOT
          AMI_ID=$(aws ec2 create-image \
            --instance-id ${data.external.spot_instance.result.instance_id} \
            --name "web-${formatdate("YYYYMMDD-hhmm", timestamp())}" \
            --no-reboot \
            --block-device-mappings '[{"DeviceName":"/dev/xvda","Ebs":{"DeleteOnTermination":true,"VolumeSize":10,"VolumeType":"gp3"}}]' \
            --tag-specifications 'ResourceType=image,Tags=[{Key=auto-delete,Value=no}]' \
            --region ap-northeast-3 \
            --query 'ImageId' --output text)
          echo "{\"ami_id\":\"$AMI_ID\"}" > ami_output.json
        EOT
      }
    }
    
    # Get the AMI ID from the output file
    data "local_file" "ami_output" {
      depends_on = [null_resource.create_ami_with_ebs]
      filename   = "ami_output.json"
    }
    
    locals {
      ami_data = jsondecode(data.local_file.ami_output.content)
      ami_id   = local.ami_data.ami_id
    }
    
    # Update launch template and terminate instance
    resource "null_resource" "update_launch_template" {
      depends_on = [data.local_file.ami_output]
    
      provisioner "local-exec" {
        command = <<-EOT
      # Wait for AMI to be available
          while [ "$(aws ec2 describe-images --image-ids ${local.ami_id} --query 'Images[0].State' --output text --region ap-northeast-3)" != "available" ]; do
            echo "Waiting for AMI to be available..."
            sleep 30
          done
    
          # Create new version
          aws ec2 create-launch-template-version \
            --launch-template-id lt-0716c882cb57a921d \
            --source-version '$Latest' \
            --launch-template-data '{"ImageId":"${local.ami_id}"}' \
            --region ap-northeast-3
    
          # Set as default
          aws ec2 modify-launch-template \
            --launch-template-id lt-0716c882cb57a921d \
            --default-version '$Latest' \
            --region ap-northeast-3
    
          # Terminate the old instance
          aws ec2 terminate-instances \
            --instance-ids ${data.external.spot_instance.result.instance_id} \
            --region ap-northeast-3
        EOT
      }
    }
    
    output "ami_id" {
      value = local.ami_id
    }
    
    output "instance_id" {
      value = data.external.spot_instance.result.instance_id
    }

    在目錄中創建一個run.sh 內容物如下:

    rm terraform.tfstate*
    rm -rf .terraform/
    terraform init

    來跑一下:

    ./run.sh 
    
    Initializing the backend...
    ......
    Terraform has been successfully initialized!
    
    You may now begin working with Terraform. Try running "terraform plan" to see
    any changes that are required for your infrastructure. All Terraform commands
    should now work.
    
    If you ever set or change modules or backend configuration for Terraform,
    rerun this command to reinitialize your working directory. If you forget, other
    commands will detect it and remind you to do so if necessary.

    Run “terraform plan”

    ......
    
    Plan: 2 to add, 0 to change, 0 to destroy.
    
    Changes to Outputs:
      + ami_id      = (known after apply)
      + instance_id = "i-0ef93b2b937468a98"
    
    ───────────────────────────────────────────────────────────────────────────────
    
    Note: You didn't use the -out option to save this plan, so Terraform can't
    guarantee to take exactly these actions if you run "terraform apply" now.

    Run “terraform apply”,也可以直接跑到這裡不用plan 。

    ......
    null_resource.update_launch_template: Creation complete after 3m19s [id=3702937929507644703]
    
    Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
    
    Outputs:
    
    ami_id = "ami-0dd89a88bf12cb2bd"
    instance_id = "i-0ef93b2b937468a98"

    執行完成後,spot instance 就被製作成新的AMI 並被設置為launch template 的default AMI,而且最後會terminate 當前的spot instance 喔。

    目前運行穩定,沒發現有什麼問題,似乎使用ASG 也可以達成這個目的,讓我們下次再議。

    CIA 最近一段時間動作頻頻,發佈了多條針對中國招募線人的廣告Videos,很多人認為是在中國的諜報人員都被處決,我並不認同,我認為這是為了掩護某些重要情報來源而故意放的煙霧彈,當你看身邊所有人都像五十萬的時候,真正的五十萬就隱藏起來了。
  • 如何升級EC2 上的MacOS? How to upgrade MacOS on AWS EC2

    參考官方文件就可以了 – https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mac-instance-updates.html

    過於陳舊的MacOS 需要升級ENA driver,但通常不會有這種情況,因為使用MacOS 主要是用於iOS 開發和迅速迭代,沒有人會用五年前的Mac 來寫code,因為你要支持最新的iPhone 就必須要用最新的MacOS。而且,不是所有的版本都支持升級,支持的版本請參考該頁面說明。

    使用SSH 連結到 EC2 並為ec2-user 設置一個密碼:

    sudo /usr/bin/dscl . -passwd /Users/ec2-user

    為了Enable the secure token for the ec2-user user,必須修改一次密碼,修改的動作並不是為了修改密碼,而是為了Enable the secure token。

    vi old_password.txt
    
    vi new_password.txt
    
    sysadminctl -oldPassword `cat old_password.txt` -newPassword `cat new_password.txt`
    
    2026-01-21 06:53:17.708 sysadminctl[705:6829] Attempting to change password for ec2-user…
    2026-01-21 06:53:19.287 sysadminctl[705:6829] SecKeychainCopyLogin returned -25294
    2026-01-21 06:53:19.287 sysadminctl[705:6829] Failed to update keychain password (-25294)
    2026-01-21 06:53:19.288 sysadminctl[705:6829] - Done

    確認一下真的啟用了:

    sysadminctl -secureTokenStatus ec2-user
    
    2026-01-21 06:58:35.755 sysadminctl[806:8689] Secure token is ENABLED for user ec2-user

    創建一個json 文件用於Delegate ownership of the Amazon EBS root volume to the EBS root volume administrative user,這是因為在 Apple Silicon Mac (Mac2, Mac2-m2, Mac-m4 等)上,系統更新需要特定的管理權限,當你要進行 macOS 系統更新時,需要先將 EBS根磁碟區的所有權從內部磁碟管理員(aws-managed-user)委派給 EBS 根磁碟區管理使用者(通常是 ec2-user)。

    vi mac-credentials.json
    
    {
      "internalDiskPassword":"",
      "rootVolumeUsername":"ec2-user",
      "rootVolumepassword":"newPasswordHere"
    }
    
    aws ec2 create-delegate-mac-volume-ownership-task \
    --instance-id i-079180f283a7b0baf \
    --mac-credentials file://mac-credentials.json
    
    {
        "MacModificationTask": {
            "InstanceId": "i-079180f283a7b0baf",
            "MacModificationTaskId": "macmodification-05b40255b8b15176c",
            "MacSystemIntegrityProtectionConfig": {},
            "StartTime": "2026-01-21T08:09:43.417000+00:00",
            "TaskState": "pending",
            "TaskType": "volume-ownership-delegation"
        }
    }
    
    aws ec2 describe-mac-modification-tasks \
    --mac-modification-task-id macmodification-05b40255b8b15176c
    
    {
        "MacModificationTasks": [
            {
                "InstanceId": "i-079180f283a7b0baf",
                "MacModificationTaskId": "macmodification-05b40255b8b15176c",
                "MacSystemIntegrityProtectionConfig": {},
                "StartTime": "2026-01-21T08:09:43.417000+00:00",
                "Tags": [],
                "TaskState": "in-progress",
                "TaskType": "volume-ownership-delegation"
            }
        ]
    }

    上面一般需要30~90 分鐘,我是用了四十幾分鐘,

    login after rebooted:
    
        ┌───┬──┐   __|  __|_  )
        │ ╷╭╯╷ │   _|  (     /
        │  └╮  │  ___|\___|___|
        │ ╰─┼╯ │  Amazon EC2
        └───┴──┘  macOS Sequoia 15.7.3
    
    
    softwareupdate --list
    Software Update Tool
    
    Finding available software
    Software Update found the following new or updated software:
    * Label: macOS Tahoe 26.2-25C56
        Title: macOS Tahoe 26.2, Version: 26.2, Size: 8111826KiB, Recommended: YES, Action: restart, 
    
    sudo softwareupdate --install --all --restart
    Software Update Tool
    
    Finding available software
    Downloading macOS Tahoe 26.2
    Password: 
    Downloading: 62.47%
    Downloading: 93.73%
    Downloading: 100.00%
    Downloaded: macOS Tahoe 26.2
    Restarting...
    
    
        ┌───┬──┐   __|  __|_  )
        │ ╷╭╯╷ │   _|  (     /
        │  └╮  │  ___|\___|___|
        │ ╰─┼╯ │  Amazon EC2
        └───┴──┘  macOS Tahoe 26.2