Due to seasonal peaks (e.g., Amazon Big Billion Days, holiday sales), the checkout service needs to be scalable, resilient, and responsive. In this case study, we’ll analyze how to determine the optimal number of pods for the checkout service to handle both regular and peak traffic loads, ensuring smooth customer experiences and minimizing latency during high-demand periods.
Requirements for the Checkout Service
High Availability and Fault Tolerance
- The service should be distributed across multiple nodes to avoid a single point of failure.
- It should handle failures seamlessly to maintain a smooth user experience.
Scalability
- The system should be able to scale up during peak demand (e.g., sale events) and scale down during off-peak periods to save costs.
Low Latency:
- As the checkout process directly affects user satisfaction, it must respond quickly, ideally under 200 ms for 95% of transactions.
High Throughput
- It should be able to handle a high number of concurrent users during peak times without degradation in performance.
Service Level Objectives (SLOs)
- Response time <200 ms for 95% of requests.
- Availability >99.9% for the checkout service.
Step 1: Profiling the Workload
Normal Traffic Load:
- During regular hours, the checkout service handles around 500 requests per second.
Peak Traffic Load:
- During peak sale events, traffic can increase by 10x, reaching up to 5000 requests per second.
Average Resource Consumption Per Request:
- CPU Usage: 0.2 CPU cores
- Memory Usage: 250 MB
- Latency target: <200 ms
From load testing, it’s determined that each pod can process up to 50 requests per second under typical configurations without exceeding latency or CPU limits.
Step 2: Estimating Pod Count
Base Pod Count Calculation
Normal Load Requirement
Pods required = Normal Requests per second / Requests per Pods per second i.e 500 / 50 which equals to 10 Pods.
Peak Load Requirement
Pods required = Peak requests per second / Requests per Pods per second i.e. 5000 / 50 which equals to 100 Pods.
Establishing Pod Autoscaling Ranges
- Minimum Pods: 10 (for normal traffic)
- Maximum Pods: 100 (for peak traffic)
By configuring Kubernetes' Horizontal Pod Autoscaler (HPA), we allow the checkout service to dynamically adjust the number of pods between these values based on actual CPU usage and traffic.
Step 3: Implementing Autoscaling
Kubernetes’ Horizontal Pod Autoscaler (HPA) can scale the number of pods based on custom metrics, typically CPU or request rate.
Autoscaling Configuration for the Checkout Service
CPU-Based Autoscaling
- Set target CPU utilization to 60%. This threshold ensures that as traffic rises, the number of pods will increase to maintain optimal CPU usage.
Latency-Based Autoscaling (if supported):
- Use custom metrics if available to trigger autoscaling when request latency approaches 200 ms, adding more pods to reduce latency.
Example HPA Configuration:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: checkout-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: checkout-service
minReplicas: 10
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Step 4: Ensuring High Availability
To improve availability and prevent single-point failures:
Pod Distribution Across Zones
- Ensure that pods are spread across multiple availability zones to minimize the impact of zone-specific outages.
Pod Disruption Budget (PDB)
- Configure PDBs to maintain a minimum number of available pods during rolling updates or node failures.
Node Affinity and Anti-Affinity:
- Use node affinity to spread pods across different nodes and prevent them from clustering on a single node.
Example Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: checkout-pdb
spec:
minAvailable: 80%
selector:
matchLabels:
app: checkout-service
Step 5: Monitoring and Optimization
Monitoring Metrics:
- Use monitoring tools (e.g., Prometheus, Grafana) to track CPU, memory, and latency metrics for the checkout service.
Autoscaling Adjustment:
- Periodically review the scaling thresholds and adjust HPA configurations based on observed traffic patterns and resource usage.
Cost Optimization:
- Evaluate usage patterns to fine-tune resource requests and limits, balancing cost with performance.
This approach not only ensures that the checkout service meets latency and availability requirements but also helps in managing costs efficiently by scaling down during off-peak times.
This scalable, responsive architecture makes the checkout service resilient and ensures a seamless shopping experience for users, even during high-demand events.