Next.js server-side rendering (SSR) is CPU-intensive. Every request triggers React rendering, data fetching, and HTML serialization—all executed on Node.js’s single-threaded event loop. In Kubernetes, how you allocate and schedule CPU is one of the most important performance levers for SSR workloads (often more impactful than memory tuning or network optimizations).
This post explains why the default “many small pods” pattern destroys tail latency, shows real benchmarks, and gives production-ready configurations that routinely deliver 2–4× better p95 latency and throughput.
Most teams start with the obvious configuration:
# Default configuration with small pods leading to high tail latency
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
This creates ~8 pods per 4-core node. It feels efficient—until you measure tail latency.
| Configuration | Throughput (req/s) | p95 Latency | CPU per node |
|---|---|---|---|
| 8 × 0.5 CPU pods (baseline) | 920 | 285 ms | 4 cores |
| 2 × 2 CPU pods | 1,850 | 98 ms | 4 cores |
| 1 × 4 CPU pod + clustering | 2,210 | 72 ms | 4 cores |
Why is the difference so dramatic?
Fewer, larger pods → fewer context switches → warmer caches → dramatically better tail latency.
Modern AMD EPYC and Intel Xeon CPUs (Milan, Genoa, Sapphire Rapids, Emerald Rapids) have:
When Kubernetes schedules eight 0.5-core pods on one socket, they thrash the shared L3 cache. When you instead schedule two 2-core pods (or one 4-core pod with Node.js clustering), each pod owns a large slice of cache and almost never evicts its own working set.
# Recommended production configuration for Next.js SSR in Kubernetes
# Optimized for fewer, larger pods with Guaranteed QoS
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 10
template:
spec:
containers:
- name: nextjs
image: my-app:latest
resources:
requests:
cpu: "2200m" # 2.2 cores → Guaranteed QoS
memory: "4Gi"
limits:
cpu: "3200m" # Allow short bursts without throttling
memory: "6Gi"
env:
- name: NODE_OPTIONS
value: "--max-old-space-size=4096"
- name: CLUSTER_CONCURRENCY
value: "4" # Or use Node.js built-in cluster module / PM2
Why these numbers?
HorizontalPodAutoscaler (standard CPU-based):
# Horizontal Pod Autoscaler for CPU-based scaling
# Targets 60% CPU utilization as a safe default for Node.js SSR
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
name: nextjs
minReplicas: 4
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60 # Safe default for Node.js SSR
Pure CPU utilization is a poor signal for Node.js readiness. A busy event loop doing async I/O can sit at 15% CPU while being completely saturated.
Better: scale on Event Loop Utilization (ELU) or Event Loop Lag.
# Custom HPA scaling based on Event Loop Utilization (ELU)
# Scales when ELU exceeds 70% for better resource efficiency
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
metrics:
- type: Pods
pods:
metric:
name: nodejs_event_loop_utilization # Exposed by Prometheus node exporter or app instrumentation
target:
type: AverageValue
averageValue: 70 # Add pods when ELU > 70%
Real-world results from BlaBlaCar and Nearform show 2–3× better resource efficiency than CPU-only scaling.
For Next.js SSR in Kubernetes in 2025:
Do this and you’ll typically see 2–4× better p95 latency and throughput—often at lower cost.