Optimizing Next.js Server-Side Rendering in Kubernetes

December 5, 2025

Next.js server-side rendering (SSR) is CPU-intensive. Every request triggers React rendering, data fetching, and HTML serialization—all executed on Node.js’s single-threaded event loop. In Kubernetes, how you allocate and schedule CPU is one of the most important performance levers for SSR workloads (often more impactful than memory tuning or network optimizations).

This post explains why the default “many small pods” pattern destroys tail latency, shows real benchmarks, and gives production-ready configurations that routinely deliver 2–4× better p95 latency and throughput.

The Core Problem: Context Switching Kills Tail Latency

Most teams start with the obvious configuration:

# Default configuration with small pods leading to high tail latency
resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "1Gi"

This creates ~8 pods per 4-core node. It feels efficient—until you measure tail latency.

Configuration	Throughput (req/s)	p95 Latency	CPU per node
8 × 0.5 CPU pods (baseline)	920	285 ms	4 cores
2 × 2 CPU pods	1,850	98 ms	4 cores
1 × 4 CPU pod + clustering	2,210	72 ms	4 cores

Why is the difference so dramatic?

Every pod-to-pod context switch costs 1–3 μs and triggers TLB shootdowns that invalidate translation lookaside buffers 7.
L3 cache is shared across cores on modern CPUs. Packing many pods on the same socket creates cache contention and “noisy neighbor” effects 12.
When pods cross NUMA nodes, remote memory access can be 1.5–3× slower than local access.

Fewer, larger pods → fewer context switches → warmer caches → dramatically better tail latency.

CPU Architecture Reminder (Why This Happens in 2025 Hardware)

Modern AMD EPYC and Intel Xeon CPUs (Milan, Genoa, Sapphire Rapids, Emerald Rapids) have:

12–16 cores per socket
32–96 MiB of shared L3 cache per socket
Separate last-level caches and memory controllers per NUMA node

When Kubernetes schedules eight 0.5-core pods on one socket, they thrash the shared L3 cache. When you instead schedule two 2-core pods (or one 4-core pod with Node.js clustering), each pod owns a large slice of cache and almost never evicts its own working set.

Recommended Production Configuration

# Recommended production configuration for Next.js SSR in Kubernetes
# Optimized for fewer, larger pods with Guaranteed QoS
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 10
  template:
    spec:
      containers:
      - name: nextjs
        image: my-app:latest
        resources:
          requests:
            cpu: "2200m"  # 2.2 cores → Guaranteed QoS
            memory: "4Gi"
          limits:
            cpu: "3200m"  # Allow short bursts without throttling
            memory: "6Gi"
        env:
          - name: NODE_OPTIONS
            value: "--max-old-space-size=4096"
          - name: CLUSTER_CONCURRENCY
            value: "4"  # Or use Node.js built-in cluster module / PM2

Why these numbers?

2200m request gives Guaranteed QoS class and avoids compression/throttling.
3200m limit allows legitimate CPU bursts during traffic spikes.
4-way clustering inside the pod fully utilizes the 3.2 allocatable cores during bursts.

HorizontalPodAutoscaler (standard CPU-based):

# Horizontal Pod Autoscaler for CPU-based scaling
# Targets 60% CPU utilization as a safe default for Node.js SSR
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    name: nextjs
  minReplicas: 4
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60  # Safe default for Node.js SSR

When CPU Is Effectively Unlimited: Scale on Event Loop Utilization

Pure CPU utilization is a poor signal for Node.js readiness. A busy event loop doing async I/O can sit at 15% CPU while being completely saturated.

Better: scale on Event Loop Utilization (ELU) or Event Loop Lag.

# Custom HPA scaling based on Event Loop Utilization (ELU)
# Scales when ELU exceeds 70% for better resource efficiency
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  metrics:
  - type: Pods
    pods:
      metric:
        name: nodejs_event_loop_utilization  # Exposed by Prometheus node exporter or app instrumentation
      target:
        type: AverageValue
        averageValue: 70  # Add pods when ELU > 70%

Real-world results from BlaBlaCar and Nearform show 2–3× better resource efficiency than CPU-only scaling.

Future Improvements (30–80% More Throughput Possible Today)

Platformatic Runtime or Watt – replaces Next.js server with a faster request handler 1
Static CPU Manager Policy + core pinning for sub-millisecond latency jitter 6
Vertical Pod Autoscaler recommendations + cluster autoscaler with Spot VMs 10, 11

Conclusion

For Next.js SSR in Kubernetes in 2025:

Stop deploying dozens of tiny 0.25–0.5 CPU pods.
Use 2–4 CPU pods with light clustering.
Set meaningful requests/limits (e.g., 2.2–3.2 cores).
Prefer Guaranteed QoS.
Consider Event Loop Utilization for scaling when you have headroom.

Do this and you’ll typically see 2–4× better p95 latency and throughput—often at lower cost.

References