Skip to content

Fix: Kubernetes HPA Not Scaling — HorizontalPodAutoscaler Shows Unknown or Doesn't Scale

FixDevs ·

Quick Answer

How to fix Kubernetes HorizontalPodAutoscaler issues — metrics-server not installed, CPU requests not set, unknown metrics, scale-down delay, custom metrics, and KEDA.

The Problem

A Kubernetes HorizontalPodAutoscaler shows <unknown> for the current metric value:

kubectl get hpa
# NAME      REFERENCE            TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
# my-hpa    Deployment/my-app    <unknown>/50%   2         10        2          5m

Or the HPA doesn’t scale up even when the application is clearly overloaded:

# CPU usage visible in kubectl top pods
kubectl top pods
# NAME                      CPU(cores)   MEMORY(bytes)
# my-app-7d9f8b6c4-xk2p9   950m         256Mi

# But HPA still shows 1 replica and won't scale
kubectl describe hpa my-hpa
# Warning  FailedGetScale  unable to fetch metrics from resource metrics API

Or the HPA scales up but never scales back down, leaving excess replicas running.

Why This Happens

HPA relies on the metrics API to make scaling decisions. Common failure causes:

  • metrics-server not installed — HPA’s default CPU and memory metrics require metrics-server in the cluster. Without it, all metrics show <unknown>.
  • No CPU requests on the container — HPA calculates CPU utilization as current usage / requested CPU. If resources.requests.cpu is not set, HPA can’t calculate a percentage and shows <unknown>.
  • metrics-server not accessiblemetrics-server uses kubelet’s resource endpoints. In some setups (kubeadm, kind, minikube), the kubelet’s serving certificate isn’t trusted, requiring --kubelet-insecure-tls.
  • Scale-down cooldown — by default, HPA waits 5 minutes before scaling down to avoid flapping. Replicas won’t decrease immediately after load drops.
  • Wrong metric target typeUtilization (percentage) vs AverageValue (absolute) have different meanings and requirements.

Fix 1: Install metrics-server

CPU and memory HPA requires metrics-server. Verify it’s installed and working:

# Check if metrics-server is installed
kubectl get deployment metrics-server -n kube-system

# If not found, install with Helm
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm upgrade --install metrics-server metrics-server/metrics-server \
  --namespace kube-system

# Or with kubectl
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify metrics are working
kubectl top nodes
kubectl top pods -A

For kubeadm, kind, minikube — add --kubelet-insecure-tls:

# The default metrics-server deployment fails in clusters where
# kubelet serving certificates aren't signed by the cluster CA

# Patch the deployment to add the insecure flag
kubectl patch deployment metrics-server -n kube-system \
  --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

# Or in Helm values:
helm upgrade --install metrics-server metrics-server/metrics-server \
  --namespace kube-system \
  --set args[0]="--kubelet-insecure-tls"

For minikube:

minikube addons enable metrics-server

Fix 2: Set CPU Requests on the Container

HPA requires resources.requests.cpu to calculate utilization percentage:

# WRONG — no resource requests
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
        - name: my-app
          image: my-app:latest
          # No resources section → HPA shows <unknown>

---
# CORRECT — set CPU requests (and ideally limits)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
        - name: my-app
          image: my-app:latest
          resources:
            requests:
              cpu: "200m"       # 200 millicores = 0.2 CPU cores
              memory: "256Mi"
            limits:
              cpu: "1000m"      # 1 CPU core maximum
              memory: "512Mi"

Create the HPA targeting CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50  # Scale when avg CPU > 50% of requests

Or with kubectl autoscale:

# Create HPA targeting 50% CPU utilization
kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10

# Verify
kubectl get hpa my-app
kubectl describe hpa my-app

Fix 3: Configure Scale Behavior to Prevent Flapping

The default scale-down policy is conservative. Customize it for your use case:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60   # Wait 60s before scaling up again
      policies:
        - type: Pods
          value: 4                     # Add at most 4 pods at once
          periodSeconds: 60
        - type: Percent
          value: 100                   # Or double the current count
          periodSeconds: 60
      selectPolicy: Max               # Use the policy that allows more scaling
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
        - type: Pods
          value: 1                     # Remove at most 1 pod at a time
          periodSeconds: 120

Aggressive scale-down (for cost savings):

behavior:
  scaleDown:
    stabilizationWindowSeconds: 60  # Shorter wait
    policies:
      - type: Percent
        value: 50          # Remove up to 50% of pods at once
        periodSeconds: 60

Prevent scale-down entirely (for critical services):

behavior:
  scaleDown:
    selectPolicy: Disabled  # Never scale down (manual only)

Fix 4: Use Multiple Metrics

Scale on both CPU and memory, or combine with custom metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    # CPU utilization
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    # Memory utilization
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70
    # Custom metric from Prometheus (requires Prometheus Adapter)
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"  # Scale when each pod handles > 100 req/s

Note: When multiple metrics are defined, HPA scales to satisfy ALL of them — it uses the metric that requires the most replicas. This is conservative by design.

Fix 5: Set Up Custom Metrics with Prometheus Adapter

For application-level metrics (queue depth, request rate), use the Prometheus Adapter:

# Install Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --set prometheus.url=http://prometheus-server.monitoring.svc.cluster.local
# Configure the adapter to expose a custom metric
# In prometheus-adapter ConfigMap:
rules:
  custom:
    - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
      resources:
        overrides:
          namespace: { resource: "namespace" }
          pod: { resource: "pod" }
      name:
        matches: "^(.*)_total$"
        as: "${1}_per_second"
      metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
# HPA using the custom metric
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "50"

Fix 6: Use KEDA for Event-Driven Autoscaling

KEDA (Kubernetes Event-Driven Autoscaling) scales based on external event sources like Kafka, Redis, AWS SQS, and more:

# Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace
# Scale based on Redis queue length
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: redis-scaledobject
spec:
  scaleTargetRef:
    name: my-worker
  minReplicaCount: 0     # Scale to zero when queue is empty
  maxReplicaCount: 30
  triggers:
    - type: redis
      metadata:
        address: redis:6379
        listName: jobs
        listLength: "10"   # 1 pod per 10 items in queue

---
# Scale based on Kafka consumer lag
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer
spec:
  scaleTargetRef:
    name: kafka-worker
  minReplicaCount: 1
  maxReplicaCount: 50
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka:9092
        consumerGroup: my-group
        topic: events
        lagThreshold: "100"  # 1 pod per 100 unprocessed messages

Note: KEDA supports scale-to-zero, which the built-in HPA doesn’t. This is useful for batch workloads or workers that should be inactive when there’s no work.

Still Not Working?

HPA not found by kubectl describe — if kubectl describe hpa shows FailedComputeMetricsReplicas, check the events section. Common messages:

  • "unable to fetch metrics from resource metrics API" → metrics-server not running
  • "missing request for cpu" → no CPU requests on pod spec
  • "invalid metrics" → wrong metric type or name in the HPA spec

HPA scales up but Cluster Autoscaler doesn’t add nodes — if all pods are Pending after HPA scales up, the cluster may be out of capacity. Check if Cluster Autoscaler is installed and configured. HPA and Cluster Autoscaler work together: HPA scales pods, Cluster Autoscaler scales nodes.

Verify the metrics API is accessible:

# Test the metrics API directly
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods"

# Check metrics-server logs
kubectl logs -n kube-system deployment/metrics-server

HPA shows correct metrics but doesn’t scale — check minReplicas and maxReplicas. If REPLICAS already equals maxReplicas, HPA can’t scale up further. Also check if the deployment has a PodDisruptionBudget preventing scale-down.

For related Kubernetes issues, see Fix: Kubernetes CrashLoopBackOff and Fix: Kubernetes Pod Pending.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles