Fix: Docker HEALTHCHECK Failing — Container Marked Unhealthy Despite Running
Quick Answer
How to fix Docker HEALTHCHECK failures — command syntax, curl vs wget availability, start period, interval tuning, health check in docker-compose, and debugging unhealthy containers.
The Problem
A Docker container starts and runs correctly, but its status shows unhealthy:
docker ps
CONTAINER ID IMAGE STATUS
a1b2c3d4e5f6 myapp Up 2 minutes (unhealthy)
# Container is running, but health check is failingOr the container keeps restarting because orchestrators (Docker Swarm, Kubernetes) kill unhealthy containers:
docker inspect myapp --format='{{.State.Health.Status}}'
# unhealthy
docker inspect myapp --format='{{json .State.Health.Log}}'
# [{"Start":"...","End":"...","ExitCode":1,"Output":"curl: (7) Failed to connect to localhost port 8080: Connection refused"}]Or the start_period isn’t long enough, causing the container to be marked unhealthy before the application finishes starting:
# Container starts, health check runs immediately, app not ready yet
# Health check fails 3 times → container marked unhealthy
# But app would have been healthy 10 seconds laterWhy This Happens
Docker’s HEALTHCHECK instruction runs a command inside the container on a schedule. The container is marked unhealthy after a specified number of consecutive failures. Common causes:
curlorwgetnot in the image — the default health check command is oftencurl http://localhost:8080/health, but minimal images (Alpine, distroless) don’t include curl.- Wrong port or path — the health check targets a port or path that doesn’t exist or isn’t reachable from inside the container.
- Localhost vs 0.0.0.0 — the app listens on
0.0.0.0but the health check tries127.0.0.1— this should work, but some configurations bind only to a specific interface. start_periodtoo short — the health check starts counting failures immediately by default. Slow-starting applications (JVM, large Node.js apps) aren’t ready within the default start period.- Exit code not zero — the health check command must exit
0to be healthy. If the HTTP request succeeds but the command returns a non-zero exit code for other reasons, Docker marks it as failed. - Shell unavailable —
HEALTHCHECK CMDwithout["CMD-SHELL", "..."]runs the command directly without a shell, so shell features (&&,||, pipes) don’t work.
Fix 1: Install curl or Use Alternatives
Minimal images often lack curl. Either install it or use an alternative:
# Alpine — install curl (adds ~1MB)
FROM node:20-alpine
RUN apk add --no-cache curl
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
# Alternatively — use wget (included in busybox/Alpine)
HEALTHCHECK CMD wget -qO- http://localhost:3000/health || exit 1
# Or use nc (netcat) to check if port is open (no HTTP check)
HEALTHCHECK CMD nc -z localhost 3000 || exit 1Distroless images — copy curl binary or use a shell script:
FROM gcr.io/distroless/nodejs20-debian12
# Distroless has no shell, no curl — use a compiled healthcheck binary
# Option 1: Use a multi-stage build to include a health check binary
FROM golang:1.22-alpine AS healthcheck-builder
RUN go build -o /healthcheck github.com/grpc-ecosystem/grpc-health-probe/...
FROM gcr.io/distroless/nodejs20-debian12
COPY --from=healthcheck-builder /healthcheck /healthcheck
HEALTHCHECK --interval=30s --timeout=10s \
CMD ["/healthcheck", "-addr=:3000"]Node.js — use a JavaScript health check script:
FROM node:20-alpine
# health-check.js included in the image
COPY health-check.js .
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s \
CMD node health-check.js// health-check.js
const http = require('http');
const options = {
host: 'localhost',
port: process.env.PORT || 3000,
path: '/health',
timeout: 5000,
};
const req = http.request(options, (res) => {
process.exit(res.statusCode === 200 ? 0 : 1);
});
req.on('error', () => process.exit(1));
req.on('timeout', () => { req.abort(); process.exit(1); });
req.end();Fix 2: Set Correct Timing Parameters
The default HEALTHCHECK timing often causes false positives for real-world applications:
# Default values (if not specified):
# --interval=30s Check every 30 seconds
# --timeout=30s Fail if check takes longer than 30 seconds
# --start-period=0s Start counting failures immediately
# --retries=3 Mark unhealthy after 3 consecutive failures
# Optimized for a typical web application:
HEALTHCHECK \
--interval=10s \ # Check frequently during development
--timeout=5s \ # Fail fast if unresponsive
--start-period=60s \ # Wait 60s before counting failures (startup time)
--retries=5 \ # 5 failures before marking unhealthy
CMD curl -f http://localhost:3000/health || exit 1
# For a JVM/Spring Boot application (slow startup):
HEALTHCHECK \
--interval=20s \
--timeout=10s \
--start-period=120s \ # JVM apps often take 30-90s to start
--retries=5 \
CMD curl -f http://localhost:8080/actuator/health || exit 1start-period explained:
During start_period, health check failures don’t count toward retries. The container is starting during this window. After start_period elapses, failures start counting. This prevents false unhealthy status during legitimate startup.
# Check the health status timeline
docker inspect myapp | jq '.State.Health.Log[-5:]'
# Look at Start timestamps to see when checks began
# Compare against the container's start time in .State.StartedAtFix 3: Fix Health Check Command Syntax
The CMD form in HEALTHCHECK has two variants with different behavior:
# CMD (exec form) — runs directly, no shell, shell features not available
HEALTHCHECK CMD ["curl", "-f", "http://localhost:3000/health"]
# Equivalent to: docker exec container curl -f http://localhost:3000/health
# CMD-SHELL (shell form) — runs through /bin/sh -c
HEALTHCHECK CMD curl -f http://localhost:3000/health || exit 1
# Equivalent to: docker exec container /bin/sh -c "curl -f ... || exit 1"
# The || exit 1 requires a shell — use CMD-SHELL form
# Explicit CMD-SHELL form (equivalent to plain string CMD)
HEALTHCHECK CMD ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]curl -f flag — -f (fail) makes curl return exit code 22 for HTTP error responses (4xx, 5xx). Without it, curl exits 0 even on 404 or 500 responses:
# WRONG — exits 0 even on 500 Internal Server Error
HEALTHCHECK CMD curl http://localhost:3000/health
# CORRECT — exits non-zero on 4xx/5xx HTTP responses
HEALTHCHECK CMD curl -f http://localhost:3000/healthTest the health check command manually:
# Run the exact command inside the container
docker exec myapp curl -f http://localhost:3000/health
echo "Exit code: $?" # Should be 0 for healthy
# Run as root to rule out permission issues
docker exec -u root myapp curl -f http://localhost:3000/health
# Check if curl is available
docker exec myapp which curl || echo "curl not found"Fix 4: Configure Health Checks in docker-compose
docker-compose.yml supports overriding or adding health checks:
# docker-compose.yml
services:
api:
image: myapp:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 15s
timeout: 5s
retries: 5
start_period: 30s
# Service that waits for api to be healthy
worker:
image: myworker:latest
depends_on:
api:
condition: service_healthy # Wait for api to pass health check
# Without this, worker starts immediately — api may not be ready
# Database with built-in health check
postgres:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
start_period: 10s
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
retries: 3Disable health check for a service (override Dockerfile’s HEALTHCHECK):
services:
myservice:
image: myapp:latest
healthcheck:
disable: true # Ignore the HEALTHCHECK from the DockerfileFix 5: Implement a Proper Health Endpoint
The health check’s value depends on what /health actually checks. A proper health endpoint verifies that the application can serve requests:
// Express.js — comprehensive health endpoint
app.get('/health', async (req, res) => {
const health: Record<string, unknown> = {
status: 'ok',
uptime: process.uptime(),
timestamp: new Date().toISOString(),
};
// Check database connection
try {
await db.query('SELECT 1');
health.database = 'ok';
} catch (err) {
health.database = 'error';
health.status = 'degraded';
}
// Check Redis connection
try {
await redis.ping();
health.cache = 'ok';
} catch (err) {
health.cache = 'error';
health.status = 'degraded';
}
const statusCode = health.status === 'ok' ? 200 : 503;
res.status(statusCode).json(health);
});Separate liveness vs readiness (Kubernetes pattern, useful in Docker too):
// Liveness — "is the process alive?" (simple, rarely fails)
app.get('/health/live', (req, res) => {
res.status(200).json({ status: 'alive' });
});
// Readiness — "can it serve traffic?" (checks dependencies)
app.get('/health/ready', async (req, res) => {
try {
await Promise.all([
db.query('SELECT 1'),
redis.ping(),
]);
res.status(200).json({ status: 'ready' });
} catch (err) {
res.status(503).json({ status: 'not ready', error: err.message });
}
});# Use the liveness check for Docker HEALTHCHECK (avoid killing due to DB outage)
HEALTHCHECK CMD curl -f http://localhost:3000/health/live || exit 1Fix 6: Debug an Unhealthy Container
When a container is unhealthy, inspect the recent health check history:
# View last 5 health check results
docker inspect myapp --format='{{json .State.Health.Log}}' | \
python3 -m json.tool | head -50
# Each log entry contains:
# Start: when the check began
# End: when it finished
# ExitCode: 0=healthy, non-zero=unhealthy
# Output: stdout/stderr from the check command
# Watch health status in real time
watch -n 2 'docker inspect myapp --format="Status: {{.State.Health.Status}}"'
# Get full container inspect output
docker inspect myapp | jq '.State.Health'Common output messages and their meanings:
"curl: (7) Failed to connect to localhost port 3000: Connection refused"
→ App not listening on port 3000 yet, or crashed
→ Fix: Increase start-period or check app startup
"curl: (22) The requested URL returned error: 500"
→ App is running but returning 500 error
→ Fix: Debug the health endpoint — check app logs
"curl: (6) Could not resolve host: localhost"
→ Unusual — network configuration issue
→ Fix: Use 127.0.0.1 instead of localhost
"OCI runtime exec failed: exec: 'curl': executable file not found"
→ curl not in the image
→ Fix: Install curl or use alternativeFix 7: Use Health Checks in Production Orchestration
In production with Docker Swarm or Kubernetes, health checks drive automatic recovery:
Docker Swarm — restart unhealthy replicas:
# docker-compose.yml (Swarm mode)
services:
api:
image: myapp:latest
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 10s
timeout: 5s
retries: 3
start_period: 30sHealth check during rolling updates — Swarm waits for the new container to pass its health check before removing the old one, enabling zero-downtime deployments.
Still Not Working?
Different user inside the container — the health check runs as the container’s user (often non-root). If the app listens on a port below 1024 (privileged), a non-root user may not be able to connect. Use ports above 1024 or run as root.
IPv6 vs IPv4 binding — if the app binds to ::1 (IPv6 localhost) but curl tries 127.0.0.1 (IPv4), the connection fails. Try using [::1] in the curl URL or bind the app to 0.0.0.0.
Health check timing with depends_on — in Docker Compose, depends_on: service_healthy only works if the dependency defines a healthcheck. If it doesn’t have one, Docker Compose ignores the service_healthy condition.
Misleading exit codes — some commands return non-zero for reasons unrelated to the actual health. Test the command manually inside the container with docker exec to verify the exit code and output before trusting it in a HEALTHCHECK.
For related Docker issues, see Fix: Docker Build Cache Invalidated and Fix: Docker Multi-Stage Build Failed.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Docker Secrets Not Working — BuildKit --secret Not Mounting, Compose Secrets Undefined, or Secret Leaking into Image
How to fix Docker secrets — BuildKit secret mounts in Dockerfile, docker-compose secrets config, runtime vs build-time secrets, environment variable alternatives, and verifying secrets don't leak into image layers.
Fix: Docker Compose Healthcheck Not Working — depends_on Not Waiting or Always Unhealthy
How to fix Docker Compose healthcheck issues — depends_on condition service_healthy, healthcheck command syntax, start_period, custom health scripts, and debugging unhealthy containers.
Fix: docker-compose.override.yml Not Working — Override File Ignored or Not Merged
How to fix docker-compose.override.yml not being applied — file naming, merge behavior, explicit file flags, environment-specific configs, and common override pitfalls.
Fix: Docker Build ARG Not Available — ENV Variables Missing at Runtime
How to fix Docker ARG and ENV variable issues — build-time vs runtime scope, ARG before FROM, multi-stage build variable passing, secret handling, and .env file patterns.