Skip to content

Fix: Docker Compose Healthcheck Not Working — depends_on Not Waiting or Always Unhealthy

FixDevs ·

Quick Answer

How to fix Docker Compose healthcheck issues — depends_on condition service_healthy, healthcheck command syntax, start_period, custom health scripts, and debugging unhealthy containers.

The Problem

A service starts before its dependency is ready, despite depends_on being configured:

services:
  app:
    depends_on:
      - db  # App starts before DB is accepting connections
  db:
    image: postgres:16

Or depends_on with condition: service_healthy causes the dependent service to never start:

services:
  app:
    depends_on:
      db:
        condition: service_healthy  # App waits forever — DB stays 'starting'
  db:
    image: postgres:16
    # No healthcheck defined — condition never satisfied

Or a healthcheck is defined but the container shows (unhealthy) despite the service being fine:

docker ps
# CONTAINER  STATUS
# my-db      Up 2 minutes (unhealthy)

Why This Happens

Docker’s depends_on by default only waits for the container to start, not for the service inside it to be ready. Common failures:

  • depends_on without condition — the default condition: service_started means “wait for the container to start,” not “wait for the database to accept connections.” Your app may start while Postgres is still initializing.
  • No healthcheck definedcondition: service_healthy requires an explicit healthcheck block on the dependency. Without one, the container never transitions from starting to healthy.
  • Wrong healthcheck command — if the health command uses a binary not available in the container, it fails immediately with exit code 1 or 127.
  • start_period too short — Postgres, MySQL, and other databases take several seconds (sometimes 30+) to initialize on first boot. If the health check runs during this window, it fails and the container is marked unhealthy.

Fix 1: Add a Healthcheck to the Dependency

condition: service_healthy only works when the dependency has a healthcheck:

services:
  app:
    image: myapp:latest
    depends_on:
      db:
        condition: service_healthy  # Wait until db is healthy
      redis:
        condition: service_healthy
    environment:
      DATABASE_URL: postgresql://user:pass@db:5432/mydb

  db:
    image: postgres:16
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: mydb
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
      interval: 5s       # Check every 5 seconds
      timeout: 5s        # Fail if no response within 5 seconds
      retries: 5         # Mark unhealthy after 5 consecutive failures
      start_period: 10s  # Don't count failures during first 10s (init time)

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 3
      start_period: 5s

Fix 2: Healthcheck Commands for Common Services

Correct health commands for popular services:

# PostgreSQL
healthcheck:
  test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-postgres} -d ${POSTGRES_DB:-postgres}"]
  interval: 5s
  timeout: 5s
  retries: 5
  start_period: 15s

# MySQL / MariaDB
healthcheck:
  test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p${MYSQL_ROOT_PASSWORD}"]
  interval: 5s
  timeout: 5s
  retries: 5
  start_period: 30s  # MySQL takes longer to initialize

# MongoDB
healthcheck:
  test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
  interval: 5s
  timeout: 5s
  retries: 5
  start_period: 20s

# Redis
healthcheck:
  test: ["CMD", "redis-cli", "ping"]
  interval: 5s
  timeout: 3s
  retries: 3

# RabbitMQ
healthcheck:
  test: ["CMD", "rabbitmq-diagnostics", "ping"]
  interval: 10s
  timeout: 10s
  retries: 5
  start_period: 30s

# Elasticsearch
healthcheck:
  test: ["CMD-SHELL", "curl -fs http://localhost:9200/_cluster/health | grep -vq '\"status\":\"red\"'"]
  interval: 10s
  timeout: 10s
  retries: 5
  start_period: 60s

# Custom HTTP service
healthcheck:
  test: ["CMD-SHELL", "curl -fs http://localhost:8080/health || exit 1"]
  interval: 10s
  timeout: 5s
  retries: 3
  start_period: 20s

CMD vs CMD-SHELL syntax:

# CMD — exec form, no shell, each word is a separate array element
test: ["CMD", "pg_isready", "-U", "postgres"]

# CMD-SHELL — runs via /bin/sh -c, supports shell features
test: ["CMD-SHELL", "pg_isready -U postgres && echo healthy"]

# String form — equivalent to CMD-SHELL
test: "pg_isready -U postgres"

Note: Prefer CMD over CMD-SHELL when possible — it avoids shell injection and is more reliable. Use CMD-SHELL only when you need shell features like &&, pipes, or variable expansion.

Fix 3: Tune start_period for Slow Services

start_period prevents failures during initialization from counting toward retries:

healthcheck:
  test: ["CMD-SHELL", "pg_isready -U postgres"]
  interval: 5s       # How often to run the check
  timeout: 5s        # How long to wait for each check
  retries: 5         # Failures after start_period before marking unhealthy
  start_period: 30s  # Grace period — failures here don't count

When to increase start_period:

# Postgres with large schemas / initial data — 30s+
# MySQL with InnoDB recovery — 60s+
# Elasticsearch with large indices — 60-120s
# Kafka/Zookeeper cluster — 60s+
# Services with slow JVM startup (Spring Boot) — 30-60s

# For development, longer start_period reduces false unhealthy states
start_period: 60s

# For production CI, shorter start_period catches real problems faster
start_period: 10s

Fix 4: Debug Unhealthy Containers

When a container shows (unhealthy), inspect the health check output:

# See health status and last check output
docker inspect --format='{{json .State.Health}}' my-db | python -m json.tool

# Example output:
# {
#   "Status": "unhealthy",
#   "FailingStreak": 3,
#   "Log": [
#     {
#       "Start": "2026-03-26T10:00:00Z",
#       "End": "2026-03-26T10:00:05Z",
#       "ExitCode": 1,
#       "Output": "pg_isready: error: could not connect to server: FATAL: password authentication failed for user \"postgres\""
#     }
#   ]
# }

# Or use docker events to watch health transitions
docker events --filter "type=container" --filter "event=health_status"

Run the health check command manually inside the container:

# Connect to the container and run the health check manually
docker exec my-db pg_isready -U postgres -d mydb

# Run the exact CMD from the healthcheck
docker exec my-db sh -c "pg_isready -U postgres -d mydb"

# Check if the command exists in the container
docker exec my-db which pg_isready
docker exec my-db which redis-cli

Common unhealthy causes by exit code:

Exit 0 — healthy
Exit 1 — health check failed (service not ready)
Exit 127 — command not found (binary doesn't exist in container)
Exit 124 — timeout exceeded

Fix 5: Healthcheck in Custom Application Images

Add healthcheck to your own Dockerfiles:

# Dockerfile — add HEALTHCHECK instruction
FROM node:20-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .

EXPOSE 3000

# Install curl for the health check (alpine doesn't have it by default)
RUN apk add --no-cache curl

HEALTHCHECK --interval=10s --timeout=5s --start-period=15s --retries=3 \
  CMD curl -fs http://localhost:3000/health || exit 1

CMD ["node", "server.js"]
# Python / FastAPI
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

EXPOSE 8000

HEALTHCHECK --interval=10s --timeout=5s --start-period=20s --retries=3 \
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Implement a /health endpoint in your app:

// Express health endpoint
app.get('/health', (req, res) => {
  // Check critical dependencies
  const healthy = db.isConnected() && redis.isReady();
  if (healthy) {
    res.json({ status: 'ok' });
  } else {
    res.status(503).json({ status: 'unhealthy', reason: 'dependency unavailable' });
  }
});

Fix 6: Full Example with All Conditions

A production-ready docker-compose.yml with proper health checks:

version: '3.8'

services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgresql://user:pass@db:5432/appdb
      REDIS_URL: redis://redis:6379
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "curl -fs http://localhost:3000/health || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 20s

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: appdb
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d appdb"]
      interval: 5s
      timeout: 5s
      retries: 5
      start_period: 15s
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 3
      start_period: 5s
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:

Still Not Working?

depends_on is ignored by docker compose up --no-deps — the --no-deps flag skips dependency resolution. Remove it if you need health check waiting.

Service marked healthy but app still failsdepends_on: condition: service_healthy only ensures the container’s health check passes. Your app may still need a retry loop for the actual connection, since TCP acceptance and application readiness aren’t always the same. Add retry logic in your app’s startup code.

Healthcheck passes but container restarts — the process exiting (crash) is separate from the health check. A container can be healthy but still restart if the main process crashes. Check docker logs for the crash reason.

Health check works locally but fails in CI — CI environments often have less CPU/memory, causing services to start slower. Increase start_period and retries in CI, or use a separate docker-compose.ci.yml with longer timeouts.

For related Docker issues, see Fix: Docker Container Keeps Restarting and Fix: Docker Compose depends_on Not Working.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles