Fix: Docker Compose depends_on Not Waiting for Service to Be Ready

Q: How do I fix "Docker Compose depends_on Not Waiting for Service to Be Ready"?

How to fix Docker Compose depends_on not working — services start in order but the app still crashes because depends_on only waits for container start, not service readiness. Includes healthcheck solutions.

The Error

You set depends_on in your docker-compose.yml to ensure services start in order, but your application still crashes on startup:

app_1   | Error: connect ECONNREFUSED 127.0.0.1:5432
app_1   | Connection refused — PostgreSQL not ready
db_1    | LOG:  database system is ready to accept connections

Or:

web_1   | redis.exceptions.ConnectionError: Error 111 connecting to redis:6379. Connection refused.

The app container starts before the database or Redis is ready to accept connections, even though depends_on is configured. The dependent service starts, but crashes before the dependency finishes initializing.

Why This Happens

depends_on in Docker Compose only controls container start order — it does not wait for the service inside the container to be ready. It starts containers in dependency order, but immediately moves to the next service as soon as the container process starts (not when the service is accepting connections).

From the Docker Compose documentation:

depends_on does not wait for db and redis to be “ready” before starting web — only until they have been started.

This is a common misconception. The database container may start in seconds, but PostgreSQL, MySQL, or Redis may take several more seconds to initialize, run migrations, or set up data directories before accepting connections.

Platform and Environment Differences

depends_on behavior changed significantly between Compose v1 and v2, and the surrounding orchestrator (plain Docker, Swarm, Kubernetes, Podman) handles dependency ordering differently. The same docker-compose.yml can pass on one host and fail on another.

Compose v1 (Python) vs v2 (Go). The original Compose CLI was written in Python and installed as docker-compose (with a hyphen). Compose v2 is a Go plugin installed as docker compose (with a space) and is the default in Docker Desktop and recent Docker Engine releases. Compose v1 reached end of life in mid-2023. The Python implementation had partial condition support; the Go implementation honors condition: service_healthy, condition: service_started, and condition: service_completed_successfully reliably. Run docker compose version and docker-compose --version to see which binary you are actually invoking — both may exist on the same host.

File format version: 2.x, 3.x, and the unversioned Compose Spec. The version: "2.x" schema preserved condition in depends_on. The version: "3.x" schema dropped it for a period (3.0 through 3.4 era) when Docker pushed Compose toward Swarm compatibility, then reinstated it. The current recommendation is to omit version: entirely — Compose v2 defaults to the unversioned Compose Spec and accepts both condition and healthcheck. If you copy a tutorial that uses version: "3.0", you may hit an older Compose that silently drops the condition. Newer Compose simply ignores version: and uses the Spec.

Docker Swarm vs Compose vs Podman Compose. docker stack deploy (Swarm) does not honor depends_on at all — Swarm assumes services come up independently and reconcile via retries. If you deploy the same file with docker stack deploy -c docker-compose.yml mystack, the depends_on block is ignored and your app starts in parallel with the database. Podman’s podman-compose is a separate implementation written in Python; it added healthcheck condition support later than Docker Compose v2 and the behavior under service_healthy may differ. Test the same file under podman compose (newer, written in Go) versus podman-compose (older Python) before assuming portability.

Kubernetes init containers as the alternative. Kubernetes does not have depends_on. The equivalent pattern is an init container that runs pg_isready -h db -p 5432 in a loop and exits 0 only when the dependency is ready. The main container does not start until all init containers complete. If you migrate from Compose to Kubernetes, expect to translate every condition: service_healthy into an init container plus a readiness probe.

Restart policies and dependency timing. restart: unless-stopped and restart: on-failure are evaluated by the Docker daemon, not Compose. After the initial docker compose up, if the dependency restarts (database OOM, network blip), Compose does not re-apply depends_on. The app container restarts and races the database again. Use a healthcheck inside the app’s own container so the supervisor sees a clear unhealthy state rather than a crash loop.

Docker Desktop vs Linux engine differences. Docker Desktop on macOS and Windows runs Docker inside a Linux VM. Filesystem timing in the VM is slower than a native Linux Docker Engine, especially with bind mounts. A start_period: 10s that works on a Linux CI runner may be too short on a developer’s MacBook. The healthcheck mechanism itself is identical; only the timing assumptions change.

Compose Watch and develop. Compose v2.22 added develop.watch for syncing files into running containers without a rebuild. develop.watch does not re-run depends_on conditions on file change — it only rebuilds or syncs the specific service. If your app depends on a database and you change the app code, the dependency is not re-checked; the running app container keeps using the existing database connection.

Fix 1: Use healthcheck with depends_on condition (Compose v2.1+)

Docker Compose v2.1+ supports condition in depends_on combined with healthcheck. This is the correct, built-in solution:

version: "3.8"

services:
  db:
    image: postgres:15
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: mydb
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
      interval: 5s
      timeout: 5s
      retries: 10
      start_period: 10s

  app:
    image: myapp:latest
    depends_on:
      db:
        condition: service_healthy  # Wait until db passes healthcheck
    environment:
      DATABASE_URL: postgres://user:password@db:5432/mydb

With condition: service_healthy, Compose waits until the db service’s healthcheck reports healthy before starting app.

Healthcheck commands for common services:

# PostgreSQL
healthcheck:
  test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
  interval: 5s
  timeout: 5s
  retries: 10

# MySQL / MariaDB
healthcheck:
  test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p${MYSQL_ROOT_PASSWORD}"]
  interval: 5s
  timeout: 5s
  retries: 10

# Redis
healthcheck:
  test: ["CMD", "redis-cli", "ping"]
  interval: 5s
  timeout: 3s
  retries: 5

# MongoDB
healthcheck:
  test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
  interval: 10s
  timeout: 5s
  retries: 5

# Generic HTTP service
healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
  interval: 10s
  timeout: 5s
  retries: 5
  start_period: 15s

Pro Tip: Use start_period when a service takes a long time to initialize. During the start period, failed healthchecks do not count toward the retry limit — preventing false failures during initial startup. Set it slightly longer than the typical startup time of the service.

Fix 2: Add Retry Logic to Your Application

Even with healthchecks, network conditions or race conditions can cause connection failures. Build retry logic directly into your application:

Node.js — retry with exponential backoff:

const { Pool } = require("pg");

async function connectWithRetry(maxRetries = 10, delayMs = 2000) {
  const pool = new Pool({ connectionString: process.env.DATABASE_URL });

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const client = await pool.connect();
      console.log("Database connected successfully");
      client.release();
      return pool;
    } catch (err) {
      console.error(`Attempt ${attempt}/${maxRetries} failed:`, err.message);
      if (attempt === maxRetries) throw err;
      await new Promise(resolve => setTimeout(resolve, delayMs * attempt));
    }
  }
}

module.exports = connectWithRetry();

Python — retry with tenacity:

from tenacity import retry, stop_after_attempt, wait_fixed
import psycopg2
import os

@retry(stop=stop_after_attempt(10), wait=wait_fixed(2))
def connect_to_db():
    conn = psycopg2.connect(os.environ["DATABASE_URL"])
    print("Database connected")
    return conn

db = connect_to_db()

Application-level retry is a good practice regardless of depends_on — in production, databases restart, network blips happen, and connections drop. An app that retries gracefully is more resilient than one that crashes on first failure.

Fix 3: Use a Wait Script (Legacy Approach)

Before condition: service_healthy was available, the common pattern was a wait-for-it.sh or dockerize script that polls until a port is open:

Using wait-for-it.sh:

# In your app's Dockerfile
COPY wait-for-it.sh /wait-for-it.sh
RUN chmod +x /wait-for-it.sh

services:
  app:
    image: myapp:latest
    command: ["/wait-for-it.sh", "db:5432", "--", "node", "server.js"]
    depends_on:
      - db

Download wait-for-it.sh from https://github.com/vishnubob/wait-for-it.

Using dockerize:

ENV DOCKERIZE_VERSION v0.7.0
RUN wget https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz \
    && tar -C /usr/local/bin -xzvf dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz

command: dockerize -wait tcp://db:5432 -timeout 60s node server.js

Note: The healthcheck + condition: service_healthy approach (Fix 1) is preferred over wait scripts — it is cleaner, does not require modifying the Dockerfile, and is officially supported.

Fix 4: Fix service_started and service_completed_successfully Conditions

Compose v2.1+ supports three conditions for depends_on:

depends_on:
  db:
    condition: service_started    # Default — just waits for container to start
  migrations:
    condition: service_completed_successfully  # Waits for a one-shot container to exit 0
  cache:
    condition: service_healthy    # Waits for healthcheck to pass

service_completed_successfully is useful for migration containers that run once and exit:

services:
  db:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user"]
      interval: 5s
      retries: 10

  migrate:
    image: myapp:latest
    command: ["npm", "run", "db:migrate"]
    depends_on:
      db:
        condition: service_healthy
    restart: "no"  # Don't restart after migration completes

  app:
    image: myapp:latest
    depends_on:
      db:
        condition: service_healthy
      migrate:
        condition: service_completed_successfully  # Wait for migrations to finish

This ensures: db starts → db is healthy → migrations run → app starts.

Fix 5: Fix restart Policy Masking the Real Issue

If your app has restart: always or restart: on-failure, Docker restarts it repeatedly when it fails to connect. Eventually the database is ready and the app connects — but the root cause (no readiness check) is hidden:

services:
  app:
    image: myapp:latest
    restart: on-failure   # Hides the depends_on problem
    depends_on:
      - db

This works in practice but is fragile. A restart loop wastes resources and generates misleading error logs. Use condition: service_healthy instead and keep restart: on-failure as a safety net, not the primary solution.

Common Mistake: Setting restart: always and calling it fixed. The application crashes and restarts 5–10 times before the database is ready. Each restart generates confusing error logs. Monitoring systems may alert on the crashes. Use healthchecks for a clean startup.

Fix 6: Debug depends_on Issues

Check if healthchecks are passing:

# Watch service health status
docker compose ps

# Or watch in real time
watch docker compose ps

# Check a specific service's health
docker inspect --format='{{json .State.Health}}' container_name | jq

Check healthcheck logs:

docker inspect container_name | jq '.[0].State.Health.Log'

This shows the last few healthcheck command outputs — useful to see why a healthcheck is failing.

Force a slow startup to reproduce the issue:

db:
  image: postgres:15
  command: ["sh", "-c", "sleep 10 && docker-entrypoint.sh postgres"]

Adding a sleep artificially delays the database, making the race condition obvious for debugging.

Fix 7: Multi-Service Dependency Chains

For complex dependency graphs:

services:
  postgres:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user"]
      interval: 5s
      retries: 10

  redis:
    image: redis:7
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      retries: 5

  api:
    image: myapi:latest
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

  worker:
    image: myworker:latest
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

  nginx:
    image: nginx:alpine
    depends_on:
      api:
        condition: service_started  # Just needs api container running, not healthy
    ports:
      - "80:80"

Note: condition: service_healthy requires the dependent service to have a healthcheck defined. If you add condition: service_healthy but forget the healthcheck, Compose raises an error:

service "db" is not healthy because it has no healthcheck defined

Still Not Working?

Check Compose file version. The condition field in depends_on requires Compose file version 2.1 or later (and Docker Compose v1.27+). The 3.x format supports it only with Docker Compose v2 (docker compose, not docker-compose):

# This works with docker compose (v2 CLI)
version: "3.8"
services:
  app:
    depends_on:
      db:
        condition: service_healthy

Run docker compose version to confirm you have Compose v2.

Check that the healthcheck command exits with 0 on success. A healthcheck command that always exits with a non-zero code keeps the service in an unhealthy state permanently. Test the healthcheck command inside the running container:

docker exec container_name pg_isready -U user
echo $?  # Should print 0 for healthy

Check for network issues between containers. Even when services are healthy, DNS resolution between containers requires them to be on the same Docker network. Make sure all services share a network:

services:
  db:
    networks:
      - app-network
  app:
    networks:
      - app-network

networks:
  app-network:
    driver: bridge

Check extra_hosts and DNS for cross-stack dependencies. When the dependency lives in a different Compose project (database in stack-data, app in stack-web), Compose-level depends_on cannot reach across stacks. Either join both stacks into one file, share an external network with networks: name: shared external: true, or use extra_hosts to point the app at a known IP. The application-level retry pattern from Fix 2 becomes mandatory in this layout.

Check that the healthcheck binary exists in the image. Slim images (e.g., postgres:15-alpine, node:20-slim) sometimes lack curl, wget, or pg_isready. A healthcheck that calls a missing binary returns exit code 127 (command not found) and the service is stuck unhealthy forever. Use docker compose exec <service> which pg_isready to confirm the tool is present before relying on it.

Check for cyclic dependency or self-reference. A service that declares depends_on pointing back at itself or forming a cycle (a depends on b, b depends on a) causes Compose to fail with a configuration error. The error message names the cycle but does not always make it obvious where the loop started — run docker compose config to print the resolved graph.

Check that the healthcheck does not require shell features the test runner lacks. test: ["CMD-SHELL", "..."] runs inside /bin/sh, which in Alpine is ash and in Debian-based images is dash. Bash-only features like [[ ... ]] fail silently and the healthcheck always returns non-zero. Use POSIX sh syntax or switch to ["CMD", ...] with a static binary.

For other Docker startup errors, see Fix: Docker container exited with code 137 (OOMKilled) and Fix: Docker no space left on device. When the dependency container restarts repeatedly during boot, see Fix: Docker container keeps restarting. For env vars in Compose dependency chains, see Fix: Docker Compose env file not loaded.