Skip to content

Fix: nginx Upstream Load Balancing Not Working — All Traffic Hitting One Server

FixDevs ·

Quick Answer

How to fix nginx load balancing issues — upstream block configuration, health checks, least_conn vs round-robin, sticky sessions, upstream timeouts, and SSL termination.

The Problem

nginx is configured for load balancing but all requests go to the same upstream server:

upstream backend {
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
    server 10.0.0.3:3000;
}

# Requests only reach 10.0.0.1 — other servers get no traffic

Or one server gets most of the traffic due to a misconfigured weight:

upstream backend {
    server 10.0.0.1:3000 weight=10;   # Gets 10x the traffic
    server 10.0.0.2:3000;             # Default weight=1
}

Or nginx marks a healthy upstream server as down and stops sending traffic to it:

# Error log shows:
# upstream timed out (110: Connection timed out) while reading response header
# from upstream, client: ..., upstream: "http://10.0.0.2:3000/"
# no live upstreams while connecting to upstream

Or all connections go to one server because of sticky sessions configured incorrectly.

Why This Happens

nginx’s upstream load balancing has several configuration pitfalls:

  • Default round-robin doesn’t distribute short connections — for very fast requests, one worker process handles many requests sequentially, making it appear all traffic goes to one server.
  • ip_hash makes requests sticky per client IP — all requests from the same client go to the same server. Looks like “load balancing isn’t working” when testing from a single machine.
  • Failed upstream servers marked as down — if a server fails, nginx marks it as unavailable and stops sending traffic. The default fail_timeout=10s and max_fails=1 means one failure takes a server out for 10 seconds.
  • DNS caching — upstream server hostnames are resolved at startup. If an IP changes, nginx keeps using the old address.
  • keepalive connection reuse — with keepalive, nginx reuses connections. If connections are reused faster than distributed, some servers get more traffic.
  • upstream block in wrong contextupstream must be in the http block, not inside server or location.

Fix 1: Verify the upstream Block Configuration

The upstream block must be at the http level:

# nginx.conf — CORRECT structure
http {
    # upstream must be in http block
    upstream backend {
        server 10.0.0.1:3000;
        server 10.0.0.2:3000;
        server 10.0.0.3:3000;
    }

    server {
        listen 80;
        server_name example.com;

        location / {
            proxy_pass http://backend;   # Use upstream name
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}

# WRONG — upstream inside server block (doesn't work)
server {
    upstream backend { ... }   # Syntax error or ignored
}

Verify nginx config is valid and loaded:

# Test configuration
nginx -t

# Reload without downtime
nginx -s reload

# Check which config file is loaded
nginx -T | grep -E "upstream|server"

# View current nginx version and compiled modules
nginx -V

Fix 2: Choose the Right Load Balancing Algorithm

nginx supports several algorithms:

# Round-robin (default) — requests distributed sequentially
upstream backend {
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
    server 10.0.0.3:3000;
    # Request 1 → server 1, Request 2 → server 2, Request 3 → server 3, repeat
}

# Weighted round-robin — proportional distribution
upstream backend {
    server 10.0.0.1:3000 weight=3;   # Gets 3/5 of traffic
    server 10.0.0.2:3000 weight=2;   # Gets 2/5 of traffic
    # Use when servers have different capacities
}

# least_conn — send to server with fewest active connections
# Better for requests with varying processing times (API calls, DB queries)
upstream backend {
    least_conn;
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
    server 10.0.0.3:3000;
}

# ip_hash — sticky sessions: same client always goes to same server
# WARNING: This causes apparent "load imbalance" when testing from one IP
upstream backend {
    ip_hash;
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
}

# hash — route by custom key (URL, cookie, header)
upstream backend {
    hash $request_uri consistent;   # Same URL always goes to same server
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
}

# random — pick randomly (nginx 1.15.1+)
upstream backend {
    random two least_conn;   # Pick 2 random servers, forward to one with least connections
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
    server 10.0.0.3:3000;
}

Fix 3: Configure Health Checks and Failure Handling

Control how nginx handles failed upstream servers:

upstream backend {
    server 10.0.0.1:3000 max_fails=3 fail_timeout=30s;
    server 10.0.0.2:3000 max_fails=3 fail_timeout=30s;
    server 10.0.0.3:3000 backup;   # Only used when all others are down
}

# Parameters:
# max_fails=3    — mark server down after 3 consecutive failures (default: 1)
# fail_timeout=30s — don't retry for 30 seconds after marking down (default: 10s)
# backup         — only receives traffic when all primary servers are down
# down           — permanently marks server as down (manual removal)
# weight=N       — relative weight for round-robin

Active health checks (nginx Plus or ngx_upstream_check_module):

# nginx Open Source — passive health checks only (based on failed requests)
# nginx Plus — active health checks available

# For open source nginx with upstream_check module:
upstream backend {
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;

    check interval=3000 rise=2 fall=3 timeout=1000 type=http;
    check_http_send "GET /health HTTP/1.0\r\n\r\n";
    check_http_expect_alive http_2xx http_3xx;
}

Configure proxy timeouts to match your upstream:

location / {
    proxy_pass http://backend;

    # How long to wait for upstream to accept the connection
    proxy_connect_timeout 5s;

    # How long to wait for upstream to send response headers
    proxy_read_timeout 60s;

    # How long to wait for upstream to accept data we're sending
    proxy_send_timeout 60s;

    # Retry on failure — try next upstream server
    proxy_next_upstream error timeout http_500 http_502 http_503;
    proxy_next_upstream_tries 3;      # Max retries
    proxy_next_upstream_timeout 10s;  # Total time limit for retries
}

Fix 4: Debug Traffic Distribution

Verify traffic is actually being distributed:

# Check nginx access log — look at upstream addresses
tail -f /var/log/nginx/access.log

# Add upstream address to access log format
# nginx.conf:
log_format upstream_log '$remote_addr - $upstream_addr - $request - $status - $upstream_response_time';
access_log /var/log/nginx/access.log upstream_log;

# Now each log line shows which upstream server handled the request:
# 203.0.113.1 - 10.0.0.2:3000 - GET /api/data HTTP/1.1 - 200 - 0.045
# 203.0.113.1 - 10.0.0.1:3000 - GET /api/data HTTP/1.1 - 200 - 0.032

# Count requests per upstream server
grep "10.0.0" /var/log/nginx/access.log | \
  grep -oP '\d+\.\d+\.\d+\.\d+:\d+' | \
  sort | uniq -c | sort -rn

Check upstream server status:

# Enable nginx status module to see upstream state
server {
    listen 8080;
    server_name localhost;
    location /nginx_status {
        stub_status;
        allow 127.0.0.1;
        deny all;
    }
}
curl http://localhost:8080/nginx_status
# Active connections: 15
# server accepts handled requests
#  1234 1234 5678
# Reading: 0 Writing: 3 Waiting: 12

Fix 5: Handle WebSocket Load Balancing

WebSocket connections require special upstream configuration:

upstream websocket_backend {
    ip_hash;   # WebSockets need sticky sessions (stateful connection)
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
    # keepalive for reusing connections to upstream
    keepalive 64;
}

server {
    listen 80;

    location /ws/ {
        proxy_pass http://websocket_backend;
        proxy_http_version 1.1;

        # Required for WebSocket upgrade
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;

        # WebSocket connections are long-lived — extend timeout
        proxy_read_timeout 3600s;   # 1 hour
        proxy_send_timeout 3600s;
    }
}

Fix 6: SSL Termination at nginx

Terminate SSL at nginx and load balance over plain HTTP to upstream:

upstream backend {
    least_conn;
    server 10.0.0.1:3000;
    server 10.0.0.2:3000;
    server 10.0.0.3:3000;

    keepalive 32;   # Reuse connections — reduces overhead
}

server {
    listen 443 ssl;
    server_name api.example.com;

    ssl_certificate /etc/nginx/ssl/cert.pem;
    ssl_certificate_key /etc/nginx/ssl/key.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    location / {
        proxy_pass http://backend;   # Plain HTTP to upstream (SSL terminated)
        proxy_http_version 1.1;
        proxy_set_header Connection "";   # Required for keepalive
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;   # Tell upstream it was HTTPS
    }
}

# Redirect HTTP to HTTPS
server {
    listen 80;
    server_name api.example.com;
    return 301 https://$host$request_uri;
}

Fix 7: Rate Limiting Per Upstream

Apply rate limits before traffic reaches upstreams:

http {
    # Define rate limit zone — 10MB stores ~160,000 IP states
    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/m;

    upstream backend {
        least_conn;
        server 10.0.0.1:3000;
        server 10.0.0.2:3000;
    }

    server {
        listen 80;

        location /api/ {
            # Apply rate limit
            limit_req zone=api_limit burst=20 nodelay;
            limit_req_status 429;

            proxy_pass http://backend;
            proxy_http_version 1.1;
        }

        # Stricter limit for auth endpoints
        location /api/auth/ {
            limit_req zone=api_limit burst=5;
            proxy_pass http://backend;
        }
    }
}

Still Not Working?

upstream DNS resolution — nginx resolves upstream hostnames at startup. If you use container names or dynamic hosts, set resolver and use variables in proxy_pass for runtime DNS resolution:

resolver 127.0.0.53 valid=30s;

location / {
    set $upstream "http://backend-service:3000";
    proxy_pass $upstream;   # Re-resolved via DNS regularly
}

proxy_cache serving stale responses — if proxy caching is enabled, all clients may get the same cached response regardless of which upstream server processed it. This isn’t a load balancing issue but looks like one.

Keepalive connections and worker processes — nginx uses multiple worker processes. keepalive connections are per-worker, not global. With worker_processes 4 and keepalive 16, there are 64 total keepalive connections, distributed across workers.

For related nginx issues, see Fix: nginx 502 Bad Gateway and Fix: nginx Upstream Timed Out.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles