Fix: nginx Upstream Load Balancing Not Working — All Traffic Hitting One Server
Quick Answer
How to fix nginx load balancing issues — upstream block configuration, health checks, least_conn vs round-robin, sticky sessions, upstream timeouts, and SSL termination.
The Problem
nginx is configured for load balancing but all requests go to the same upstream server:
upstream backend {
server 10.0.0.1:3000;
server 10.0.0.2:3000;
server 10.0.0.3:3000;
}
# Requests only reach 10.0.0.1 — other servers get no trafficOr one server gets most of the traffic due to a misconfigured weight:
upstream backend {
server 10.0.0.1:3000 weight=10; # Gets 10x the traffic
server 10.0.0.2:3000; # Default weight=1
}Or nginx marks a healthy upstream server as down and stops sending traffic to it:
# Error log shows:
# upstream timed out (110: Connection timed out) while reading response header
# from upstream, client: ..., upstream: "http://10.0.0.2:3000/"
# no live upstreams while connecting to upstreamOr all connections go to one server because of sticky sessions configured incorrectly.
Why This Happens
nginx’s upstream load balancing has several configuration pitfalls:
- Default round-robin doesn’t distribute short connections — for very fast requests, one worker process handles many requests sequentially, making it appear all traffic goes to one server.
ip_hashmakes requests sticky per client IP — all requests from the same client go to the same server. Looks like “load balancing isn’t working” when testing from a single machine.- Failed upstream servers marked as down — if a server fails, nginx marks it as unavailable and stops sending traffic. The default
fail_timeout=10sandmax_fails=1means one failure takes a server out for 10 seconds. - DNS caching — upstream server hostnames are resolved at startup. If an IP changes, nginx keeps using the old address.
keepaliveconnection reuse — withkeepalive, nginx reuses connections. If connections are reused faster than distributed, some servers get more traffic.upstreamblock in wrong context —upstreammust be in thehttpblock, not insideserverorlocation.
Fix 1: Verify the upstream Block Configuration
The upstream block must be at the http level:
# nginx.conf — CORRECT structure
http {
# upstream must be in http block
upstream backend {
server 10.0.0.1:3000;
server 10.0.0.2:3000;
server 10.0.0.3:3000;
}
server {
listen 80;
server_name example.com;
location / {
proxy_pass http://backend; # Use upstream name
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
}
# WRONG — upstream inside server block (doesn't work)
server {
upstream backend { ... } # Syntax error or ignored
}Verify nginx config is valid and loaded:
# Test configuration
nginx -t
# Reload without downtime
nginx -s reload
# Check which config file is loaded
nginx -T | grep -E "upstream|server"
# View current nginx version and compiled modules
nginx -VFix 2: Choose the Right Load Balancing Algorithm
nginx supports several algorithms:
# Round-robin (default) — requests distributed sequentially
upstream backend {
server 10.0.0.1:3000;
server 10.0.0.2:3000;
server 10.0.0.3:3000;
# Request 1 → server 1, Request 2 → server 2, Request 3 → server 3, repeat
}
# Weighted round-robin — proportional distribution
upstream backend {
server 10.0.0.1:3000 weight=3; # Gets 3/5 of traffic
server 10.0.0.2:3000 weight=2; # Gets 2/5 of traffic
# Use when servers have different capacities
}
# least_conn — send to server with fewest active connections
# Better for requests with varying processing times (API calls, DB queries)
upstream backend {
least_conn;
server 10.0.0.1:3000;
server 10.0.0.2:3000;
server 10.0.0.3:3000;
}
# ip_hash — sticky sessions: same client always goes to same server
# WARNING: This causes apparent "load imbalance" when testing from one IP
upstream backend {
ip_hash;
server 10.0.0.1:3000;
server 10.0.0.2:3000;
}
# hash — route by custom key (URL, cookie, header)
upstream backend {
hash $request_uri consistent; # Same URL always goes to same server
server 10.0.0.1:3000;
server 10.0.0.2:3000;
}
# random — pick randomly (nginx 1.15.1+)
upstream backend {
random two least_conn; # Pick 2 random servers, forward to one with least connections
server 10.0.0.1:3000;
server 10.0.0.2:3000;
server 10.0.0.3:3000;
}Fix 3: Configure Health Checks and Failure Handling
Control how nginx handles failed upstream servers:
upstream backend {
server 10.0.0.1:3000 max_fails=3 fail_timeout=30s;
server 10.0.0.2:3000 max_fails=3 fail_timeout=30s;
server 10.0.0.3:3000 backup; # Only used when all others are down
}
# Parameters:
# max_fails=3 — mark server down after 3 consecutive failures (default: 1)
# fail_timeout=30s — don't retry for 30 seconds after marking down (default: 10s)
# backup — only receives traffic when all primary servers are down
# down — permanently marks server as down (manual removal)
# weight=N — relative weight for round-robinActive health checks (nginx Plus or ngx_upstream_check_module):
# nginx Open Source — passive health checks only (based on failed requests)
# nginx Plus — active health checks available
# For open source nginx with upstream_check module:
upstream backend {
server 10.0.0.1:3000;
server 10.0.0.2:3000;
check interval=3000 rise=2 fall=3 timeout=1000 type=http;
check_http_send "GET /health HTTP/1.0\r\n\r\n";
check_http_expect_alive http_2xx http_3xx;
}Configure proxy timeouts to match your upstream:
location / {
proxy_pass http://backend;
# How long to wait for upstream to accept the connection
proxy_connect_timeout 5s;
# How long to wait for upstream to send response headers
proxy_read_timeout 60s;
# How long to wait for upstream to accept data we're sending
proxy_send_timeout 60s;
# Retry on failure — try next upstream server
proxy_next_upstream error timeout http_500 http_502 http_503;
proxy_next_upstream_tries 3; # Max retries
proxy_next_upstream_timeout 10s; # Total time limit for retries
}Fix 4: Debug Traffic Distribution
Verify traffic is actually being distributed:
# Check nginx access log — look at upstream addresses
tail -f /var/log/nginx/access.log
# Add upstream address to access log format
# nginx.conf:
log_format upstream_log '$remote_addr - $upstream_addr - $request - $status - $upstream_response_time';
access_log /var/log/nginx/access.log upstream_log;
# Now each log line shows which upstream server handled the request:
# 203.0.113.1 - 10.0.0.2:3000 - GET /api/data HTTP/1.1 - 200 - 0.045
# 203.0.113.1 - 10.0.0.1:3000 - GET /api/data HTTP/1.1 - 200 - 0.032
# Count requests per upstream server
grep "10.0.0" /var/log/nginx/access.log | \
grep -oP '\d+\.\d+\.\d+\.\d+:\d+' | \
sort | uniq -c | sort -rnCheck upstream server status:
# Enable nginx status module to see upstream state
server {
listen 8080;
server_name localhost;
location /nginx_status {
stub_status;
allow 127.0.0.1;
deny all;
}
}curl http://localhost:8080/nginx_status
# Active connections: 15
# server accepts handled requests
# 1234 1234 5678
# Reading: 0 Writing: 3 Waiting: 12Fix 5: Handle WebSocket Load Balancing
WebSocket connections require special upstream configuration:
upstream websocket_backend {
ip_hash; # WebSockets need sticky sessions (stateful connection)
server 10.0.0.1:3000;
server 10.0.0.2:3000;
# keepalive for reusing connections to upstream
keepalive 64;
}
server {
listen 80;
location /ws/ {
proxy_pass http://websocket_backend;
proxy_http_version 1.1;
# Required for WebSocket upgrade
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
# WebSocket connections are long-lived — extend timeout
proxy_read_timeout 3600s; # 1 hour
proxy_send_timeout 3600s;
}
}Fix 6: SSL Termination at nginx
Terminate SSL at nginx and load balance over plain HTTP to upstream:
upstream backend {
least_conn;
server 10.0.0.1:3000;
server 10.0.0.2:3000;
server 10.0.0.3:3000;
keepalive 32; # Reuse connections — reduces overhead
}
server {
listen 443 ssl;
server_name api.example.com;
ssl_certificate /etc/nginx/ssl/cert.pem;
ssl_certificate_key /etc/nginx/ssl/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
location / {
proxy_pass http://backend; # Plain HTTP to upstream (SSL terminated)
proxy_http_version 1.1;
proxy_set_header Connection ""; # Required for keepalive
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme; # Tell upstream it was HTTPS
}
}
# Redirect HTTP to HTTPS
server {
listen 80;
server_name api.example.com;
return 301 https://$host$request_uri;
}Fix 7: Rate Limiting Per Upstream
Apply rate limits before traffic reaches upstreams:
http {
# Define rate limit zone — 10MB stores ~160,000 IP states
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=100r/m;
upstream backend {
least_conn;
server 10.0.0.1:3000;
server 10.0.0.2:3000;
}
server {
listen 80;
location /api/ {
# Apply rate limit
limit_req zone=api_limit burst=20 nodelay;
limit_req_status 429;
proxy_pass http://backend;
proxy_http_version 1.1;
}
# Stricter limit for auth endpoints
location /api/auth/ {
limit_req zone=api_limit burst=5;
proxy_pass http://backend;
}
}
}Still Not Working?
upstream DNS resolution — nginx resolves upstream hostnames at startup. If you use container names or dynamic hosts, set resolver and use variables in proxy_pass for runtime DNS resolution:
resolver 127.0.0.53 valid=30s;
location / {
set $upstream "http://backend-service:3000";
proxy_pass $upstream; # Re-resolved via DNS regularly
}proxy_cache serving stale responses — if proxy caching is enabled, all clients may get the same cached response regardless of which upstream server processed it. This isn’t a load balancing issue but looks like one.
Keepalive connections and worker processes — nginx uses multiple worker processes. keepalive connections are per-worker, not global. With worker_processes 4 and keepalive 16, there are 64 total keepalive connections, distributed across workers.
For related nginx issues, see Fix: nginx 502 Bad Gateway and Fix: nginx Upstream Timed Out.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Helm Not Working — Release Already Exists, Stuck Upgrade, and Values Not Applied
How to fix Helm 3 errors — release already exists, another operation is in progress, --set values not applied, nil pointer template errors, kubeVersion mismatch, hook failures, and ConfigMap changes not restarting pods.
Fix: Docker Secrets Not Working — BuildKit --secret Not Mounting, Compose Secrets Undefined, or Secret Leaking into Image
How to fix Docker secrets — BuildKit secret mounts in Dockerfile, docker-compose secrets config, runtime vs build-time secrets, environment variable alternatives, and verifying secrets don't leak into image layers.
Fix: Python Packaging Not Working — Build Fails, Package Not Found After Install, or PyPI Upload Errors
How to fix Python packaging issues — pyproject.toml setup, build backends (setuptools/hatchling/flit), wheel vs sdist, editable installs, package discovery, and twine upload to PyPI.
Fix: AWS Lambda Layer Not Working — Module Not Found or Layer Not Applied
How to fix AWS Lambda Layer issues — directory structure, runtime compatibility, layer ARN configuration, dependency conflicts, size limits, and container image alternatives.