Fix: AWS RDS Proxy Not Working — Endpoint, IAM Auth, Connection Pinning, and Lambda VPC

Q: How do I fix "AWS RDS Proxy Not Working — Endpoint, IAM Auth, Connection Pinning, and Lambda VPC"?

How to fix AWS RDS Proxy errors — IAM authentication token mismatch, connection pinning blocking reuse, Lambda VPC routing, Secrets Manager rotation, max_connections, read/write splitter, and TLS requirement.

The Error

You connect via the RDS Proxy endpoint and authentication fails:

FATAL: password authentication failed for user "app_user"

Or every Lambda invocation opens a fresh DB connection despite the proxy:

[CloudWatch] Connections: 50 → 100 → 150 → 200 → max_connections reached

Or transactions inside a Lambda function break with pinning errors:

psycopg2.errors.OperationalError: FATAL: terminating connection due to administrator command

Or Lambda can’t reach the proxy at all:

psycopg2.OperationalError: could not connect to server: Connection timed out

Why This Happens

RDS Proxy is a managed connection pooler that sits between your app (typically Lambda) and an RDS database. Its purpose is to prevent connection exhaustion when many concurrent Lambda instances each open a fresh connection to the same database. Without a pooler, a burst of 500 Lambda invocations creates 500 Postgres backends — most RDS instance classes top out at 100-300 connections, and once you exhaust the cap every subsequent client gets FATAL: too many connections. The proxy multiplexes thousands of client-side connections onto a much smaller real backend pool.

The pain points fall into four buckets. IAM authentication is per-connection — each new connection requires a freshly generated 15-minute token, and caching the token across long-lived clients eventually fails authentication. Pinning forces the proxy to dedicate a backend connection to a single client until that client disconnects; once pinned, that connection is no longer pooled, and a high pinning rate effectively bypasses the proxy. Lambda VPC configuration is required because RDS Proxy lives in a VPC, so Lambdas outside the VPC simply cannot reach it. And Secrets Manager rotation, while gracefully handled by the proxy, can break client-side connection pools that cached the old password.

What makes this hard to diagnose is that the symptoms look like generic database problems. “Too many connections” looks like an under-provisioned database when it’s really a pinning issue. “Connection refused” looks like a security group problem when it’s actually IAM auth. “Slow queries” look like missing indexes when the proxy is paying a TLS handshake for every borrowed connection. The fix usually starts with CloudWatch — the proxy emits granular metrics that tell you exactly which of these failure modes is active.

Diagnostic Timeline

Trace a “RDS Proxy started returning FATAL: too many connections even though we just added the proxy” failure.

Minute 0 — first suspicion: increase the pool. The first reflex is to bump MaxConnectionsPercent from 50 to 100, or scale the RDS instance from t3.medium to t3.large. Neither helps. The proxy already has its own pool, separate from the application’s pool — increasing the proxy cap doesn’t change client behavior.

Minute 3 — first evidence: read the pinning metric. Open CloudWatch and graph DatabaseConnectionsCurrentlySessionPinned against DatabaseConnectionsCurrentlyBorrowed. If the pinned count is climbing toward the borrowed count, the proxy can’t multiplex — every borrowed connection is pinned to one client. That is the real exhaustion, not raw connection count.

Minute 6 — next check: IAM token lifecycle. Open the Lambda’s CloudWatch logs. Look for Authentication token expired or password authentication failed immediately after warm starts older than 15 minutes. A long-lived Lambda execution environment that cached the IAM auth token at init will start failing reconnects after 15 minutes. Every failed reconnect leaves a half-open connection that the proxy counts against your cap until the TCP timeout fires.

Minute 9 — discriminating evidence: VPC route table. If the Lambda was recently moved into a private subnet without a NAT Gateway, IAM auth calls to rds.amazonaws.com may be silently routing through the wrong egress. Run a socket.create_connection test from inside the Lambda — if it times out, the connection never reached the proxy at all, which masquerades as authentication failure in the application logs.

Minute 12 — actual root cause: pinning from PREPARE statements. Your ORM (psycopg2 with named prepared statements, or MySQL with PREPARE/EXECUTE) pins every connection on first query. Switch to client-side parameterized queries (the %s placeholders in psycopg2, the driver-level ? binds in MySQL), or scope session state to SET LOCAL inside a transaction. The pinning rate drops to near zero, the proxy starts multiplexing again, and the “too many connections” errors disappear without touching the pool cap.

Fix 1: Use the Right Endpoint and Auth Method

Get the proxy endpoint from the AWS Console (RDS → Proxies → your-proxy → Proxy endpoints):

proxy-name.proxy-abc123def456.us-east-1.rds.amazonaws.com

Don’t connect to the underlying RDS instance — use the proxy endpoint.

For password authentication (no IAM):

import psycopg2
conn = psycopg2.connect(
    host="proxy-name.proxy-abc123.us-east-1.rds.amazonaws.com",
    port=5432,
    dbname="myapp",
    user="app_user",
    password=os.environ["DB_PASSWORD"],
    sslmode="require",  # RDS Proxy requires TLS
)

For IAM authentication:

import boto3
import psycopg2

rds_client = boto3.client("rds")
token = rds_client.generate_db_auth_token(
    DBHostname="proxy-name.proxy-abc123.us-east-1.rds.amazonaws.com",
    Port=5432,
    DBUsername="app_user",
)

conn = psycopg2.connect(
    host="proxy-name.proxy-abc123.us-east-1.rds.amazonaws.com",
    port=5432,
    dbname="myapp",
    user="app_user",
    password=token,
    sslmode="require",
)

The token is valid for 15 minutes. Generate fresh for each new connection — don’t cache across invocations.

For Node:

import { RDS } from "@aws-sdk/client-rds";
import { Signer } from "@aws-sdk/rds-signer";

const signer = new Signer({
  hostname: "proxy-name.proxy-abc123.us-east-1.rds.amazonaws.com",
  port: 5432,
  username: "app_user",
});

const token = await signer.getAuthToken();

const client = new pg.Client({
  host: "proxy-name.proxy-abc123.us-east-1.rds.amazonaws.com",
  port: 5432,
  database: "myapp",
  user: "app_user",
  password: token,
  ssl: { rejectUnauthorized: false },  // Or load AWS RDS CA cert
});

Pro Tip: Use IAM auth in production. Passwords in env vars need rotation; IAM tokens self-rotate every call and are tied to your Lambda’s execution role.

Fix 2: Configure the Proxy Authentication

In the proxy config (AWS Console → RDS → Proxies → your-proxy → Authentication):

Secret: arn:aws:secretsmanager:us-east-1:123456:secret:rds-myapp-AbCdEf
IAM authentication: REQUIRED (or DISABLED)
Client TLS: REQUIRED

The Secrets Manager secret stores the actual database credentials. RDS Proxy uses these to connect to the underlying RDS instance — your app connects to the proxy with either:

The password from the secret (rotated by Secrets Manager).
An IAM token (no secret needed by the client).

Set IAM authentication: REQUIRED if your clients always use IAM. Then password-based clients are rejected — safer for production.

Common Mistake: Storing different passwords in Secrets Manager vs your environment. The proxy uses the secret; your client uses the env var; they don’t match → authentication fails. Source-of-truth must be one place.

Fix 3: Detect and Eliminate Pinning

Pinning happens automatically for certain operations. Check via CloudWatch:

Namespace: AWS/RDS
Metric: DatabaseConnectionsCurrentlySessionPinned

If this is > 0, connections are pinned.

What causes pinning (Postgres):

PREPARE / EXECUTE statements (named prepared statements).
Session variables set with SET SESSION (transaction-scoped SET LOCAL is fine).
Listening on channels (LISTEN).
Advisory locks (pg_advisory_lock).
Temporary tables not cleaned at end of transaction.
Large objects (lo_*).

What causes pinning (MySQL):

User variables (@var set across statements).
Temporary tables.
LOCK TABLES.
SET outside transaction with session scope.
Prepared statements (yes, this is common!).

The fix is to avoid these or scope them to a single transaction:

BEGIN;
SET LOCAL statement_timeout = '5s';  -- LOCAL = transaction-scoped, no pinning
SELECT ...;
COMMIT;

For prepared statements in MySQL, consider client-side parameterized queries (the driver’s ? placeholders are usually fine; PREPARE/EXECUTE statements are the issue).

Pro Tip: Pinning isn’t a fatal error — it just means the proxy can’t pool that connection. If your workload has 5 pinned connections but max_connections is 200, you’re still fine. Worry when pinning rate is high and exhausts your pool.

Fix 4: Lambda VPC Configuration

The Lambda function must be in the same VPC (or a peered VPC) as the proxy:

# SAM template:
MyLambda:
  Type: AWS::Serverless::Function
  Properties:
    VpcConfig:
      SecurityGroupIds:
        - !Ref LambdaSecurityGroup
      SubnetIds:
        - subnet-abc
        - subnet-def

The Lambda security group must allow outbound to the proxy on port 5432 (Postgres) or 3306 (MySQL). The proxy security group must allow inbound from the Lambda security group.

For Lambdas that also need internet access (calling external APIs), put them in private subnets with NAT Gateway. Lambdas in public subnets can’t get a public IP — they’d fail to reach the internet.

# Wrong: lambda in public subnet — no internet
# Right: lambda in private subnet, with NAT Gateway for internet

Common Mistake: Lambda placed in the same security group as RDS without an explicit allow rule. Same-SG isn’t always implicit allow — add an inbound rule on the proxy SG allowing 0.0.0.0/0 from the Lambda SG, port 5432.

For testing connectivity from inside a Lambda:

import socket
sock = socket.create_connection(
    ("proxy-name.proxy-abc.us-east-1.rds.amazonaws.com", 5432),
    timeout=3,
)
sock.close()
print("Reachable")

If this times out, you have a networking problem (VPC/SG/route table), not a database problem.

Fix 5: Connection Pool Settings

On the proxy:

MaxConnectionsPercent — what % of the underlying RDS max_connections the proxy can use (default 100).
MaxIdleConnectionsPercent — what % of the pool can be idle (default 50).
ConnectionBorrowTimeout — how long a client waits for a connection from the pool (default 120s).

For high-traffic workloads:

MaxConnectionsPercent: 90    # Leave some headroom for the RDS instance
MaxIdleConnectionsPercent: 50  # Keep many idle for fast burst response
ConnectionBorrowTimeout: 60   # Fail faster than the default

On the client side, each Lambda invocation creates a connection but the proxy multiplexes them. Don’t create your own connection pool in Lambda — it defeats the purpose of the proxy.

# WRONG: client-side pool of size 10 on each Lambda instance
from psycopg2.pool import ThreadedConnectionPool
pool = ThreadedConnectionPool(2, 10, ...)  # Bad

# RIGHT: single connection per invocation, proxy pools across invocations
conn = psycopg2.connect(...)
try:
    # use conn
    conn.commit()
finally:
    conn.close()

Pro Tip: For Lambda with the proxy, prefer one connection per invocation. The proxy handles pooling across Lambda instances; client-side pools just complicate things.

Fix 6: Secrets Manager Rotation

When Secrets Manager rotates the underlying password, the proxy seamlessly switches. Your IAM-authenticated clients are unaffected. Password-authenticated clients with the old password from the secret will fail next time they reconnect — that’s expected during rotation.

To handle rotation gracefully:

# Always read the secret fresh:
import boto3
import json

def get_db_password():
    sm = boto3.client("secretsmanager")
    response = sm.get_secret_value(SecretId="rds-myapp")
    return json.loads(response["SecretString"])["password"]

def connect():
    return psycopg2.connect(
        host="proxy-name...",
        password=get_db_password(),
        ...
    )

For long-running services (not Lambda), cache the secret for a short time (1-5 minutes) and retry connection on auth failure:

@lru_cache(maxsize=1)
def get_secret_cached():
    return get_db_password()

def connect_with_retry():
    try:
        return psycopg2.connect(password=get_secret_cached(), ...)
    except psycopg2.OperationalError as e:
        if "authentication failed" in str(e):
            get_secret_cached.cache_clear()
            return psycopg2.connect(password=get_secret_cached(), ...)
        raise

Common Mistake: Hardcoding the password in env vars. Then rotation breaks your app until you redeploy. Always read from Secrets Manager (or use IAM auth).

Fix 7: Read/Write Endpoints

For Aurora with read replicas, RDS Proxy supports read/write splitting:

Writer endpoint: proxy-name.proxy-abc.us-east-1.rds.amazonaws.com — connects to the writer.
Reader endpoint: proxy-name-ro-abc.proxy-abc.us-east-1.rds.amazonaws.com — connects to a reader.

Your app routes:

# Writes:
write_conn = psycopg2.connect(host="proxy-name...", ...)

# Reads:
read_conn = psycopg2.connect(host="proxy-name-ro-abc...", ...)

The proxy load-balances reads across the read replicas. Writes go to the writer.

Common Mistake: Using only the writer endpoint for everything. You’re not using the read replicas — wasted capacity. Route SELECTs to the reader endpoint.

Beware: replica lag means reads against a reader endpoint can return stale data. For “read your own writes,” route reads to the writer (or use Aurora’s aurora_replica_read_consistency setting).

Fix 8: Monitoring and Cost

Key CloudWatch metrics:

DatabaseConnectionsCurrentlyBorrowed — pool connections in use.
DatabaseConnectionsCurrentlyInTransaction — actively running a transaction.
DatabaseConnectionsCurrentlySessionPinned — pinning rate.
ClientConnections — total client connections to the proxy.
QueryDatabaseResponseLatency — proxy-to-DB latency.

Set alarms on:

High pinning rate (your client code is creating pinning operations).
ClientConnections near max — proxy can’t keep up.
Low DatabaseConnectionsCurrentlyBorrowed — proxy not multiplexing well (could mean pinning).

RDS Proxy is billed per vCPU of the underlying RDS instance per hour. For db.t3.medium (2 vCPU) at ~$0.015/vCPU/hour, that’s roughly $22/month for the proxy alone. Bigger instances cost more.

Pro Tip: RDS Proxy makes sense for Lambda-heavy workloads. For containerized apps with long-lived processes (ECS, EKS), the runtime’s own connection pool may suffice — RDS Proxy’s overhead isn’t worth it.

Still Not Working?

A few less-obvious failures:

Connection refused immediately. Security group blocks the port. Check Lambda SG outbound + proxy SG inbound.
SSL connection has been closed unexpectedly. Proxy requires TLS but client isn’t using it. Set sslmode=require (psycopg2) or equivalent.
Slow queries via proxy but fast direct. Proxy adds ~1-3ms per query. For very-low-latency reads, sometimes worth bypassing the proxy. But high-volume Lambdas benefit more from pooling.
Too many connections on RDS even with proxy. Your MaxConnectionsPercent is set too low, or pinning is consuming your pool. Increase the cap or fix pinning.
Lambda timing out on connection. ConnectionBorrowTimeout is high and the proxy is exhausted. Reduce timeout to fail fast, then scale the proxy.
Aurora Serverless v2 + Proxy weirdness. Aurora Serverless v2 has its own scaling layer. Combining with RDS Proxy is supported but can produce confusing metrics. For serverless workloads, sometimes Aurora’s Data API is simpler.
Authentication token expired. IAM token is older than 15 minutes. Don’t cache tokens; regenerate per connection.
pg_hba.conf doesn’t allow connection. Check the RDS instance parameter group — IAM-required users need specific entries. AWS manages this for default cases.
RDS Proxy in available state but new connections hang. The proxy is healthy but the underlying RDS reboot, failover, or maintenance window dropped the backend pool. Check the RDS instance event log — a failover that completed silently still drains the proxy’s warm connections, and the first wave of clients pays the reconnection cost.
Aurora replica added but proxy doesn’t route reads to it. RDS Proxy discovers replicas through the Aurora cluster endpoint at creation time. Adding a replica later requires either restarting the proxy or waiting for the discovery interval. Force-refresh by toggling the read endpoint config in the console.
TLS handshake amplifies latency. Every borrowed connection from the proxy requires a fresh TLS handshake from the client. For short-lived Lambdas this can dominate the connection cost. Reuse the connection across the entire Lambda execution and close on shutdown, not per-query.

For related AWS database connectivity issues, see AWS RDS connection timed out, Postgres connection refused, AWS Lambda timeout, and AWS IAM permission denied.