Fix: OpenAI API Not Working — RateLimitError, 401, 429, and Connection Issues

Q: How do I fix "OpenAI API Not Working — RateLimitError, 401, 429, and Connection Issues"?

How to fix OpenAI API errors — RateLimitError (429), AuthenticationError (401), APIConnectionError, context length exceeded, model not found, and SDK v0-to-v1 migration mistakes.

The Error

You call the OpenAI API and get a 429:

openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details.', 'type': 'insufficient_quota', 'code': 'insufficient_quota'}}

Or a 401:

openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-proj-...', 'type': 'invalid_request_error', 'code': 'invalid_api_key'}}

Or a connection error:

openai.APIConnectionError: Connection error.

Or a 400 for context length:

openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 132000 tokens.", 'type': 'invalid_request_error', 'code': 'context_length_exceeded'}}

Or a broken import that made sense six months ago:

AttributeError: module 'openai' has no attribute 'ChatCompletion'

The last one means you upgraded the SDK without reading the migration guide. All of these are fixable.

Why This Happens

OpenAI’s API has several distinct failure modes that look similar on the surface but have completely different root causes and fixes. The error class, status code, and error type field together tell you exactly what went wrong and whether retrying will help.

The Python SDK (v1.x) and Node.js SDK (v4.x) both expose named error classes that make this handling straightforward once you know them.

Fix 1: RateLimitError (HTTP 429)

A 429 response can mean two completely different things. Check the type field before doing anything:

rate_limit_exceeded: You’re sending too many requests or too many tokens per minute. Safe to retry with backoff.
insufficient_quota: You’ve run out of credits or your payment failed. Retrying does nothing — fix billing first.

from openai import OpenAI, RateLimitError

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    error_type = e.body.get("error", {}).get("type", "")
    if error_type == "insufficient_quota":
        # Billing issue — check platform.openai.com/settings/billing
        raise RuntimeError("OpenAI quota exhausted — check your billing") from e
    else:
        # Actual rate limit — retry with backoff
        raise

import OpenAI, { RateLimitError } from "openai";

const client = new OpenAI();

try {
    const response = await client.chat.completions.create({ ... });
} catch (error) {
    if (error instanceof RateLimitError) {
        const type = error.error?.type;
        if (type === "insufficient_quota") {
            throw new Error("OpenAI quota exhausted — check billing");
        }
        // rate_limit_exceeded: retry with backoff
    }
}

Common Mistake: When you get a 429, the instinct is to immediately retry. But if error.type is insufficient_quota, retrying is pointless — your account is out of credits and no amount of backoff will change that. Always check the error type before writing retry logic. Only rate_limit_exceeded is worth retrying.

For rate limits, use exponential backoff with jitter. Both SDKs retry automatically by default (2 retries on 429), but if you need more control:

import time
import random
from openai import RateLimitError

def call_with_backoff(client, **kwargs):
    max_retries = 5
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError as e:
            if "insufficient_quota" in str(e):
                raise
            if attempt == max_retries - 1:
                raise
            # Exponential backoff + jitter
            delay = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(delay)

For production Python workloads, the tenacity library handles this cleanly:

from tenacity import retry, stop_after_attempt, wait_random_exponential
from openai import RateLimitError

@retry(
    wait=wait_random_exponential(multiplier=1, min=4, max=60),
    stop=stop_after_attempt(6),
    reraise=True
)
def call_api(**kwargs):
    return client.chat.completions.create(**kwargs)

OpenAI’s rate limits are per model and cover RPM (requests per minute), TPM (tokens per minute), and RPD (requests per day). If you’re hitting limits regularly, check your tier at platform.openai.com/settings/organization/limits — quotas increase automatically as you spend more.

Fix 2: AuthenticationError (HTTP 401)

A 401 almost always comes down to one of three things: wrong key, wrong format, or wrong organization/project header.

Verify your key is set correctly:

# Check the variable is actually set
echo $OPENAI_API_KEY

# Common mistake: trailing whitespace or newline in .env
cat -A .env | grep OPENAI_API_KEY

In Python:

from openai import OpenAI
import os

# Don't hardcode — use environment variable
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

In Node.js:

import OpenAI from "openai";

// SDK reads OPENAI_API_KEY automatically if you don't pass apiKey
const client = new OpenAI();

// Or explicit:
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Common key format mistakes:

Problem	Example	Fix
Missing prefix	`abc123...`	Must start with `sk-`
Extra whitespace	`sk-proj-...`	Trim whitespace
Quotes included	`"sk-proj-..."`	Remove quotes from .env
Wrong env var name	`OPENAI_KEY`	Must be `OPENAI_API_KEY`

Organization and project header mismatch:

If you’re a member of multiple organizations or projects, passing the wrong organization or project parameter causes 401 even with a valid key:

# Wrong: org ID from a different account
client = OpenAI(
    api_key="sk-proj-...",
    organization="org-WRONGID"  # Must match the key's org
)

Fix: If you only belong to one organization, omit the organization parameter entirely. If you have multiple, find the correct ID at platform.openai.com/account/org-home/organization-settings.

Project-scoped keys vs. legacy keys:

New API keys created in the OpenAI dashboard are project-scoped (sk-proj-...). These don’t need organization or project headers — the key already carries that context. If you’re using an older sk-... key, it’s a legacy user key; consider rotating to a project-scoped key to avoid header complexity.

Fix 3: APIConnectionError and APITimeoutError

Connection errors happen before you get a response. Timeout errors happen when the API takes longer than your configured limit.

Configure an appropriate timeout:

import httpx
from openai import OpenAI

client = OpenAI(
    timeout=httpx.Timeout(
        60.0,        # Total timeout
        connect=5.0, # Connection establishment
        read=55.0,   # Reading the response body
        write=5.0    # Sending the request
    ),
    max_retries=3
)

const client = new OpenAI({
    timeout: 60 * 1000,  // 60 seconds in ms
    maxRetries: 3
});

The default timeout is 10 minutes in Python and 10 minutes in Node.js for the full request. That’s fine for most requests but too long for latency-sensitive applications.

For streaming, long completions need more time. If you’re generating thousands of tokens, a 30-second timeout will cut you off mid-stream. Set the timeout based on worst-case output length:

# A 2000-token response at ~60 tokens/second takes ~33 seconds
client = OpenAI(timeout=90.0)  # 90 second total

Connection errors from proxies or firewalls:

If APIConnectionError is consistent (not intermittent), check:

Your environment uses a proxy that intercepts SSL — set REQUESTS_CA_BUNDLE to your corporate CA bundle
A firewall blocks api.openai.com on port 443
You’re running inside a container or restricted network

import httpx
from openai import OpenAI

# If you need a proxy:
client = OpenAI(
    http_client=httpx.Client(proxy="http://your-proxy:8080")
)

Fix 4: BadRequestError (HTTP 400)

Context length exceeded:

The error message tells you exactly what happened:

This model's maximum context length is 128000 tokens. However, you requested 132000 tokens
(2000 in the messages, 130000 in the completion).

The total token count is input tokens + max_completion_tokens. Reduce one or both:

import tiktoken

def count_tokens(messages, model="gpt-4o-mini"):
    enc = tiktoken.encoding_for_model(model)
    return sum(len(enc.encode(m["content"])) for m in messages)

messages = [...]
model = "gpt-4o-mini"  # context window: 128k tokens
max_context = 128000
max_completion = 4096

input_tokens = count_tokens(messages, model)
if input_tokens + max_completion > max_context:
    # Truncate oldest messages (keep system prompt)
    while count_tokens(messages, model) + max_completion > max_context:
        messages.pop(1)  # Remove oldest non-system message

Model not found:

The model 'gpt-4' does not exist or you do not have access to it.

OpenAI deprecates model versions and requires you to use specific snapshot names or the latest aliases. Check platform.openai.com/docs/models for current model names. Common mistakes:

What you wrote	What to use instead
`gpt-4`	`gpt-4o` or `gpt-4o-mini`
`gpt-3.5-turbo-16k`	`gpt-4o-mini` (faster, cheaper)
`gpt-4-32k`	`gpt-4o`

Also verify your account has access. GPT-4 class models require at least Tier 1 (you’ve made a successful payment).

Fix 5: InternalServerError (HTTP 500+)

500 errors are OpenAI’s fault, not yours. The SDK retries them automatically up to 2 times. If they persist:

Check status.openai.com for ongoing incidents
Log the request_id from the error — if you need to contact support, this is the ID they need:

from openai import OpenAI, InternalServerError

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
except InternalServerError as e:
    print(f"Server error. Request ID: {e.request_id}")
    print(f"Status: {e.status_code}")
    # Log and retry or fail gracefully

import OpenAI, { InternalServerError } from "openai";

const client = new OpenAI();

try {
    const response = await client.chat.completions.create({ ... });
} catch (error) {
    if (error instanceof InternalServerError) {
        console.error(`Server error. Request ID: ${error.request_id}`);
    }
}

500 errors are safe to retry. Build a circuit breaker if you see sustained 500s — don’t hammer the API when OpenAI is having an incident.

Fix 6: SDK Migration Errors (v0→v1 Python, v3→v4 Node.js)

If you upgraded the SDK and your existing code broke, the APIs changed significantly between major versions.

Python: v0 → v1

# OLD (v0 — no longer works):
import openai
openai.api_key = "sk-..."
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[...]
)
text = response["choices"][0]["message"]["content"]

# NEW (v1):
from openai import OpenAI
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...]
)
text = response.choices[0].message.content  # Object access, not dict

Error handling changed too:

# OLD (v0):
import openai
try:
    ...
except openai.error.RateLimitError:  # error submodule — broken in v1
    ...

# NEW (v1) — two equivalent styles:
import openai
try:
    ...
except openai.RateLimitError:
    ...

# Or with explicit import:
from openai import RateLimitError
try:
    ...
except RateLimitError:
    ...

Node.js: v3 → v4

// OLD (v3 — no longer works):
const { Configuration, OpenAIApi } = require("openai");
const openai = new OpenAIApi(new Configuration({ apiKey: "..." }));
const response = await openai.createChatCompletion({ ... });
const text = response.data.choices[0].message.content; // .data

// NEW (v4):
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "..." });
const response = await client.chat.completions.create({ ... });
const text = response.choices[0].message.content; // No .data wrapper

The .data wrapper is the most common v3→v4 breakage — removing it fixes Cannot read properties of undefined (reading 'choices').

Fix 7: Streaming Errors

Errors during streaming can occur before the stream starts (same as normal errors) or mid-stream. Always wrap streaming code in a try-catch:

from openai import OpenAI, APIConnectionError, APITimeoutError, RateLimitError

client = OpenAI()

try:
    with client.chat.completions.stream(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Write a long essay."}]
    ) as stream:
        for chunk in stream:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)
except RateLimitError:
    print("\nRate limited — retry later")
except APIConnectionError:
    print("\nConnection dropped mid-stream")
except APITimeoutError:
    print("\nRequest timed out — increase timeout for long completions")

import OpenAI, { APIConnectionError, RateLimitError } from "openai";

const client = new OpenAI();

try {
    const stream = await client.chat.completions.create({
        model: "gpt-4o-mini",
        messages: [{ role: "user", content: "Write a long essay." }],
        stream: true
    });

    for await (const chunk of stream) {
        process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
    }
} catch (error) {
    if (error instanceof RateLimitError) {
        console.error("\nRate limited");
    } else if (error instanceof APIConnectionError) {
        console.error("\nConnection error");
    }
}

Note: If a connection drops after the stream starts, the error message may be generic. The request_id from the error object is the most reliable way to identify which request failed when debugging with OpenAI support.

OpenAI vs Other LLM Providers: When the Error is the Provider

Sometimes the right fix is not retrying OpenAI — it is routing the workload to a different provider whose constraints match your traffic. Each major LLM API has a distinct failure profile, and knowing the shape of the alternatives saves hours of fighting OpenAI specifically.

Anthropic (Claude) uses a separate anthropic SDK and a different message format: a top-level system parameter instead of a system role inside messages, and content blocks that can be strings or typed arrays. Anthropic returns overloaded_error (HTTP 529) under heavy load — a code OpenAI does not have. If your OpenAI retries are constantly hitting rate_limit_exceeded, Claude 3.5 Sonnet often has more headroom for the same context length (200k tokens vs OpenAI’s 128k on most models).

Google Gemini is reachable through both the google-genai SDK and Vertex AI. Gemini’s quota model is per-project-per-minute rather than per-organization, so multi-tenant apps hit limits differently. Function calling uses a different schema (function_declarations vs OpenAI’s tools), and streaming chunks have a different shape — do not assume a drop-in replacement.

Mistral, Cohere, Together AI, and Groq all offer OpenAI-compatible endpoints. You can keep the openai SDK and just change base_url and api_key:

from openai import OpenAI

groq = OpenAI(
    api_key=os.environ["GROQ_API_KEY"],
    base_url="https://api.groq.com/openai/v1"
)
response = groq.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello"}]
)

Groq is the fastest for inference (sub-second responses on Llama 3.3) but has small context windows and aggressive rate limits. Together AI is the cheapest for open-weight models. Mistral has native function calling and JSON mode that mirror OpenAI’s interface closely. If you are running the model yourself rather than calling a hosted API, the failure mode shifts to GPU drivers and tokenizers — see huggingface transformers not working for the self-hosted equivalents.

Replicate is different — it is async-first, returning a prediction ID you poll until completion. If you see puppeteer-style long polls when migrating, that is by design. See replicate not working for the prediction lifecycle.

AWS Bedrock wraps Anthropic, Mistral, Cohere, Meta, and Amazon’s own Titan/Nova models behind one API but requires IAM permissions instead of a key. Bedrock failures show up as AccessDeniedException or ThrottlingException, not 401/429. See aws bedrock not working for the IAM-side fixes.

The practical takeaway: if OpenAI is throwing insufficient_quota and you need shipping done today, a Groq or Together AI key set up in fifteen minutes can keep production moving while billing is sorted.

Still Not Working?

The SDK retries twice and still fails

The SDK’s built-in retry only covers 408, 429 (rate_limit_exceeded only), and 5xx errors. It does not retry 400 errors (context, model not found) or 401 errors (authentication). If you’re seeing consistent failures, check which error class you’re actually getting:

import openai

try:
    client.chat.completions.create(...)
except openai.APIError as e:
    print(type(e).__name__)  # Print the actual class name
    print(e.status_code)
    print(e.body)

Environment variable not loading

In Python, python-dotenv doesn’t auto-load your .env file — you have to call load_dotenv(). If your env vars aren’t loading at all, see dotenv not loading for the full list of causes.

from dotenv import load_dotenv
load_dotenv()  # Must call this before OpenAI()

from openai import OpenAI
client = OpenAI()  # Now picks up OPENAI_API_KEY from .env

In Next.js, .env.local variables with OPENAI_API_KEY are server-only by default — they won’t be available in browser-side code. Use them only in API routes, Server Actions, or getServerSideProps. See Next.js env variables not working for the full breakdown.

429 on first request (free tier)

Free tier accounts have very low limits (typically 3 RPM). If you hit 429 on your first few calls after signing up, you’re hitting the free tier cap. Add a payment method at platform.openai.com/settings/billing to move to Tier 1, which has significantly higher limits.

Model access denied (400 or 404)

Some models require a minimum spend tier. GPT-4o and GPT-4o-mini are accessible to all paid accounts. GPT-4 Turbo and newer models may require Tier 1 or higher. If you get model not found for a model that definitely exists, your account tier may be too low — check platform.openai.com/settings/organization/limits.

Async code not running in Python

If you’re using the async client (AsyncOpenAI) and the call never completes or throws RuntimeError: Event loop is closed, you’re mixing async and sync incorrectly. See Python asyncio not running for common async/await pitfalls:

from openai import AsyncOpenAI
import asyncio

client = AsyncOpenAI()

async def main():
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Requests work locally but fail in production

Check for these environment-specific issues:

Missing env var — production environment (Vercel, Railway, Fly.io) doesn’t have OPENAI_API_KEY set
Timeout too short — serverless functions have execution time limits; a 30-second max function timeout will cut off long completions
Cold start latency — first request on a cold serverless function takes extra time; add this to your timeout calculation
Egress blocked — some cloud environments block outbound traffic by default; explicitly allow api.openai.com:443

For Vercel deployments, also check Vercel deployment failed for environment variable configuration in the project settings.

Function calling returns the wrong tool name

When tool_choice is "auto", the model occasionally invents tool names that almost match your real ones — get_weather_v2 when you defined get_weather. Always validate the returned name against your declared tools before executing. The same prompt may pick a different tool on Anthropic or Gemini, so do not rely on consistent selection across providers when switching.

Structured output drifts after a model upgrade

Pinning model="gpt-4o" resolves to the latest snapshot, which can change behavior overnight. For deterministic JSON output, pin the dated snapshot (e.g. gpt-4o-2024-11-20) and migrate intentionally. Combine with response_format={"type": "json_schema", "json_schema": {...}} for hard schema enforcement — this is more reliable than tool_choice hacks and roughly mirrors Mistral’s response_format and Gemini’s responseSchema.

Frameworks like LangChain hide the real error

LangChain wraps OpenAI errors in its own exceptions, often losing the request_id and original status code. If you cannot tell whether it is a 429 or a 401 from the traceback, drop the abstraction layer and call the OpenAI SDK directly for that step. See langchain python not working for unwrapping nested chain errors and exposing the underlying HTTP response.