Skip to content

Fix: OpenAI API Not Working — RateLimitError, 401, 429, and Connection Issues

FixDevs ·

Quick Answer

How to fix OpenAI API errors — RateLimitError (429), AuthenticationError (401), APIConnectionError, context length exceeded, model not found, and SDK v0-to-v1 migration mistakes.

The Error

You call the OpenAI API and get a 429:

openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details.', 'type': 'insufficient_quota', 'code': 'insufficient_quota'}}

Or a 401:

openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-proj-...', 'type': 'invalid_request_error', 'code': 'invalid_api_key'}}

Or a connection error:

openai.APIConnectionError: Connection error.

Or a 400 for context length:

openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 132000 tokens.", 'type': 'invalid_request_error', 'code': 'context_length_exceeded'}}

Or a broken import that made sense six months ago:

AttributeError: module 'openai' has no attribute 'ChatCompletion'

The last one means you upgraded the SDK without reading the migration guide. All of these are fixable.

Why This Happens

OpenAI’s API has several distinct failure modes that look similar on the surface but have completely different root causes and fixes. The error class, status code, and error type field together tell you exactly what went wrong and whether retrying will help.

The Python SDK (v1.x) and Node.js SDK (v4.x) both expose named error classes that make this handling straightforward once you know them.

Fix 1: RateLimitError (HTTP 429)

A 429 response can mean two completely different things. Check the type field before doing anything:

  • rate_limit_exceeded: You’re sending too many requests or too many tokens per minute. Safe to retry with backoff.
  • insufficient_quota: You’ve run out of credits or your payment failed. Retrying does nothing — fix billing first.
from openai import OpenAI, RateLimitError

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitError as e:
    error_type = e.body.get("error", {}).get("type", "")
    if error_type == "insufficient_quota":
        # Billing issue — check platform.openai.com/settings/billing
        raise RuntimeError("OpenAI quota exhausted — check your billing") from e
    else:
        # Actual rate limit — retry with backoff
        raise
import OpenAI, { RateLimitError } from "openai";

const client = new OpenAI();

try {
    const response = await client.chat.completions.create({ ... });
} catch (error) {
    if (error instanceof RateLimitError) {
        const type = error.error?.type;
        if (type === "insufficient_quota") {
            throw new Error("OpenAI quota exhausted — check billing");
        }
        // rate_limit_exceeded: retry with backoff
    }
}

Common Mistake: When you get a 429, the instinct is to immediately retry. But if error.type is insufficient_quota, retrying is pointless — your account is out of credits and no amount of backoff will change that. Always check the error type before writing retry logic. Only rate_limit_exceeded is worth retrying.

For rate limits, use exponential backoff with jitter. Both SDKs retry automatically by default (2 retries on 429), but if you need more control:

import time
import random
from openai import RateLimitError

def call_with_backoff(client, **kwargs):
    max_retries = 5
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError as e:
            if "insufficient_quota" in str(e):
                raise
            if attempt == max_retries - 1:
                raise
            # Exponential backoff + jitter
            delay = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(delay)

For production Python workloads, the tenacity library handles this cleanly:

from tenacity import retry, stop_after_attempt, wait_random_exponential
from openai import RateLimitError

@retry(
    wait=wait_random_exponential(multiplier=1, min=4, max=60),
    stop=stop_after_attempt(6),
    reraise=True
)
def call_api(**kwargs):
    return client.chat.completions.create(**kwargs)

OpenAI’s rate limits are per model and cover RPM (requests per minute), TPM (tokens per minute), and RPD (requests per day). If you’re hitting limits regularly, check your tier at platform.openai.com/settings/organization/limits — quotas increase automatically as you spend more.

Fix 2: AuthenticationError (HTTP 401)

A 401 almost always comes down to one of three things: wrong key, wrong format, or wrong organization/project header.

Verify your key is set correctly:

# Check the variable is actually set
echo $OPENAI_API_KEY

# Common mistake: trailing whitespace or newline in .env
cat -A .env | grep OPENAI_API_KEY

In Python:

from openai import OpenAI
import os

# Don't hardcode — use environment variable
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

In Node.js:

import OpenAI from "openai";

// SDK reads OPENAI_API_KEY automatically if you don't pass apiKey
const client = new OpenAI();

// Or explicit:
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Common key format mistakes:

ProblemExampleFix
Missing prefixabc123...Must start with sk-
Extra whitespacesk-proj-...Trim whitespace
Quotes included"sk-proj-..."Remove quotes from .env
Wrong env var nameOPENAI_KEYMust be OPENAI_API_KEY

Organization and project header mismatch:

If you’re a member of multiple organizations or projects, passing the wrong organization or project parameter causes 401 even with a valid key:

# Wrong: org ID from a different account
client = OpenAI(
    api_key="sk-proj-...",
    organization="org-WRONGID"  # Must match the key's org
)

Fix: If you only belong to one organization, omit the organization parameter entirely. If you have multiple, find the correct ID at platform.openai.com/account/org-home/organization-settings.

Project-scoped keys vs. legacy keys:

New API keys created in the OpenAI dashboard are project-scoped (sk-proj-...). These don’t need organization or project headers — the key already carries that context. If you’re using an older sk-... key, it’s a legacy user key; consider rotating to a project-scoped key to avoid header complexity.

Fix 3: APIConnectionError and APITimeoutError

Connection errors happen before you get a response. Timeout errors happen when the API takes longer than your configured limit.

Configure an appropriate timeout:

import httpx
from openai import OpenAI

client = OpenAI(
    timeout=httpx.Timeout(
        60.0,        # Total timeout
        connect=5.0, # Connection establishment
        read=55.0,   # Reading the response body
        write=5.0    # Sending the request
    ),
    max_retries=3
)
const client = new OpenAI({
    timeout: 60 * 1000,  // 60 seconds in ms
    maxRetries: 3
});

The default timeout is 10 minutes in Python and 10 minutes in Node.js for the full request. That’s fine for most requests but too long for latency-sensitive applications.

For streaming, long completions need more time. If you’re generating thousands of tokens, a 30-second timeout will cut you off mid-stream. Set the timeout based on worst-case output length:

# A 2000-token response at ~60 tokens/second takes ~33 seconds
client = OpenAI(timeout=90.0)  # 90 second total

Connection errors from proxies or firewalls:

If APIConnectionError is consistent (not intermittent), check:

  1. Your environment uses a proxy that intercepts SSL — set REQUESTS_CA_BUNDLE to your corporate CA bundle
  2. A firewall blocks api.openai.com on port 443
  3. You’re running inside a container or restricted network
import httpx
from openai import OpenAI

# If you need a proxy:
client = OpenAI(
    http_client=httpx.Client(proxy="http://your-proxy:8080")
)

Fix 4: BadRequestError (HTTP 400)

Context length exceeded:

The error message tells you exactly what happened:

This model's maximum context length is 128000 tokens. However, you requested 132000 tokens
(2000 in the messages, 130000 in the completion).

The total token count is input tokens + max_completion_tokens. Reduce one or both:

import tiktoken

def count_tokens(messages, model="gpt-4o-mini"):
    enc = tiktoken.encoding_for_model(model)
    return sum(len(enc.encode(m["content"])) for m in messages)

messages = [...]
model = "gpt-4o-mini"  # context window: 128k tokens
max_context = 128000
max_completion = 4096

input_tokens = count_tokens(messages, model)
if input_tokens + max_completion > max_context:
    # Truncate oldest messages (keep system prompt)
    while count_tokens(messages, model) + max_completion > max_context:
        messages.pop(1)  # Remove oldest non-system message

Model not found:

The model 'gpt-4' does not exist or you do not have access to it.

OpenAI deprecates model versions and requires you to use specific snapshot names or the latest aliases. Check platform.openai.com/docs/models for current model names. Common mistakes:

What you wroteWhat to use instead
gpt-4gpt-4o or gpt-4o-mini
gpt-3.5-turbo-16kgpt-4o-mini (faster, cheaper)
gpt-4-32kgpt-4o

Also verify your account has access. GPT-4 class models require at least Tier 1 (you’ve made a successful payment).

Fix 5: InternalServerError (HTTP 500+)

500 errors are OpenAI’s fault, not yours. The SDK retries them automatically up to 2 times. If they persist:

  1. Check status.openai.com for ongoing incidents
  2. Log the request_id from the error — if you need to contact support, this is the ID they need:
from openai import OpenAI, InternalServerError

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
except InternalServerError as e:
    print(f"Server error. Request ID: {e.request_id}")
    print(f"Status: {e.status_code}")
    # Log and retry or fail gracefully
import OpenAI, { InternalServerError } from "openai";

const client = new OpenAI();

try {
    const response = await client.chat.completions.create({ ... });
} catch (error) {
    if (error instanceof InternalServerError) {
        console.error(`Server error. Request ID: ${error.request_id}`);
    }
}

500 errors are safe to retry. Build a circuit breaker if you see sustained 500s — don’t hammer the API when OpenAI is having an incident.

Fix 6: SDK Migration Errors (v0→v1 Python, v3→v4 Node.js)

If you upgraded the SDK and your existing code broke, the APIs changed significantly between major versions.

Python: v0 → v1

# OLD (v0 — no longer works):
import openai
openai.api_key = "sk-..."
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[...]
)
text = response["choices"][0]["message"]["content"]

# NEW (v1):
from openai import OpenAI
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...]
)
text = response.choices[0].message.content  # Object access, not dict

Error handling changed too:

# OLD (v0):
import openai
try:
    ...
except openai.error.RateLimitError:  # error submodule — broken in v1
    ...

# NEW (v1) — two equivalent styles:
import openai
try:
    ...
except openai.RateLimitError:
    ...

# Or with explicit import:
from openai import RateLimitError
try:
    ...
except RateLimitError:
    ...

Node.js: v3 → v4

// OLD (v3 — no longer works):
const { Configuration, OpenAIApi } = require("openai");
const openai = new OpenAIApi(new Configuration({ apiKey: "..." }));
const response = await openai.createChatCompletion({ ... });
const text = response.data.choices[0].message.content; // .data

// NEW (v4):
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "..." });
const response = await client.chat.completions.create({ ... });
const text = response.choices[0].message.content; // No .data wrapper

The .data wrapper is the most common v3→v4 breakage — removing it fixes Cannot read properties of undefined (reading 'choices').

Fix 7: Streaming Errors

Errors during streaming can occur before the stream starts (same as normal errors) or mid-stream. Always wrap streaming code in a try-catch:

from openai import OpenAI, APIConnectionError, APITimeoutError, RateLimitError

client = OpenAI()

try:
    with client.chat.completions.stream(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Write a long essay."}]
    ) as stream:
        for chunk in stream:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)
except RateLimitError:
    print("\nRate limited — retry later")
except APIConnectionError:
    print("\nConnection dropped mid-stream")
except APITimeoutError:
    print("\nRequest timed out — increase timeout for long completions")
import OpenAI, { APIConnectionError, RateLimitError } from "openai";

const client = new OpenAI();

try {
    const stream = await client.chat.completions.create({
        model: "gpt-4o-mini",
        messages: [{ role: "user", content: "Write a long essay." }],
        stream: true
    });

    for await (const chunk of stream) {
        process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
    }
} catch (error) {
    if (error instanceof RateLimitError) {
        console.error("\nRate limited");
    } else if (error instanceof APIConnectionError) {
        console.error("\nConnection error");
    }
}

Note: If a connection drops after the stream starts, the error message may be generic. The request_id from the error object is the most reliable way to identify which request failed when debugging with OpenAI support.

Still Not Working?

The SDK retries twice and still fails

The SDK’s built-in retry only covers 408, 429 (rate_limit_exceeded only), and 5xx errors. It does not retry 400 errors (context, model not found) or 401 errors (authentication). If you’re seeing consistent failures, check which error class you’re actually getting:

import openai

try:
    client.chat.completions.create(...)
except openai.APIError as e:
    print(type(e).__name__)  # Print the actual class name
    print(e.status_code)
    print(e.body)

Environment variable not loading

In Python, python-dotenv doesn’t auto-load your .env file — you have to call load_dotenv(). If your env vars aren’t loading at all, see dotenv not loading for the full list of causes.

from dotenv import load_dotenv
load_dotenv()  # Must call this before OpenAI()

from openai import OpenAI
client = OpenAI()  # Now picks up OPENAI_API_KEY from .env

In Next.js, .env.local variables with OPENAI_API_KEY are server-only by default — they won’t be available in browser-side code. Use them only in API routes, Server Actions, or getServerSideProps. See Next.js env variables not working for the full breakdown.

429 on first request (free tier)

Free tier accounts have very low limits (typically 3 RPM). If you hit 429 on your first few calls after signing up, you’re hitting the free tier cap. Add a payment method at platform.openai.com/settings/billing to move to Tier 1, which has significantly higher limits.

Model access denied (400 or 404)

Some models require a minimum spend tier. GPT-4o and GPT-4o-mini are accessible to all paid accounts. GPT-4 Turbo and newer models may require Tier 1 or higher. If you get model not found for a model that definitely exists, your account tier may be too low — check platform.openai.com/settings/organization/limits.

Async code not running in Python

If you’re using the async client (AsyncOpenAI) and the call never completes or throws RuntimeError: Event loop is closed, you’re mixing async and sync incorrectly. See Python asyncio not running for common async/await pitfalls:

from openai import AsyncOpenAI
import asyncio

client = AsyncOpenAI()

async def main():
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Requests work locally but fail in production

Check for these environment-specific issues:

  1. Missing env var — production environment (Vercel, Railway, Fly.io) doesn’t have OPENAI_API_KEY set
  2. Timeout too short — serverless functions have execution time limits; a 30-second max function timeout will cut off long completions
  3. Cold start latency — first request on a cold serverless function takes extra time; add this to your timeout calculation
  4. Egress blocked — some cloud environments block outbound traffic by default; explicitly allow api.openai.com:443

For Vercel deployments, also check Vercel deployment failed for environment variable configuration in the project settings.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles