Fix: OpenAI API Not Working — RateLimitError, 401, 429, and Connection Issues
Quick Answer
How to fix OpenAI API errors — RateLimitError (429), AuthenticationError (401), APIConnectionError, context length exceeded, model not found, and SDK v0-to-v1 migration mistakes.
The Error
You call the OpenAI API and get a 429:
openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details.', 'type': 'insufficient_quota', 'code': 'insufficient_quota'}}Or a 401:
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-proj-...', 'type': 'invalid_request_error', 'code': 'invalid_api_key'}}Or a connection error:
openai.APIConnectionError: Connection error.Or a 400 for context length:
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 132000 tokens.", 'type': 'invalid_request_error', 'code': 'context_length_exceeded'}}Or a broken import that made sense six months ago:
AttributeError: module 'openai' has no attribute 'ChatCompletion'The last one means you upgraded the SDK without reading the migration guide. All of these are fixable.
Why This Happens
OpenAI’s API has several distinct failure modes that look similar on the surface but have completely different root causes and fixes. The error class, status code, and error type field together tell you exactly what went wrong and whether retrying will help.
The Python SDK (v1.x) and Node.js SDK (v4.x) both expose named error classes that make this handling straightforward once you know them.
Fix 1: RateLimitError (HTTP 429)
A 429 response can mean two completely different things. Check the type field before doing anything:
rate_limit_exceeded: You’re sending too many requests or too many tokens per minute. Safe to retry with backoff.insufficient_quota: You’ve run out of credits or your payment failed. Retrying does nothing — fix billing first.
from openai import OpenAI, RateLimitError
client = OpenAI()
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}]
)
except RateLimitError as e:
error_type = e.body.get("error", {}).get("type", "")
if error_type == "insufficient_quota":
# Billing issue — check platform.openai.com/settings/billing
raise RuntimeError("OpenAI quota exhausted — check your billing") from e
else:
# Actual rate limit — retry with backoff
raiseimport OpenAI, { RateLimitError } from "openai";
const client = new OpenAI();
try {
const response = await client.chat.completions.create({ ... });
} catch (error) {
if (error instanceof RateLimitError) {
const type = error.error?.type;
if (type === "insufficient_quota") {
throw new Error("OpenAI quota exhausted — check billing");
}
// rate_limit_exceeded: retry with backoff
}
}Common Mistake: When you get a 429, the instinct is to immediately retry. But if error.type is insufficient_quota, retrying is pointless — your account is out of credits and no amount of backoff will change that. Always check the error type before writing retry logic. Only rate_limit_exceeded is worth retrying.
For rate limits, use exponential backoff with jitter. Both SDKs retry automatically by default (2 retries on 429), but if you need more control:
import time
import random
from openai import RateLimitError
def call_with_backoff(client, **kwargs):
max_retries = 5
for attempt in range(max_retries):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError as e:
if "insufficient_quota" in str(e):
raise
if attempt == max_retries - 1:
raise
# Exponential backoff + jitter
delay = (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limited. Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_retries})")
time.sleep(delay)For production Python workloads, the tenacity library handles this cleanly:
from tenacity import retry, stop_after_attempt, wait_random_exponential
from openai import RateLimitError
@retry(
wait=wait_random_exponential(multiplier=1, min=4, max=60),
stop=stop_after_attempt(6),
reraise=True
)
def call_api(**kwargs):
return client.chat.completions.create(**kwargs)OpenAI’s rate limits are per model and cover RPM (requests per minute), TPM (tokens per minute), and RPD (requests per day). If you’re hitting limits regularly, check your tier at platform.openai.com/settings/organization/limits — quotas increase automatically as you spend more.
Fix 2: AuthenticationError (HTTP 401)
A 401 almost always comes down to one of three things: wrong key, wrong format, or wrong organization/project header.
Verify your key is set correctly:
# Check the variable is actually set
echo $OPENAI_API_KEY
# Common mistake: trailing whitespace or newline in .env
cat -A .env | grep OPENAI_API_KEYIn Python:
from openai import OpenAI
import os
# Don't hardcode — use environment variable
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])In Node.js:
import OpenAI from "openai";
// SDK reads OPENAI_API_KEY automatically if you don't pass apiKey
const client = new OpenAI();
// Or explicit:
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });Common key format mistakes:
| Problem | Example | Fix |
|---|---|---|
| Missing prefix | abc123... | Must start with sk- |
| Extra whitespace | sk-proj-... | Trim whitespace |
| Quotes included | "sk-proj-..." | Remove quotes from .env |
| Wrong env var name | OPENAI_KEY | Must be OPENAI_API_KEY |
Organization and project header mismatch:
If you’re a member of multiple organizations or projects, passing the wrong organization or project parameter causes 401 even with a valid key:
# Wrong: org ID from a different account
client = OpenAI(
api_key="sk-proj-...",
organization="org-WRONGID" # Must match the key's org
)Fix: If you only belong to one organization, omit the organization parameter entirely. If you have multiple, find the correct ID at platform.openai.com/account/org-home/organization-settings.
Project-scoped keys vs. legacy keys:
New API keys created in the OpenAI dashboard are project-scoped (sk-proj-...). These don’t need organization or project headers — the key already carries that context. If you’re using an older sk-... key, it’s a legacy user key; consider rotating to a project-scoped key to avoid header complexity.
Fix 3: APIConnectionError and APITimeoutError
Connection errors happen before you get a response. Timeout errors happen when the API takes longer than your configured limit.
Configure an appropriate timeout:
import httpx
from openai import OpenAI
client = OpenAI(
timeout=httpx.Timeout(
60.0, # Total timeout
connect=5.0, # Connection establishment
read=55.0, # Reading the response body
write=5.0 # Sending the request
),
max_retries=3
)const client = new OpenAI({
timeout: 60 * 1000, // 60 seconds in ms
maxRetries: 3
});The default timeout is 10 minutes in Python and 10 minutes in Node.js for the full request. That’s fine for most requests but too long for latency-sensitive applications.
For streaming, long completions need more time. If you’re generating thousands of tokens, a 30-second timeout will cut you off mid-stream. Set the timeout based on worst-case output length:
# A 2000-token response at ~60 tokens/second takes ~33 seconds
client = OpenAI(timeout=90.0) # 90 second totalConnection errors from proxies or firewalls:
If APIConnectionError is consistent (not intermittent), check:
- Your environment uses a proxy that intercepts SSL — set
REQUESTS_CA_BUNDLEto your corporate CA bundle - A firewall blocks
api.openai.comon port 443 - You’re running inside a container or restricted network
import httpx
from openai import OpenAI
# If you need a proxy:
client = OpenAI(
http_client=httpx.Client(proxy="http://your-proxy:8080")
)Fix 4: BadRequestError (HTTP 400)
Context length exceeded:
The error message tells you exactly what happened:
This model's maximum context length is 128000 tokens. However, you requested 132000 tokens
(2000 in the messages, 130000 in the completion).The total token count is input tokens + max_completion_tokens. Reduce one or both:
import tiktoken
def count_tokens(messages, model="gpt-4o-mini"):
enc = tiktoken.encoding_for_model(model)
return sum(len(enc.encode(m["content"])) for m in messages)
messages = [...]
model = "gpt-4o-mini" # context window: 128k tokens
max_context = 128000
max_completion = 4096
input_tokens = count_tokens(messages, model)
if input_tokens + max_completion > max_context:
# Truncate oldest messages (keep system prompt)
while count_tokens(messages, model) + max_completion > max_context:
messages.pop(1) # Remove oldest non-system messageModel not found:
The model 'gpt-4' does not exist or you do not have access to it.OpenAI deprecates model versions and requires you to use specific snapshot names or the latest aliases. Check platform.openai.com/docs/models for current model names. Common mistakes:
| What you wrote | What to use instead |
|---|---|
gpt-4 | gpt-4o or gpt-4o-mini |
gpt-3.5-turbo-16k | gpt-4o-mini (faster, cheaper) |
gpt-4-32k | gpt-4o |
Also verify your account has access. GPT-4 class models require at least Tier 1 (you’ve made a successful payment).
Fix 5: InternalServerError (HTTP 500+)
500 errors are OpenAI’s fault, not yours. The SDK retries them automatically up to 2 times. If they persist:
- Check status.openai.com for ongoing incidents
- Log the
request_idfrom the error — if you need to contact support, this is the ID they need:
from openai import OpenAI, InternalServerError
client = OpenAI()
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}]
)
except InternalServerError as e:
print(f"Server error. Request ID: {e.request_id}")
print(f"Status: {e.status_code}")
# Log and retry or fail gracefullyimport OpenAI, { InternalServerError } from "openai";
const client = new OpenAI();
try {
const response = await client.chat.completions.create({ ... });
} catch (error) {
if (error instanceof InternalServerError) {
console.error(`Server error. Request ID: ${error.request_id}`);
}
}500 errors are safe to retry. Build a circuit breaker if you see sustained 500s — don’t hammer the API when OpenAI is having an incident.
Fix 6: SDK Migration Errors (v0→v1 Python, v3→v4 Node.js)
If you upgraded the SDK and your existing code broke, the APIs changed significantly between major versions.
Python: v0 → v1
# OLD (v0 — no longer works):
import openai
openai.api_key = "sk-..."
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[...]
)
text = response["choices"][0]["message"]["content"]
# NEW (v1):
from openai import OpenAI
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
model="gpt-4o",
messages=[...]
)
text = response.choices[0].message.content # Object access, not dictError handling changed too:
# OLD (v0):
import openai
try:
...
except openai.error.RateLimitError: # error submodule — broken in v1
...
# NEW (v1) — two equivalent styles:
import openai
try:
...
except openai.RateLimitError:
...
# Or with explicit import:
from openai import RateLimitError
try:
...
except RateLimitError:
...Node.js: v3 → v4
// OLD (v3 — no longer works):
const { Configuration, OpenAIApi } = require("openai");
const openai = new OpenAIApi(new Configuration({ apiKey: "..." }));
const response = await openai.createChatCompletion({ ... });
const text = response.data.choices[0].message.content; // .data
// NEW (v4):
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "..." });
const response = await client.chat.completions.create({ ... });
const text = response.choices[0].message.content; // No .data wrapperThe .data wrapper is the most common v3→v4 breakage — removing it fixes Cannot read properties of undefined (reading 'choices').
Fix 7: Streaming Errors
Errors during streaming can occur before the stream starts (same as normal errors) or mid-stream. Always wrap streaming code in a try-catch:
from openai import OpenAI, APIConnectionError, APITimeoutError, RateLimitError
client = OpenAI()
try:
with client.chat.completions.stream(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a long essay."}]
) as stream:
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
except RateLimitError:
print("\nRate limited — retry later")
except APIConnectionError:
print("\nConnection dropped mid-stream")
except APITimeoutError:
print("\nRequest timed out — increase timeout for long completions")import OpenAI, { APIConnectionError, RateLimitError } from "openai";
const client = new OpenAI();
try {
const stream = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Write a long essay." }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
} catch (error) {
if (error instanceof RateLimitError) {
console.error("\nRate limited");
} else if (error instanceof APIConnectionError) {
console.error("\nConnection error");
}
}Note: If a connection drops after the stream starts, the error message may be generic. The request_id from the error object is the most reliable way to identify which request failed when debugging with OpenAI support.
Still Not Working?
The SDK retries twice and still fails
The SDK’s built-in retry only covers 408, 429 (rate_limit_exceeded only), and 5xx errors. It does not retry 400 errors (context, model not found) or 401 errors (authentication). If you’re seeing consistent failures, check which error class you’re actually getting:
import openai
try:
client.chat.completions.create(...)
except openai.APIError as e:
print(type(e).__name__) # Print the actual class name
print(e.status_code)
print(e.body)Environment variable not loading
In Python, python-dotenv doesn’t auto-load your .env file — you have to call load_dotenv(). If your env vars aren’t loading at all, see dotenv not loading for the full list of causes.
from dotenv import load_dotenv
load_dotenv() # Must call this before OpenAI()
from openai import OpenAI
client = OpenAI() # Now picks up OPENAI_API_KEY from .envIn Next.js, .env.local variables with OPENAI_API_KEY are server-only by default — they won’t be available in browser-side code. Use them only in API routes, Server Actions, or getServerSideProps. See Next.js env variables not working for the full breakdown.
429 on first request (free tier)
Free tier accounts have very low limits (typically 3 RPM). If you hit 429 on your first few calls after signing up, you’re hitting the free tier cap. Add a payment method at platform.openai.com/settings/billing to move to Tier 1, which has significantly higher limits.
Model access denied (400 or 404)
Some models require a minimum spend tier. GPT-4o and GPT-4o-mini are accessible to all paid accounts. GPT-4 Turbo and newer models may require Tier 1 or higher. If you get model not found for a model that definitely exists, your account tier may be too low — check platform.openai.com/settings/organization/limits.
Async code not running in Python
If you’re using the async client (AsyncOpenAI) and the call never completes or throws RuntimeError: Event loop is closed, you’re mixing async and sync incorrectly. See Python asyncio not running for common async/await pitfalls:
from openai import AsyncOpenAI
import asyncio
client = AsyncOpenAI()
async def main():
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
asyncio.run(main())Requests work locally but fail in production
Check for these environment-specific issues:
- Missing env var — production environment (Vercel, Railway, Fly.io) doesn’t have
OPENAI_API_KEYset - Timeout too short — serverless functions have execution time limits; a 30-second max function timeout will cut off long completions
- Cold start latency — first request on a cold serverless function takes extra time; add this to your timeout calculation
- Egress blocked — some cloud environments block outbound traffic by default; explicitly allow
api.openai.com:443
For Vercel deployments, also check Vercel deployment failed for environment variable configuration in the project settings.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Kafka Consumer Not Receiving Messages, Connection Refused, and Rebalancing Errors
How to fix Apache Kafka issues — consumer not receiving messages, auto.offset.reset, Docker advertised.listeners, max.poll.interval.ms rebalancing, MessageSizeTooLargeException, and KafkaJS errors.
Fix: Django REST Framework 403 Permission Denied
How to fix Django REST Framework 403 Forbidden and permission denied errors — authentication classes, permission classes, IsAuthenticated vs AllowAny, object-level permissions, and CSRF issues.
Fix: Flask CORS Not Working
How to fix CORS errors in Flask — installing flask-cors correctly, handling preflight OPTIONS requests, configuring origins with credentials, route-specific CORS, and debugging missing headers.
Fix: GraphQL 400 Bad Request Error (Query Syntax and Variable Errors)
How to fix GraphQL 400 Bad Request errors — malformed query syntax, variable type mismatches, missing required fields, schema validation failures, and how to debug GraphQL errors from Apollo and fetch.