Fix: Instructor Not Working — Validation Loops, Mode Mismatch, Streaming, and Anthropic / Gemini Issues
Part of: Python Errors
Quick Answer
How to fix Python Instructor errors — ValidationError loops, max_retries exhausted, mode=Mode.TOOLS vs JSON, partial streaming type errors, Anthropic and Gemini client patching, token usage tracking.
The Error
You call an Instructor-patched OpenAI client and the request keeps retrying until it dies:
import instructor
from openai import OpenAI
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
client = instructor.from_openai(OpenAI())
user = client.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
messages=[{"role": "user", "content": "Tell me about Alice"}],
max_retries=3,
)
# instructor.exceptions.InstructorRetryException:
# 1 validation error for User
# age
# Input should be a valid integer [type=int_type, input_value='unknown']Or you switch to Anthropic and it complains about the mode:
ValueError: Mode TOOLS is not supported by Anthropic. Use Mode.ANTHROPIC_TOOLS or Mode.ANTHROPIC_JSON.Or your partial streaming returns objects with None everywhere:
for partial in client.chat.completions.create_partial(...):
print(partial)
# User(name=None, age=None)
# User(name=None, age=None)Or you upgraded openai and now nothing patches:
AttributeError: 'OpenAI' object has no attribute 'chat'Why This Happens
Instructor wraps an LLM client and re-prompts the model when its output fails Pydantic validation. The wrapping happens in three layers, and most failures map to one of them:
- Mode mismatch — Instructor uses different prompting strategies depending on what the underlying API supports. OpenAI defaults to
Mode.TOOLS(function calling). Anthropic needsMode.ANTHROPIC_TOOLS. Gemini needsMode.GEMINI_JSON. Passing the wrong mode for the provider produces silent garbage or hard errors. - Validation loop exhaustion — If your
response_modelis too strict (required field the model can’t infer) or your prompt doesn’t ask for the data, every retry fails and you hitmax_retries. The library does the right thing — it surfaces the last validation error so you can fix the prompt or model. - Client version skew —
instructor.from_openai()andinstructor.from_anthropic()expect specific client API shapes. Mixing an oldopenai<1.0client with new Instructor breaks the patching.
The streaming None issue is by design but trips everyone the first time: partials yield as fields arrive, so early partials genuinely don’t have most fields set yet.
A subtler source of pain is schema drift. Pydantic models get tightened over time (a field becomes required, an enum gets a new value, a string gets a regex pattern). Each tightening is a chance for the LLM to fall behind. The model that handled the schema fine a month ago now fails validation 5% of the time, and the failure mode is InstructorRetryException after burning three retries’ worth of tokens. The thing degrading isn’t Instructor or the model — it’s the gap between your schema and what the model can reliably produce, and that gap widens silently as schemas evolve.
Production Incident Lens: When Structured Output Validation Collapses
The Instructor production incident usually looks like this: an LLM-powered endpoint that has been stable for weeks suddenly starts returning 500s at 2% then 5% then 20% of requests. The traceback is always InstructorRetryException: max_retries exceeded with a Pydantic validation error attached. The blast radius is every endpoint that depends on this structured-output call — typically the highest-margin AI features (intake classification, extract-from-PDF, agent tool selection).
Diagnose by partitioning. Which model? Which response_model? Which prompt template? Hit the three axes one at a time:
- Model-level regression. A provider silently updated their snapshot (you pinned
claude-3-5-sonnetinstead ofclaude-3-5-sonnet-20241022, the alias rolled forward, behavior shifted). Re-pin to a dated snapshot and the error rate drops back. Always pin dated snapshots in production — alias-only pins are an outage waiting to happen. - Schema-level regression. Someone added a tighter validator last week (a regex on
email, amin_lengthonsummary). Diff the response_model against last month’s version. Revert the validator or relax the constraint, and the failure rate drops the same hour. - Input distribution shift. Real user inputs got harder than your test fixtures. A new customer segment posts longer documents, or a marketing campaign drove non-English traffic into an English-only prompt. Log the inputs that triggered validation failures and read them — pattern usually obvious within five examples.
The right monitoring stack for Instructor-backed endpoints is per-response_model retry rate, per-response_model exhaustion rate, and input-token p95. The retry rate is the leading indicator: when retries climb from 0.5 average to 1.2 average per call, you’re heading toward exhaustion. Alert at 1.0 average retries, not just at exhaustion — by the time exhaustion fires, your token bill is already 3x normal.
Wire Instructor’s hooks (client.on("completion:error", ...) in 1.4+) to your metrics backend so each validation error surfaces with the response_model name and the offending field. Aggregating those errors by field name shows you exactly which schema constraint is failing, which is the single most useful piece of information during the incident.
Fix 1: Pick the Right Mode for Your Provider
Each provider has supported modes. Use the helper instead of guessing:
import instructor
from openai import OpenAI
from anthropic import Anthropic
import google.generativeai as genai
# OpenAI (default mode is TOOLS — usually correct)
openai_client = instructor.from_openai(OpenAI())
# Anthropic — must specify a supported mode
anthropic_client = instructor.from_anthropic(
Anthropic(),
mode=instructor.Mode.ANTHROPIC_TOOLS,
)
# Gemini
genai_client = instructor.from_gemini(
client=genai.GenerativeModel("gemini-1.5-flash"),
mode=instructor.Mode.GEMINI_JSON,
)The supported modes per provider, as of Instructor 1.x:
- OpenAI:
TOOLS(default),TOOLS_STRICT,JSON,MD_JSON,PARALLEL_TOOLS - Anthropic:
ANTHROPIC_TOOLS,ANTHROPIC_JSON - Gemini:
GEMINI_JSON,GEMINI_TOOLS - Cohere / Mistral / Groq / Ollama: each has its own — check the docs
Pro Tip: Use TOOLS_STRICT on OpenAI when you need structured output to exactly match the schema. It enables OpenAI’s strict schema mode and eliminates a whole class of validation retries — at the cost of slightly higher latency.
Fix 2: Inspect the Last Validation Error, Don’t Just Raise max_retries
InstructorRetryException contains the last attempt’s exception. Use it to see what the model actually returned:
try:
user = client.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
messages=[...],
max_retries=3,
)
except instructor.exceptions.InstructorRetryException as e:
print("Attempts:", e.n_attempts)
print("Last completion:", e.last_completion)
print("Validation errors:", e.messages[-1])Most of the time you’ll see the model returning a string (“unknown”, “not specified”) for a required int field, or refusing the question entirely. The fix is usually one of:
- Add
Optional[int]to the field - Add a
Field(description=...)hint so the model knows what you want - Make the prompt explicit (“Return age as a number, or null if unknown”)
from typing import Optional
from pydantic import BaseModel, Field
class User(BaseModel):
name: str = Field(description="Full name of the person")
age: Optional[int] = Field(default=None, description="Age in years, null if unknown")Fix 3: Validate Computed Fields with @field_validator
Instructor re-prompts on Pydantic validation errors, so your validators become part of the LLM correction loop. This is the killer feature — use it:
from pydantic import BaseModel, field_validator
class Answer(BaseModel):
answer: str
citations: list[str]
@field_validator("citations")
@classmethod
def must_have_citations(cls, v):
if len(v) < 1:
raise ValueError("Answer must include at least one citation URL.")
return vIf the model returns an answer with no citations, Instructor automatically asks it again with the validator’s error message. After a few iterations you usually get a valid response without writing any orchestration code.
Common Mistake: Using raise ValueError("invalid") with no useful message. The error text is what the LLM sees on retry — write it as if you’re telling a junior dev what’s wrong. “Must include at least one URL starting with https://” beats “invalid.”
Fix 4: Streaming Partials and Iterables
Partial streaming yields a model where each field arrives as the tokens stream in. Early yields have None for fields the model hasn’t produced yet — that’s expected, not a bug:
from instructor import Partial
stream = client.chat.completions.create_partial(
model="gpt-4o-mini",
response_model=User,
messages=[...],
)
for partial in stream:
# partial.name fills in first, partial.age later
print(partial.model_dump())For a list of items where you want each item complete before yielding, use create_iterable:
from typing import Iterable
class City(BaseModel):
name: str
country: str
cities = client.chat.completions.create_iterable(
model="gpt-4o-mini",
response_model=City, # singular — Instructor handles the list shape
messages=[{"role": "user", "content": "List 5 European capitals"}],
)
for city in cities:
print(city.name, city.country)Don’t pass list[City] as response_model for streaming — pass the element type and use create_iterable.
Fix 5: Async Clients
Use the async constructors and await the call:
import asyncio
import instructor
from openai import AsyncOpenAI
aclient = instructor.from_openai(AsyncOpenAI())
async def main():
user = await aclient.chat.completions.create(
model="gpt-4o-mini",
response_model=User,
messages=[...],
)
print(user)
asyncio.run(main())For async iteration over partials or items:
async for partial in aclient.chat.completions.create_partial(...):
print(partial)Note: Don’t mix sync and async clients. instructor.from_openai(OpenAI()) returns a sync wrapper; await client.chat.completions.create(...) on that raises TypeError: object ... can't be used in 'await' expression.
Fix 6: Track Token Usage Without Losing the Validated Object
When you call client.chat.completions.create(...), you get back the parsed Pydantic model — the raw response with usage info is gone. To get both, use create_with_completion:
user, completion = client.chat.completions.create_with_completion(
model="gpt-4o-mini",
response_model=User,
messages=[...],
)
print(user.name)
print("Tokens used:", completion.usage.total_tokens)This returns a tuple: your validated model plus the raw provider response (with usage, id, system_fingerprint, etc.).
Fix 7: OpenAI Client Version
Instructor 1.x requires openai>=1.0. If you’re seeing 'OpenAI' object has no attribute 'chat' or similar attribute errors after the from_openai patch, you’re probably on the old SDK:
pip install -U "openai>=1.40" "instructor>=1.4"Pin them together in pyproject.toml so a future openai minor bump doesn’t break your patching:
[project]
dependencies = [
"instructor>=1.4,<2.0",
"openai>=1.40,<2.0",
]If you can’t upgrade the OpenAI SDK, pin to an older Instructor that supports openai<1.0 (versions before 1.0 — but you really should upgrade).
Fix 8: Pydantic v1 Models
Instructor 1.x requires Pydantic v2. If you still have v1 models lying around:
from pydantic import BaseModel # v2
# Won't work:
# from pydantic.v1 import BaseModelCommon v1→v2 changes that bite Instructor users:
Configclass →model_configdictvalidator→field_validator(with@classmethod)parse_obj→model_validate.dict()→.model_dump()Field(..., regex=...)→Field(..., pattern=...)
Run bump-pydantic on your codebase to migrate the obvious cases automatically.
Still Not Working?
A few less-common failures:
response_model=strdoesn’t work. Instructor expects a Pydantic model. Wrap primitives:class Result(BaseModel): value: str.- Anthropic returns
<thinking>blocks in your strings. Set the system prompt to forbid them, or useMode.ANTHROPIC_JSONwhich is stricter about output format. max_retriesdoesn’t seem to apply. Pass it explicitly on the call (max_retries=3) rather than relying on defaults, which have changed between versions. For backoff, pass atenacity.Retryinginstance instead of an int.- Cost spikes after enabling retries. Each retry is a full chat completion. Cap with
max_retries=3and prefer schema fixes over higher retry counts. InstructorRetryExceptionon a valid-looking response. Printe.last_completion.choices[0].message— the model may be wrapping JSON in markdown fences. Switch toMode.MD_JSONorMode.TOOLS_STRICT.Optional[Foo]field becomes a string"None". The model is hallucinating the literal string. Tighten the field description: “Return null if unknown — not the string ‘None’.”- Local Ollama / vLLM gives empty objects. Smaller open models often can’t follow tool-use schemas reliably. Use
Mode.JSONorMode.MD_JSONwith a stricter prompt, and validate aggressively. from_anthropicraisesBadRequestError: tool_use_id. You’re sending an Anthropic message that mixes old/new tool formats. Reset the conversation or use Instructor’s helpers instead of building messages by hand.
Validation Retries Inflate Latency Past SLA
Each retry is another full round-trip to the LLM. A max_retries=3 call that hits the cap takes 4x the latency of a clean call, and the timeout that downstream services were sized against suddenly fires. For latency-sensitive endpoints, cap retries at 1 (often 0), and surface the validation failure to the caller as a clean 422 instead of silently burning seconds and tokens. Use max_retries=tenacity.Retrying(...) for fine-grained policy if you do need retries.
Tool-Call IDs Drift Across Provider Versions
When you store conversation history and replay it later, tool_use IDs that worked at write time can fail at replay time because Anthropic or OpenAI changed the ID format between SDK versions. Always replay against the same model snapshot the conversation was written with, or strip the tool-use turns and reconstruct them from the structured result. Replaying a year-old conversation against the latest provider SDK is the most reliable way to surface this bug, usually one week before a customer demo.
Streaming Partials Trigger Validators That Fail Mid-Stream
If your response_model has a @field_validator that runs on partial state (e.g., requires a list to have at least one element), partial yields fail validation because the list is still empty halfway through generation. Mark validators as mode="after" and use Partial[YourModel] for streaming — partials skip required-field checks but still respect type checks. If you need a validator that runs only on the final complete object, run it outside Instructor’s loop on the final yielded partial.
For related Pydantic and LLM client issues, see Pydantic validation error, OpenAI API not working, LangChain Python not working, and Ollama not working.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: LiteLLM Not Working — Model Name Format, API Keys, Streaming, and Fallback Errors
How to fix LiteLLM errors — BadRequestError model not found, missing API key env vars, streaming chunk differences, fallback model not triggering, async drop_params, and proxy server 401.
Fix: Outlines Not Working — Backend Setup, Pydantic Schemas, Regex, Choice, and Slow Sampling
How to fix Python Outlines errors — model backend missing, JSON schema vs Pydantic, regex pattern compilation slow, choice list timing, vLLM/Transformers/Ollama wiring, and streaming structured outputs.
Fix: DSPy Not Working — LM Configuration, Signatures, Modules, Optimizers, and Cache Surprises
How to fix DSPy errors — no LM configured, signature field types, ChainOfThought vs Predict, optimizer (MIPROv2) setup, retrieval module wiring, async usage, and cache invalidation between runs.
Fix: Langfuse Not Working — SDK Init, Tracing Generations, LangChain Wrapper, and Self-Hosted Setup
How to fix Langfuse errors — Python/JS SDK init, trace/span/generation hierarchy, LangChain CallbackHandler, OpenAI wrapper, missing usage/cost data, prompt management, and self-hosted Postgres setup.