Fix: Instructor Not Working — Validation Loops, Mode Mismatch, Streaming, and Anthropic / Gemini Issues

Q: How do I fix "Instructor Not Working — Validation Loops, Mode Mismatch, Streaming, and Anthropic / Gemini Issues"?

How to fix Python Instructor errors — ValidationError loops, max_retries exhausted, mode=Mode.TOOLS vs JSON, partial streaming type errors, Anthropic and Gemini client patching, token usage tracking.

The Error

You call an Instructor-patched OpenAI client and the request keeps retrying until it dies:

import instructor
from openai import OpenAI
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

client = instructor.from_openai(OpenAI())

user = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=User,
    messages=[{"role": "user", "content": "Tell me about Alice"}],
    max_retries=3,
)
# instructor.exceptions.InstructorRetryException:
# 1 validation error for User
# age
#   Input should be a valid integer [type=int_type, input_value='unknown']

Or you switch to Anthropic and it complains about the mode:

ValueError: Mode TOOLS is not supported by Anthropic. Use Mode.ANTHROPIC_TOOLS or Mode.ANTHROPIC_JSON.

Or your partial streaming returns objects with None everywhere:

for partial in client.chat.completions.create_partial(...):
    print(partial)
# User(name=None, age=None)
# User(name=None, age=None)

Or you upgraded openai and now nothing patches:

AttributeError: 'OpenAI' object has no attribute 'chat'

Why This Happens

Instructor wraps an LLM client and re-prompts the model when its output fails Pydantic validation. The wrapping happens in three layers, and most failures map to one of them:

Mode mismatch — Instructor uses different prompting strategies depending on what the underlying API supports. OpenAI defaults to Mode.TOOLS (function calling). Anthropic needs Mode.ANTHROPIC_TOOLS. Gemini needs Mode.GEMINI_JSON. Passing the wrong mode for the provider produces silent garbage or hard errors.
Validation loop exhaustion — If your response_model is too strict (required field the model can’t infer) or your prompt doesn’t ask for the data, every retry fails and you hit max_retries. The library does the right thing — it surfaces the last validation error so you can fix the prompt or model.
Client version skew — instructor.from_openai() and instructor.from_anthropic() expect specific client API shapes. Mixing an old openai<1.0 client with new Instructor breaks the patching.

The streaming None issue is by design but trips everyone the first time: partials yield as fields arrive, so early partials genuinely don’t have most fields set yet.

A subtler source of pain is schema drift. Pydantic models get tightened over time (a field becomes required, an enum gets a new value, a string gets a regex pattern). Each tightening is a chance for the LLM to fall behind. The model that handled the schema fine a month ago now fails validation 5% of the time, and the failure mode is InstructorRetryException after burning three retries’ worth of tokens. The thing degrading isn’t Instructor or the model — it’s the gap between your schema and what the model can reliably produce, and that gap widens silently as schemas evolve.

Production Incident Lens: When Structured Output Validation Collapses

The Instructor production incident usually looks like this: an LLM-powered endpoint that has been stable for weeks suddenly starts returning 500s at 2% then 5% then 20% of requests. The traceback is always InstructorRetryException: max_retries exceeded with a Pydantic validation error attached. The blast radius is every endpoint that depends on this structured-output call — typically the highest-margin AI features (intake classification, extract-from-PDF, agent tool selection).

Diagnose by partitioning. Which model? Which response_model? Which prompt template? Hit the three axes one at a time:

Model-level regression. A provider silently updated their snapshot (you pinned claude-3-5-sonnet instead of claude-3-5-sonnet-20241022, the alias rolled forward, behavior shifted). Re-pin to a dated snapshot and the error rate drops back. Always pin dated snapshots in production — alias-only pins are an outage waiting to happen.
Schema-level regression. Someone added a tighter validator last week (a regex on email, a min_length on summary). Diff the response_model against last month’s version. Revert the validator or relax the constraint, and the failure rate drops the same hour.
Input distribution shift. Real user inputs got harder than your test fixtures. A new customer segment posts longer documents, or a marketing campaign drove non-English traffic into an English-only prompt. Log the inputs that triggered validation failures and read them — pattern usually obvious within five examples.

The right monitoring stack for Instructor-backed endpoints is per-response_model retry rate, per-response_model exhaustion rate, and input-token p95. The retry rate is the leading indicator: when retries climb from 0.5 average to 1.2 average per call, you’re heading toward exhaustion. Alert at 1.0 average retries, not just at exhaustion — by the time exhaustion fires, your token bill is already 3x normal.

Wire Instructor’s hooks (client.on("completion:error", ...) in 1.4+) to your metrics backend so each validation error surfaces with the response_model name and the offending field. Aggregating those errors by field name shows you exactly which schema constraint is failing, which is the single most useful piece of information during the incident.

Fix 1: Pick the Right Mode for Your Provider

Each provider has supported modes. Use the helper instead of guessing:

import instructor
from openai import OpenAI
from anthropic import Anthropic
import google.generativeai as genai

# OpenAI (default mode is TOOLS — usually correct)
openai_client = instructor.from_openai(OpenAI())

# Anthropic — must specify a supported mode
anthropic_client = instructor.from_anthropic(
    Anthropic(),
    mode=instructor.Mode.ANTHROPIC_TOOLS,
)

# Gemini
genai_client = instructor.from_gemini(
    client=genai.GenerativeModel("gemini-1.5-flash"),
    mode=instructor.Mode.GEMINI_JSON,
)

The supported modes per provider, as of Instructor 1.x:

OpenAI: TOOLS (default), TOOLS_STRICT, JSON, MD_JSON, PARALLEL_TOOLS
Anthropic: ANTHROPIC_TOOLS, ANTHROPIC_JSON
Gemini: GEMINI_JSON, GEMINI_TOOLS
Cohere / Mistral / Groq / Ollama: each has its own — check the docs

Pro Tip: Use TOOLS_STRICT on OpenAI when you need structured output to exactly match the schema. It enables OpenAI’s strict schema mode and eliminates a whole class of validation retries — at the cost of slightly higher latency.

Fix 2: Inspect the Last Validation Error, Don’t Just Raise `max_retries`

InstructorRetryException contains the last attempt’s exception. Use it to see what the model actually returned:

try:
    user = client.chat.completions.create(
        model="gpt-4o-mini",
        response_model=User,
        messages=[...],
        max_retries=3,
    )
except instructor.exceptions.InstructorRetryException as e:
    print("Attempts:", e.n_attempts)
    print("Last completion:", e.last_completion)
    print("Validation errors:", e.messages[-1])

Most of the time you’ll see the model returning a string (“unknown”, “not specified”) for a required int field, or refusing the question entirely. The fix is usually one of:

Add Optional[int] to the field
Add a Field(description=...) hint so the model knows what you want
Make the prompt explicit (“Return age as a number, or null if unknown”)

from typing import Optional
from pydantic import BaseModel, Field

class User(BaseModel):
    name: str = Field(description="Full name of the person")
    age: Optional[int] = Field(default=None, description="Age in years, null if unknown")

Fix 3: Validate Computed Fields with `@field_validator`

Instructor re-prompts on Pydantic validation errors, so your validators become part of the LLM correction loop. This is the killer feature — use it:

from pydantic import BaseModel, field_validator

class Answer(BaseModel):
    answer: str
    citations: list[str]

    @field_validator("citations")
    @classmethod
    def must_have_citations(cls, v):
        if len(v) < 1:
            raise ValueError("Answer must include at least one citation URL.")
        return v

If the model returns an answer with no citations, Instructor automatically asks it again with the validator’s error message. After a few iterations you usually get a valid response without writing any orchestration code.

Common Mistake: Using raise ValueError("invalid") with no useful message. The error text is what the LLM sees on retry — write it as if you’re telling a junior dev what’s wrong. “Must include at least one URL starting with https://” beats “invalid.”

Fix 4: Streaming Partials and Iterables

Partial streaming yields a model where each field arrives as the tokens stream in. Early yields have None for fields the model hasn’t produced yet — that’s expected, not a bug:

from instructor import Partial

stream = client.chat.completions.create_partial(
    model="gpt-4o-mini",
    response_model=User,
    messages=[...],
)

for partial in stream:
    # partial.name fills in first, partial.age later
    print(partial.model_dump())

For a list of items where you want each item complete before yielding, use create_iterable:

from typing import Iterable

class City(BaseModel):
    name: str
    country: str

cities = client.chat.completions.create_iterable(
    model="gpt-4o-mini",
    response_model=City,  # singular — Instructor handles the list shape
    messages=[{"role": "user", "content": "List 5 European capitals"}],
)

for city in cities:
    print(city.name, city.country)

Don’t pass list[City] as response_model for streaming — pass the element type and use create_iterable.

Fix 5: Async Clients

Use the async constructors and await the call:

import asyncio
import instructor
from openai import AsyncOpenAI

aclient = instructor.from_openai(AsyncOpenAI())

async def main():
    user = await aclient.chat.completions.create(
        model="gpt-4o-mini",
        response_model=User,
        messages=[...],
    )
    print(user)

asyncio.run(main())

For async iteration over partials or items:

async for partial in aclient.chat.completions.create_partial(...):
    print(partial)

Note: Don’t mix sync and async clients. instructor.from_openai(OpenAI()) returns a sync wrapper; await client.chat.completions.create(...) on that raises TypeError: object ... can't be used in 'await' expression.

Fix 6: Track Token Usage Without Losing the Validated Object

When you call client.chat.completions.create(...), you get back the parsed Pydantic model — the raw response with usage info is gone. To get both, use create_with_completion:

user, completion = client.chat.completions.create_with_completion(
    model="gpt-4o-mini",
    response_model=User,
    messages=[...],
)

print(user.name)
print("Tokens used:", completion.usage.total_tokens)

This returns a tuple: your validated model plus the raw provider response (with usage, id, system_fingerprint, etc.).

Fix 7: OpenAI Client Version

Instructor 1.x requires openai>=1.0. If you’re seeing 'OpenAI' object has no attribute 'chat' or similar attribute errors after the from_openai patch, you’re probably on the old SDK:

pip install -U "openai>=1.40" "instructor>=1.4"

Pin them together in pyproject.toml so a future openai minor bump doesn’t break your patching:

[project]
dependencies = [
    "instructor>=1.4,<2.0",
    "openai>=1.40,<2.0",
]

If you can’t upgrade the OpenAI SDK, pin to an older Instructor that supports openai<1.0 (versions before 1.0 — but you really should upgrade).

Fix 8: Pydantic v1 Models

Instructor 1.x requires Pydantic v2. If you still have v1 models lying around:

from pydantic import BaseModel  # v2

# Won't work:
# from pydantic.v1 import BaseModel

Common v1→v2 changes that bite Instructor users:

Config class → model_config dict
validator → field_validator (with @classmethod)
parse_obj → model_validate
.dict() → .model_dump()
Field(..., regex=...) → Field(..., pattern=...)

Run bump-pydantic on your codebase to migrate the obvious cases automatically.

Still Not Working?

A few less-common failures:

response_model=str doesn’t work. Instructor expects a Pydantic model. Wrap primitives: class Result(BaseModel): value: str.
Anthropic returns <thinking> blocks in your strings. Set the system prompt to forbid them, or use Mode.ANTHROPIC_JSON which is stricter about output format.
max_retries doesn’t seem to apply. Pass it explicitly on the call (max_retries=3) rather than relying on defaults, which have changed between versions. For backoff, pass a tenacity.Retrying instance instead of an int.
Cost spikes after enabling retries. Each retry is a full chat completion. Cap with max_retries=3 and prefer schema fixes over higher retry counts.
InstructorRetryException on a valid-looking response. Print e.last_completion.choices[0].message — the model may be wrapping JSON in markdown fences. Switch to Mode.MD_JSON or Mode.TOOLS_STRICT.
Optional[Foo] field becomes a string "None". The model is hallucinating the literal string. Tighten the field description: “Return null if unknown — not the string ‘None’.”
Local Ollama / vLLM gives empty objects. Smaller open models often can’t follow tool-use schemas reliably. Use Mode.JSON or Mode.MD_JSON with a stricter prompt, and validate aggressively.
from_anthropic raises BadRequestError: tool_use_id. You’re sending an Anthropic message that mixes old/new tool formats. Reset the conversation or use Instructor’s helpers instead of building messages by hand.

Validation Retries Inflate Latency Past SLA

Each retry is another full round-trip to the LLM. A max_retries=3 call that hits the cap takes 4x the latency of a clean call, and the timeout that downstream services were sized against suddenly fires. For latency-sensitive endpoints, cap retries at 1 (often 0), and surface the validation failure to the caller as a clean 422 instead of silently burning seconds and tokens. Use max_retries=tenacity.Retrying(...) for fine-grained policy if you do need retries.

Tool-Call IDs Drift Across Provider Versions

When you store conversation history and replay it later, tool_use IDs that worked at write time can fail at replay time because Anthropic or OpenAI changed the ID format between SDK versions. Always replay against the same model snapshot the conversation was written with, or strip the tool-use turns and reconstruct them from the structured result. Replaying a year-old conversation against the latest provider SDK is the most reliable way to surface this bug, usually one week before a customer demo.

Streaming Partials Trigger Validators That Fail Mid-Stream

If your response_model has a @field_validator that runs on partial state (e.g., requires a list to have at least one element), partial yields fail validation because the list is still empty halfway through generation. Mark validators as mode="after" and use Partial[YourModel] for streaming — partials skip required-field checks but still respect type checks. If you need a validator that runs only on the final complete object, run it outside Instructor’s loop on the final yielded partial.

For related Pydantic and LLM client issues, see Pydantic validation error, OpenAI API not working, LangChain Python not working, and Ollama not working.

Fix: Instructor Not Working — Validation Loops, Mode Mismatch, Streaming, and Anthropic / Gemini Issues

The Error

Why This Happens

Production Incident Lens: When Structured Output Validation Collapses

Fix 1: Pick the Right Mode for Your Provider

Fix 2: Inspect the Last Validation Error, Don’t Just Raise `max_retries`

Fix 3: Validate Computed Fields with `@field_validator`

Fix 4: Streaming Partials and Iterables

Fix 5: Async Clients

Fix 6: Track Token Usage Without Losing the Validated Object

Fix 7: OpenAI Client Version

Fix 8: Pydantic v1 Models

Still Not Working?

Validation Retries Inflate Latency Past SLA

Tool-Call IDs Drift Across Provider Versions

Streaming Partials Trigger Validators That Fail Mid-Stream

Related Articles

Fix: LiteLLM Not Working — Model Name Format, API Keys, Streaming, and Fallback Errors

Fix: Outlines Not Working — Backend Setup, Pydantic Schemas, Regex, Choice, and Slow Sampling

Fix: DSPy Not Working — LM Configuration, Signatures, Modules, Optimizers, and Cache Surprises

Fix: Langfuse Not Working — SDK Init, Tracing Generations, LangChain Wrapper, and Self-Hosted Setup

The Error

Why This Happens

Production Incident Lens: When Structured Output Validation Collapses

Fix 1: Pick the Right Mode for Your Provider

Fix 2: Inspect the Last Validation Error, Don’t Just Raise max_retries

Fix 3: Validate Computed Fields with @field_validator

Fix 4: Streaming Partials and Iterables

Fix 5: Async Clients

Fix 6: Track Token Usage Without Losing the Validated Object

Fix 7: OpenAI Client Version

Fix 8: Pydantic v1 Models

Still Not Working?

Validation Retries Inflate Latency Past SLA

Tool-Call IDs Drift Across Provider Versions

Streaming Partials Trigger Validators That Fail Mid-Stream

Related Articles

Fix: LiteLLM Not Working — Model Name Format, API Keys, Streaming, and Fallback Errors

Fix: Outlines Not Working — Backend Setup, Pydantic Schemas, Regex, Choice, and Slow Sampling

Fix: DSPy Not Working — LM Configuration, Signatures, Modules, Optimizers, and Cache Surprises

Fix: Langfuse Not Working — SDK Init, Tracing Generations, LangChain Wrapper, and Self-Hosted Setup

Fix 2: Inspect the Last Validation Error, Don’t Just Raise `max_retries`

Fix 3: Validate Computed Fields with `@field_validator`