Fix: Langfuse Not Working — SDK Init, Tracing Generations, LangChain Wrapper, and Self-Hosted Setup
Part of: AI, ML & LLM Errors
Quick Answer
How to fix Langfuse errors — Python/JS SDK init, trace/span/generation hierarchy, LangChain CallbackHandler, OpenAI wrapper, missing usage/cost data, prompt management, and self-hosted Postgres setup.
The Error
You initialize Langfuse but no traces appear in the dashboard:
from langfuse import Langfuse
langfuse = Langfuse()
# Make some LLM calls...
# Dashboard stays empty.Or the LangChain integration doesn’t attach traces:
from langfuse.callback import CallbackHandler
handler = CallbackHandler()
result = llm.invoke("Hello", config={"callbacks": [handler]})
# No trace shows in Langfuse.Or generations have no token usage or cost:
langfuse.generation(
name="gpt-4o-call",
model="gpt-4o",
input=...,
output=...,
# usage missing!
)Or self-hosted Langfuse can’t connect to Postgres:
Error: Connection refused at db:5432Why This Happens
Langfuse is an open-source LLM observability platform. It ingests traces (a call), spans (units of work in a trace), and generations (LLM calls) via SDKs and ships them to a backend (cloud-hosted or self-hosted) where they’re stored, indexed, and made queryable through a UI. The architecture is deliberately fire-and-forget: SDKs batch events in memory and flush them in the background so that adding observability never adds latency to the user’s request. That decoupling is the source of most “I don’t see my traces” reports — the SDK accepted the data but the process exited before the batch was sent.
Common root causes. SDK initialization order. SDKs are async. If your process exits before traces flush, data is lost. Always flush() before exit. Env vars vs constructor args. SDKs read LANGFUSE_SECRET_KEY / LANGFUSE_PUBLIC_KEY / LANGFUSE_HOST. Wrong host (defaults to cloud) or wrong keys silently send to /dev/null because the SDK doesn’t error on auth failure — the failed requests sit in a retry queue and disappear when the process exits. LangChain integration is a callback. You must pass the CallbackHandler as a callback when invoking chains/LLMs. Without it, no instrumentation. Token usage requires explicit data (or use the OpenAI wrapper). Manual generations need usage filled in — Langfuse won’t auto-compute. If you log a generation with output but no usage, the trace appears but cost/token metrics are blank.
A subtler failure mode is project/key mismatch. Langfuse organizes data into projects, and each project has its own public/secret key pair. If you generate one set of keys, copy them to a teammate, and they later create a second project, dashboards split: half your team’s traces go to project A, half to project B, and neither view shows the full picture. Always confirm the project ID in the dashboard URL matches the project you intended.
How Other Tools Handle This
LLM observability is now a crowded category. Each tool makes different choices about what to capture, where the data lives, and how invasive the instrumentation is.
- Langfuse. Open-source backend plus Python/JS SDKs. Trace/span/generation primitives; OpenAI and Anthropic auto-instrumentation wrappers; LangChain
CallbackHandler; built-in prompt management and evals. Strengths: self-hostable on Postgres + ClickHouse, predictable pricing, evals and prompt versioning in one product. Weaknesses: SDK still requires explicitflush()for short-lived processes; usage normalization across providers is your job. - Helicone. Proxy-based. You change
base_urlto Helicone’s URL and every OpenAI call flows through it. Strengths: zero-code instrumentation if you control the client config, automatic caching, rate-limit shimming. Weaknesses: adds a network hop, harder to use with non-HTTP providers (Bedrock SDK doesn’t proxy easily), only sees what the proxy sees. - LangSmith. LangChain’s first-party observability tool. Tightest integration with LangChain primitives (Runnables, LangGraph). Strengths: best-in-class for LangChain users, integrated evals, no extra config when you already use LangChain. Weaknesses: hosted-only, opinionated toward the LangChain stack, more expensive at scale.
- Arize Phoenix. Open-source, OpenTelemetry-native. Strengths: builds on OTel semantic conventions for LLMs (the GenAI working group’s spec), so traces work across multiple backends. Weaknesses: setup is more Otel-flavored — collectors, exporters, semantic attributes; not as plug-and-play for one-script demos.
- OpenLLMetry (Traceloop). OTel instrumentation library, vendor-neutral. Strengths: works with any OTel backend (Datadog, Honeycomb, Jaeger, Langfuse). Weaknesses: you still need a backend; for “just give me a UI” use cases, you end up pairing it with one of the above.
If your stack is LangChain-only and you want minimum config, LangSmith wins. If you want self-hosted and complete control, Langfuse or Phoenix. If you want zero-code via proxy, Helicone. The “traces don’t appear” debugging steps below are most acute on SDK-based tools (Langfuse, LangSmith) because they require explicit flush; proxy-based tools (Helicone) instead suffer from “the proxy isn’t being used.”
Fix 1: Initialize the SDK Correctly
Python:
from langfuse import Langfuse
langfuse = Langfuse(
public_key="pk-lf-...",
secret_key="sk-lf-...",
host="https://cloud.langfuse.com", # or "https://us.cloud.langfuse.com" or your self-hosted URL
)Or via env vars (cleaner):
# .env
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.comfrom langfuse import Langfuse
langfuse = Langfuse() # Reads env varsJS:
import { Langfuse } from "langfuse";
const langfuse = new Langfuse({
publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
secretKey: process.env.LANGFUSE_SECRET_KEY!,
baseUrl: process.env.LANGFUSE_HOST,
});Critical: flush before exit. Traces are batched and sent async. Without a flush, short scripts lose data:
# At the end of your script:
langfuse.flush()// JS:
await langfuse.shutdownAsync();For long-running services (web servers), Langfuse flushes periodically — no explicit flush needed. For batch scripts, lambdas, CLIs — always flush.
Pro Tip: Add a process exit handler:
import atexit
atexit.register(langfuse.flush)Catches the common “forgot to flush” mistake.
Fix 2: Trace Hierarchy — Trace → Span → Generation
The three primary primitives:
- Trace — a request/session, holds the overall context.
- Span — a unit of work within a trace (e.g. “retrieve documents,” “format prompt”).
- Generation — an LLM call (special span type with model, input, output, usage).
# Create a trace:
trace = langfuse.trace(name="answer-question", user_id="user-42")
# Add a span:
span = trace.span(name="retrieve-docs", input={"query": "..."})
# ... do work ...
span.end(output=[{"doc": "..."}])
# Add a generation:
generation = trace.generation(
name="generate-answer",
model="gpt-4o",
input=messages,
output={"role": "assistant", "content": "..."},
usage={
"prompt_tokens": 250,
"completion_tokens": 50,
"total_tokens": 300,
},
model_parameters={"temperature": 0.7},
)For nested spans (an LLM call inside a span inside a trace):
trace = langfuse.trace(name="rag-query")
span = trace.span(name="retrieve")
# ... retrieval logic ...
span.end()
generation = trace.generation(
name="answer",
model="gpt-4o",
input=...,
output=...,
)For deeper nesting, span children:
parent_span = trace.span(name="parent")
child_span = parent_span.span(name="child")
child_span.end()
parent_span.end()Common Mistake: Forgetting .end() on spans. Without end, spans show “in progress” forever and duration is wrong.
For convenient decorators:
from langfuse.decorators import observe
@observe()
def my_function(query: str):
docs = retrieve(query)
return generate(query, docs)
# Every call to my_function creates a trace automatically.Fix 3: LangChain Integration
Pass the CallbackHandler to LangChain operations:
from langfuse.callback import CallbackHandler
from langchain.chat_models import ChatOpenAI
langfuse_handler = CallbackHandler(
public_key="pk-lf-...",
secret_key="sk-lf-...",
host="https://cloud.langfuse.com",
)
llm = ChatOpenAI(model="gpt-4o")
result = llm.invoke("Hello", config={"callbacks": [langfuse_handler]})The handler captures the LLM call as a generation, including prompts, response, and token usage (from OpenAI’s response metadata).
For chains:
from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
chain = ChatPromptTemplate.from_template("Answer: {q}") | llm
result = chain.invoke({"q": "What is Python?"}, config={"callbacks": [langfuse_handler]})The chain’s full structure (each runnable, retrieval steps, LLM calls) traces as a hierarchy.
For LangGraph:
from langfuse.callback import CallbackHandler
config = {"callbacks": [CallbackHandler()]}
result = graph.invoke({"input": "..."}, config=config)LangGraph’s nodes show as nested spans under the graph’s trace.
Common Mistake: Not passing config={"callbacks": [...]}. Without it, LangChain uses the default callback manager (which doesn’t include Langfuse). Always pass per-call or set globally with set_global_handler.
Fix 4: OpenAI / Anthropic Wrappers
The simplest auto-instrumentation: wrap your OpenAI client.
Python:
from langfuse.openai import openai # Drop-in replacement
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
# Logged to Langfuse automatically — no extra code.The langfuse.openai module exports an openai that proxies the real OpenAI SDK and logs every call.
For Anthropic:
from langfuse.anthropic import Anthropic
client = Anthropic(api_key="...")
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)JS:
import OpenAI from "openai";
import { observeOpenAI } from "langfuse";
const openai = observeOpenAI(new OpenAI());
const response = await openai.chat.completions.create({...});These wrappers automatically extract prompt, response, model, and token usage. No manual generation() calls needed.
Pro Tip: Use the wrapper for the bulk of your LLM calls. Reserve manual generation() for cases where the wrapper doesn’t fit (custom protocols, batched calls).
Fix 5: Cost Calculation
Langfuse computes cost from usage + model name. For built-in OpenAI/Anthropic models, costs are pre-defined. For custom models, register them in the Langfuse dashboard:
Dashboard → Settings → Models → Add model
Match pattern: my-custom-model.*
Input price: $0.001 per 1K tokens
Output price: $0.002 per 1K tokensOr via API for self-hosted:
curl -X POST "$LANGFUSE_HOST/api/public/models" \
-u "$LANGFUSE_PUBLIC_KEY:$LANGFUSE_SECRET_KEY" \
-H "content-type: application/json" \
-d '{
"modelName": "my-custom-model",
"matchPattern": "my-custom-model.*",
"inputPrice": 0.000001,
"outputPrice": 0.000002,
"tokenizerModel": "gpt-4",
"unit": "TOKENS"
}'Common Mistake: Logging usage in the wrong field. Langfuse expects prompt_tokens + completion_tokens (OpenAI format). Anthropic’s input_tokens + output_tokens need adaptation:
langfuse.generation(
name="claude-call",
model="claude-3-5-sonnet",
input=...,
output=...,
usage={
"input": response.usage.input_tokens, # Langfuse alias
"output": response.usage.output_tokens,
"total": response.usage.input_tokens + response.usage.output_tokens,
},
)Or use the language-specific keys (input/output for Langfuse, which it maps internally):
Fix 6: Sampling for Cost Control
In production, sample to reduce ingestion volume:
import random
# Sample 10%:
if random.random() < 0.1:
trace = langfuse.trace(...)
# ... full instrumentation
else:
# Run without Langfuse
passOr use Langfuse’s per-trace sampling:
trace = langfuse.trace(
name="...",
sample_rate=0.1, # 10% of traces ingested
)For LangChain:
handler = CallbackHandler(sample_rate=0.1)Sampling is per-trace — once a trace is included, all its spans/generations are. No partial traces.
Pro Tip: Sample at request boundaries (entry points), not at every span. Including a full trace or none keeps the data consistent.
Fix 7: Prompt Management
Manage prompts centrally:
# Fetch the latest version of a prompt:
prompt = langfuse.get_prompt("answer-prompt")
formatted = prompt.compile(question="What is Python?")
# Use `formatted` as the LLM input.Prompts are versioned in Langfuse Dashboard → Prompts. Updates to prompts don’t require code deploys — fetch the latest at runtime.
For caching to avoid hitting Langfuse on every request:
prompt = langfuse.get_prompt("answer-prompt", cache_ttl_seconds=300)
# Cached for 5 minutes.For version pinning:
prompt = langfuse.get_prompt("answer-prompt", version=3)To link traces to prompts:
trace = langfuse.trace(name="...")
generation = trace.generation(
name="...",
prompt=prompt, # Links this generation to the prompt version
model="gpt-4o",
input=...,
output=...,
)Langfuse shows aggregate metrics per prompt version (latency, cost, error rate) — useful for A/B testing prompts.
Fix 8: Self-Hosted Setup
Langfuse is open source and can be self-hosted. Docker Compose:
# docker-compose.yml
services:
db:
image: postgres:16
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
volumes:
- db_data:/var/lib/postgresql/data
langfuse:
image: langfuse/langfuse:latest
depends_on:
- db
ports:
- "3000:3000"
environment:
DATABASE_URL: postgresql://postgres:postgres@db:5432/postgres
NEXTAUTH_SECRET: "your-secret-here"
NEXTAUTH_URL: "http://localhost:3000"
SALT: "your-salt-here"
ENCRYPTION_KEY: "0000000000000000000000000000000000000000000000000000000000000000"
TELEMETRY_ENABLED: "false"
volumes:
db_data:docker compose up -d
# Open http://localhost:3000
# Sign up for the first admin user.Generate the encryption key:
openssl rand -hex 32For production deployment with HA:
- External Postgres (RDS, Cloud SQL, etc.).
- ClickHouse for analytics (newer Langfuse versions).
- Object storage (S3/R2) for large traces.
Point your SDK at the self-hosted URL:
langfuse = Langfuse(host="https://langfuse.example.com")Common Mistake: Forgetting to generate unique NEXTAUTH_SECRET, SALT, ENCRYPTION_KEY. The defaults in docs are placeholders. Generate per-deployment.
Still Not Working?
A few less-obvious failures:
- Traces appear but with wrong project. Multiple API key sets exist. Verify which project your
LANGFUSE_PUBLIC_KEYbelongs to. flush()hangs. Network issue or wrong host. CheckLANGFUSE_HOSTis reachable.- High latency on every request. SDK is sync by default in some configs. Ensure batching is enabled — Langfuse batches every ~1 second by default.
- Some LangChain calls untracked. Specific LangChain components may not call back. Use the
@observe()decorator to wrap them manually. - Token counts wrong. Some models (Anthropic, Bedrock) report differently. Map the response’s usage to Langfuse’s expected format.
- Streaming generations lose output. Stream events fire incrementally; aggregate the final output before logging. The OpenAI wrapper handles this; manual code must accumulate.
- Self-hosted Langfuse out of disk. Traces accumulate. Set retention policies in Dashboard → Settings, or drop old data from Postgres directly.
- CallbackHandler not capturing async ops. For async LangChain, use the async handler form:
from langfuse.callback import AsyncCallbackHandler(older versions) or just passCallbackHandler— recent versions support both. - Lambda/serverless traces missing the last call. The container freezes before the SDK’s background flush thread runs. Explicit
langfuse.flush()at the end of every handler is required;atexitdoes not fire in Lambda. - Traces appear with correct hierarchy but
input/outputare null. You passed raw objects with non-serializable values (file handles, Pydantic models without.model_dump()). Langfuse serializes to JSON before send; non-JSON types silently become null. Convert with.model_dump(),dict(), or a custom serializer first. - Self-hosted Langfuse is slow. On older versions backed only by Postgres, large workspaces degrade as the traces table grows. Recent versions split storage between Postgres (metadata) and ClickHouse (event data). Upgrade and run the migration if you’re on a v2.x release.
For related LLM observability and tracing issues, see LangChain Python not working, LiteLLM not working, OpenAI API not working, and OpenTelemetry not working.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: DSPy Not Working — LM Configuration, Signatures, Modules, Optimizers, and Cache Surprises
How to fix DSPy errors — no LM configured, signature field types, ChainOfThought vs Predict, optimizer (MIPROv2) setup, retrieval module wiring, async usage, and cache invalidation between runs.
Fix: Instructor Not Working — Validation Loops, Mode Mismatch, Streaming, and Anthropic / Gemini Issues
How to fix Python Instructor errors — ValidationError loops, max_retries exhausted, mode=Mode.TOOLS vs JSON, partial streaming type errors, Anthropic and Gemini client patching, token usage tracking.
Fix: LiteLLM Not Working — Model Name Format, API Keys, Streaming, and Fallback Errors
How to fix LiteLLM errors — BadRequestError model not found, missing API key env vars, streaming chunk differences, fallback model not triggering, async drop_params, and proxy server 401.
Fix: LangGraph Not Working — State Errors, Checkpointer Setup, and Cyclic Graph Failures
How to fix LangGraph errors — state not updating between nodes, checkpointer thread_id required, StateGraph compile error, conditional edges not routing, streaming events missing, recursion limit exceeded, and interrupt handling.