Fix: Langfuse Not Working — SDK Init, Tracing Generations, LangChain Wrapper, and Self-Hosted Setup

Q: How do I fix "Langfuse Not Working — SDK Init, Tracing Generations, LangChain Wrapper, and Self-Hosted Setup"?

How to fix Langfuse errors — Python/JS SDK init, trace/span/generation hierarchy, LangChain CallbackHandler, OpenAI wrapper, missing usage/cost data, prompt management, and self-hosted Postgres setup.

The Error

You initialize Langfuse but no traces appear in the dashboard:

from langfuse import Langfuse
langfuse = Langfuse()

# Make some LLM calls...
# Dashboard stays empty.

Or the LangChain integration doesn’t attach traces:

from langfuse.callback import CallbackHandler
handler = CallbackHandler()

result = llm.invoke("Hello", config={"callbacks": [handler]})
# No trace shows in Langfuse.

Or generations have no token usage or cost:

langfuse.generation(
    name="gpt-4o-call",
    model="gpt-4o",
    input=...,
    output=...,
    # usage missing!
)

Or self-hosted Langfuse can’t connect to Postgres:

Error: Connection refused at db:5432

Why This Happens

Langfuse is an open-source LLM observability platform. It ingests traces (a call), spans (units of work in a trace), and generations (LLM calls) via SDKs and ships them to a backend (cloud-hosted or self-hosted) where they’re stored, indexed, and made queryable through a UI. The architecture is deliberately fire-and-forget: SDKs batch events in memory and flush them in the background so that adding observability never adds latency to the user’s request. That decoupling is the source of most “I don’t see my traces” reports — the SDK accepted the data but the process exited before the batch was sent.

Common root causes. SDK initialization order. SDKs are async. If your process exits before traces flush, data is lost. Always flush() before exit. Env vars vs constructor args. SDKs read LANGFUSE_SECRET_KEY / LANGFUSE_PUBLIC_KEY / LANGFUSE_HOST. Wrong host (defaults to cloud) or wrong keys silently send to /dev/null because the SDK doesn’t error on auth failure — the failed requests sit in a retry queue and disappear when the process exits. LangChain integration is a callback. You must pass the CallbackHandler as a callback when invoking chains/LLMs. Without it, no instrumentation. Token usage requires explicit data (or use the OpenAI wrapper). Manual generations need usage filled in — Langfuse won’t auto-compute. If you log a generation with output but no usage, the trace appears but cost/token metrics are blank.

A subtler failure mode is project/key mismatch. Langfuse organizes data into projects, and each project has its own public/secret key pair. If you generate one set of keys, copy them to a teammate, and they later create a second project, dashboards split: half your team’s traces go to project A, half to project B, and neither view shows the full picture. Always confirm the project ID in the dashboard URL matches the project you intended.

How Other Tools Handle This

LLM observability is now a crowded category. Each tool makes different choices about what to capture, where the data lives, and how invasive the instrumentation is.

Langfuse. Open-source backend plus Python/JS SDKs. Trace/span/generation primitives; OpenAI and Anthropic auto-instrumentation wrappers; LangChain CallbackHandler; built-in prompt management and evals. Strengths: self-hostable on Postgres + ClickHouse, predictable pricing, evals and prompt versioning in one product. Weaknesses: SDK still requires explicit flush() for short-lived processes; usage normalization across providers is your job.
Helicone. Proxy-based. You change base_url to Helicone’s URL and every OpenAI call flows through it. Strengths: zero-code instrumentation if you control the client config, automatic caching, rate-limit shimming. Weaknesses: adds a network hop, harder to use with non-HTTP providers (Bedrock SDK doesn’t proxy easily), only sees what the proxy sees.
LangSmith. LangChain’s first-party observability tool. Tightest integration with LangChain primitives (Runnables, LangGraph). Strengths: best-in-class for LangChain users, integrated evals, no extra config when you already use LangChain. Weaknesses: hosted-only, opinionated toward the LangChain stack, more expensive at scale.
Arize Phoenix. Open-source, OpenTelemetry-native. Strengths: builds on OTel semantic conventions for LLMs (the GenAI working group’s spec), so traces work across multiple backends. Weaknesses: setup is more Otel-flavored — collectors, exporters, semantic attributes; not as plug-and-play for one-script demos.
OpenLLMetry (Traceloop). OTel instrumentation library, vendor-neutral. Strengths: works with any OTel backend (Datadog, Honeycomb, Jaeger, Langfuse). Weaknesses: you still need a backend; for “just give me a UI” use cases, you end up pairing it with one of the above.

If your stack is LangChain-only and you want minimum config, LangSmith wins. If you want self-hosted and complete control, Langfuse or Phoenix. If you want zero-code via proxy, Helicone. The “traces don’t appear” debugging steps below are most acute on SDK-based tools (Langfuse, LangSmith) because they require explicit flush; proxy-based tools (Helicone) instead suffer from “the proxy isn’t being used.”

Fix 1: Initialize the SDK Correctly

Python:

from langfuse import Langfuse

langfuse = Langfuse(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="https://cloud.langfuse.com",  # or "https://us.cloud.langfuse.com" or your self-hosted URL
)

Or via env vars (cleaner):

# .env
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com

from langfuse import Langfuse
langfuse = Langfuse()  # Reads env vars

JS:

import { Langfuse } from "langfuse";

const langfuse = new Langfuse({
  publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
  secretKey: process.env.LANGFUSE_SECRET_KEY!,
  baseUrl: process.env.LANGFUSE_HOST,
});

Critical: flush before exit. Traces are batched and sent async. Without a flush, short scripts lose data:

# At the end of your script:
langfuse.flush()

// JS:
await langfuse.shutdownAsync();

For long-running services (web servers), Langfuse flushes periodically — no explicit flush needed. For batch scripts, lambdas, CLIs — always flush.

Pro Tip: Add a process exit handler:

import atexit
atexit.register(langfuse.flush)

Catches the common “forgot to flush” mistake.

Fix 2: Trace Hierarchy — Trace → Span → Generation

The three primary primitives:

Trace — a request/session, holds the overall context.
Span — a unit of work within a trace (e.g. “retrieve documents,” “format prompt”).
Generation — an LLM call (special span type with model, input, output, usage).

# Create a trace:
trace = langfuse.trace(name="answer-question", user_id="user-42")

# Add a span:
span = trace.span(name="retrieve-docs", input={"query": "..."})
# ... do work ...
span.end(output=[{"doc": "..."}])

# Add a generation:
generation = trace.generation(
    name="generate-answer",
    model="gpt-4o",
    input=messages,
    output={"role": "assistant", "content": "..."},
    usage={
        "prompt_tokens": 250,
        "completion_tokens": 50,
        "total_tokens": 300,
    },
    model_parameters={"temperature": 0.7},
)

For nested spans (an LLM call inside a span inside a trace):

trace = langfuse.trace(name="rag-query")

span = trace.span(name="retrieve")
# ... retrieval logic ...
span.end()

generation = trace.generation(
    name="answer",
    model="gpt-4o",
    input=...,
    output=...,
)

For deeper nesting, span children:

parent_span = trace.span(name="parent")
child_span = parent_span.span(name="child")
child_span.end()
parent_span.end()

Common Mistake: Forgetting .end() on spans. Without end, spans show “in progress” forever and duration is wrong.

For convenient decorators:

from langfuse.decorators import observe

@observe()
def my_function(query: str):
    docs = retrieve(query)
    return generate(query, docs)

# Every call to my_function creates a trace automatically.

Fix 3: LangChain Integration

Pass the CallbackHandler to LangChain operations:

from langfuse.callback import CallbackHandler
from langchain.chat_models import ChatOpenAI

langfuse_handler = CallbackHandler(
    public_key="pk-lf-...",
    secret_key="sk-lf-...",
    host="https://cloud.langfuse.com",
)

llm = ChatOpenAI(model="gpt-4o")
result = llm.invoke("Hello", config={"callbacks": [langfuse_handler]})

The handler captures the LLM call as a generation, including prompts, response, and token usage (from OpenAI’s response metadata).

For chains:

from langchain.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

chain = ChatPromptTemplate.from_template("Answer: {q}") | llm

result = chain.invoke({"q": "What is Python?"}, config={"callbacks": [langfuse_handler]})

The chain’s full structure (each runnable, retrieval steps, LLM calls) traces as a hierarchy.

For LangGraph:

from langfuse.callback import CallbackHandler

config = {"callbacks": [CallbackHandler()]}
result = graph.invoke({"input": "..."}, config=config)

LangGraph’s nodes show as nested spans under the graph’s trace.

Common Mistake: Not passing config={"callbacks": [...]}. Without it, LangChain uses the default callback manager (which doesn’t include Langfuse). Always pass per-call or set globally with set_global_handler.

Fix 4: OpenAI / Anthropic Wrappers

The simplest auto-instrumentation: wrap your OpenAI client.

Python:

from langfuse.openai import openai  # Drop-in replacement

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
# Logged to Langfuse automatically — no extra code.

The langfuse.openai module exports an openai that proxies the real OpenAI SDK and logs every call.

For Anthropic:

from langfuse.anthropic import Anthropic
client = Anthropic(api_key="...")

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)

JS:

import OpenAI from "openai";
import { observeOpenAI } from "langfuse";

const openai = observeOpenAI(new OpenAI());
const response = await openai.chat.completions.create({...});

These wrappers automatically extract prompt, response, model, and token usage. No manual generation() calls needed.

Pro Tip: Use the wrapper for the bulk of your LLM calls. Reserve manual generation() for cases where the wrapper doesn’t fit (custom protocols, batched calls).

Fix 5: Cost Calculation

Langfuse computes cost from usage + model name. For built-in OpenAI/Anthropic models, costs are pre-defined. For custom models, register them in the Langfuse dashboard:

Dashboard → Settings → Models → Add model
  Match pattern: my-custom-model.*
  Input price: $0.001 per 1K tokens
  Output price: $0.002 per 1K tokens

Or via API for self-hosted:

curl -X POST "$LANGFUSE_HOST/api/public/models" \
  -u "$LANGFUSE_PUBLIC_KEY:$LANGFUSE_SECRET_KEY" \
  -H "content-type: application/json" \
  -d '{
    "modelName": "my-custom-model",
    "matchPattern": "my-custom-model.*",
    "inputPrice": 0.000001,
    "outputPrice": 0.000002,
    "tokenizerModel": "gpt-4",
    "unit": "TOKENS"
  }'

Common Mistake: Logging usage in the wrong field. Langfuse expects prompt_tokens + completion_tokens (OpenAI format). Anthropic’s input_tokens + output_tokens need adaptation:

langfuse.generation(
    name="claude-call",
    model="claude-3-5-sonnet",
    input=...,
    output=...,
    usage={
        "input": response.usage.input_tokens,    # Langfuse alias
        "output": response.usage.output_tokens,
        "total": response.usage.input_tokens + response.usage.output_tokens,
    },
)

Or use the language-specific keys (input/output for Langfuse, which it maps internally):

Fix 6: Sampling for Cost Control

In production, sample to reduce ingestion volume:

import random

# Sample 10%:
if random.random() < 0.1:
    trace = langfuse.trace(...)
    # ... full instrumentation
else:
    # Run without Langfuse
    pass

Or use Langfuse’s per-trace sampling:

trace = langfuse.trace(
    name="...",
    sample_rate=0.1,  # 10% of traces ingested
)

For LangChain:

handler = CallbackHandler(sample_rate=0.1)

Sampling is per-trace — once a trace is included, all its spans/generations are. No partial traces.

Pro Tip: Sample at request boundaries (entry points), not at every span. Including a full trace or none keeps the data consistent.

Fix 7: Prompt Management

Manage prompts centrally:

# Fetch the latest version of a prompt:
prompt = langfuse.get_prompt("answer-prompt")
formatted = prompt.compile(question="What is Python?")
# Use `formatted` as the LLM input.

Prompts are versioned in Langfuse Dashboard → Prompts. Updates to prompts don’t require code deploys — fetch the latest at runtime.

For caching to avoid hitting Langfuse on every request:

prompt = langfuse.get_prompt("answer-prompt", cache_ttl_seconds=300)
# Cached for 5 minutes.

For version pinning:

prompt = langfuse.get_prompt("answer-prompt", version=3)

To link traces to prompts:

trace = langfuse.trace(name="...")
generation = trace.generation(
    name="...",
    prompt=prompt,  # Links this generation to the prompt version
    model="gpt-4o",
    input=...,
    output=...,
)

Langfuse shows aggregate metrics per prompt version (latency, cost, error rate) — useful for A/B testing prompts.

Fix 8: Self-Hosted Setup

Langfuse is open source and can be self-hosted. Docker Compose:

# docker-compose.yml
services:
  db:
    image: postgres:16
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: postgres
    volumes:
      - db_data:/var/lib/postgresql/data

  langfuse:
    image: langfuse/langfuse:latest
    depends_on:
      - db
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgresql://postgres:postgres@db:5432/postgres
      NEXTAUTH_SECRET: "your-secret-here"
      NEXTAUTH_URL: "http://localhost:3000"
      SALT: "your-salt-here"
      ENCRYPTION_KEY: "0000000000000000000000000000000000000000000000000000000000000000"
      TELEMETRY_ENABLED: "false"

volumes:
  db_data:

docker compose up -d
# Open http://localhost:3000
# Sign up for the first admin user.

Generate the encryption key:

openssl rand -hex 32

For production deployment with HA:

External Postgres (RDS, Cloud SQL, etc.).
ClickHouse for analytics (newer Langfuse versions).
Object storage (S3/R2) for large traces.

Point your SDK at the self-hosted URL:

langfuse = Langfuse(host="https://langfuse.example.com")

Common Mistake: Forgetting to generate unique NEXTAUTH_SECRET, SALT, ENCRYPTION_KEY. The defaults in docs are placeholders. Generate per-deployment.

Still Not Working?

A few less-obvious failures:

Traces appear but with wrong project. Multiple API key sets exist. Verify which project your LANGFUSE_PUBLIC_KEY belongs to.
flush() hangs. Network issue or wrong host. Check LANGFUSE_HOST is reachable.
High latency on every request. SDK is sync by default in some configs. Ensure batching is enabled — Langfuse batches every ~1 second by default.
Some LangChain calls untracked. Specific LangChain components may not call back. Use the @observe() decorator to wrap them manually.
Token counts wrong. Some models (Anthropic, Bedrock) report differently. Map the response’s usage to Langfuse’s expected format.
Streaming generations lose output. Stream events fire incrementally; aggregate the final output before logging. The OpenAI wrapper handles this; manual code must accumulate.
Self-hosted Langfuse out of disk. Traces accumulate. Set retention policies in Dashboard → Settings, or drop old data from Postgres directly.
CallbackHandler not capturing async ops. For async LangChain, use the async handler form: from langfuse.callback import AsyncCallbackHandler (older versions) or just pass CallbackHandler — recent versions support both.
Lambda/serverless traces missing the last call. The container freezes before the SDK’s background flush thread runs. Explicit langfuse.flush() at the end of every handler is required; atexit does not fire in Lambda.
Traces appear with correct hierarchy but input/output are null. You passed raw objects with non-serializable values (file handles, Pydantic models without .model_dump()). Langfuse serializes to JSON before send; non-JSON types silently become null. Convert with .model_dump(), dict(), or a custom serializer first.
Self-hosted Langfuse is slow. On older versions backed only by Postgres, large workspaces degrade as the traces table grows. Recent versions split storage between Postgres (metadata) and ClickHouse (event data). Upgrade and run the migration if you’re on a v2.x release.

For related LLM observability and tracing issues, see LangChain Python not working, LiteLLM not working, OpenAI API not working, and OpenTelemetry not working.