Fix: Redis Streams Not Working — Consumer Groups, XACK, Pending Entries, MAXLEN, and Claiming

Q: How do I fix "Redis Streams Not Working — Consumer Groups, XACK, Pending Entries, MAXLEN, and Claiming"?

How to fix Redis Streams errors — XADD/XREAD basics, consumer group XGROUP CREATE, XACK for ack, XPENDING for stuck messages, MAXLEN ~ for trimming, XAUTOCLAIM for redelivery, and Cluster hash slot constraints.

The Error

You add to a stream but readers see nothing:

XADD events * type signup userId 42
"1716240000000-0"

# In another shell:
XREAD COUNT 10 STREAMS events 0
# Returns nil after the first read.

Or messages don’t redistribute among consumers in a group:

XREADGROUP GROUP grp1 worker-a COUNT 10 STREAMS events >
# worker-a gets all messages; worker-b sees nothing.

Or a stream grows forever despite trimming:

XADD events MAXLEN 10000 * field value
XLEN events
# Returns 50000 — way more than 10000.

Or XACK doesn’t clear the pending entries list:

XACK events grp1 1716240000000-0
XPENDING events grp1
# Still shows 100 pending entries.

Why This Happens

Redis Streams is an append-only log with consumer-group semantics, modeled after Kafka but built into Redis. The data structure looks simple from outside — entries with monotonic IDs, append on one end, read from the other — but the operational semantics carry the complexity. Each consumer group tracks its own last-delivered ID, each consumer within a group has its own Pending Entries List, and acknowledgments are explicit. Skip any of that and messages either get lost or never go away.

The most subtle area is the interaction between reads, acks, and idle timeouts. XREADGROUP with > advances the group’s cursor and adds the entry to the consumer’s PEL. The PEL is the source of truth for “in-flight” work; the cursor is just for new deliveries. If a consumer dies mid-process, the PEL still holds the entry — nothing redelivers automatically unless you run XAUTOCLAIM. Treat Streams as a database where you must explicitly mark each row “done”; otherwise it stays forever.

XREAD vs XREADGROUP. XREAD returns messages by ID (any subscriber sees everything). XREADGROUP distributes among group consumers — only one consumer per message.
> vs explicit ID. With XREADGROUP, > means “messages never delivered to this group.” Any other ID means “from the pending list.”
PEL (Pending Entries List). Once XREADGROUP delivers a message, it stays in the consumer’s PEL until XACK or claimed by another consumer. Acks are mandatory; otherwise messages accumulate.
MAXLEN with ~ is approximate. XADD events MAXLEN ~ 10000 * is fast but allows some over-shoot. Exact trimming (MAXLEN 10000) is slower.
Cluster + Streams = same key in one slot. Streams aren’t easily sharded across cluster nodes.

Fix 1: Use XADD / XREAD for Simple Publish/Subscribe

# Add an entry (auto-generated ID with *):
XADD events * type signup userId 42

# Read everything from the start:
XREAD COUNT 100 STREAMS events 0

# Read new entries (blocking):
XREAD BLOCK 5000 COUNT 100 STREAMS events $
# $ means "from the latest ID at the time of the call."

# Read by specific ID range:
XRANGE events - +
XRANGE events 1716240000000 1716250000000

XREAD is fan-out — every consumer that calls it sees the same messages. Suitable for fire-and-forget broadcast.

For Node:

import { createClient } from "redis";

const client = createClient();
await client.connect();

await client.xAdd("events", "*", { type: "signup", userId: "42" });

// Block until new messages arrive:
const messages = await client.xRead(
  { key: "events", id: "$" },
  { BLOCK: 5000, COUNT: 100 },
);

For Python:

import redis

r = redis.Redis()
r.xadd("events", {"type": "signup", "userId": 42})

messages = r.xread({"events": "$"}, block=5000, count=100)

Pro Tip: Use XADD ... NOMKSTREAM to fail if the stream doesn’t exist (avoids accidentally creating streams from typos).

Fix 2: Consumer Groups for Distributed Processing

For load-balanced consumption (one message per consumer in the group):

# Create the group (only once):
XGROUP CREATE events grp1 0 MKSTREAM
# 0 = start from the beginning; $ = only new messages
# MKSTREAM = create the stream if it doesn't exist

# Consumer A:
XREADGROUP GROUP grp1 worker-a COUNT 10 BLOCK 5000 STREAMS events >

# Consumer B (in parallel):
XREADGROUP GROUP grp1 worker-b COUNT 10 BLOCK 5000 STREAMS events >

Each worker gets distinct messages. Redis tracks which group has seen which IDs.

In Node:

// First-time setup (idempotent):
try {
  await client.xGroupCreate("events", "grp1", "0", { MKSTREAM: true });
} catch (e) {
  if (!e.message.includes("BUSYGROUP")) throw e;
}

// Consumer loop:
async function consume(workerId: string) {
  while (true) {
    const result = await client.xReadGroup(
      "grp1",
      workerId,
      { key: "events", id: ">" },
      { BLOCK: 5000, COUNT: 10 },
    );
    
    if (!result) continue;
    
    for (const { messages } of result) {
      for (const { id, message } of messages) {
        try {
          await process(message);
          await client.xAck("events", "grp1", id);
        } catch (err) {
          console.error("failed to process:", id, err);
          // Don't ack — will be redelivered or claimed later.
        }
      }
    }
  }
}

Common Mistake: Calling XGROUP CREATE on every startup without catching BUSYGROUP. The group already exists; the error is fatal. Catch and ignore.

Fix 3: ACK and PEL Management

After XREADGROUP, messages sit in your consumer’s Pending Entries List (PEL) until acked:

# Inspect the PEL:
XPENDING events grp1
# Shows: total count, min/max ID, consumer counts

XPENDING events grp1 - + 10
# Shows up to 10 pending entries with details

Without XACK, the PEL grows. Memory leak.

After successful processing:

XACK events grp1 1716240000000-0
# Returns 1 if acked, 0 if the message wasn't in the PEL (already acked or never delivered to this group).

For batch ack:

XACK events grp1 1716240000000-0 1716240000001-0 1716240000002-0
# Acks multiple at once.

Pro Tip: Always ack inside a try/finally or use exception-safe wrappers. A crash between processing and ack leaves the message in PEL for redelivery — that’s the at-least-once delivery guarantee in action.

Fix 4: MAXLEN and Trimming

Append with trim:

# Approximate trim (fast, may overshoot slightly):
XADD events MAXLEN ~ 10000 * field value

# Exact trim (slower, hits a precise limit):
XADD events MAXLEN 10000 * field value

# Cap by minimum ID (delete everything older than X):
XADD events MINID ~ 1716200000000 * field value

~ allows the trim to happen in whole macro-nodes (faster). = (or no symbol) forces exact trimming.

For periodic trim outside XADD:

XTRIM events MAXLEN ~ 10000
XTRIM events MINID 1716200000000

Run via a cron job for streams that grow during quiet periods.

For unbounded growth diagnostics:

XLEN events
XINFO STREAM events

XINFO STREAM shows total length, last-delivered ID per group, and approximate radix tree size.

Common Mistake: Setting MAXLEN per XADD but only ~ approximate, then complaining about overshoot. Switch to = for strict caps, or run a separate XTRIM periodically.

Fix 5: Claim Stalled Messages

If a consumer crashes mid-process, its PEL entries are stuck. XAUTOCLAIM (or older XPENDING + XCLAIM) reassigns them:

# Auto-claim messages idle > 60000ms (1 minute):
XAUTOCLAIM events grp1 worker-b 60000 0 COUNT 10
# 0 = start scanning from ID 0
# Returns up to 10 messages claimed by worker-b

In a loop:

async function claimStalled(workerId: string) {
  const result = await client.xAutoClaim("events", "grp1", workerId, 60000, "0", {
    COUNT: 10,
  });
  
  for (const { id, message } of result.messages) {
    try {
      await process(message);
      await client.xAck("events", "grp1", id);
    } catch (err) {
      // Will be auto-claimed again next time.
    }
  }
}

// Run claimStalled periodically alongside the main consumer loop.

XAUTOCLAIM does atomically what XPENDING + XCLAIM would do manually — and is much simpler. Available in Redis 6.2+.

For older Redis:

XPENDING events grp1 - + 10
# Returns list of pending entries with idle times.

XCLAIM events grp1 worker-b 60000 1716240000000-0
# Move that specific entry to worker-b.

Pro Tip: Run XAUTOCLAIM on a dedicated worker or in the main loop with low frequency (every 10-30 seconds). Avoid running it from many workers concurrently — they’ll fight over claims.

Fix 6: Dead Letter Queue Pattern

For messages that fail repeatedly, route to a DLQ:

async function processWithDLQ(id: string, message: Record<string, string>, retryCount: number) {
  try {
    await actuallyProcess(message);
    await client.xAck("events", "grp1", id);
  } catch (err) {
    if (retryCount >= 3) {
      // Move to DLQ:
      await client.xAdd("events:dlq", "*", {
        ...message,
        original_id: id,
        error: String(err),
        retries: String(retryCount),
      });
      await client.xAck("events", "grp1", id);
    }
    // Else: don't ack, let it be redelivered.
  }
}

Track retry counts via XPENDING delivery count:

XPENDING events grp1 - + 10 worker-a
# Each row includes: id, consumer, idle_time, delivery_count

After 3+ deliveries, your code decides to give up.

Fix 7: Cluster Constraints

Redis Cluster shards by key hash slot. Streams are a single key — events lives in one slot, on one master. You can’t split a stream across nodes.

For scaling beyond one node’s capacity:

Multiple streams. Hash partition application-side: events:{user_id % 16}.
Curly braces {hash_tag} to keep related keys on the same slot if needed.

const userId = 42;
const streamKey = `events:{${userId % 16}}`;  // {bracketed} part is the hash tag
await client.xAdd(streamKey, "*", { userId: String(userId), ... });

This sharding pattern produces 16 streams, each on potentially different cluster nodes. Consumers subscribe to all 16.

Common Mistake: Trying to use a single stream beyond ~50K msgs/sec on one node. Even fast Redis hits limits — partition by key.

Fix 8: Memory and Operational Concerns

Streams accumulate data. Without trimming, a busy stream can fill all Redis memory:

# Memory used by a stream:
MEMORY USAGE events

For tight memory budgets:

Approximate MAXLEN on every XADD. Cheap; keeps growth bounded.
Periodic XTRIM with MINID. Time-based — keep last 24 hours.
Persistence (AOF / RDB). Streams are persisted like other data; massive streams slow down BGSAVE.

For production:

# In redis.conf:
maxmemory 8gb
maxmemory-policy noeviction   # Or volatile-lru, allkeys-lru depending on use case.

noeviction makes Redis return errors on memory exhaustion rather than silently dropping data — safer for streams where eviction would corrupt the log.

Pro Tip: Use a separate Redis instance for high-volume streams. Mixing cache (eviction-tolerant) and streams (eviction-intolerant) in one Redis instance forces uncomfortable trade-offs.

Redis Streams vs Other Message Log Primitives

Redis Streams gets compared to Kafka constantly, but the design space includes NATS JetStream, RabbitMQ streams, and Apache Pulsar — each with different operational profiles. Picking based on “we already run Redis” is fine until throughput or retention requirements outgrow it.

Redis Streams. Already-deployed Redis, sub-millisecond latency, simple consumer-group model. Single-key-per-slot in Cluster limits a single stream to one node’s capacity (roughly 50-100K msgs/sec). Persistence via AOF/RDB; no native tiered storage. Sweet spot: app-internal job queues, presence, real-time fanout under 1M msgs/sec.

Apache Kafka. Partitioned log with disk-first design. Topics scale horizontally by partition count; consumers parallelize across partitions. Higher operational complexity (ZooKeeper or KRaft, brokers, schema registry), but proven at millions of msgs/sec. Long retention is cheap with tiered storage to S3. Sweet spot: org-wide event backbone, log aggregation, stream processing with ksqlDB/Flink.

NATS JetStream. Lightweight Go binary, KV + stream + object store in one. Single-binary deployment, multi-region replication via leaf nodes. Acks, consumer groups, and replay all built in. Sweet spot: microservice mesh where you want pub/sub plus durable streams without standing up Kafka.

RabbitMQ Streams. Streams added in 3.9+ as a complement to classic queues. Plays well with existing AMQP infrastructure. Lower throughput ceiling than Kafka but simpler ops if you already run RabbitMQ. Sweet spot: existing RabbitMQ shops that want log semantics without migrating.

Apache Pulsar. Brokers separated from storage (BookKeeper), tiered storage to S3 native, multi-tenancy built in. Higher cognitive load, but the cleanest scaling story for huge multi-team deployments. Sweet spot: large orgs with cross-team data and strict tenancy isolation requirements.

Rough decision rule: under 100K msgs/sec and Redis is already on the stack → Streams. Org-wide event bus with multi-day retention → Kafka. Microservice mesh wanting one tool for KV + streams → JetStream. Multi-tenant SaaS platform → Pulsar.

Still Not Working?

A few less-obvious failures:

BUSYGROUP Consumer Group name already exists. Catch this — the group exists from a previous run. Not an error.
NOGROUP No such key 'events' or consumer group 'grp1' in XREADGROUP. Stream or group missing. Run XGROUP CREATE ... MKSTREAM.
XREADGROUP returns nothing. Either no new messages (with >) or your group has consumed everything. Check XPENDING for stuck entries.
Memory usage grows even with MAXLEN. Approximate trim allows some overshoot. Use = (exact) periodically or a smaller threshold.
Producer faster than consumer. Backlog grows. Either add consumers, faster processing, or accept lag.
Different libraries return different formats. node-redis v4 differs from ioredis. Always check your client’s docs.
Crash recovery loses recent messages. AOF with appendfsync everysec may lose up to 1 second. For critical streams, use appendfsync always (slower) or external persistence (Kafka).
XADD with explicit ID fails. XADD events 123-0 * — the ID must be greater than the stream’s last ID. Use * to auto-generate or pick a higher ID.
PEL grows after consumer rename. Each unique consumer name keeps its own PEL even after the worker is gone. Use XGROUP DELCONSUMER events grp1 worker-old to clean up.
XINFO CONSUMERS shows ghost workers. Same root cause — consumers persist by name. Periodically delete consumers idle for hours/days.
min-idle-time mismatch in XAUTOCLAIM. The threshold compares against the entry’s idle time, not when the consumer last responded. Set it long enough that healthy workers don’t get their work stolen.

For related Redis and event streaming issues, see Redis connection refused, Redis pub sub not working, Valkey not working, and Kafka not working.