Skip to content

Fix: NATS Not Working — Connection Auth, JetStream Streams, Consumer Ack, and Subject Wildcards

FixDevs · (Updated: )

Part of:  Docker, DevOps & Infrastructure

Quick Answer

How to fix NATS errors — no responders to request, JetStream stream not found, consumer redelivery loop, durable vs ephemeral consumers, subject wildcard mismatch, TLS auth setup, and KV bucket basics.

The Error

You call nats.request and get this:

nats: no responders available for request

Or your JetStream publish fails because the stream doesn’t exist:

nats: stream not found: ORDERS

Or the consumer keeps redelivering the same message:

[consumer] received msg seq=1
[consumer] received msg seq=1   ← same one again, 30 seconds later
[consumer] received msg seq=1

Or a wildcard subscription matches nothing:

await nc.subscribe("orders.*.processed", cb=handle)
# Publishing to "orders.us.east.processed" → no delivery.

Why This Happens

NATS has two layers and most issues come from mixing them up:

  • Core NATS — fire-and-forget pub/sub and request/reply. No persistence. A request without a subscriber gets “no responders.” Subscribers come and go; nothing stored.
  • JetStream — persistent messaging built on top. Streams capture messages by subject pattern. Consumers (durable or ephemeral) read from streams with ack semantics.

Other common pitfalls:

  • Subject wildcards. * matches one token; > matches one or more. orders.* matches orders.a but not orders.a.b. orders.> matches both.
  • Ack timeouts. A consumer that doesn’t ack within AckWait (default 30s) gets the message redelivered. Long-running handlers must extend the deadline or ack early.
  • Durable vs ephemeral consumers. Durable consumers survive client restarts and resume from their last acked position. Ephemerals are tied to a connection and disappear with it.
  • TLS and auth. Bare nats://... is unencrypted and unauthenticated. Production needs tls:// or nats://user:pass@... or NKeys/JWT.

The deeper architectural reason for so many “messages disappear” reports is that Core NATS was originally designed as a real-time bus — closer in spirit to UDP multicast than to Kafka. If nobody is listening, the message is dropped on the floor. JetStream was added in 2020 specifically to bolt persistence onto that bus, but it lives as a layer above the wire protocol rather than inside it. The result is that the same client can publish to a JetStream-backed subject (durable) and a Core NATS subject (not durable) in the same line of code; only the subject configuration determines which path the message takes. Teams who treat NATS as “always durable like Kafka” eventually lose a message and only then discover the layering.

JetStream’s ack semantics are also subtly different from RabbitMQ’s. In RabbitMQ, an unacked message is held by the broker until the consumer’s channel closes; in JetStream, an unacked message is redelivered after AckWait, which defaults to 30 seconds. A consumer that processes a long-running job — image transcoding, ML inference, anything pushing past 30s — sees the same message twice (then four times, then eight) and may build a feedback loop where the same work is repeated forever. The fix is either msg.in_progress() to extend the deadline, or a longer ack_wait on the consumer config; either way, the default is calibrated for fast handlers, not for batch jobs.

Fix 1: “No Responders” Means No Subscriber

Request/reply needs at least one subscriber:

# Subscriber side:
import asyncio
import nats

async def main():
    nc = await nats.connect("nats://localhost:4222")
    
    async def handler(msg):
        await msg.respond(b"pong")
    
    await nc.subscribe("ping", cb=handler)
    await asyncio.sleep(3600)  # Stay alive

asyncio.run(main())
# Requester side:
nc = await nats.connect("nats://localhost:4222")
try:
    response = await nc.request("ping", b"", timeout=1.0)
    print(response.data)
except nats.errors.NoRespondersError:
    print("No subscriber on 'ping'")

Check both sides are connected to the same NATS server and using the same subject. From a NATS CLI:

nats sub "ping"          # Listen
nats req "ping" ""       # Send

Pro Tip: For mission-critical request/reply patterns, never rely on a single responder being online. Use JetStream with consumer pulls so messages queue when no one is listening.

Fix 2: Create the JetStream Stream Before Publishing

JetStream needs explicit stream creation:

import nats
from nats.js.api import StreamConfig, RetentionPolicy

async def setup():
    nc = await nats.connect("nats://localhost:4222")
    js = nc.jetstream()
    
    await js.add_stream(
        name="ORDERS",
        subjects=["orders.>"],
        retention=RetentionPolicy.LIMITS,
        max_msgs=1_000_000,
        max_bytes=10 * 1024**3,   # 10 GB
        max_age=7 * 24 * 3600,    # 7 days, in seconds
    )

Or via CLI (idempotent — safe to re-run):

nats stream add ORDERS \
    --subjects "orders.>" \
    --retention limits \
    --max-msgs 1000000 \
    --max-age 7d \
    --storage file

Subjects use > to capture everything under orders.. Publishing to orders.us.placed, orders.eu.shipped, etc. all land in this stream.

Common Mistake: Adding a stream with subjects orders (no wildcard) and wondering why orders.us.placed doesn’t get captured. Use orders.> for “everything under orders.”

Fix 3: Use Durable Consumers for Reliable Processing

Durable consumers survive restarts. Define one with explicit ack:

from nats.js.api import ConsumerConfig, AckPolicy, DeliverPolicy

await js.add_consumer(
    "ORDERS",
    config=ConsumerConfig(
        durable_name="ORDER_PROCESSOR",
        ack_policy=AckPolicy.EXPLICIT,
        deliver_policy=DeliverPolicy.ALL,
        max_deliver=5,
        ack_wait=60,  # seconds
    ),
)

Then pull or push:

# Pull-based (recommended for backpressure control):
sub = await js.pull_subscribe("orders.>", "ORDER_PROCESSOR", stream="ORDERS")

while True:
    try:
        msgs = await sub.fetch(batch=10, timeout=5)
        for msg in msgs:
            try:
                process(msg.data)
                await msg.ack()
            except TransientError:
                await msg.nak(delay=5)  # Retry in 5s
            except PermanentError:
                await msg.term()  # Don't retry
    except asyncio.TimeoutError:
        continue

Three ack outcomes:

  • ack() — successfully processed. Won’t be redelivered.
  • nak(delay=N) — failed transiently. Redeliver after N seconds.
  • term() — failed permanently. Don’t redeliver. (Counts against max_deliver differently — terminated messages move to the consumer’s discard count.)
  • in_progress() — still working. Extends the ack deadline.

Pro Tip: For handlers that take longer than ack_wait, call msg.in_progress() periodically to extend the deadline. Otherwise NATS redelivers thinking the handler died.

Fix 4: Subject Wildcards — * vs >

Two wildcard tokens:

  • * matches exactly one token. orders.*.placed matches orders.us.placed but not orders.us.east.placed.
  • > matches one or more tokens, only at the end. orders.> matches orders.us, orders.us.east, orders.us.east.placed, etc.
# Match all order events from any region (one-deep):
await nc.subscribe("orders.*.placed", cb=...)
# Matches: orders.us.placed, orders.eu.placed
# Doesn't match: orders.us.east.placed

# Match every order event:
await nc.subscribe("orders.>", cb=...)
# Matches all of the above.

For JetStream stream subjects, use the same wildcard rules. A stream with subjects=["orders.>"] captures everything under orders..

Common Mistake: Trying to use > in the middle: orders.>.placed is invalid. > must be the final token.

Fix 5: Connection Auth and TLS

Bare nats:// connections work for local dev but are unencrypted and unauthenticated. For production:

User/password:

nc = await nats.connect(
    "nats://user:[email protected]:4222",
)

Token:

nc = await nats.connect(
    "nats://nats.example.com:4222",
    token="s3cr3t",
)

NKey + JWT (recommended for prod):

nc = await nats.connect(
    "tls://nats.example.com:4222",
    user_credentials="./user.creds",
)

user.creds is the file you get from nsc generate creds. It contains both the NKey seed and the signed JWT.

TLS:

import ssl

ctx = ssl.create_default_context()
nc = await nats.connect(
    "tls://nats.example.com:4222",
    tls=ctx,
)

For mutual TLS (client cert auth):

ctx = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile="ca.pem")
ctx.load_cert_chain(certfile="client.pem", keyfile="client.key")
nc = await nats.connect("tls://nats.example.com:4222", tls=ctx)

Fix 6: Key-Value (KV) Buckets

JetStream KV is a key-value store built on streams. Common API:

js = nc.jetstream()

# Create / open a bucket:
kv = await js.create_key_value(bucket="config", history=5, ttl=3600)

# Set / get:
await kv.put("api.endpoint", b"https://api.example.com")
entry = await kv.get("api.endpoint")
print(entry.value)  # b"https://api.example.com"

# Watch for changes:
async for entry in await kv.watchall():
    print(entry.key, entry.value)

KV buckets are a thin layer over streams — under the hood, kv.put("key", value) is js.publish("$KV.config.key", value). Useful for:

  • Feature flags
  • Runtime config
  • Service discovery
  • Distributed locks (via CAS operations)

Note: KV is not a high-throughput cache. For per-request caching, use Redis. NATS KV shines for low-volume, consistency-sensitive state that multiple services need to watch.

Fix 7: Reconnection and max_reconnect_attempts

NATS clients auto-reconnect by default. Tune:

nc = await nats.connect(
    servers=["nats://1.example.com:4222", "nats://2.example.com:4222"],
    max_reconnect_attempts=-1,  # Infinite
    reconnect_time_wait=2.0,    # 2s between attempts
    error_cb=on_error,
    closed_cb=on_closed,
    reconnected_cb=on_reconnected,
)

Pass multiple servers — the client probes them in order during initial connect and shuffles for reconnects.

Handle reconnection events:

async def on_disconnected():
    print("disconnected")

async def on_reconnected():
    print(f"reconnected to {nc.connected_url.netloc}")

nc = await nats.connect(
    "nats://...",
    disconnected_cb=on_disconnected,
    reconnected_cb=on_reconnected,
)

Common Mistake: Treating reconnects as errors. For long-lived clients, disconnect/reconnect is normal — handle it gracefully. Only closed_cb (terminal close, often after exhausting reconnects) is an actual failure.

Fix 8: Monitor With the NATS CLI

The nats CLI is invaluable for debugging:

# Server health:
nats server check connection

# Stream stats:
nats stream info ORDERS

# Consumer pending messages, redelivery counts:
nats consumer info ORDERS ORDER_PROCESSOR

# Tail a subject:
nats sub "orders.>"

# Publish a test message:
nats pub orders.us.placed '{"id":1}'

# Inspect KV bucket:
nats kv get config api.endpoint
nats kv ls config

For continuous monitoring, expose /varz, /connz, /jsz, /healthz endpoints from your NATS server:

curl http://localhost:8222/varz | jq
curl http://localhost:8222/jsz?accounts=true | jq

The /jsz endpoint shows JetStream usage per account, stream sizes, consumer lag — feed it to Prometheus via the prometheus-nats-exporter.

Version History and Tooling Context

NATS reached version 2.0 in 2019 with a fundamentally new server. The features users actually depend on landed across a series of point releases since then, and the choice of feature dictates the minimum server version you can deploy against:

  • NATS 2.2 (March 2021) introduced JetStream as a general-availability feature. Before this, persistence was provided by a separate NATS Streaming Server (now deprecated). Code written against NATS Streaming uses a different API entirely and won’t talk to JetStream.
  • NATS 2.9 (October 2022) stabilized Key-Value (KV) and Object Store. Earlier versions had a beta KV layer with a slightly different API; production deployments should be on 2.9 or later for KV.
  • NATS 2.10 (September 2023) added stream-level clustering improvements, subject-mapping per cluster, and the replicas argument on consumers. This is the version where multi-region JetStream became operationally simple.
  • NATS 2.11 (2024) added consumer priority groups, scheduled message delivery, and a more efficient leaf-node connection protocol. The 2.11 client libraries dropped some Python 3.7 support, so older runtimes need pinned client versions.
  • NATS 2.12+ ships ongoing JetStream performance work and tighter limits on consumer fan-out. If your monitoring shows a sudden change in JetStream API response times, check the release notes for limit changes before assuming a bug.

Compared to alternatives: Kafka is the heavyweight option — partitioned logs, infinite retention, ecosystem-wide tooling. Kafka makes more sense than NATS when you have ETL pipelines, exactly-once semantics across many consumers, or compaction requirements. RabbitMQ is the closest direct competitor for request/reply and work queues; it’s older, has richer routing primitives (topic exchanges, headers exchanges, RPC patterns), and ships with a polished management UI. NATS wins on operational simplicity and raw throughput for the cases it covers: a single binary, no Zookeeper or Erlang VM, and core pub/sub in microseconds. Redis Streams sits at the lightweight end — fine for a single-instance buffer but lacking real persistence guarantees across replication. If you’re choosing between them: pick NATS for microservice meshes that need leaf-node routing across regions, Kafka for large-scale event sourcing, RabbitMQ for complex routing and existing AMQP integration.

Still Not Working?

A few less-obvious failures:

  • Messages published but never delivered. Subject mismatch. Use nats sub ">" from CLI to see everything hitting the server, then check the actual subject your publisher uses.
  • Stream full and rejecting. max_msgs/max_bytes hit. Either purge old messages (nats stream purge ORDERS), expand limits, or switch retention to WORK_QUEUE (auto-removes acked messages).
  • Consumer lag grows unbounded. Processing is slower than ingestion. Add workers, batch with fetch(batch=N), or use a WorkQueue consumer that load-balances across instances.
  • Random “context deadline exceeded”. Default request timeout is short. Pass timeout= explicitly for longer-running operations.
  • JetStream API works for one account but not another. Each NATS account has its own JetStream. Check the account context (nsc list keys, nats context).
  • max_deliver exhausted, messages disappear. They went to the consumer’s “discarded” count. Set up a --max-deliver-subject to receive them as a dead-letter queue.
  • Server fills disk. JetStream is writing to disk. Check stream sizes and either reduce retention or add storage.
  • Cross-region replication needed. Use NATS mirror or source streams, or a leaf node topology. These need explicit config — they’re not automatic.
  • Consumer redelivery loop after a deploy. Your handler is slower than ack_wait and never calls msg.in_progress(). Either extend the deadline on the consumer config or wrap long-running work in a periodic in-progress signal. The loop will burn CPU until you do.
  • Stream replication shows replicas: 1 but you asked for 3. JetStream silently downgrades replicas when fewer than N servers in the cluster are JetStream-enabled. Check nats server list and confirm every server has jetstream: {} in its config.
  • KV watch fires twice for the same revision. A reconnect during the watch replays from the last known revision. Idempotency on the consumer side is mandatory; track the highest revision you’ve seen and skip anything not strictly greater.

For related messaging, queue, and broker connection issues, see RabbitMQ connection refused, Kafka not working, Redis pub sub not working, and Celery task not received.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles