Fix: NATS Not Working — Connection Auth, JetStream Streams, Consumer Ack, and Subject Wildcards
Part of: Docker, DevOps & Infrastructure
Quick Answer
How to fix NATS errors — no responders to request, JetStream stream not found, consumer redelivery loop, durable vs ephemeral consumers, subject wildcard mismatch, TLS auth setup, and KV bucket basics.
The Error
You call nats.request and get this:
nats: no responders available for requestOr your JetStream publish fails because the stream doesn’t exist:
nats: stream not found: ORDERSOr the consumer keeps redelivering the same message:
[consumer] received msg seq=1
[consumer] received msg seq=1 ← same one again, 30 seconds later
[consumer] received msg seq=1Or a wildcard subscription matches nothing:
await nc.subscribe("orders.*.processed", cb=handle)
# Publishing to "orders.us.east.processed" → no delivery.Why This Happens
NATS has two layers and most issues come from mixing them up:
- Core NATS — fire-and-forget pub/sub and request/reply. No persistence. A
requestwithout a subscriber gets “no responders.” Subscribers come and go; nothing stored. - JetStream — persistent messaging built on top. Streams capture messages by subject pattern. Consumers (durable or ephemeral) read from streams with ack semantics.
Other common pitfalls:
- Subject wildcards.
*matches one token;>matches one or more.orders.*matchesorders.abut notorders.a.b.orders.>matches both. - Ack timeouts. A consumer that doesn’t ack within
AckWait(default 30s) gets the message redelivered. Long-running handlers must extend the deadline or ack early. - Durable vs ephemeral consumers. Durable consumers survive client restarts and resume from their last acked position. Ephemerals are tied to a connection and disappear with it.
- TLS and auth. Bare
nats://...is unencrypted and unauthenticated. Production needstls://ornats://user:pass@...or NKeys/JWT.
The deeper architectural reason for so many “messages disappear” reports is that Core NATS was originally designed as a real-time bus — closer in spirit to UDP multicast than to Kafka. If nobody is listening, the message is dropped on the floor. JetStream was added in 2020 specifically to bolt persistence onto that bus, but it lives as a layer above the wire protocol rather than inside it. The result is that the same client can publish to a JetStream-backed subject (durable) and a Core NATS subject (not durable) in the same line of code; only the subject configuration determines which path the message takes. Teams who treat NATS as “always durable like Kafka” eventually lose a message and only then discover the layering.
JetStream’s ack semantics are also subtly different from RabbitMQ’s. In RabbitMQ, an unacked message is held by the broker until the consumer’s channel closes; in JetStream, an unacked message is redelivered after AckWait, which defaults to 30 seconds. A consumer that processes a long-running job — image transcoding, ML inference, anything pushing past 30s — sees the same message twice (then four times, then eight) and may build a feedback loop where the same work is repeated forever. The fix is either msg.in_progress() to extend the deadline, or a longer ack_wait on the consumer config; either way, the default is calibrated for fast handlers, not for batch jobs.
Fix 1: “No Responders” Means No Subscriber
Request/reply needs at least one subscriber:
# Subscriber side:
import asyncio
import nats
async def main():
nc = await nats.connect("nats://localhost:4222")
async def handler(msg):
await msg.respond(b"pong")
await nc.subscribe("ping", cb=handler)
await asyncio.sleep(3600) # Stay alive
asyncio.run(main())# Requester side:
nc = await nats.connect("nats://localhost:4222")
try:
response = await nc.request("ping", b"", timeout=1.0)
print(response.data)
except nats.errors.NoRespondersError:
print("No subscriber on 'ping'")Check both sides are connected to the same NATS server and using the same subject. From a NATS CLI:
nats sub "ping" # Listen
nats req "ping" "" # SendPro Tip: For mission-critical request/reply patterns, never rely on a single responder being online. Use JetStream with consumer pulls so messages queue when no one is listening.
Fix 2: Create the JetStream Stream Before Publishing
JetStream needs explicit stream creation:
import nats
from nats.js.api import StreamConfig, RetentionPolicy
async def setup():
nc = await nats.connect("nats://localhost:4222")
js = nc.jetstream()
await js.add_stream(
name="ORDERS",
subjects=["orders.>"],
retention=RetentionPolicy.LIMITS,
max_msgs=1_000_000,
max_bytes=10 * 1024**3, # 10 GB
max_age=7 * 24 * 3600, # 7 days, in seconds
)Or via CLI (idempotent — safe to re-run):
nats stream add ORDERS \
--subjects "orders.>" \
--retention limits \
--max-msgs 1000000 \
--max-age 7d \
--storage fileSubjects use > to capture everything under orders.. Publishing to orders.us.placed, orders.eu.shipped, etc. all land in this stream.
Common Mistake: Adding a stream with subjects orders (no wildcard) and wondering why orders.us.placed doesn’t get captured. Use orders.> for “everything under orders.”
Fix 3: Use Durable Consumers for Reliable Processing
Durable consumers survive restarts. Define one with explicit ack:
from nats.js.api import ConsumerConfig, AckPolicy, DeliverPolicy
await js.add_consumer(
"ORDERS",
config=ConsumerConfig(
durable_name="ORDER_PROCESSOR",
ack_policy=AckPolicy.EXPLICIT,
deliver_policy=DeliverPolicy.ALL,
max_deliver=5,
ack_wait=60, # seconds
),
)Then pull or push:
# Pull-based (recommended for backpressure control):
sub = await js.pull_subscribe("orders.>", "ORDER_PROCESSOR", stream="ORDERS")
while True:
try:
msgs = await sub.fetch(batch=10, timeout=5)
for msg in msgs:
try:
process(msg.data)
await msg.ack()
except TransientError:
await msg.nak(delay=5) # Retry in 5s
except PermanentError:
await msg.term() # Don't retry
except asyncio.TimeoutError:
continueThree ack outcomes:
ack()— successfully processed. Won’t be redelivered.nak(delay=N)— failed transiently. Redeliver after N seconds.term()— failed permanently. Don’t redeliver. (Counts againstmax_deliverdifferently — terminated messages move to the consumer’s discard count.)in_progress()— still working. Extends the ack deadline.
Pro Tip: For handlers that take longer than ack_wait, call msg.in_progress() periodically to extend the deadline. Otherwise NATS redelivers thinking the handler died.
Fix 4: Subject Wildcards — * vs >
Two wildcard tokens:
*matches exactly one token.orders.*.placedmatchesorders.us.placedbut notorders.us.east.placed.>matches one or more tokens, only at the end.orders.>matchesorders.us,orders.us.east,orders.us.east.placed, etc.
# Match all order events from any region (one-deep):
await nc.subscribe("orders.*.placed", cb=...)
# Matches: orders.us.placed, orders.eu.placed
# Doesn't match: orders.us.east.placed
# Match every order event:
await nc.subscribe("orders.>", cb=...)
# Matches all of the above.For JetStream stream subjects, use the same wildcard rules. A stream with subjects=["orders.>"] captures everything under orders..
Common Mistake: Trying to use > in the middle: orders.>.placed is invalid. > must be the final token.
Fix 5: Connection Auth and TLS
Bare nats:// connections work for local dev but are unencrypted and unauthenticated. For production:
User/password:
nc = await nats.connect(
"nats://user:[email protected]:4222",
)Token:
nc = await nats.connect(
"nats://nats.example.com:4222",
token="s3cr3t",
)NKey + JWT (recommended for prod):
nc = await nats.connect(
"tls://nats.example.com:4222",
user_credentials="./user.creds",
)user.creds is the file you get from nsc generate creds. It contains both the NKey seed and the signed JWT.
TLS:
import ssl
ctx = ssl.create_default_context()
nc = await nats.connect(
"tls://nats.example.com:4222",
tls=ctx,
)For mutual TLS (client cert auth):
ctx = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile="ca.pem")
ctx.load_cert_chain(certfile="client.pem", keyfile="client.key")
nc = await nats.connect("tls://nats.example.com:4222", tls=ctx)Fix 6: Key-Value (KV) Buckets
JetStream KV is a key-value store built on streams. Common API:
js = nc.jetstream()
# Create / open a bucket:
kv = await js.create_key_value(bucket="config", history=5, ttl=3600)
# Set / get:
await kv.put("api.endpoint", b"https://api.example.com")
entry = await kv.get("api.endpoint")
print(entry.value) # b"https://api.example.com"
# Watch for changes:
async for entry in await kv.watchall():
print(entry.key, entry.value)KV buckets are a thin layer over streams — under the hood, kv.put("key", value) is js.publish("$KV.config.key", value). Useful for:
- Feature flags
- Runtime config
- Service discovery
- Distributed locks (via CAS operations)
Note: KV is not a high-throughput cache. For per-request caching, use Redis. NATS KV shines for low-volume, consistency-sensitive state that multiple services need to watch.
Fix 7: Reconnection and max_reconnect_attempts
NATS clients auto-reconnect by default. Tune:
nc = await nats.connect(
servers=["nats://1.example.com:4222", "nats://2.example.com:4222"],
max_reconnect_attempts=-1, # Infinite
reconnect_time_wait=2.0, # 2s between attempts
error_cb=on_error,
closed_cb=on_closed,
reconnected_cb=on_reconnected,
)Pass multiple servers — the client probes them in order during initial connect and shuffles for reconnects.
Handle reconnection events:
async def on_disconnected():
print("disconnected")
async def on_reconnected():
print(f"reconnected to {nc.connected_url.netloc}")
nc = await nats.connect(
"nats://...",
disconnected_cb=on_disconnected,
reconnected_cb=on_reconnected,
)Common Mistake: Treating reconnects as errors. For long-lived clients, disconnect/reconnect is normal — handle it gracefully. Only closed_cb (terminal close, often after exhausting reconnects) is an actual failure.
Fix 8: Monitor With the NATS CLI
The nats CLI is invaluable for debugging:
# Server health:
nats server check connection
# Stream stats:
nats stream info ORDERS
# Consumer pending messages, redelivery counts:
nats consumer info ORDERS ORDER_PROCESSOR
# Tail a subject:
nats sub "orders.>"
# Publish a test message:
nats pub orders.us.placed '{"id":1}'
# Inspect KV bucket:
nats kv get config api.endpoint
nats kv ls configFor continuous monitoring, expose /varz, /connz, /jsz, /healthz endpoints from your NATS server:
curl http://localhost:8222/varz | jq
curl http://localhost:8222/jsz?accounts=true | jqThe /jsz endpoint shows JetStream usage per account, stream sizes, consumer lag — feed it to Prometheus via the prometheus-nats-exporter.
Version History and Tooling Context
NATS reached version 2.0 in 2019 with a fundamentally new server. The features users actually depend on landed across a series of point releases since then, and the choice of feature dictates the minimum server version you can deploy against:
- NATS 2.2 (March 2021) introduced JetStream as a general-availability feature. Before this, persistence was provided by a separate NATS Streaming Server (now deprecated). Code written against NATS Streaming uses a different API entirely and won’t talk to JetStream.
- NATS 2.9 (October 2022) stabilized Key-Value (KV) and Object Store. Earlier versions had a beta KV layer with a slightly different API; production deployments should be on 2.9 or later for KV.
- NATS 2.10 (September 2023) added stream-level clustering improvements, subject-mapping per cluster, and the
replicasargument on consumers. This is the version where multi-region JetStream became operationally simple. - NATS 2.11 (2024) added consumer priority groups, scheduled message delivery, and a more efficient leaf-node connection protocol. The 2.11 client libraries dropped some Python 3.7 support, so older runtimes need pinned client versions.
- NATS 2.12+ ships ongoing JetStream performance work and tighter limits on consumer fan-out. If your monitoring shows a sudden change in JetStream API response times, check the release notes for limit changes before assuming a bug.
Compared to alternatives: Kafka is the heavyweight option — partitioned logs, infinite retention, ecosystem-wide tooling. Kafka makes more sense than NATS when you have ETL pipelines, exactly-once semantics across many consumers, or compaction requirements. RabbitMQ is the closest direct competitor for request/reply and work queues; it’s older, has richer routing primitives (topic exchanges, headers exchanges, RPC patterns), and ships with a polished management UI. NATS wins on operational simplicity and raw throughput for the cases it covers: a single binary, no Zookeeper or Erlang VM, and core pub/sub in microseconds. Redis Streams sits at the lightweight end — fine for a single-instance buffer but lacking real persistence guarantees across replication. If you’re choosing between them: pick NATS for microservice meshes that need leaf-node routing across regions, Kafka for large-scale event sourcing, RabbitMQ for complex routing and existing AMQP integration.
Still Not Working?
A few less-obvious failures:
- Messages published but never delivered. Subject mismatch. Use
nats sub ">"from CLI to see everything hitting the server, then check the actual subject your publisher uses. - Stream full and rejecting.
max_msgs/max_byteshit. Either purge old messages (nats stream purge ORDERS), expand limits, or switchretentiontoWORK_QUEUE(auto-removes acked messages). - Consumer lag grows unbounded. Processing is slower than ingestion. Add workers, batch with
fetch(batch=N), or use aWorkQueueconsumer that load-balances across instances. - Random “context deadline exceeded”. Default request timeout is short. Pass
timeout=explicitly for longer-running operations. - JetStream API works for one account but not another. Each NATS account has its own JetStream. Check the account context (
nsc list keys,nats context). max_deliverexhausted, messages disappear. They went to the consumer’s “discarded” count. Set up a--max-deliver-subjectto receive them as a dead-letter queue.- Server fills disk. JetStream is writing to disk. Check stream sizes and either reduce retention or add storage.
- Cross-region replication needed. Use NATS mirror or source streams, or a leaf node topology. These need explicit config — they’re not automatic.
- Consumer redelivery loop after a deploy. Your handler is slower than
ack_waitand never callsmsg.in_progress(). Either extend the deadline on the consumer config or wrap long-running work in a periodic in-progress signal. The loop will burn CPU until you do. - Stream replication shows
replicas: 1but you asked for 3. JetStream silently downgrades replicas when fewer than N servers in the cluster are JetStream-enabled. Checknats server listand confirm every server hasjetstream: {}in its config. - KV watch fires twice for the same revision. A reconnect during the watch replays from the last known revision. Idempotency on the consumer side is mandatory; track the highest revision you’ve seen and skip anything not strictly greater.
For related messaging, queue, and broker connection issues, see RabbitMQ connection refused, Kafka not working, Redis pub sub not working, and Celery task not received.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Redis Streams Not Working — Consumer Groups, XACK, Pending Entries, MAXLEN, and Claiming
How to fix Redis Streams errors — XADD/XREAD basics, consumer group XGROUP CREATE, XACK for ack, XPENDING for stuck messages, MAXLEN ~ for trimming, XAUTOCLAIM for redelivery, and Cluster hash slot constraints.
Fix: Valkey Not Working — Redis Client Compatibility, ACL, Cluster Mode, and Migration
How to fix Valkey errors — client connection refused, RESP protocol compatibility, ACL user setup, cluster slot reshard, persistence config (RDB/AOF), TLS, Sentinel mode, and migrating from Redis.
Fix: ArgoCD Not Working — OutOfSync, Sync Waves, RBAC, Helm/Kustomize, and Webhook Setup
How to fix ArgoCD errors — application stuck OutOfSync, sync waves not respected, RBAC permission denied, Helm values not merged, ApplicationSet generator config, repo auth, and webhook not triggering.
Fix: Cloudflare Queues Not Working — Producer Binding, Consumer Worker, Batching, and Dead Letter
How to fix Cloudflare Queues errors — producer queue.send not delivering, consumer not invoking, ack/retry/DLQ patterns, batch size limits, max_retries, content type pitfalls, and local dev with wrangler.