Fix: Milvus Not Working — Connection Errors, Schema Setup, and Index Build Failures
Part of: Python Errors
Quick Answer
How to fix Milvus errors — pymilvus connection refused localhost 19530, collection schema mismatch, index not built before search, partition not found, embedded vs standalone vs cluster, and flush before search.
The Error
You install pymilvus and the first connection fails:
MilvusException: <MilvusException: (code=2, message=Fail connecting to server on localhost:19530. Timeout)>Or you create a collection and inserts fail with schema errors:
MilvusException: (code=1, message=field dimension mismatch, expected 1536, got 768)Or you query without building an index:
MilvusException: (code=22, message=collection not loaded)
MilvusException: (code=15, message=index not exist)Or you insert data and immediately search — but no results come back:
collection.insert(rows)
results = collection.search(query_vec, ...)
# Empty results, even though you just inserted matching dataOr partitions get confused:
MilvusException: (code=23, message=partition not found)Or you try to use Milvus Lite and the API differs from Milvus standalone:
client = MilvusClient("milvus_lite.db")
client.search(...)
# But your code uses Collection() API — incompatibleMilvus is the most production-scale open-source vector database — used at companies with billions of vectors. It’s heavier than Chroma or Qdrant (requires multiple coordinators, query nodes, and data nodes in cluster mode) but scales further. The Python client pymilvus has two APIs (legacy Collection and modern MilvusClient), three deployment modes (Lite, Standalone, Cluster), and the index-then-load workflow that’s unique among vector DBs. This guide covers each common failure.
Why This Happens
Milvus separates insert and search more strictly than other vector DBs. Inserted vectors live in a “growing segment” until they’re “sealed” — only sealed segments can be indexed, and only indexed segments are searched. Search before indexing returns nothing or errors; insert and immediate search may miss recent data unless you flush().
The two client APIs (Collection vs MilvusClient) reflect two generations of pymilvus. MilvusClient (added in pymilvus 2.3+) is the modern, simpler API; Collection is the lower-level legacy API still widely documented in tutorials.
Fix 1: Choosing a Deployment Mode
Milvus Lite — embedded, file-based (great for prototypes):
pip install pymilvusfrom pymilvus import MilvusClient
client = MilvusClient("./milvus_lite.db")
# All data in a single SQLite-like file
# No server neededMilvus Standalone — single-process server:
# Docker
docker run -d --name milvus-standalone \
-p 19530:19530 -p 9091:9091 \
-v $(pwd)/volumes/milvus:/var/lib/milvus \
milvusdb/milvus:latest milvus run standalonefrom pymilvus import MilvusClient
client = MilvusClient(uri="http://localhost:19530")Milvus Cluster — distributed (multiple nodes, requires Kubernetes for serious deployments):
# Helm
helm install my-milvus milvus/milvus --set cluster.enabled=trueComparison:
| Mode | Vectors | Use case |
|---|---|---|
| Lite | < 1M | Prototyping, embedded apps, single user |
| Standalone | < 10M | Small production, single-machine |
| Cluster | Billions+ | Production at scale |
Common Mistake: Starting with Milvus Cluster for a 100k-vector workload. The operational complexity (8+ pods, etcd, MinIO, Pulsar) is overkill — Lite or Standalone runs the same workload with one process and 10x less ops burden. Scale up to Cluster only when you measurably need it.
Fix 2: MilvusClient vs Collection API
Modern API (MilvusClient, recommended):
from pymilvus import MilvusClient
client = MilvusClient(uri="http://localhost:19530")
# Create collection with schema
client.create_collection(
collection_name="articles",
dimension=1536,
metric_type="COSINE",
primary_field_name="id",
vector_field_name="embedding",
)
# Insert
client.insert(
collection_name="articles",
data=[
{"id": 1, "embedding": [0.1, 0.2, ...], "title": "Article 1"},
{"id": 2, "embedding": [0.3, 0.4, ...], "title": "Article 2"},
],
)
# Search
results = client.search(
collection_name="articles",
data=[query_embedding],
limit=10,
output_fields=["title"],
)Legacy API (Collection):
from pymilvus import (
connections, Collection, FieldSchema, CollectionSchema, DataType,
)
connections.connect(host="localhost", port="19530")
# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536),
FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
]
schema = CollectionSchema(fields=fields)
collection = Collection(name="articles", schema=schema)
# Build index
collection.create_index(
field_name="embedding",
index_params={"metric_type": "COSINE", "index_type": "HNSW", "params": {"M": 16, "efConstruction": 200}},
)
# Load to memory
collection.load()
# Insert
collection.insert([[1, 2], [vec1, vec2], ["a", "b"]])
# Search
results = collection.search(
[query_vec], "embedding", {"metric_type": "COSINE"}, limit=10,
)Use MilvusClient for new code. The legacy API still works but the modern client is simpler and converges to the same backend. Most online tutorials use Collection — translate to MilvusClient for cleaner code.
Fix 3: Schema Setup and Field Types
from pymilvus import MilvusClient, DataType
client = MilvusClient(uri="http://localhost:19530")
# Explicit schema with multiple fields
schema = client.create_schema(auto_id=True, enable_dynamic_field=True)
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=1536)
schema.add_field(field_name="title", datatype=DataType.VARCHAR, max_length=500)
schema.add_field(field_name="published_at", datatype=DataType.INT64) # Unix timestamp
schema.add_field(field_name="tags", datatype=DataType.ARRAY, element_type=DataType.VARCHAR, max_capacity=10, max_length=50)
# Prepare index parameters
index_params = client.prepare_index_params()
index_params.add_index(
field_name="embedding",
index_type="HNSW",
metric_type="COSINE",
params={"M": 16, "efConstruction": 200},
)
client.create_collection(
collection_name="articles",
schema=schema,
index_params=index_params,
)Data types:
| DataType | Use for |
|---|---|
INT8, INT16, INT32, INT64 | Integer fields, primary key |
FLOAT, DOUBLE | Numeric fields |
BOOL | Boolean |
VARCHAR | Strings (specify max_length) |
JSON | Arbitrary JSON values |
ARRAY | Lists of primitives (specify element_type and max_capacity) |
FLOAT_VECTOR | Dense vectors (specify dim) |
BINARY_VECTOR | Binary vectors (dimension in bits) |
SPARSE_FLOAT_VECTOR | Sparse vectors (for hybrid search) |
Dynamic fields — let you add arbitrary fields per row without schema changes:
schema = client.create_schema(auto_id=True, enable_dynamic_field=True)
# Now insert can include any fields
client.insert(
collection_name="articles",
data=[
{"embedding": vec, "title": "...", "any_extra_field": "value"},
],
)Common Mistake: Forgetting max_length on VARCHAR fields. Milvus requires explicit length limits; without one, schema creation fails with a confusing error about “field has no max length”. Always set max_length=N for VARCHAR — pick generously since the cost is small.
Fix 4: Index Building (Required Before Search)
MilvusException: (code=15, message=index not exist)Milvus requires an index on the vector field before you can search. Unlike Chroma/Qdrant where indexes are automatic, Milvus needs explicit create_index():
# After creating the collection
index_params = client.prepare_index_params()
index_params.add_index(
field_name="embedding",
index_type="HNSW", # Or IVF_FLAT, IVF_SQ8, IVF_PQ, FLAT, AUTOINDEX
metric_type="COSINE",
params={"M": 16, "efConstruction": 200},
)
client.create_index(
collection_name="articles",
index_params=index_params,
)Index types:
| Index | Best for |
|---|---|
FLAT | Exact search, < 10k vectors |
IVF_FLAT | Medium datasets (10k–1M) |
IVF_SQ8 | Like IVF_FLAT but 4x less memory (slight accuracy loss) |
IVF_PQ | Large datasets (>1M) where memory matters |
HNSW | Best speed/accuracy tradeoff, default for many workloads |
AUTOINDEX | Let Milvus pick based on data size (Milvus 2.4+) |
DISKANN | Datasets too large for RAM (disk-based) |
GPU_IVF_FLAT, GPU_IVF_PQ | GPU-accelerated (Milvus GPU build) |
Load collection to memory before searching:
client.load_collection(collection_name="articles")
# Vectors and index now in memory; search works
# When done, free memory
client.release_collection(collection_name="articles")Pro Tip: Use AUTOINDEX for the simplest path — Milvus picks a sensible index type and parameters based on your data size. Only specify HNSW/IVF/PQ explicitly when you’ve benchmarked and know a specific choice wins. AUTOINDEX defaults are tuned by the Milvus team based on extensive testing.
Fix 5: Flush Before Search (or Wait)
client.insert(collection_name="articles", data=[...])
results = client.search(collection_name="articles", data=[query_vec], limit=10)
# Empty results — insertions haven't been "flushed" to searchable segmentsMilvus buffers inserts in growing segments. Searches only see sealed segments by default. To search immediately after insert:
client.insert(collection_name="articles", data=[...])
client.flush(collection_name="articles") # Force seal of current growing segment
client.search(...) # Now sees the insertsOr use consistency_level:
results = client.search(
collection_name="articles",
data=[query_vec],
limit=10,
consistency_level="Strong", # Wait for all data to be searchable
)Consistency levels:
| Level | Behavior |
|---|---|
Strong | Always search latest data (slowest) |
Bounded | Search data up to N seconds old (default) |
Session | See your own writes |
Eventually | Search whatever’s available (fastest, may miss recent inserts) |
Common Mistake: Calling flush() after every insert. Each flush triggers segment compaction — frequent flushing kills throughput. For bulk loads, insert thousands of rows, then flush once. For real-time apps, use consistency_level="Session" so your queries see your writes without triggering full flushes.
Fix 6: Searching with Filters
results = client.search(
collection_name="articles",
data=[query_vec],
limit=10,
output_fields=["title", "published_at"],
filter="published_at > 1700000000 and tags contains 'ml'",
search_params={"params": {"ef": 50}}, # HNSW search-time parameter
)Filter syntax uses boolean expressions:
# Comparison
filter='age > 18'
filter='name == "Alice"'
filter='status != "deleted"'
# Logical
filter='age > 18 and status == "active"'
filter='age < 13 or age > 65'
# IN
filter='category in ["news", "blog"]'
# String functions
filter='title like "Intro to %"'
# Array contains
filter='tags contains "machine-learning"'
filter='tags contains_any ["ai", "ml"]'
# JSON field access
filter='metadata["author_id"] == 123'Filter performance — Milvus uses scalar indexes if you create them:
client.create_index(
collection_name="articles",
index_params=client.prepare_index_params().add_index(
field_name="published_at",
index_type="STL_SORT", # Sorted index for range queries
),
)Without scalar indexes, filtered queries scan all rows — slow for selective filters on large collections.
Fix 7: Partitions for Multi-Tenancy
# Create partitions
client.create_partition(
collection_name="articles",
partition_name="2024",
)
client.create_partition(
collection_name="articles",
partition_name="2025",
)
# Insert into specific partition
client.insert(
collection_name="articles",
partition_name="2025",
data=[...],
)
# Search in specific partition (much faster than searching all)
results = client.search(
collection_name="articles",
partition_names=["2025"],
data=[query_vec],
limit=10,
)Use partitions for time-based or tenant-based data separation. Partition queries skip unrelated data entirely — major speedup vs scanning everything and filtering after.
Common Mistake: Creating thousands of partitions (e.g., one per user). Milvus has overhead per partition and supports a few hundred efficiently. For high-cardinality tenancy, use scalar field filters instead. Partitions work best for low-cardinality coarse groupings (year, region, environment).
For comparing Milvus partitions to Weaviate’s multi-tenancy or Qdrant’s collections, see Weaviate not working and Qdrant not working.
Fix 8: Hybrid Search (Sparse + Dense)
Milvus 2.4+ supports hybrid search combining dense and sparse vectors:
from pymilvus import MilvusClient, DataType, AnnSearchRequest, RRFRanker
client = MilvusClient(uri="http://localhost:19530")
# Schema with both vector types
schema = client.create_schema()
schema.add_field("id", DataType.INT64, is_primary=True)
schema.add_field("dense_vec", DataType.FLOAT_VECTOR, dim=1536)
schema.add_field("sparse_vec", DataType.SPARSE_FLOAT_VECTOR)
schema.add_field("text", DataType.VARCHAR, max_length=1000)
# Index both
index_params = client.prepare_index_params()
index_params.add_index("dense_vec", index_type="HNSW", metric_type="COSINE", params={"M": 16})
index_params.add_index("sparse_vec", index_type="SPARSE_INVERTED_INDEX", metric_type="IP")
client.create_collection(collection_name="hybrid_docs", schema=schema, index_params=index_params)
# Insert with both vector types
client.insert(
collection_name="hybrid_docs",
data=[
{
"id": 1,
"dense_vec": dense_embedding, # From embedding model
"sparse_vec": {0: 0.5, 42: 0.3, 100: 0.8}, # From BM25/SPLADE
"text": "Article content...",
},
],
)
# Hybrid search
dense_req = AnnSearchRequest(
data=[query_dense],
anns_field="dense_vec",
param={"params": {"ef": 50}},
limit=10,
)
sparse_req = AnnSearchRequest(
data=[query_sparse],
anns_field="sparse_vec",
param={},
limit=10,
)
results = client.hybrid_search(
collection_name="hybrid_docs",
reqs=[dense_req, sparse_req],
ranker=RRFRanker(k=60), # Reciprocal Rank Fusion
limit=5,
)Ranker options:
RRFRanker(k=60)— Reciprocal Rank Fusion, parameter-free aggregationWeightedRanker(0.7, 0.3)— Weighted combination (dense 0.7, sparse 0.3)
Pinecone also supports sparse-dense hybrid via its serverless API; the trade-off is that Milvus exposes the underlying sparse index type, while Pinecone treats sparse vectors as a black box.
Platform Differences: Milvus Lite vs Standalone vs Cluster vs Zilliz Cloud
Most Milvus errors trace back to which deployment flavor you’re using — the API surface looks identical but the internals diverge sharply.
Milvus Lite (embedded, pymilvus[lite]) — Pure-Python wrapper over SQLite-style storage. No coordinators, no indexers, no etcd. Only runs on Linux and macOS (officially), and Apple Silicon (M1/M2/M3/M4) is supported as of pymilvus 2.4.2+. Windows users must use WSL2 or Docker — Lite does not ship a Windows wheel. Lite supports only FLAT and HNSW indexes; IVF_*, DISKANN, and GPU indexes are unavailable. Partitions exist but multi-partition search has known caveats. Best for unit tests and notebooks.
Milvus Standalone (Docker, single process) — One container packs the proxy, coord, querynode, datanode, and indexnode. Embeds etcd and MinIO inside. Runs on x86_64 and arm64 (M-series Mac via Docker Desktop). Memory floor sits around 1.5GB even idle — Docker Desktop on macOS with default 2GB limit will OOM under load. Bump Docker memory to 4GB minimum.
Milvus Cluster (Helm/Operator on Kubernetes) — Each role runs as a separate StatefulSet. Requires external etcd (or built-in), MinIO/S3 for object storage, and Pulsar/Kafka for write-ahead log. Use the Milvus Operator (milvus-operator) rather than raw Helm for upgrades — Helm chart upgrades between minor versions sometimes mangle the CR. Cluster mode is the only deployment that supports GPU indexes (GPU_IVF_FLAT, GPU_IVF_PQ, GPU_CAGRA) and disk-based DISKANN.
Zilliz Cloud (managed Milvus) — Same API via MilvusClient(uri, token=api_key). Differences: collection names are namespaced per cluster, free-tier clusters auto-pause after idle and the first query after a pause times out (retry after ~10s wakes them). Zilliz uses its own AUTOINDEX implementation that doesn’t expose the underlying algorithm.
Attu UI — The official admin UI. Bundled into the Standalone Docker image at http://localhost:9091/webui (Milvus 2.4+). Self-managed Kubernetes deployments must install Attu separately as a sidecar. Zilliz Cloud has its own web console instead of Attu.
GPU support — Only available in the milvusdb/milvus-gpu:latest image (Standalone) or via Helm value image.all.repository=milvusdb/milvus-gpu. Requires CUDA 11.8+ and NVIDIA Container Toolkit. AMD GPUs are not supported at any level. On Apple Silicon, GPU indexes are unavailable everywhere — fall back to HNSW on CPU.
Each platform handles tenant isolation at a different layer — Milvus uses partitions and scalar filters, Weaviate has dedicated multi-tenancy primitives, Qdrant uses per-collection sharding.
Still Not Working?
Milvus vs Other Vector DBs
- Milvus — Production-scale, multiple index types, GPU support, complex deployment. Best for billion-vector workloads.
- Chroma — Simplest, embedded. See ChromaDB not working.
- Qdrant — Self-hosted with rich filters. Lighter ops than Milvus.
- Weaviate — Hybrid plus GraphQL plus generative built-in.
- Pinecone — Managed, zero ops, no self-host option.
Milvus is the right choice when you’ve outgrown Chroma/Qdrant and need to scale to billions of vectors. For smaller workloads, the deployment complexity isn’t worth it.
Connection Timeouts in Docker
MilvusException: (code=2, message=Fail connecting to server)If running Milvus in Docker and connecting from outside:
docker run -d --name milvus-standalone \
-p 19530:19530 -p 9091:9091 \
milvusdb/milvus:latest milvus run standaloneThen in Python:
client = MilvusClient(uri="http://localhost:19530")If Milvus is on a remote host, ensure firewall allows port 19530 (gRPC) and optionally 9091 (HTTP).
Authentication
Milvus 2.3+ supports user authentication:
client = MilvusClient(
uri="http://localhost:19530",
token="root:Milvus", # username:password
)For Zilliz Cloud (managed Milvus):
client = MilvusClient(
uri="https://your-cluster.api.gcp-us-west1.zillizcloud.com",
token="your-api-key",
)Embedding Models
Pymilvus integrates with embedding providers:
from pymilvus.model.dense import OpenAIEmbeddingFunction
ef = OpenAIEmbeddingFunction(model_name="text-embedding-3-small", api_key="sk-...")
embeddings = ef.encode_documents(["text 1", "text 2"])
client.insert(
collection_name="articles",
data=[
{"id": 1, "embedding": embeddings[0], "text": "text 1"},
],
)OpenAI’s text-embedding-3-small and HuggingFace’s sentence-transformers/all-MiniLM-L6-v2 are the most common pairings — match the dim field exactly to the model’s output size (1536 for text-embedding-3-small, 384 for MiniLM).
LangChain and LlamaIndex Integration
# LangChain
from langchain_milvus import Milvus
from langchain_openai import OpenAIEmbeddings
vector_store = Milvus(
embedding_function=OpenAIEmbeddings(),
collection_name="articles",
connection_args={"uri": "http://localhost:19530"},
)
vector_store.add_texts(["doc 1", "doc 2"])
results = vector_store.similarity_search("query", k=5)For LangChain integration patterns, see LangChain Python not working.
Quota and Pool Errors Under Concurrent Writes
MilvusException: (code=29, message=DataNode resource not enough)Standalone Milvus has a fixed datanode pool — concurrent bulk inserts from many workers exhaust it. Either reduce parallel writers, batch larger (10k+ rows per insert() call), or move to Cluster mode where datanodes scale horizontally. On Zilliz Cloud the equivalent error mentions “quota exceeded” and the fix is upgrading the cluster tier; rate-limit your client to half the documented QPS to leave headroom.
Mixed-Version Client and Server
Pymilvus 2.4.x clients can connect to Milvus 2.3.x servers in most cases, but specific features (sparse vectors, hybrid search, prepare_index_params) silently no-op or throw code=1100, message=unsupported against older servers. Run client.get_server_version() first and compare against pymilvus.__version__ — keep the minor versions aligned. Zilliz Cloud auto-upgrades, so always run the latest pymilvus there.
Storage Backend Out of Space
When MinIO (or the configured S3) fills up, Milvus reports vague flush failures rather than a clean ENOSPC. Check the MinIO console at :9001 (Standalone default) or your S3 bucket’s metrics. Set dataCoord.gc.dropTolerance and dataCoord.gc.missingTolerance to shorter values so deleted segments are physically removed faster; the defaults are conservative and keep tombstoned data for hours.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: ChromaDB Not Working — Persistent Client, Collection Errors, and Embedding Function Issues
How to fix ChromaDB errors — persistent client not saving data, collection already exists error, dimension mismatch in embeddings, embedding function required, HTTP client connection refused, and memory growing unbounded.
Fix: Pinecone Not Working — Index Creation, Serverless vs Pod, and Python SDK v3 Migration
How to fix Pinecone errors — ApiException 401 unauthorized, index not found, dimension mismatch, serverless spec required, Python SDK v3 breaking changes, namespace confusion, and upsert rate limit 429.
Fix: Qdrant Not Working — Connection Errors, Collection Setup, and Filter Syntax Issues
How to fix Qdrant errors — connection refused to localhost 6333, collection not found create_collection, vector size mismatch, filter must match schema, payload index missing slow queries, and timeout on large batch uploads.
Fix: Weaviate Not Working — Client v4 Migration, Schema Setup, and Vectorizer Errors
How to fix Weaviate errors — client v3 to v4 migration breaking imports, schema creation property mismatch, vectorizer module not loaded, connection refused localhost 8080, batch import errors, and hybrid search alpha tuning.