Vector databases have become one of the most debated infrastructure choices for AI applications. Two years ago, the choice was relatively simple — most teams defaulted to Pinecone because it was the only mature managed option. In 2026, the landscape has matured significantly: Qdrant has become production-grade, Weaviate has accelerated its managed offering, and even PostgreSQL's pgvector extension has become a legitimate contender for certain workloads.
This guide cuts through the noise. We start with the fundamentals of how vector databases work, then compare five options — Pinecone, Weaviate, Qdrant, Chroma, and pgvector — with real Python examples, a cost analysis at 10M vectors, and a decision framework for choosing the right database for your RAG pipeline.
How Vector Databases Work: The Essential Concepts
Before comparing products, you need to understand the indexing algorithms they use. The choice of algorithm is the most consequential architectural decision a vector database makes.
HNSW (Hierarchical Navigable Small World)
HNSW is a graph-based approximate nearest neighbor algorithm. It builds a layered graph structure where:
- Each vector is a node in the graph
- Connections are established to nearby vectors during indexing
- Search navigates the graph from high (coarse) layers to low (fine) layers
- Higher layers have fewer nodes and longer edges; lower layers are denser
Properties:
- Excellent recall (typically 95–99% at reasonable ef values)
- Fast query time: O(log n) for search
- High memory usage: the graph structure requires significant RAM
- Slow index builds for very large datasets
- No disk-based implementation (the full index lives in RAM for most implementations)
HNSW is used by: Qdrant, Weaviate, Chroma, and pgvector (via hnswlib).
IVF (Inverted File Index)
IVF partitions the vector space into clusters using k-means. During search:
- The query vector is compared to cluster centroids
- Only the top-k closest clusters are searched exhaustively
- The
nprobeparameter controls how many clusters are searched (precision vs. speed tradeoff)
Properties:
- More memory-efficient than HNSW (cluster centroids + IDs, not full graph)
- Scales better to very large datasets
- Requires training on representative data before indexing
- Can be combined with product quantization (IVF-PQ) to dramatically reduce memory at some recall cost
- Slower for small datasets than HNSW
IVF variants are used by: Pinecone (proprietary variant), Milvus (Faiss backend).
DiskANN
DiskANN (Disk-based Approximate Nearest Neighbor) is a newer algorithm that stores the graph structure on disk rather than RAM, enabling billion-scale indexes without proportionally large RAM requirements.
Properties:
- Much lower RAM requirements than HNSW at scale
- Slightly slower queries due to disk I/O
- Ideal for datasets that exceed available RAM
DiskANN is used by: Azure AI Search, Qdrant (optional), Weaviate (optional).
The Five Contenders: Architecture Overview
Pinecone
Pinecone is a fully managed, purpose-built vector database service. It offers no self-hosting option — you use their cloud infrastructure or you don't use Pinecone.
Architecture: Pinecone separates storage and compute, using a proprietary index format that combines aspects of IVF and flat indexing. It shards indexes across multiple pods for horizontal scaling. The "serverless" tier (introduced in 2024) stores vectors in object storage and builds indexes on-demand, dramatically lowering costs for sparse or intermittent workloads.
Strengths: Easiest to get started, excellent managed experience, strong SLA, metadata filtering is fast.
Weaknesses: No self-hosting, limited customization of indexing behavior, pricing becomes expensive at very high query volumes.
Weaviate
Weaviate is an open-source vector database built in Go. It offers both self-hosting and a managed cloud offering (Weaviate Cloud Services).
Architecture: Weaviate uses HNSW by default with a custom implementation optimized for vector+metadata hybrid search. It has a native GraphQL API and supports multi-vector embeddings. It provides a unique "modules" system for integrating embedding models directly into the database (so the database can vectorize at write time using Cohere, OpenAI, or HuggingFace).
Strengths: GraphQL API is powerful for complex queries, strong hybrid search (BM25 + vector), active development, good Kubernetes operator.
Weaknesses: More complex to configure than simpler alternatives, HNSW memory requirements can be significant, the GraphQL API has a learning curve.
Qdrant
Qdrant is an open-source vector search engine written in Rust, developed by Qdrant Solutions. It offers self-hosting and a managed cloud tier.
Architecture: Qdrant uses HNSW with several Rust-native optimizations: quantization (scalar, product, binary) to reduce memory usage, on-disk indexing for large collections, and payload-based filtering that is applied within the HNSW traversal (not as a post-filter, which dramatically improves filtered search performance).
Strengths: Best-in-class filtered vector search performance, Rust performance characteristics, excellent Python and gRPC clients, good documentation, reasonable cloud pricing.
Weaknesses: Managed cloud is newer and has less track record than Pinecone, smaller ecosystem than Weaviate.
Chroma
Chroma is an open-source embedding database designed explicitly for AI applications and local development. It prioritizes developer experience over production scalability.
Architecture: Chroma uses hnswlib for indexing and SQLite for metadata storage in its embedded mode. A server mode uses DuckDB for analytics queries. It has a simple, Python-first API.
Strengths: Zero-configuration setup, excellent for development and prototyping, seamless LangChain and LlamaIndex integration, in-memory mode for testing.
Weaknesses: Not suitable for production at scale (single-node, no horizontal scaling), limited query capabilities, no native cloud offering.
pgvector
pgvector is a PostgreSQL extension that adds vector similarity search to PostgreSQL tables. It is not a dedicated vector database but an extension of an existing relational database.
Architecture: pgvector supports both HNSW and IVF indexes on PostgreSQL columns. Metadata filtering is standard SQL — WHERE clauses. It runs within a standard PostgreSQL instance.
Strengths: No additional infrastructure if you already run PostgreSQL, SQL familiarity, ACID transactions, existing backup and monitoring tooling applies.
Weaknesses: Lower query performance than dedicated vector DBs at large scale, HNSW index build is significantly slower, limited to PostgreSQL ecosystem.
Feature Comparison Table
| Feature | Pinecone | Weaviate | Qdrant | Chroma | pgvector |
|---|---|---|---|---|---|
| License | Proprietary | Apache 2.0 | Apache 2.0 | Apache 2.0 | PostgreSQL |
| Self-hostable | No | Yes | Yes | Yes | Yes |
| Managed cloud | Yes | Yes (WCS) | Yes (Qdrant Cloud) | No | Various (RDS, Supabase, Neon) |
| Primary index | Proprietary (IVF-based) | HNSW | HNSW + quantization | HNSW (hnswlib) | HNSW or IVF |
| Filtered vector search | Good | Good | Excellent | Basic | Good (via SQL) |
| Hybrid search (dense+sparse) | Yes | Yes | Yes | No | Via extensions |
| Multi-vector per object | Yes (namespaces) | Yes | Yes | No | No (one column) |
| Metadata storage | Yes | Yes | Yes (payload) | Yes | Yes (table columns) |
| Horizontal scaling | Yes | Yes | Yes | No | Limited |
| Client languages | Python, JS, Go, Java | Python, JS, Go, Java, others | Python, JS, Rust, Go | Python, JS | Any SQL client |
| gRPC support | No | Yes | Yes | No | No |
| ACID transactions | No | No | No | No | Yes |
Python Code Examples: CRUD Operations
The following examples show a standard pattern — insert vectors, query by similarity, filter by metadata, and delete — in each major database.
Qdrant
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue
)
import numpy as np
client = QdrantClient(url="http://localhost:6333")
# Create collection
client.create_collection(
collection_name="articles",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
# Upsert vectors (batch)
points = [
PointStruct(
id=i,
vector=np.random.rand(1536).tolist(),
payload={"category": "ai-news", "year": 2026, "source": f"doc_{i}"}
)
for i in range(1000)
]
client.upsert(collection_name="articles", points=points)
# Filtered similarity search
query_vector = np.random.rand(1536).tolist()
results = client.search(
collection_name="articles",
query_vector=query_vector,
query_filter=Filter(
must=[FieldCondition(key="category", match=MatchValue(value="ai-news"))]
),
limit=5,
with_payload=True,
)
for r in results:
print(f"ID: {r.id}, Score: {r.score:.4f}, Source: {r.payload['source']}")
# Delete by filter
client.delete(
collection_name="articles",
points_selector=Filter(
must=[FieldCondition(key="year", match=MatchValue(value=2026))]
),
)
Pinecone
from pinecone import Pinecone, ServerlessSpec
import numpy as np
pc = Pinecone(api_key="your-api-key")
# Create serverless index
pc.create_index(
name="articles",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index("articles")
# Upsert vectors
vectors = [
(
f"doc_{i}",
np.random.rand(1536).tolist(),
{"category": "ai-news", "year": 2026}
)
for i in range(1000)
]
index.upsert(vectors=vectors, batch_size=100)
# Filtered similarity search
results = index.query(
vector=np.random.rand(1536).tolist(),
filter={"category": {"$eq": "ai-news"}},
top_k=5,
include_metadata=True,
)
for match in results.matches:
print(f"ID: {match.id}, Score: {match.score:.4f}")
# Delete by ID
index.delete(ids=["doc_0", "doc_1"])
Weaviate
import weaviate
from weaviate.classes.config import Configure, Property, DataType
from weaviate.classes.query import MetadataQuery
import numpy as np
client = weaviate.connect_to_local()
# Create collection (schema)
client.collections.create(
name="Article",
vectorizer_config=Configure.Vectorizer.none(),
properties=[
Property(name="category", data_type=DataType.TEXT),
Property(name="year", data_type=DataType.INT),
Property(name="source", data_type=DataType.TEXT),
],
)
articles = client.collections.get("Article")
# Batch insert
with articles.batch.dynamic() as batch:
for i in range(1000):
batch.add_object(
properties={"category": "ai-news", "year": 2026, "source": f"doc_{i}"},
vector=np.random.rand(1536).tolist(),
)
# Filtered similarity search
results = articles.query.near_vector(
near_vector=np.random.rand(1536).tolist(),
limit=5,
filters=weaviate.classes.query.Filter.by_property("category").equal("ai-news"),
return_metadata=MetadataQuery(score=True),
)
for obj in results.objects:
print(f"UUID: {obj.uuid}, Distance: {obj.metadata.score:.4f}")
client.close()
Cloud vs. Self-Hosting Comparison
| Dimension | Managed Cloud | Self-Hosted |
|---|---|---|
| Setup time | Minutes | Hours–days |
| Operational overhead | Near zero | Significant |
| Scaling | Automatic | Manual or via Kubernetes |
| Cost predictability | Variable (usage-based) | Fixed (instance costs) |
| Data sovereignty | Vendor's cloud region | Full control |
| Customization | Limited | Full |
| SLA | Vendor SLA (99.9%+) | Self-managed |
| Best for | Startups, rapid prototyping, small teams | Large data volumes, cost optimization at scale, compliance requirements |
For most teams building their first AI application, managed cloud wins — the operational overhead of self-hosting a distributed database is not the right place to spend engineering time early in a project.
For teams with regulatory requirements (HIPAA, GDPR with strict data residency), cost constraints at scale, or the engineering capacity to operate infrastructure, self-hosting Qdrant or Weaviate on Kubernetes is the right choice.
Cost Analysis: 10 Million Vectors
Assumptions: 1536-dimensional vectors (OpenAI text-embedding-3-large), approximately 6.1 GB of raw vector data, 100K queries/month, p99 latency requirement of <100ms.
| Database | Option | Estimated Monthly Cost |
|---|---|---|
| Pinecone | Serverless (on-demand) | $70–$120 |
| Pinecone | Standard pod (p1.x1) | $70/month (fixed) |
| Weaviate Cloud | Standard | $145–$220 |
| Qdrant Cloud | Managed | $65–$120 |
| Chroma | Self-hosted on EC2 (r5.large) | $120–$150 (instance + storage) |
| pgvector | Supabase Pro + vector add-on | $100–$180 |
| Qdrant | Self-hosted on EC2 (r6i.xlarge) | $170/month (instance only) |
| Weaviate | Self-hosted on EC2 (r6i.2xlarge) | $350/month (instance only) |
Note: Self-hosted costs do not include engineering time for operations, monitoring, backups, and upgrades — which can be substantial.
The Pinecone serverless and Qdrant Cloud options are cheapest for this profile. As query volume scales above ~1M queries/month, the cost gap between managed and self-hosted narrows significantly.
RAG Pipeline Decision Framework
When selecting a vector database for a RAG application, work through these questions:
1. What is your dataset size?
- Under 100K vectors: Any option works. Use Chroma for development, any managed option for production.
- 100K–10M vectors: All options are viable. Cost and operational considerations dominate.
- Over 10M vectors: Qdrant or Weaviate self-hosted with DiskANN, or Pinecone serverless for cost efficiency.
2. Do you need filtered vector search?
- Basic filters (1–2 conditions): All options handle this.
- Complex multi-condition filters on high-cardinality fields: Qdrant has the most efficient implementation.
3. Do you already run PostgreSQL?
- If yes: Consider pgvector first. Eliminating infrastructure is a real benefit. Only migrate to a dedicated vector DB if you hit performance limits.
4. Do you need hybrid search (keyword + vector)?
- All major options support this in 2026. Weaviate has the most mature implementation.
5. What are your compliance requirements?
- Data must stay on-premises or in a specific region: Self-hosted Qdrant or Weaviate.
- No special requirements: Managed cloud is fine.
6. How important is development experience?
- If your team is building quickly and doesn't want to manage infrastructure: Pinecone serverless or Qdrant Cloud.
- If your team wants local parity with production: Qdrant (Docker image matches cloud API exactly).
Final Recommendations by Use Case
| Use Case | Recommended Option |
|---|---|
| Development and prototyping | Chroma (in-memory) |
| Small production app, no ops team | Pinecone serverless |
| RAG with complex filtered queries | Qdrant |
| Existing PostgreSQL infrastructure | pgvector |
| GraphQL API preference | Weaviate |
| Large-scale, cost-optimized | Qdrant self-hosted |
| Enterprise with compliance requirements | Weaviate or Qdrant self-hosted |
The honest answer in 2026 is that Qdrant has emerged as the strongest all-around option for teams that need a dedicated vector database: it combines Rust performance, excellent filtered search, reasonable cloud pricing, and the ability to self-host with full feature parity. But "best in general" is rarely "best for your situation" — work through the decision framework above before committing.