The Problem: Stateless LLMs Can't Actually Learn
As AI agents move into production, teams run into the same wall: LLMs are stateless. Every API call starts from scratch. The model knows nothing about the user's preferences from last week, the decision the agent made in a previous session, or the feedback the user gave yesterday.
The workarounds most teams reach for don't scale:
- Replaying full conversation histories into the context window — costs explode as histories grow
- Hand-curating fact files — becomes a maintenance nightmare as users and data multiply
- Building custom memory systems — months of engineering work before you can even start on the actual product
Weaviate CEO Bob van Luijt put it plainly: "Memory is the difference between an agent that answers a question and an agent that gets better at its job."
With Engram now GA, Weaviate is offering that memory layer as managed infrastructure — so teams can skip straight to building the product.
How Engram Works: The Memory Pipeline
Engram treats memory as structured, evolving infrastructure — not an ever-growing pile of context. The core is a three-stage async pipeline:
1. Extract — Pull discrete facts from raw text, conversations, or pre-extracted data. ("Lives in Berlin." "Prefers dark mode.")
2. Transform — Reconcile new facts against what's already stored: deduplicate, handle preference changes, update time-evolving facts.
3. Commit — Persist the clean memory state to the Weaviate vector database.
The pipeline is fire-and-forget: applications hand off raw events and keep working while memory builds in the background — no latency added to the critical path.
Once stored, memories are served through Weaviate's hybrid search — combining vector similarity (semantic understanding) with BM25 keyword search — so agents can retrieve relevant context using natural language queries.
Scoping: The Right Memory to the Right Agent
One of Engram's design pillars is isolation by default with sharing when needed. Memory is scoped at multiple levels:
- Project scope: memories stay within a project
- User scope: each user's memories are isolated from other users (required when using the
UserKnowledgetopic) - Custom scope properties: add
conversation_idor any other property to scope memories further
In multi-agent scenarios, agents can also be granted access to a shared memory pool, enabling coordinated handoffs and collaborative workflows.
Ready-Made Templates for Common Use Cases
Rather than forcing teams to understand the full pipeline architecture before getting started, Engram ships with templates for the most common memory patterns:
| Template | What It Does |
|---|---|
| Personalization | Remembers user preferences, past interactions, stated goals across sessions |
| Continual Learning | Lets agents improve from feedback over time, updating what they know |
| Multi-Agent Shared State | Gives multiple agents access to a shared context pool for coordination |
Teams that outgrow templates can drop down to direct pipeline control — customizing individual extraction prompts, reconciliation logic, and commit strategies — without leaving the platform.
A custom memory layer requires choosing an extraction LLM, writing deduplication logic, operating a vector store, tuning retrieval, and handling edge cases like preference changes and conflicting facts. Engram ships all of that as a managed service, backed by the same Weaviate infrastructure that serves over 150 million downloads per month.
A Concrete Example: Adding Long-Term Memory to a Chat App
The quickstart pattern shows how straightforward integration can be:
- After each conversation turn: send messages to Engram via
memories.add()— returns arun_idimmediately - Background pipeline: extracts structured facts like "lives in Berlin," "prefers specialty coffee," "uses dark mode"
- Before the next response: query
memories.search(query=user_input, user_id="alice")to retrieve relevant context - LLM call: inject retrieved memories into the system prompt for a personalized, context-aware response
Stop and restart the process — the agent still knows what it learned about Alice from three sessions ago.
The same pattern scales to multi-agent architectures by using shared group scopes, letting a scheduler agent, executor agent, and reviewer agent all draw from the same organizational memory.
- Weaviate Engram is now GA: managed memory and context service for AI agents, built on open-source Weaviate vector DB
- Three-stage async pipeline: Extract → Transform (dedup + reconcile) → Commit
- Scoping ensures memory isolation per project/user, with optional sharing for multi-agent workflows
- Ready-made templates: Personalization, Continual Learning, Multi-Agent Shared State
- Hybrid retrieval: vector similarity + BM25 keyword search
- Available now in Weaviate Cloud — free tier (1,000 runs/month) and paid plans from $45/month
— Weaviate Blog: Engram is now Generally Available — official announcement
— Engram Product Page — features, pricing, and getting started
— Engram Documentation — REST API, Python SDK, architecture concepts
— Engram Quickstart Tutorial — create a project, store your first memory, search it