TL;DR: Weaviate's Engram is now generally available in Weaviate Cloud with a free tier (1,000 pipeline runs/month) and paid plans from $45/month. Engram handles memory as infrastructure: async pipelines extract facts from raw events, reconcile them against existing memories, and persist structured state to Weaviate's vector DB, which agents can then query using hybrid semantic + keyword search.

The Problem: Stateless LLMs Can't Actually Learn

As AI agents move into production, teams run into the same wall: LLMs are stateless. Every API call starts from scratch. The model knows nothing about the user's preferences from last week, the decision the agent made in a previous session, or the feedback the user gave yesterday.

The workarounds most teams reach for don't scale:

  • Replaying full conversation histories into the context window — costs explode as histories grow
  • Hand-curating fact files — becomes a maintenance nightmare as users and data multiply
  • Building custom memory systems — months of engineering work before you can even start on the actual product

Weaviate CEO Bob van Luijt put it plainly: "Memory is the difference between an agent that answers a question and an agent that gets better at its job."

With Engram now GA, Weaviate is offering that memory layer as managed infrastructure — so teams can skip straight to building the product.

Free Tier includes 1,000 pipeline runs/month
$45/mo Starting price for paid plans
150M+ Monthly downloads of open-source Weaviate

How Engram Works: The Memory Pipeline

Engram treats memory as structured, evolving infrastructure — not an ever-growing pile of context. The core is a three-stage async pipeline:

⚙️
The Three-Stage Memory Pipeline
1. Extract — Pull discrete facts from raw text, conversations, or pre-extracted data. ("Lives in Berlin." "Prefers dark mode.")
2. Transform — Reconcile new facts against what's already stored: deduplicate, handle preference changes, update time-evolving facts.
3. Commit — Persist the clean memory state to the Weaviate vector database.
The pipeline is fire-and-forget: applications hand off raw events and keep working while memory builds in the background — no latency added to the critical path.

Once stored, memories are served through Weaviate's hybrid search — combining vector similarity (semantic understanding) with BM25 keyword search — so agents can retrieve relevant context using natural language queries.

Scoping: The Right Memory to the Right Agent

One of Engram's design pillars is isolation by default with sharing when needed. Memory is scoped at multiple levels:

  • Project scope: memories stay within a project
  • User scope: each user's memories are isolated from other users (required when using the UserKnowledge topic)
  • Custom scope properties: add conversation_id or any other property to scope memories further

In multi-agent scenarios, agents can also be granted access to a shared memory pool, enabling coordinated handoffs and collaborative workflows.

Ready-Made Templates for Common Use Cases

Rather than forcing teams to understand the full pipeline architecture before getting started, Engram ships with templates for the most common memory patterns:

Template What It Does
Personalization Remembers user preferences, past interactions, stated goals across sessions
Continual Learning Lets agents improve from feedback over time, updating what they know
Multi-Agent Shared State Gives multiple agents access to a shared context pool for coordination

Teams that outgrow templates can drop down to direct pipeline control — customizing individual extraction prompts, reconciliation logic, and commit strategies — without leaving the platform.

🏗️
Engram vs. Building Your Own Memory System
A custom memory layer requires choosing an extraction LLM, writing deduplication logic, operating a vector store, tuning retrieval, and handling edge cases like preference changes and conflicting facts. Engram ships all of that as a managed service, backed by the same Weaviate infrastructure that serves over 150 million downloads per month.

A Concrete Example: Adding Long-Term Memory to a Chat App

The quickstart pattern shows how straightforward integration can be:

  1. After each conversation turn: send messages to Engram via memories.add() — returns a run_id immediately
  2. Background pipeline: extracts structured facts like "lives in Berlin," "prefers specialty coffee," "uses dark mode"
  3. Before the next response: query memories.search(query=user_input, user_id="alice") to retrieve relevant context
  4. LLM call: inject retrieved memories into the system prompt for a personalized, context-aware response

Stop and restart the process — the agent still knows what it learned about Alice from three sessions ago.

The same pattern scales to multi-agent architectures by using shared group scopes, letting a scheduler agent, executor agent, and reviewer agent all draw from the same organizational memory.

Key Takeaways
  • Weaviate Engram is now GA: managed memory and context service for AI agents, built on open-source Weaviate vector DB
  • Three-stage async pipeline: Extract → Transform (dedup + reconcile) → Commit
  • Scoping ensures memory isolation per project/user, with optional sharing for multi-agent workflows
  • Ready-made templates: Personalization, Continual Learning, Multi-Agent Shared State
  • Hybrid retrieval: vector similarity + BM25 keyword search
  • Available now in Weaviate Cloud — free tier (1,000 runs/month) and paid plans from $45/month
🔗
Sources & Official References
Weaviate Blog: Engram is now Generally Available — official announcement
Engram Product Page — features, pricing, and getting started
Engram Documentation — REST API, Python SDK, architecture concepts
Engram Quickstart Tutorial — create a project, store your first memory, search it