A place of
memory for
your agents.
Gradatum is a self-hosted, embedded memory backbone for multi-agent AI systems. It cures memory rot in LLMs and lets Claude, Gemini, Codex and your home-grown agents share what they know — across sessions, across machines, across runs.
« Loci sunt, in quibus imagines collocantur. »
LLMs forget. Agents reinvent the wheel every session. Teams of agents can't share what they learn.
- SaaS lock-in — your memory hosted on someone else's server
- Heavy stacks — Postgres + pgvector + Neo4j just to remember things
- Built for humans, not agents — no ACL, no multi-tenancy, no MCP
- Transient context, not persistent KB — forgets between sessions
One Rust binary. SQLite + Markdown on disk. No PostgreSQL. No Redis. No SaaS. Pluggable LLM (or none). Multi-vault. Hierarchical ACL. Hybrid search — BM25 (SQLite FTS5) + semantic (cosine) + PageRank. The Markdown files are the source of truth — not the index.
The Gradatum approach
| Embedded | One Rust binary. No PostgreSQL. No Redis. No external services. apt install and you're running. |
| Self-hosted | Your memory, your machine. No telemetry. No vendor lock-in. |
| LLM-agnostic | Plug any OpenAI-compatible backend (Ollama, vLLM, llama.cpp, OpenRouter, Anthropic) — or run heuristic-only with no LLM at all. |
| Multi-vault | Separate main from staging and bench-* vaults for testing, migration, A/B prompts. Atomic swap when ready. |
| Hierarchical ACL | Bearer-scoped access to memory loci. Configure from presets (flat, hierarchical, multi-project, team) or write your own. |
| Multi-storage | OpenDAL abstraction — Local filesystem available. S3/R2, Azure, GCS planned (feature flags available; backend implementations pending). NFS explicitly rejected. |
| Markdown truth | Notes are Markdown files with YAML frontmatter. Readable by humans and by cat. The database is an index, not the source of truth. |
| Hybrid search | BM25 (SQLite FTS5) + semantic search (cosine brute-force; ANN (sqlite-vec) planned v0.5.3). PageRank graph + reranker abstraction (no-op by default; cross-encoder ONNX optional). Multi-signal fusion via RRF (Reciprocal Rank Fusion). |
One binary. Four planes. Markdown on disk.
A stateless façade speaks HTTP and MCP to your agents. A worker drains a SQLite-backed queue. One vault per instance is the default — staging and bench-* vaults are first-class for migration and A/B testing.
AI agents · coding assistants · orchestrators
↓ MCP / HTTP / CLI (RFC-0003: :19090)
┌─────────────────────────────────────────┐
│ gradatum-server │ stateless façade
│ /api/v1 /mcp /sse /health /admin │
└────────────────┬────────────────────────┘
↓ async queue · Apalis (SQLite, lease 5min)
┌─────────────────────────────────────────┐
│ gradatum-worker │ curator + maintenance
└────────────────┬────────────────────────┘
↓ DATA PLANE (19 of 28 product crates)
┌─────────────────────────────────────────┐
│ core markdown vault storage index │
│ search queue cache chat curator │
│ embed engine acl-policy acl-auth │
│ auth dto db-sqlite warden gateway │
│ [SQLite FTS5 · cosine · reranker (no-op) │
│ Apalis · OpenDAL · llama.cpp · rmcp] │
└────────────────┬────────────────────────┘
↓ CLIENTS
┌─────────────────────────────────────────┐
│ gradatum-mcp-stub (stdio→HTTP proxy) │
│ gradatum CLI │
│ gradatum-sdk-rs │
│ gradatum (umbrella SDK facade) │
└─────────────────────────────────────────┘ How a note lives.
A note in Gradatum follows two pipelines: one when it's written (ingested + understood + indexed), one when it's searched back. Both run on the same machine, in milliseconds, with zero SaaS round-trip.
Write
an agent submits a note- 1
Authenticate the agent
The agent presents an API key. Gradatum exchanges it for a short-lived JWT used for the rest of the session.
api-key (chmod 600) ·
POST /auth/exchange· JWT 24 h - 2
Accept and queue
The note is queued. The call returns immediately — the agent doesn't wait for indexing.
POST /api/v1/vault_write· 202 Accepted - 3
Curate (or skip)
A local LLM classifies the note into a canonical section. Skipped when the agent already provides one.
Qwen3-4B local · skipped when
section_hintset - 4
Extract the title
If the note starts with a heading, that becomes the title — a stable identifier for cross-references.
H1 markdown extract · stored in
notes.title - 5
Compute meaning (embedding)
The text is turned into a numeric fingerprint of its meaning. Two notes on the same topic land near each other — even with no shared words.
bge-m3 · 1024-dim vector · local inference
- 6
Connect the dots (wikilinks)
Cross-references inside the note become edges in the vault's graph. Backlinks build naturally over time.
B5 post-curate ·
[[Note]]→ graph tablenote_links - 7
Persist everything
Text, vector, full-text index, graph edges — all written in a single atomic transaction.
SQLite WAL · FTS5 · BLOB f32 · ULID keys
- ✓
Ready to be recalled
The note is live and searchable.
end-to-end ~0.5–2 s · dominated by embedding
Search
an agent asks a question- 1
Reuse the auth
The JWT is refreshed automatically when it nears expiry. Most search calls pay no auth cost.
auto-refresh · under 30 % TTL remaining
- 2
Receive the query
The agent sends a natural-language query and optional filters.
POST /api/v1/vault_search· section, tenant, limit, include_downgraded - 3
Search by words
Classic full-text search returns notes containing the query terms. Fast and exact — blind to synonyms.
SQLite FTS5 · BM25 ranking
- 4
Search by meaning
The query is matched against note fingerprints by similarity. Returns notes that mean the same thing — even with no shared words.
bge-m3 query embed · cosine similarity in-process
- 5
Reconcile both rankings
The two ranked lists are merged into one — without needing to calibrate scores between them.
Reciprocal Rank Fusion · k = 60 · stable sort
- 6
Boost by context
Recent notes get a small bump. Notes linked to by many others get another.
composite = rrf × (1 + α × recency) × (1 + β × pagerank) · α=0.2 β=0.1
- 7
Re-rank the top results
A neural model rescores the top candidates with deeper understanding. Off by default — opt-in.
reranker abstraction (no-op by default) · cross-encoder ONNX optional · feature
onnx-reranker - 8
Build readable snippets
A short excerpt around the match is extracted — the agent gets context, not just an ID.
FTS5 native
snippet()· « match » - ✓
Return the top-N
Each result carries id, score, title, snippet, section, tags.
end-to-end ~50–200 ms (no reranker) · ~150–400 ms (with)
Six levels, named with care.
Borrowed from Cicero's ars memoriae: agents place their memories in loci, mental locations of an imagined palace. Agents don't share rooms — they share places of memory.
- Vault
- The technical backing store (SQLite FTS5 + Markdown). Multi-vault first-class — main + staging + bench-*.
- Locus
- A logical subdivision of a vault, isolated by ACL. From Cicero's ars memoriae — the mental location where an image is placed.
- Section
- One of 10 cognitive categories: decisions, architecture, debug, reasoning, feedback, lessons-learned, retrospectives, experiments, agent-issues, reference.
- Note
- Atomic Markdown file with YAML frontmatter. ULID identity. SHA-256 content hash for drift detection.
- Bearer
- An authenticated identity with read/write ACL patterns over loci. Configured via presets or custom bearer.toml.
- Preset
- A template configuration shipped in examples/presets/ — flat, hierarchical, multi-project, team.
One backend today. More planned.
Gradatum uses Apache OpenDAL as a unified storage abstraction layer. Local filesystem is available today; S3, Azure, and GCS support is planned. NFS is explicitly rejected: POSIX lock incompatibility causes data corruption under concurrent writers.
- Local FS primary
fs://Default. NVMe local only — NFS rejected: POSIX lock incompatibility causes data corruption under concurrent writers.
- S3 / R2 planned
s3://AWS S3, Cloudflare R2, MinIO, Backblaze B2. Feature flag available; backend implementation pending.
- Azure Blob planned
azblob://Azure Blob Storage via OpenDAL azblob service. Feature flag available; backend implementation pending.
- GCS planned
gcs://Google Cloud Storage. Service account or ADC auth. Feature flag available; backend implementation pending.
# Per-vault storage configuration
[vaults.main.storage]
backend = "s3"
bucket = "my-gradatum-vault"
region = "us-east-1"
root = "/gradatum/main"
[vaults.staging.storage]
backend = "fs"
root = "~/.gradatum/vaults/staging" # local NVMe for testing
# NFS rejected: POSIX lock incompatibility (data corruption under concurrent writers)
# backend = "nfs" ← will fail at startup with nfs_check.rs guard Roadmap — ten versions, four milestones
v0.1.0 · architecture foundation BRONZE Architecture Foundation
✓ SHIPPED
BRONZE Architecture Foundation
✓ SHIPPEDEstablishes the public architecture foundation — four persistence traits, a warden layer for note integrity, an install wizard for first-time setup, and smoke-test coverage so that early adopters can deploy a working knowledge store with confidence from the first release.
explore shipped features →- Functional core v0.1.0-alpha.0→5 2026-04
- Service mode (HTTP + MCP) v0.1.0-alpha.5 2026-05-07
- Hardening + search foundations v0.1.0-alpha.7→10 2026-05
- Supply chain bumps v0.1.0-alpha.10-bumps.1 2026-05
- Search quality (RRF + reranker) v0.1.0-alpha.11→13 2026-05
- Migrated off the predecessor backend — gradatum primary store v0.1.0-alpha.13 2026-05-25
- Security hardening — JWT validate_nbf v0.1.0-alpha.14 2026-05-28
- Gradatum Skills (reminder + vault-search) gradatum-skills v0.1.0 2026-05-25
- Polish — title resolution, batched lookups, parallel wikilinks v0.1.0-alpha.15 2026-05-28
- Stabilisation — alpha series wrap-up, OSS feedback integration v0.1.x closing 2026-05-29
v0.2.0 · job infrastructure + observability BRONZE Job Infrastructure + Observability
✓ SHIPPED
BRONZE Job Infrastructure + Observability
✓ SHIPPEDLays the foundation for gradatum's background job system and makes it fully observable. This release introduces the job queue layer built on Apalis (https://github.com/geofmureithi/apalis) — a type-safe, SQLite-backed Rust job framework — including the Job enum, JobRecord lifecycle tracking, and per-class worker configuration. On top of that foundation, it ships a Dead-Letter Queue (DLQ) for failed background jobs with configurable retry policies, timeout enforcement, and panic isolation, so jobs that fail definitively are captured rather than silently dropped. A /api/v1/jobs introspection endpoint surfaces job state over HTTP, with a Server-Sent Events (SSE) stream for real-time push updates and Prometheus metrics per job kind, giving operators full visibility into what the system is running and why without polling.
explore shipped features →- F-14 partial: Apalis foundation + forward-compatible type definitions v0.2.0 2026-05-29
- F-15: DLQ + Apalis Monitor multi-worker + Prometheus exporter v0.2.0 2026-05-29
- F-16: /api/v1/jobs API + SSE + Idempotency-Key + admin CLI v0.2.0 2026-05-29
- Tag v0.2.0 — Bronze 2nd milestone OSS public v0.2.0 2026-05-29
v0.3.0 · storage traits + event-log + secrets di BRONZE Storage Traits + Event-Log + Secrets DI
✓ SHIPPED
BRONZE Storage Traits + Event-Log + Secrets DI
✓ SHIPPEDDecomposes the monolithic storage trait into three granular, pluggable interfaces (DocumentStore, IndexStore, VectorStore), ships an append-only event-log table for LLM cost-attribution telemetry, adds an autonomous LLM gateway crate (proxy + reranker), introduces deterministic cognitive-kind tagging for notes (CoALA — Cognitive Architectures for Language Agents — episodic/semantic/procedural/reflective), and fixes a critical JWT signing-key persistence bug that caused every server restart to invalidate all live tokens. Patch releases v0.3.1–0.3.3 harden the multi-worker job queue.
explore shipped features →- Storage trait carve — DocumentStore / IndexStore / VectorStore + AppState dyn-dispatch v0.3.0 2026-06-01
- Event-log sink (table event_log) + gateway cost-attribution (QaEvent +5 fields) v0.3.0 2026-06-01
- gradatum-gateway crate — LLM proxy + reranker v1 (F-08), code-complete v0.3.0 2026-06-01
- F-42: c_kind / doc_kind columns — deterministic CoALA mapping, zero LLM v0.3.0 2026-06-02
- Secrets DI (F-13) + P0 fix: JWT key persisted (boot-stable, load-or-generate) v0.3.0 2026-06-02
- Tag v0.3.0 — 28 crates, 1088 tests PASS, Bronze 3rd milestone v0.3.0 2026-06-02
- v0.3.1–0.3.3 reliability patches — multi-worker job-queue concurrency (BEGIN IMMEDIATE deadlock fix) v0.3.3 2026-06-02
- v0.3.4 — vault_search title:null fix (write-path: title column populated at curate, migration 0009 backfill) v0.3.4 2026-06-03
- v0.3.5 — search read-path: semantic-only hits enriched with title+snippet; legacy title recovery — 1223 tests PASS, live v0.3.5 2026-06-03
- v0.3.6 — First public OSS release: 28 crates published on crates.io, source open on GitHub (Apache-2.0) v0.3.6 2026-06-05
- v0.3.7 — Reliability: search/read/write round-trip fixes (title persistence, vault_read by ULID, wikilink reconciliation) v0.3.7 2026-06-05
Memory layer
v0.4.0 → v0.5.1 — Completes the durable memory store: structured ingest, note history, temporal decay, distillation, MCP-native querying, multi-user isolation, and OAuth-based remote access. The queryable, sovereign memory store that other systems — and eventually gradatum itself — will build on.
v0.4.0 · vault core — durable memory layer Vault Core — Durable Memory Layer
✓ SHIPPED
Vault Core — Durable Memory Layer
✓ SHIPPEDCompletes the core knowledge store: structured ingest with content-aware chunking, copy-on-write note history with optimistic locking for safe concurrent writes, stable wikilink graph traversal, temporal decay scoring and provenance trust so retrieved content carries verifiable lineage, declarative lifecycle rules that keep the vault compact without losing traceability, scheduled distillation that compresses raw notes into reusable knowledge, and pluggable storage backends.
explore shipped features →- F-47 Provenance Trust Score — verifiable note lineage, trust field integrated with search ranking v0.4.0 2026-06-06
- F-39 Stable Wikilinks — redirect_table, ULID anchors, backlink index at write time v0.4.0 2026-06-06
- F-41 Optimistic Locking — write_if_match SHA-256 content hash, 409 Conflict on race v0.4.0 2026-06-06
- F-40 Note History — copy-on-write version trail, history/* endpoints, max_versions cap v0.4.0 2026-06-06
- Tag v0.4.0 — 28 crates, 1178 tests PASS, « Écriture durable » milestone v0.4.0 2026-06-06
- v0.4.1 — Quality & reliability: zero-panic API surface, doc-comments 28 crates, SECURITY.md, revocation wired, MSRV 1.88, first public release (crates.io + GitHub) v0.4.1 2026-06-06
- v0.4.2 — Internal: note_id in vault_write response, vault_downgrade 404, DTO unification, gateway metrics cardinality v0.4.2 2026-06-07
- v0.4.3 — Vault lifecycle: semantic forget (dry-run + decay), note lifecycle state machine, configurable history pruning, multi-vault query scoping, multimodal gateway support, temporal index foundation v0.4.3 2026-06-10
v0.5.2 · static code index + observability SILVER Static Code Index + Observability
✓ SHIPPED
SILVER Static Code Index + Observability
✓ SHIPPEDAdds a static code index built with tree-sitter (Rust, zero LLM) — symbols are derived deterministically from source, stored in a separate code vault, and queryable by symbol name, free-text, or file path with optional body extraction. Drift detection and incremental O(diff) updates keep the index fresh without full rebuilds. Also ships: vault_timeline for chronological note listing (as-of / valid_until), session-log Tier 1 for agent action tracing, corpus_match_count as a proof-of-absence search signal, native TLS termination (rustls, TLS 1.2+/1.3, fail-closed), and vault_write in-place update (optimistic-lock RMW, 409 on conflict).
- vault_write in-place update — optimistic-lock RMW, SHA-256 guard, 409 on conflict v0.5.0 2026-06-12
- session-log Tier 1 — append-only agent action tracing, 90-day retention, PII-safe v0.5.0 2026-06-12
- corpus_match_count — BM25/FTS5 proof-of-absence signal, opt-in, cap 10001 v0.5.1 2026-06-13
- vault_timeline — chronological note listing, as-of / valid_until filtering v0.5.1 2026-06-13
- Static code index — tree-sitter Rust, NoteId::derived_from, migration 0016 v0.5.2 2026-06-13
- code_scope query endpoint — symbol / query / path modes, include_body, drift detection v0.5.2 2026-06-13
- Incremental O(diff) code update — idempotent, <10ms on unchanged files v0.5.2 2026-06-14
- Native TLS termination — [server.tls], rustls, TLS 1.2+/1.3, fail-closed v0.5.2 2026-06-14
- Tag v0.5.2 — 31 crates workspace, source on GitHub (Apache-2.0), 1925 tests PASS v0.5.2 2026-06-15
v0.5.5 · foundation polish SILVER Foundation Polish
planned
SILVER Foundation Polish
plannedCloses the v0.5.x foundation window before the MCP-native pivot: real-time health observability (queue depth with accurate oldest-message age, build SHA in /health for unambiguous version proof), Rust 2024 edition upgrade across the full workspace, and surface-hardening of the knowledge base (backfill, public API hygiene). No new capabilities — the goal is a clean, verifiable baseline to build on.
v0.6.0 · queryable memory store — mcp-native backend SILVER Queryable Memory Store — MCP-Native Backend
planned
SILVER Queryable Memory Store — MCP-Native Backend
plannedTurns the completed vault into a memory store any client can query directly through the Model Context Protocol (MCP) — a native MCP server with Streamable HTTP transport, write-time schema validation with automatic repair, and a vault audit & deduplication pass. This is a deliberate ordering: the memory store becomes a stable, externally consumable product first — usable today by any MCP client (Claude, IDEs, custom agents) — and only then does gradatum grow its own context layer (v0.7.0) on top of the exact same interface it already exposes to everyone else. The store earns its API by serving others before it serves itself.
explore shipped features →Context Assembly + Agent Runtime
v0.7.0+ — The layer that consumes the memory store coherently across sessions: context assembly, sliding-window memory, proactive recall, skill selection, and a declarative user profile. gradatum stops treating queries as stateless and starts reasoning over accumulated knowledge.
v0.7.0 · memory layer + context assembly Memory Layer + Context Assembly
planned
Memory Layer + Context Assembly
plannedAdds the context-assembly layer that turns the vault from a passive store into an active participant: identity rendering, sliding-window memory, proactive recall, a declarative user profile, and skill selection that picks only relevant context before injection. gradatum can now consume its own memory store coherently across sessions — reasoning over accumulated knowledge rather than treating each query as stateless.
v0.8.0 · gradatum-code — sovereign terminal agent gradatum-code — Sovereign Terminal Agent
planned
gradatum-code — Sovereign Terminal Agent
plannedShips the first version of gradatum-code: a terminal agent that reasons over the local codebase using the vault as its memory — symbol lookup, diff-aware context, project history recall, and task execution. gradatum-code runs entirely on local hardware, with no cloud dependency and no external code upload. Phase A covers the agentic core; later phases extend to IDE integration and collaborative workflows.
v1.0.0 · production baseline GOLD Production Baseline
planned
GOLD Production Baseline
plannedThe first production-certified release — the point where gradatum becomes safe to build on. The public trait contracts freeze as stable (semver guarantees you can depend on), the privacy filter runs on a local ONNX (portable inference) path with no external LLM dependency, the system proves 30 days of continuous operation, and the full LongMemEval long-term-memory benchmark runs reproducibly. v1.0.0 adds no new API surface by design: it is a stability and certification milestone, not a feature drop — the moment the contracts stop moving.
v2.0.0 · multimodal + consolidation PLATINUM Multimodal + Consolidation
planned
PLATINUM Multimodal + Consolidation
plannedExtends the platform to multimodal inputs with a breaking-change chat API and long-horizon memory consolidation — completing gradatum's trajectory from local knowledge store to autonomous cognitive infrastructure.
Pre-built binaries, crates.io, or build from source.
# build from source
git clone https://github.com/gradatum/gradatum
cd gradatum && cargo build --release --workspace
# initialize a vault with the hierarchical preset
gradatum-admin init --preset hierarchical \
--root /var/lib/gradatum
# start the server
systemctl --user start gradatum-server # write a note (vault_write)
curl -X POST http://localhost:19090/api/v1/vault_write \
-H "Authorization: Bearer $GRADATUM_BEARER" \
-H "Content-Type: application/json" \
-d '{"locus":"projecta/backend","section":"decisions","body":"Use ULID for stable note identity"}'
# search across loci
curl http://localhost:19090/api/v1/vault_search \
-H "Authorization: Bearer $GRADATUM_BEARER" \
-d '{"query":"ULID identity","locus":"projecta/*"}'
# list vaults
gradatum-admin vault list # gradatum.toml — MCP stub for Claude Code
[[mcpServers]]
name = "gradatum"
command = "gradatum-mcp-stub"
args = ["--server", "http://localhost:19090"]
env = { GRADATUM_BEARER = "your-bearer-token" }
# Or via HTTP directly (API)
curl http://localhost:19090/health
curl http://localhost:19090/api/v1/vault_search \
-d '{"query":"ULID","locus":"projecta/*"}' v0.5.2 — pre-built binaries (server / llm / mcp) on GitHub Releases, source on GitHub (Apache-2.0), names reserved on crates.io (full library at v1.0). API not stable before v1.0. See the Install page for all deployment profiles.