Gradatum Vault · cognitive memory backbone for AI agents

A place of
memory for
your agents.

Gradatum is a self-hosted, embedded memory backbone for multi-agent AI systems. It cures memory rot in LLMs and lets Claude, Gemini, Codex and your home-grown agents share what they know — across sessions, across machines, across runs.

« Loci sunt, in quibus imagines collocantur. »

Cicero · de Oratore II — the loci are the places where images are set down.

I · The problem

LLMs forget. Agents reinvent the wheel every session. Teams of agents can't share what they learn.

What's missing today
  • SaaS lock-in — your memory hosted on someone else's server
  • Heavy stacks — Postgres + pgvector + Neo4j just to remember things
  • Built for humans, not agents — no ACL, no multi-tenancy, no MCP
  • Transient context, not persistent KB — forgets between sessions
What Gradatum does

One Rust binary. SQLite + Markdown on disk. No PostgreSQL. No Redis. No SaaS. Pluggable LLM (or none). Multi-vault. Hierarchical ACL. Hybrid search — BM25 (SQLite FTS5) + semantic (cosine) + PageRank. The Markdown files are the source of truth — not the index.


II · Properties

The Gradatum approach

Embedded One Rust binary. No PostgreSQL. No Redis. No external services. apt install and you're running.
Self-hosted Your memory, your machine. No telemetry. No vendor lock-in.
LLM-agnostic Plug any OpenAI-compatible backend (Ollama, vLLM, llama.cpp, OpenRouter, Anthropic) — or run heuristic-only with no LLM at all.
Multi-vault Separate main from staging and bench-* vaults for testing, migration, A/B prompts. Atomic swap when ready.
Hierarchical ACL Bearer-scoped access to memory loci. Configure from presets (flat, hierarchical, multi-project, team) or write your own.
Multi-storage OpenDAL abstraction — Local filesystem available. S3/R2, Azure, GCS planned (feature flags available; backend implementations pending). NFS explicitly rejected.
Markdown truth Notes are Markdown files with YAML frontmatter. Readable by humans and by cat. The database is an index, not the source of truth.
Hybrid search BM25 (SQLite FTS5) + semantic search (cosine brute-force; ANN (sqlite-vec) planned v0.5.3). PageRank graph + reranker abstraction (no-op by default; cross-encoder ONNX optional). Multi-signal fusion via RRF (Reciprocal Rank Fusion).

III · Architecture

One binary. Four planes. Markdown on disk.

A stateless façade speaks HTTP and MCP to your agents. A worker drains a SQLite-backed queue. One vault per instance is the default — staging and bench-* vaults are first-class for migration and A/B testing.

        AI agents · coding assistants · orchestrators
              ↓  MCP / HTTP / CLI  (RFC-0003: :19090)
        ┌─────────────────────────────────────────┐
        │  gradatum-server                        │  stateless façade
        │  /api/v1  /mcp  /sse  /health  /admin  │
        └────────────────┬────────────────────────┘
                         ↓  async queue · Apalis (SQLite, lease 5min)
        ┌─────────────────────────────────────────┐
        │  gradatum-worker                        │  curator + maintenance
        └────────────────┬────────────────────────┘
                         ↓  DATA PLANE (19 of 28 product crates)
        ┌─────────────────────────────────────────┐
        │  core  markdown  vault  storage  index  │
        │  search  queue  cache  chat  curator     │
        │  embed  engine  acl-policy  acl-auth     │
        │  auth  dto  db-sqlite  warden  gateway  │
        │  [SQLite FTS5 · cosine · reranker (no-op)  │
        │   Apalis · OpenDAL · llama.cpp · rmcp]  │
        └────────────────┬────────────────────────┘
                         ↓  CLIENTS
        ┌─────────────────────────────────────────┐
        │  gradatum-mcp-stub  (stdio→HTTP proxy)  │
        │  gradatum CLI                           │
        │  gradatum-sdk-rs                        │
        │  gradatum  (umbrella SDK facade)        │
        └─────────────────────────────────────────┘

Lifecycle of a note

How a note lives.

A note in Gradatum follows two pipelines: one when it's written (ingested + understood + indexed), one when it's searched back. Both run on the same machine, in milliseconds, with zero SaaS round-trip.

Write

an agent submits a note
  1. 1

    Authenticate the agent

    The agent presents an API key. Gradatum exchanges it for a short-lived JWT used for the rest of the session.

    api-key (chmod 600) · POST /auth/exchange · JWT 24 h

  2. 2

    Accept and queue

    The note is queued. The call returns immediately — the agent doesn't wait for indexing.

    POST /api/v1/vault_write · 202 Accepted

  3. 3

    Curate (or skip)

    A local LLM classifies the note into a canonical section. Skipped when the agent already provides one.

    Qwen3-4B local · skipped when section_hint set

  4. 4

    Extract the title

    If the note starts with a heading, that becomes the title — a stable identifier for cross-references.

    H1 markdown extract · stored in notes.title

  5. 5

    Compute meaning (embedding)

    The text is turned into a numeric fingerprint of its meaning. Two notes on the same topic land near each other — even with no shared words.

    bge-m3 · 1024-dim vector · local inference

  6. 6

    Connect the dots (wikilinks)

    Cross-references inside the note become edges in the vault's graph. Backlinks build naturally over time.

    B5 post-curate · [[Note]] → graph table note_links

  7. 7

    Persist everything

    Text, vector, full-text index, graph edges — all written in a single atomic transaction.

    SQLite WAL · FTS5 · BLOB f32 · ULID keys

  8. Ready to be recalled

    The note is live and searchable.

    end-to-end ~0.5–2 s · dominated by embedding

Search

an agent asks a question
  1. 1

    Reuse the auth

    The JWT is refreshed automatically when it nears expiry. Most search calls pay no auth cost.

    auto-refresh · under 30 % TTL remaining

  2. 2

    Receive the query

    The agent sends a natural-language query and optional filters.

    POST /api/v1/vault_search · section, tenant, limit, include_downgraded

  3. 3

    Search by words

    Classic full-text search returns notes containing the query terms. Fast and exact — blind to synonyms.

    SQLite FTS5 · BM25 ranking

  4. 4

    Search by meaning

    The query is matched against note fingerprints by similarity. Returns notes that mean the same thing — even with no shared words.

    bge-m3 query embed · cosine similarity in-process

  5. 5

    Reconcile both rankings

    The two ranked lists are merged into one — without needing to calibrate scores between them.

    Reciprocal Rank Fusion · k = 60 · stable sort

  6. 6

    Boost by context

    Recent notes get a small bump. Notes linked to by many others get another.

    composite = rrf × (1 + α × recency) × (1 + β × pagerank) · α=0.2 β=0.1

  7. 7

    Re-rank the top results

    A neural model rescores the top candidates with deeper understanding. Off by default — opt-in.

    reranker abstraction (no-op by default) · cross-encoder ONNX optional · feature onnx-reranker

  8. 8

    Build readable snippets

    A short excerpt around the match is extracted — the agent gets context, not just an ID.

    FTS5 native snippet() · « match »

  9. Return the top-N

    Each result carries id, score, title, snippet, section, tags.

    end-to-end ~50–200 ms (no reranker) · ~150–400 ms (with)

Write — local, queued, asynchronous · Search — synchronous, milliseconds · Everything runs on one box. No cloud round-trip.

IV · Vocabulary

Six levels, named with care.

Borrowed from Cicero's ars memoriae: agents place their memories in loci, mental locations of an imagined palace. Agents don't share rooms — they share places of memory.

Vault
The technical backing store (SQLite FTS5 + Markdown). Multi-vault first-class — main + staging + bench-*.
Locus
A logical subdivision of a vault, isolated by ACL. From Cicero's ars memoriae — the mental location where an image is placed.
Section
One of 10 cognitive categories: decisions, architecture, debug, reasoning, feedback, lessons-learned, retrospectives, experiments, agent-issues, reference.
Note
Atomic Markdown file with YAML frontmatter. ULID identity. SHA-256 content hash for drift detection.
Bearer
An authenticated identity with read/write ACL patterns over loci. Configured via presets or custom bearer.toml.
Preset
A template configuration shipped in examples/presets/ — flat, hierarchical, multi-project, team.

V · Multi-storage

One backend today. More planned.

Gradatum uses Apache OpenDAL as a unified storage abstraction layer. Local filesystem is available today; S3, Azure, and GCS support is planned. NFS is explicitly rejected: POSIX lock incompatibility causes data corruption under concurrent writers.

  • Local FS primary
    fs://

    Default. NVMe local only — NFS rejected: POSIX lock incompatibility causes data corruption under concurrent writers.

  • S3 / R2 planned
    s3://

    AWS S3, Cloudflare R2, MinIO, Backblaze B2. Feature flag available; backend implementation pending.

  • Azure Blob planned
    azblob://

    Azure Blob Storage via OpenDAL azblob service. Feature flag available; backend implementation pending.

  • GCS planned
    gcs://

    Google Cloud Storage. Service account or ADC auth. Feature flag available; backend implementation pending.

Example — gradatum.toml
# Per-vault storage configuration
[vaults.main.storage]
backend = "s3"
bucket  = "my-gradatum-vault"
region  = "us-east-1"
root    = "/gradatum/main"

[vaults.staging.storage]
backend = "fs"
root    = "~/.gradatum/vaults/staging"  # local NVMe for testing

# NFS rejected: POSIX lock incompatibility (data corruption under concurrent writers)
# backend = "nfs"  ← will fail at startup with nfs_check.rs guard

VI · Status & Roadmap

Roadmap — ten versions, four milestones

v0.5.2 · 31 crates workspace · Apache-2.0 · Rust 2024 · MSRV 1.88+ · GitHub · crates.io (names reserved — full library at v1.0)

v0.1.0 · architecture foundation

BRONZE Architecture Foundation

✓ SHIPPED

Establishes the public architecture foundation — four persistence traits, a warden layer for note integrity, an install wizard for first-time setup, and smoke-test coverage so that early adopters can deploy a working knowledge store with confidence from the first release.

explore shipped features →
  1. Functional core v0.1.0-alpha.0→5 2026-04
  2. Service mode (HTTP + MCP) v0.1.0-alpha.5 2026-05-07
  3. Hardening + search foundations v0.1.0-alpha.7→10 2026-05
  4. Supply chain bumps v0.1.0-alpha.10-bumps.1 2026-05
  5. Search quality (RRF + reranker) v0.1.0-alpha.11→13 2026-05
  6. Migrated off the predecessor backend — gradatum primary store v0.1.0-alpha.13 2026-05-25
  7. Security hardening — JWT validate_nbf v0.1.0-alpha.14 2026-05-28
  8. Gradatum Skills (reminder + vault-search) gradatum-skills v0.1.0 2026-05-25
  9. Polish — title resolution, batched lookups, parallel wikilinks v0.1.0-alpha.15 2026-05-28
  10. Stabilisation — alpha series wrap-up, OSS feedback integration v0.1.x closing 2026-05-29
v0.2.0 · job infrastructure + observability

BRONZE Job Infrastructure + Observability

✓ SHIPPED

Lays the foundation for gradatum's background job system and makes it fully observable. This release introduces the job queue layer built on Apalis (https://github.com/geofmureithi/apalis) — a type-safe, SQLite-backed Rust job framework — including the Job enum, JobRecord lifecycle tracking, and per-class worker configuration. On top of that foundation, it ships a Dead-Letter Queue (DLQ) for failed background jobs with configurable retry policies, timeout enforcement, and panic isolation, so jobs that fail definitively are captured rather than silently dropped. A /api/v1/jobs introspection endpoint surfaces job state over HTTP, with a Server-Sent Events (SSE) stream for real-time push updates and Prometheus metrics per job kind, giving operators full visibility into what the system is running and why without polling.

explore shipped features →
  1. F-14 partial: Apalis foundation + forward-compatible type definitions v0.2.0 2026-05-29
  2. F-15: DLQ + Apalis Monitor multi-worker + Prometheus exporter v0.2.0 2026-05-29
  3. F-16: /api/v1/jobs API + SSE + Idempotency-Key + admin CLI v0.2.0 2026-05-29
  4. Tag v0.2.0 — Bronze 2nd milestone OSS public v0.2.0 2026-05-29

Apalis job foundation · Dead-letter queue · Jobs introspection API + SSE

v0.3.0 · storage traits + event-log + secrets di

BRONZE Storage Traits + Event-Log + Secrets DI

✓ SHIPPED

Decomposes the monolithic storage trait into three granular, pluggable interfaces (DocumentStore, IndexStore, VectorStore), ships an append-only event-log table for LLM cost-attribution telemetry, adds an autonomous LLM gateway crate (proxy + reranker), introduces deterministic cognitive-kind tagging for notes (CoALA — Cognitive Architectures for Language Agents — episodic/semantic/procedural/reflective), and fixes a critical JWT signing-key persistence bug that caused every server restart to invalidate all live tokens. Patch releases v0.3.1–0.3.3 harden the multi-worker job queue.

explore shipped features →
  1. Storage trait carve — DocumentStore / IndexStore / VectorStore + AppState dyn-dispatch v0.3.0 2026-06-01
  2. Event-log sink (table event_log) + gateway cost-attribution (QaEvent +5 fields) v0.3.0 2026-06-01
  3. gradatum-gateway crate — LLM proxy + reranker v1 (F-08), code-complete v0.3.0 2026-06-01
  4. F-42: c_kind / doc_kind columns — deterministic CoALA mapping, zero LLM v0.3.0 2026-06-02
  5. Secrets DI (F-13) + P0 fix: JWT key persisted (boot-stable, load-or-generate) v0.3.0 2026-06-02
  6. Tag v0.3.0 — 28 crates, 1088 tests PASS, Bronze 3rd milestone v0.3.0 2026-06-02
  7. v0.3.1–0.3.3 reliability patches — multi-worker job-queue concurrency (BEGIN IMMEDIATE deadlock fix) v0.3.3 2026-06-02
  8. v0.3.4 — vault_search title:null fix (write-path: title column populated at curate, migration 0009 backfill) v0.3.4 2026-06-03
  9. v0.3.5 — search read-path: semantic-only hits enriched with title+snippet; legacy title recovery — 1223 tests PASS, live v0.3.5 2026-06-03
  10. v0.3.6 — First public OSS release: 28 crates published on crates.io, source open on GitHub (Apache-2.0) v0.3.6 2026-06-05
  11. v0.3.7 — Reliability: search/read/write round-trip fixes (title persistence, vault_read by ULID, wikilink reconciliation) v0.3.7 2026-06-05

Storage trait carve · Event-log sink · LLM gateway · Cognitive kind (F-42) · Secrets DI

Chapter I

Memory layer

v0.4.0 → v0.5.1 — Completes the durable memory store: structured ingest, note history, temporal decay, distillation, MCP-native querying, multi-user isolation, and OAuth-based remote access. The queryable, sovereign memory store that other systems — and eventually gradatum itself — will build on.

v0.4.0 · vault core — durable memory layer

Vault Core — Durable Memory Layer

✓ SHIPPED

Completes the core knowledge store: structured ingest with content-aware chunking, copy-on-write note history with optimistic locking for safe concurrent writes, stable wikilink graph traversal, temporal decay scoring and provenance trust so retrieved content carries verifiable lineage, declarative lifecycle rules that keep the vault compact without losing traceability, scheduled distillation that compresses raw notes into reusable knowledge, and pluggable storage backends.

explore shipped features →
  1. F-47 Provenance Trust Score — verifiable note lineage, trust field integrated with search ranking v0.4.0 2026-06-06
  2. F-39 Stable Wikilinks — redirect_table, ULID anchors, backlink index at write time v0.4.0 2026-06-06
  3. F-41 Optimistic Locking — write_if_match SHA-256 content hash, 409 Conflict on race v0.4.0 2026-06-06
  4. F-40 Note History — copy-on-write version trail, history/* endpoints, max_versions cap v0.4.0 2026-06-06
  5. Tag v0.4.0 — 28 crates, 1178 tests PASS, « Écriture durable » milestone v0.4.0 2026-06-06
  6. v0.4.1 — Quality & reliability: zero-panic API surface, doc-comments 28 crates, SECURITY.md, revocation wired, MSRV 1.88, first public release (crates.io + GitHub) v0.4.1 2026-06-06
  7. v0.4.2 — Internal: note_id in vault_write response, vault_downgrade 404, DTO unification, gateway metrics cardinality v0.4.2 2026-06-07
  8. v0.4.3 — Vault lifecycle: semantic forget (dry-run + decay), note lifecycle state machine, configurable history pruning, multi-vault query scoping, multimodal gateway support, temporal index foundation v0.4.3 2026-06-10

Structured ingest · Note history + locking · Temporal decay + provenance · Lifecycle + compaction · Scheduled distillation · Pluggable backends

v0.5.2 · static code index + observability

SILVER Static Code Index + Observability

✓ SHIPPED

Adds a static code index built with tree-sitter (Rust, zero LLM) — symbols are derived deterministically from source, stored in a separate code vault, and queryable by symbol name, free-text, or file path with optional body extraction. Drift detection and incremental O(diff) updates keep the index fresh without full rebuilds. Also ships: vault_timeline for chronological note listing (as-of / valid_until), session-log Tier 1 for agent action tracing, corpus_match_count as a proof-of-absence search signal, native TLS termination (rustls, TLS 1.2+/1.3, fail-closed), and vault_write in-place update (optimistic-lock RMW, 409 on conflict).

  1. vault_write in-place update — optimistic-lock RMW, SHA-256 guard, 409 on conflict v0.5.0 2026-06-12
  2. session-log Tier 1 — append-only agent action tracing, 90-day retention, PII-safe v0.5.0 2026-06-12
  3. corpus_match_count — BM25/FTS5 proof-of-absence signal, opt-in, cap 10001 v0.5.1 2026-06-13
  4. vault_timeline — chronological note listing, as-of / valid_until filtering v0.5.1 2026-06-13
  5. Static code index — tree-sitter Rust, NoteId::derived_from, migration 0016 v0.5.2 2026-06-13
  6. code_scope query endpoint — symbol / query / path modes, include_body, drift detection v0.5.2 2026-06-13
  7. Incremental O(diff) code update — idempotent, <10ms on unchanged files v0.5.2 2026-06-14
  8. Native TLS termination — [server.tls], rustls, TLS 1.2+/1.3, fail-closed v0.5.2 2026-06-14
  9. Tag v0.5.2 — 31 crates workspace, source on GitHub (Apache-2.0), 1925 tests PASS v0.5.2 2026-06-15

Static code index (tree-sitter Rust, zero LLM) · code_scope query (symbol / query / path, include_body) · Drift detection + incremental O(diff) update · vault_timeline (chronological listing, as-of / valid_until) · session-log Tier 1 (agent action tracing) · corpus_match_count (proof-of-absence signal) · Native TLS (rustls, TLS 1.2+/1.3, fail-closed) · vault_write in-place update (optimistic-lock RMW)

v0.5.5 · foundation polish

SILVER Foundation Polish

planned

Closes the v0.5.x foundation window before the MCP-native pivot: real-time health observability (queue depth with accurate oldest-message age, build SHA in /health for unambiguous version proof), Rust 2024 edition upgrade across the full workspace, and surface-hardening of the knowledge base (backfill, public API hygiene). No new capabilities — the goal is a clean, verifiable baseline to build on.

Health observability (/health) · Rust 2024 edition · Knowledge-base surface hardening

v0.6.0 · queryable memory store — mcp-native backend

SILVER Queryable Memory Store — MCP-Native Backend

planned

Turns the completed vault into a memory store any client can query directly through the Model Context Protocol (MCP) — a native MCP server with Streamable HTTP transport, write-time schema validation with automatic repair, and a vault audit & deduplication pass. This is a deliberate ordering: the memory store becomes a stable, externally consumable product first — usable today by any MCP client (Claude, IDEs, custom agents) — and only then does gradatum grow its own context layer (v0.7.0) on top of the exact same interface it already exposes to everyone else. The store earns its API by serving others before it serves itself.

explore shipped features →

Native MCP server · Streamable HTTP transport (MCP spec 2025-11) · Write-time schema validation + auto-repair · Vault audit & deduplication

Chapter II

Context Assembly + Agent Runtime

v0.7.0+ — The layer that consumes the memory store coherently across sessions: context assembly, sliding-window memory, proactive recall, skill selection, and a declarative user profile. gradatum stops treating queries as stateless and starts reasoning over accumulated knowledge.

v0.7.0 · memory layer + context assembly

Memory Layer + Context Assembly

planned

Adds the context-assembly layer that turns the vault from a passive store into an active participant: identity rendering, sliding-window memory, proactive recall, a declarative user profile, and skill selection that picks only relevant context before injection. gradatum can now consume its own memory store coherently across sessions — reasoning over accumulated knowledge rather than treating each query as stateless.

Context builder · Sliding-window memory · Proactive recall · User profile (declarative) · Skill selection

v0.8.0 · gradatum-code — sovereign terminal agent

gradatum-code — Sovereign Terminal Agent

planned

Ships the first version of gradatum-code: a terminal agent that reasons over the local codebase using the vault as its memory — symbol lookup, diff-aware context, project history recall, and task execution. gradatum-code runs entirely on local hardware, with no cloud dependency and no external code upload. Phase A covers the agentic core; later phases extend to IDE integration and collaborative workflows.

gradatum-code agentic core · Symbol + diff-aware context · Project history recall · Local-only execution (zero cloud upload) · Task execution loop

v1.0.0 · production baseline

GOLD Production Baseline

planned

The first production-certified release — the point where gradatum becomes safe to build on. The public trait contracts freeze as stable (semver guarantees you can depend on), the privacy filter runs on a local ONNX (portable inference) path with no external LLM dependency, the system proves 30 days of continuous operation, and the full LongMemEval long-term-memory benchmark runs reproducibly. v1.0.0 adds no new API surface by design: it is a stability and certification milestone, not a feature drop — the moment the contracts stop moving.

Stable API — semver guarantees · Local ONNX privacy filter · 30-day production proof · Full LongMemEval reproducible · Multi-user + OAuth remote access

v2.0.0 · multimodal + consolidation

PLATINUM Multimodal + Consolidation

planned

Extends the platform to multimodal inputs with a breaking-change chat API and long-horizon memory consolidation — completing gradatum's trajectory from local knowledge store to autonomous cognitive infrastructure.

Multimodal chat (BREAKING) · Memory consolidation


VIII · Quick start

Pre-built binaries, crates.io, or build from source.

Clone & build (v0.5.2 · API not stable before v1.0)
# build from source
git clone https://github.com/gradatum/gradatum
cd gradatum && cargo build --release --workspace

# initialize a vault with the hierarchical preset
gradatum-admin init --preset hierarchical \
  --root /var/lib/gradatum

# start the server
systemctl --user start gradatum-server
Write & search (REST API)
# write a note (vault_write)
curl -X POST http://localhost:19090/api/v1/vault_write \
  -H "Authorization: Bearer $GRADATUM_BEARER" \
  -H "Content-Type: application/json" \
  -d '{"locus":"projecta/backend","section":"decisions","body":"Use ULID for stable note identity"}'

# search across loci
curl http://localhost:19090/api/v1/vault_search \
  -H "Authorization: Bearer $GRADATUM_BEARER" \
  -d '{"query":"ULID identity","locus":"projecta/*"}'

# list vaults
gradatum-admin vault list
MCP integration (MCP stub)
# gradatum.toml — MCP stub for Claude Code
[[mcpServers]]
name    = "gradatum"
command = "gradatum-mcp-stub"
args    = ["--server", "http://localhost:19090"]
env     = { GRADATUM_BEARER = "your-bearer-token" }

# Or via HTTP directly (API)
curl http://localhost:19090/health
curl http://localhost:19090/api/v1/vault_search \
  -d '{"query":"ULID","locus":"projecta/*"}'

v0.5.2 — pre-built binaries (server / llm / mcp) on GitHub Releases, source on GitHub (Apache-2.0), names reserved on crates.io (full library at v1.0). API not stable before v1.0. See the Install page for all deployment profiles.