Agent memory architecture: how Eluu builds AI colleagues that actually remember

Most AI agents lose every fact about you the moment a conversation ends. You introduce yourself on Monday, explain the project, share a doc. On Tuesday you’re starting over. Nothing compounds.

We’ve spent the last year solving this for AI colleagues at Eluu. This post introduces the three-tier memory architecture we shipped — conversational, shared, and private — the data model behind it, and the production failures (scope leakage, confidence drift) that reshaped it.

TL;DR

A vector database is not a memory. It’s a filing cabinet. Memory is a live mental model that updates, persists, and stays consistent across conversations.
Eluu colleagues use three memory tiers: conversational (the current session), shared (facts the whole team holds), and private (per-colleague preferences). Each has a different scope, write rule, and lifetime.
The hard problems aren’t storage — they are what to write, what to forget, and how to resolve conflicts between colleagues who remember different versions of the same fact.
Two production failures forced redesigns: scope leakage (private memories surfacing in shared contexts) and confidence drift (stable facts losing certainty as the retrieval model improved).
We still don’t know how to do clean memory transfer between colleagues, deletion that survives re-derivation, or offboarding.

Why agent memory is different from RAG

When people say “memory” for an AI, they usually mean retrieval: a vector database in the corner of a RAG pipeline that lets the model cite yesterday’s email. That’s a useful component. It is not a memory. It’s a filing cabinet.

A colleague’s memory is doing something closer to maintaining a live mental model. When Lisa, our AI sales colleague, answers a question about a prospect, she is not re-reading every email from scratch. She has a stable belief about who that prospect is, what they care about, and what stage of the pipeline they’re in. That belief updates when new information arrives, persists when it doesn’t, and stays consistent across conversations.

That’s much harder than retrieval. A filing cabinet can be wrong in chunks — a document is out of date, a folder is mislabeled. A mental model is wrong in aggregate: every subsequent decision is filtered through the flawed belief, compounding the error. Getting memory right is not a vector store problem. It’s a problem of deciding what counts as a fact, what overrides what, what to forget, and when to ask for confirmation.

The three-tier memory architecture

Eluu colleagues have three kinds of memory. They behave very differently.

1. Conversational memory — what was just said

This is the easy one. Everyone has it. A context window is a very short-term memory, and most AI products start and end there. Useful, ephemeral, and deeply insufficient on its own.

2. Shared memory — facts the whole team holds

When Lisa learns that your top customer renews on the 15th, Mark should know too — without anyone asking. Shared memory is what makes an Eluu workspace feel like a team, not a pile of disconnected bots. It is not a database lookup; it is a belief every colleague holds.

3. Private memory — what each colleague learned alone

Ruby should know which dashboards you like without having to ask every time. Lisa should remember which tone you prefer for customer replies. This is how preference becomes personality.

We model all three under a single record type, with a kind discriminator and a scope that narrows visibility:

type Memory = {
  id: string;
  kind: 'conversation' | 'shared' | 'private';
  content: string;
  writtenAt: Date;
  writtenBy: ColleagueId;
  scope: ScopeId;       // workspace | team | colleague | conversation
  confidence: number;   // 0..1, decays with age + contradictions
  sources: SourceRef[];
};

Reads are cheap; writes are not. A write happens only when a colleague believes the new information is reliable and either adds something the memory did not know, or contradicts something it did. Otherwise we discard silently — a log entry, not a belief.

The hardest problems aren’t storage

Storage is solved. The hard problems, in roughly the order we hit them, are writing, forgetting, and conflict.

Writing — most of what an agent observes is noise

A customer says “let me think about it.” Is that a fact about their decision-making process worth remembering, or conversational filler? If you write too eagerly, memory fills up with opinions masquerading as observations and a week later the colleague is confidently wrong about everything. If you write too conservatively, the colleague forgets the things it was asked to remember.

We settled on a rule we call “state it to save it”: a memory is written only if the colleague can articulate, in a sentence, why it’s worth keeping. That sentence becomes part of the stored record — and a much better retrieval target than the raw observation.

Forgetting — a filing cabinet does not shed old files

So we run a background job that scores every memory on three axes — age, access frequency, and contradiction count — and marks the bottom of the distribution for soft deletion. A soft-deleted memory is not retrieved by default but can be surfaced if someone explicitly asks “what did you know about X?”

This mirrors human memory decay: you forget, but you can often still be reminded.

Conflict — two colleagues, two versions of the same fact

This is the real monster. Neither colleague is obviously wrong. Which one wins?

Our current answer is unsatisfying but works:

We record the conflict.
We surface it to the workspace owner.
We let the human resolve it.
The colleague that asserted the newer fact gets the benefit of the doubt until resolution.
We keep history so we can trace when a belief changed and who caused it.

Someday we want colleagues to resolve many of these conflicts among themselves. Today we don’t trust them to.

The cheap version of memory is a giant log. The expensive version is a shared understanding. We are still on the long climb from the first to the second.

What changed after launch: two production failures

Two things surprised us in production hard enough to ship redesigns.

Scope leakage was a security bug, not a UX bug

Private memories were accidentally being surfaced in shared contexts because our retrieval layer joined scopes too permissively. A customer saw a colleague recall a fact only one employee had shared with it privately. Embarrassing — and a good reminder that memory scoping is a security property, not a UX nicety. We rewrote retrieval to fail closed: if the scope can’t be proven, the memory is not returned.

Confidence drift made stable colleagues sound less sure

Memories that had been stable for months started losing confidence as the retrieval model got marginally better at finding edge-case contradictions. A colleague that used to confidently say “your top account is Acme” would now hedge: “based on recent activity, your top account might be Acme.”

Users read hedging as regression. We now freeze confidence for memories that have been stable and unchallenged for N days — model improvements should not surface as the colleague suddenly becoming less sure of old facts.

This is part of a broader pattern we wrote about in how we run truly parallel agents: the model’s output is data, not signal. You cannot trust it to monitor itself.

What we still don’t know

Three open problems we’re sitting with:

Cross-colleague memory transfer. If you hire a new colleague mid-project, we don’t have a great way to bring them up to speed beyond reading the shared memory. A human onboarding at week three absorbs enormous ambient context. Our colleagues can’t yet.
Deletion that survives re-derivation. If a user tells a colleague to forget a fact, the retrieval layer might re-derive it from raw sources on the next call. Real forgetting requires marking not just the memory but the underlying evidence — and that gets gnarly with shared data.
Offboarding. When someone on a team leaves, which of their private preferences should be purged? Which were really preferences about the workflow, not the person? We currently ask the workspace owner to decide per-category. We don’t love this.

FAQ

What is agent memory? Agent memory is a persistent, scoped representation of facts and preferences an AI agent maintains across sessions. Unlike RAG retrieval, agent memory survives between conversations, updates as new information arrives, and stays consistent when queried multiple times.

How is agent memory different from RAG? RAG retrieves from a static corpus per query. Agent memory holds beliefs that persist across queries, are written deliberately rather than passively indexed, and are scoped to specific agents, teams, or workspaces.

What are the three types of memory in Eluu? Conversational (current session), shared (visible to the whole team), and private (per-colleague preferences). Each tier has its own write rule, retention policy, and visibility scope.

How do you prevent AI agents from remembering the wrong things? Eluu uses a “state it to save it” rule: a memory is only persisted if the colleague can articulate why it’s worth keeping. That justification is stored alongside the memory and used as the primary retrieval target.

How do you handle conflicting memories between agents? Conflicts are recorded, surfaced to the workspace owner, and resolved by a human. The colleague asserting the newer fact gets the benefit of the doubt until resolution. Full version history is kept.

We’ll write more about each of these as we learn. If you are working on something similar, we would love to compare notes — reach out at eluu.ai.