Privacy. Trust. Understanding. Judgment. Action: each step depends on the one before it. This is the charter: what Mandaire commits to and why that sequence is the only one that produces an AI worth trusting with your life and your work. Two products implement it: mandaire.app (for your life) and mandaire.dev (for your product). Same charter. Same architectural commitments. If anything on either surface conflicts with what is written here, this document is right.
Before you read the full architecture
The architecture below is detailed and honest. Before you go deep: here is the short version of what you can verify right now, and what is on the roadmap. The verification matrix later on this page has the full per-claim breakdown with evidence, enforcement, and remaining gaps.
Running today
On the roadmap
The honest framing: the thesis is stronger than the evidence program right now, by design, because thesis is faster to revise than infrastructure. The "LIVE / AUDIT PENDING" items in the verification matrix hold architecturally today; they just haven't been independently confirmed yet. We name them separately from the roadmap items because that distinction matters. If you are deciding whether to connect Gmail and iMessage, the running-today list is what holds.
The thesis
The AI industry has been building the sequence backwards. Start with capability: make the AI as useful as possible, as fast as possible. Add safety guardrails. Layer trust commitments on top when regulators or users demand them. Privacy at the end, if at all, as a filter over outputs already produced from inputs the user could not safely give.
This is the top-down sequence. It produces AI that is impressive in demos and compromised in practice. You self-censor. You give it the version of your situation you are comfortable handing to a system whose trust architecture you cannot verify. The AI works with incomplete inputs and produces outputs worse than they could be, on exactly the topics that matter most.
The evidence is accumulating. In early 2026, a Cursor AI agent wiped a production database. The developer's own account: "I violated every principle." The agent had the capability and the access. It had no substrate of judgment about what "this database is the company" means for that specific operator. 65 percent of organizations now report security incidents caused by AI agents (Kiteworks, 2026). This is not a product quality problem. It is the top-down sequence producing its expected result: action without the judgment substrate to make action safe.
What the AI industry calls "memory" is mostly a short list of facts: preferences, past choices, things you mentioned. Useful. Not sufficient. Knowing someone requires something closer to what a long-time partner, doctor, or trusted advisor holds: the context behind the decisions, the pattern across years, the judgment record that says not just what you chose but why, and what you have already refused.
That kind of knowing requires a knowledge graph, not a memory. Three components, each solving a structural failure LLM memory cannot fix at any scale.
Inputs. 360-degree ingestion from every source the person uses (email, messages, calendar, photos, AI conversations), plus write-back from AI sessions so the graph compounds with use. LLM memory accumulates references. The graph ingests from the source of truth and keeps it current. A million-token context window cannot ingest sources you have not pasted in, and cannot capture what your AI concluded last Tuesday unless there is a write-back path.
Intelligence. Statistical entity and relationship resolution across every source, plus calibrated inference with provenance and decay. LLM memory cannot resolve "Alex," "A. Rivera," and the person in six iMessage threads as the same individual without being told; being told once does not propagate across every future session. When a name collides with a public figure, stateless retrieval returns the public record by default. It has no basis to prefer the person in your inbox. A persistent graph built from your corpus resolves identity from your sources first. The personal entity and the public figure are distinct nodes, each with their own provenance, relationship weight, and communication history. It also weights a single mention the same as a three-year thread. LLM memory is a derived projection: weights, not rows. It cannot be queried "give me all claims about this person with confidence below 0.7 and no cited evidence"; that query is the calibration loop, and without it you cannot measure your own false-positive rate. Mandaire's intelligence layer resolves entities cross-source, scores every inference by the evidence behind it, tracks when a belief was formed and when it went stale, and maintains a supersession chain so corrections propagate forward, a queryable record of every place the user overrode the AI (what Inference calls the GT-A signal: the only honest way to measure LLM error rate per classifier). The inference coverage baseline as of May 2026: 100% evidence coverage, 99.4% confidence coverage across the claim store. LLM memory cannot report a coverage number because there is no row to count. That number is the differentiator. The graph on the first user's data has 59,645 entities and 2,960,359 annotated relationship edges. Correctness at that scale requires rules, not prompts: 78 recent merge candidates were rejected by a structural determinism guard that caught an integer-ID rotation error that a language model, shown the same candidates, would have approved as plausible name matches.
Outputs. Deterministic, rules-gated retrieval via a structured query interface. LLM memory cannot enforce "do not pass this topic to that recipient in this context": that requires a rules engine running upstream of any model. An LLM can be instructed not to share sensitive data with a given recipient, but it can be prompted around, it will drift between sessions, and it has no structural invariant that fails-closed if violated. There is also a subtler problem: LLM memory responds differently to "what do you know about X's health?" depending on whether it holds anything. A sophisticated recipient can infer sensitive information from a hesitant or evasive non-answer (the Milgrom (1981) unraveling theorem applied to disclosure). Mandaire's disclosure engine compiles claims into a deterministic rule hierarchy and evaluates per-(recipient, topic, context, surface) at the corpus access boundary, before any LLM sees the query. Holds and genuine unknowns return byte-equal responses: identical content, identical length, identical latency floor, no discriminating audit fields. An LLM structurally cannot hold this invariant because its output is a function of what it knows. LLM memory answers what it knows. Judgment enforces what you have decided it may say. Every decision is logged. These are not improvements on LLM memory. They are a different architecture. The failure is structural, not dimensional.
That substrate requires all three components to compound: every AI write-back adds to what the graph knows, every correction propagates as a future rule, every resolved entity makes the next synthesis more accurate. Building it requires trust (not trust as a soft feeling, but trust as a structural property): the user must be able to give the system everything it needs to know without filtering for what feels safe to hand to a system with an opaque business model. And trust, in this sense, is only possible if privacy is solved first, at the architecture level, not the policy level.
The only order that works:
In December 2025, Meta acquired Limitless, one of the more trusted personal AI products on the market. Within weeks: HIPAA-relevant use cases dropped. The minimum age fell from 18 to 13. EU and UK users lost access entirely. Not because Meta is malicious. Because Meta's business model, its regulatory posture, and its user demographic are different from Limitless's, and an acquisition cannot override that.
The Limitless users who had trusted that product with the sensitive context a personal AI requires had no architectural recourse. The trust they placed was real. The architecture did not preserve it. Track record cannot survive an acquisition. Architecture can make trust durable, if it is built that way before the acquisition happens.
This is the cost of building the sequence backwards: trust without privacy at the foundation is trust that cannot survive a business-model change, an acquisition, a regulatory shift, or a quiet policy update. The architecture that makes trust durable is the one that makes it structurally impossible to violate, not the one that promises not to.
At Google I/O on May 19, 2026, Google launched Gemini Spark: a 24/7 personal AI agent with access to Gmail, Calendar, Docs, Drive, Photos, WhatsApp, Spotify, GitHub, and Tasks. A genuine attempt at what the industry has been promising. The integration is real. The limitation is also real: every source in that list is a Google-ecosystem or Google-partnership source. The architecture is provider-locked by design. Gemini Spark is the strongest version of the top-down argument, and it still does not include iMessage, Apple Notes, Anthropic conversations, or twenty years of correspondence that predates Google's retention window.
WWDC 2026 confirmed the thesis Apple did not intend to prove. Apple shipped an Extensions Framework that makes the AI brain itself swappable: Gemini as default, Claude and ChatGPT as user choices, one active at a time. Apple spent a billion dollars a year licensing a rival's model because building intelligence is not the durable problem. The brain is now a commodity input.
What Apple did not build, and structurally is not building, is a memory layer, an entity graph, or a disclosure engine. Siri's brain can be Gemini or Claude. Neither brain knows your 20-year correspondence history, your resolved identity graph across every messaging app, or what the right version of last Tuesday's meeting summary is for your manager versus your spouse. The brain is swappable. Your context is not.
Siri can act on your phone; it cannot know your life. The corpus does not fit, the sources do not cooperate, and the joining never happens on a device. Mandaire is where the joining happens, and the disclosure engine decides what any assistant, Siri included, is allowed to see.
Mandaire is the substrate any brain calls. The entity graph that spans Apple, Google, Anthropic, and Microsoft. The disclosure engine that runs before any query, regardless of which brain is asking, to enforce per-recipient rules. Apple made the brain a choice. Mandaire is what any brain you choose needs to be useful.
No platform owner can occupy this position. Apple cannot ingest Gmail, Google Workspace, or Meta's social graph: access agreements, platform policy, and DMA enforcement in Europe prevent it. Google cannot reach iMessage or WhatsApp: Apple and Meta's platform controls and competitive incentives block it. Meta cannot touch either rival. Every platform owner's incentive is to deepen their own silo, not to bridge a competitor's. The cross-vendor fragmentation is not a bug in the landscape; it is a structural property that antitrust dynamics actively maintain. Only a user-authorized neutral third party can span all silos simultaneously. The harder the platforms compete, the more the joining problem belongs to a neutral layer.
The regulatory direction is pointing the same way. In May 2026, the FTC reached settlements with multiple marketing firms for deploying AI that claimed to listen to users' private conversations and target advertisements accordingly -- the first concrete enforcement action establishing legal consequences for the surveillance-first approach. Colorado and European regulators have both passed AI accountability frameworks taking effect in 2026, requiring organizations to demonstrate meaningful transparency and user control over AI systems that process personal data. For organizations navigating those requirements, Mandaire's architecture is compliant by design: disclosure rules are deterministic and auditable, enforcement happens at the corpus boundary before any AI sees the query, and every decision is logged. The architecture that makes trust genuine is also the architecture that passes the compliance test being written around it.
Sam Altman has said the age of AI agents working autonomously is already here. Dario Amodei has described AI systems that will have the judgment of a senior employee within the decade. Tobi Lutke has asked what it means to run a company where every employee has a brilliant friend who happens to know everything about your business. Pieter Levels runs a company of one and attributes much of it to AI. Midjourney was built by a team of eleven.
Each of these is a version of the same argument: the binding constraint on what one person can accomplish is shifting, and AI is the reason. But none of those accounts names the precondition: the AI that gives you judgment on the things that matter has to know the things that matter. The things that matter are the things that require trust. Trust requires privacy at the foundation. You cannot have the Altman/Amodei/Lutke result without the sequence. That is the product argument in one sentence.
The deeper implication is one the industry circles without stating directly: once the substrate is real, the case for adding headcount collapses for most of what knowledge workers currently delegate. Not because tasks vanish. Because a fleet of agents grounded in your actual history, your actual judgment record, and your actual preferences handles what previously required other people to hold context on your behalf. The principal still decides. The principal acts in the legal-accountability moments where a signature is required. The operational infrastructure around those decisions runs on substrate instead of staff.
Mandaire is the hippocampus. The AI is the cortex. Every session, the cortex reads from the hippocampus and writes back what it learned. The cortex is swappable: one cloud model today, another tomorrow, a local model after that. The hippocampus persists across all of them. Context that lives inside a single model disappears when you switch. Context held in a substrate every model reads from is yours permanently, and it compounds.
The reason this is not how most people work today is not lack of ambition. The AI in their hands does not actually know them. It does not remember across sessions. Its judgment is generic, not calibrated to their specific decisions and prior refusals. It makes things up. The binding constraint is not the AI's capability in the abstract. It is the absence of substrate. We are building the substrate. Once it is real, the Altman/Amodei/Lutke/Levels argument applies with its full force.
The frame
The cost of AI execution is falling fast. Drafting, summarizing, classifying, researching, generating, translating: most of what used to require human hours now requires seconds and pennies. Execution quality, agentic recovery, and codebase reasoning are not yet commoditized, but the direction is clear, and the gap between the best provider and the cheapest narrows every quarter.
As execution gets cheaper, the binding constraint shifts. The useful AI is the one with your full context. Your full context is what you cannot, and should not, hand to a system you don't trust. So the binding constraint is trust: the architecture that lets you safely give an AI the context it needs to be genuinely useful, with disclosure controlled by deterministic code that no business model can quietly compromise. Without that trust layer, you self-censor. With self-censorship, the AI has incomplete inputs. With incomplete inputs, the outputs are worse than they could be, exactly on the topics that matter most.
Mandaire is the trust layer that makes the context available and the judgment possible. Trust is upstream of context, which is upstream of judgment. The architecture below documents how trust is delivered as code, not policy: persistent encrypted store, deterministic disclosure engine, central reasoner (in development), BYO renderer, open-protocol exit ramp. The work the layer does for your life is mandaire.app. The work the layer does for your product is mandaire.dev. Same architectural primitives. Different binding constraint per surface (named in §The two surfaces below).
Nine principles
These are the nine principles every architectural decision in Mandaire rests on. They are stated as commitments, not aspirations. Each one is intended to be observable in the running system. If a principle is asserted here and not visible in the architecture, the architecture is wrong and the principle stays.
The mediation mechanism
The claim "we mediate disclosure" or "we refuse the wrong action" is rhetoric without a mechanism. Below is the mechanism, split honestly into the parts that are structural (code-level guarantees, replicable only by rebuilding the architecture) and the parts that are operational discipline (intent-brief practice, replicable by any team with a careful system prompt). The structural parts are the moat. The operational parts are the practice. We refuse to blend them under one heading because that would be evasion.
These three are the moat. Each is enforced by deterministic code. None is replicable by adding a careful prompt to an existing AI builder; replication requires rebuilding the architecture upstream of the LLM.
A deterministic disclosure gate runs upstream of every external query, at the corpus access boundary. The gate is fail-closed: other users and external AI callers receive a pre-disclosure-filtered envelope and never see context they are not authorized to see. Blocked categories are blocked today. Per-audience disclosure intelligence (what to say to whom, by relationship and context) is live; the adaptive inference rules that calibrate per-topic policy are in shadow evaluation, generating decisions that are logged and reviewed before enforcement promotes them. Your own renderer (the AI you connect in operator mode) sees your own synthesized context, which means your chosen LLM provider does too; field-level mediation of even your own renderer's view is part of the v0.1 policy graph. The structural primitive a competitor cannot replicate by adding a prompt is this position upstream of inference; the policy depth is what that position makes possible.
Sends, calendar events with external attendees, schema changes that constrain future reporting, pricing-page wording, auth shortcuts, normalization rules that silently change a financial series, money-moving actions, production deploys, database migrations, public posts: all are typed actions in the system's action taxonomy. When the calling AI (your Claude, ChatGPT, or operator model) proposes executing one of these through an action connector, it requests an approval token from Mandaire first. Mandaire issues the token or returns a hold. The LLM proposes. Deterministic code decides whether the proposal clears. Approval is one word; the work waits politely; it never bypasses.
When Mandaire holds a proposed build, send, or commitment, that hold goes into a persistent append-only ledger alongside the things that were cleared. Future requests of similar shape are evaluated against this state. A user who keeps asking for the same held-on-good-grounds action will see the system reference the prior hold, not silently re-evaluate. This is state, not prompting; the persistence is what makes the refusal compound rather than reset on every conversation.
These four are the practice, the intent-brief shape Mandaire enforces by requiring it in every project's startup. They are valuable. They are not the moat. A competent competitor can replicate them in a week with a careful system prompt. We list them here because you will see them in every Mandaire interaction; we do not claim them as architecture.
Every project, conversation, draft, or build begins with what the system is deliberately NOT doing. The intent brief writes down both. The user reads it before any work is committed. If the non-goals are wrong, the conversation stops there. The cheapest correction is the one made before the first line of code, the first sent message, the first commitment to a counterparty.
What must be true for this work to be worth doing? What would falsify it? The intent brief names both. If a named assumption is wrong, the operational practice surfaces the conflict in the next intent brief; the structural enforcement is at S2 (the irreversible-action gate) when the assumption-conflict touches a gated action.
Before building the full thing, what is the smallest slice that tells you whether the direction is right? Mandaire proposes that slice and prefers to ship it first. The user confronts the question "would I actually find this useful" before the system produces the rest.
A question that begins "will users care about this?" cannot be answered with code alone. A question that begins "will this scale to a million rows?" can be. Mandaire separates them explicitly. Technical uncertainty: build a test. Market uncertainty: surface the question, do not build past it.
Below is the shape of a Mandaire refusal note. The same shape across both surfaces. Domains differ; the mechanism is identical.
[mandaire · Refusal note · Example, AI-CTO domain]
You asked for: full admin analytics dashboard.
I am not building that yet.
Reason:
Your stated user is the store manager during prep,
not the owner doing analysis. A dashboard optimizes
for explanation; the workflow needs a default
recommendation.
Cheapest test:
One recommendation screen for tomorrow's prep
quantities, with an override-reason field. Ship
that. If managers override more than 30% of
suggestions in week one, we revisit the model
before building any analytics surface.
Decision held for you:
Do you want the manager to see confidence labels
on the recommendation, or only the recommendation
itself?
Refused-path entry:
"admin analytics dashboard, declined Day 12,
reason: user/feature mismatch. Reconsider
after recommendation surface has shipped and
manager-override data is in."
[your AI · Mandaire hold applied · Example, Chief-of-Staff domain]
You asked me to: send the response draft to Sarah now.
Mandaire's gate flagged a disclosure conflict. I am holding the send.
Reason:
The draft mentions the Q4 budget freeze, which
you discussed last week as confidential until
the all-hands. Sarah is not in the disclosure
set for that topic in this context. The draft
would leak the freeze through implication.
Cheapest revision:
Reply with the relationship question only. The
budget context is not needed to answer her
question and risks the leak. I will draft the
minimal-disclosure version for you to approve.
Decision held for you:
Confirm the freeze is still pre-announcement, or
tell me the disclosure set has expanded. If the
all-hands has shipped, I will revise the
disclosure policy for this topic.
Refused-path entry:
"send draft mentioning Q4 freeze to Sarah,
declined, reason: disclosure-policy conflict
on topic 'budget' for recipient Sarah in
context 'work professional, pre-announcement.'"
The architecture
The service architecture maps directly to the knowledge graph's three components. Layer 1 (Tenant VM) is where inputs land: source ingestors, entity resolution, the corpus. Layer 2 (Central Reasoning) is where intelligence runs: calibrated inference compilation, relationship scoring, and the disclosure engine that gates every output. Layer 3 (Renderer) is where outputs surface: your existing AI queries the MCP endpoint and receives a structured, pre-disclosure-filtered answer.
Mandaire is the hippocampus. The AI is the cortex. Every session, the cortex reads from the hippocampus and writes back what it learned. The cortex is swappable: one cloud model today, another tomorrow, a local model after that. The hippocampus persists across all of them. That is the whole platform-independence thesis in one image: context that lives inside a single model is a feature, and context held in a substrate every model reads from and writes to is infrastructure.
Open source. Apache / AGPL. Runs in your tenant VM (Mandaire-hosted by default, self-hostable always). Includes: encryption module (AGPL, reproducible builds, third-party audit), the MCP server scaffolding, the connectors and ingestor framework, the agent harness, the output filter, the proactive-update taxonomy, the watcher implementations, the operating-principles documentation. A sophisticated user can read everything that runs in their tenant VM. Hiding any of this in compiled binaries would be security-through-obscurity that a determined adversary defeats in days. We publish it instead.
Mandaire builds a relationship graph from your actual private data: email, iMessage, WhatsApp, calendar, photos, and professional connections. The current graph holds over 180,000 entity-resolved directional edges across those sources (roughly 62% from Gmail co-occurrence, 16% from Calendar, 7% from WhatsApp, the rest from iMessage, LinkedIn connections, photos, and contacts). For any interactive MCP query, the traversal engine navigates up to five degrees of separation in under one millisecond, confirmed by covering-index analysis showing no full-table scans at any depth.
The underlying engine is indexed SQLite. The speed comes from the index design and the fact that interactive queries use LIMIT-bound patterns, not full-graph analytics. Full-graph analytics at 2+ hops (for batch reporting or relationship mapping) run in the 50ms–2 second range depending on graph density. That is the honest picture.
The differentiator is not the storage engine. It is the data. No cloud provider can build a relationship graph from iMessage, WhatsApp, ChatGPT conversation history, and Gmail simultaneously, because they cannot reach across those walled gardens. Mandaire can, because it runs on your hardware under your encryption key.
Proprietary. The actual moat. Two sub-layers:
The proprietary layer is structurally outside any tenant VM. Tenant root, tenant SSH, tenant LLM access reveal nothing about Layer 2B because Layer 2B is not on the tenant's machine. The IP protection is location, not obfuscation.
Bring your own. Claude Desktop, ChatGPT, Gemini, Cursor, Claude Code, or whatever else you already pay for. Mandaire exposes a single MCP tool. The renderer calls it, gets a pre-disclosure-filtered envelope, presents it in natural language. The renderer never sees raw data. The renderer is the voice; Mandaire is the substrate.
Why this works architecturally: the renderer doesn't reason over raw data, so renderer-side sycophancy or model differences produce different tone, not different disclosure. The deterministic disclosure layer runs upstream of any LLM, in Layer 2B. This is what makes "use whatever AI you already pay for" structurally honest rather than marketing.
Mandaire is read-only with respect to its sources. It ingests from Gmail, iMessage, Calendar, and WhatsApp, but never writes to them. Your underlying accounts are not touched.
The calling AI (Layer 3) is read-write with respect to Mandaire. It reads context via MCP and writes state back: commitments made, decisions taken, context corrections. That write-back is what makes the system compound: every session adds to what Mandaire knows, without expanding what Mandaire can access.
Actions (sending a reply, creating a calendar event, updating a record) go through action connectors the calling AI chooses (a Gmail MCP, a Calendar MCP, whatever the user has configured). Mandaire's S2 gate evaluates the proposed action before execution and issues or holds the approval token. Mandaire does not execute. Your AI does, after the gate clears it.
Honest about who runs what:
User-configurable, per-deployment, per-tenant:
Trust commitments
Each commitment below is stated so it could be falsified. If we fail one, the failure is observable from your seat. If we ever quietly drop one, it shows up as a change in this document with a date.
The six bets
We are not telling you we have a structural moat the competition cannot reach. The labs ship cross-provider memory. Cursor ships agentic coding with team context and skills. Agent-memory frameworks are open-source. Anyone claiming a "structural moat" in this space in 2026 is either kidding themselves or kidding you. We are not doing either.
What we are doing is making six specific bets. None is uncopyable. All six together are an integrated position. The fifth and sixth bets used to be implicit; we named them after the v9 thesis evaluation flagged that the most consequential bets were the ones we hadn't put in writing.
The bets are not independent. They are sequenced. Bet 5 (governed delegation worth the burden) gates Bet 3 (non-engineer segment) gates Bet 2 (judgment-not-facts timing window) gates Bet 1 (trust-and-openness compounds). Bet 4 (escalation discipline) is the operational practice underneath all of them; Bet 6 (protocol over platform) is the strategic frame that makes the structure of Bets 1-5 cohere rather than contradict. Treating these as a portfolio of independent probabilities understates the chained risk: if Bet 5 fails (users won't tolerate the governance burden), Bets 3 and 2 don't get tested. The fundability question is not whether each bet is plausible in isolation; it is whether the chain holds.
Bet 1: trust-and-openness over lock-in. Leaving Mandaire is structurally easy. Your code, your decision history, your taste memory, your audit log are all yours, exportable, hostable in-house any time. We bet the market pays a premium for being trusted with the build work because you can verify, audit, and walk away. Honest tension with Bet 3: non-engineer operators do not evaluate AGPL licenses. So Bet 1 pays back through two channels. First, the technical advisors of non-technical buyers (the brother-in-law engineer, the friend at the security-conscious company, the third-party auditor) who do evaluate AGPL and reduce buyer risk; and second, the long-run trust signal that the architecture has nothing to hide. We acknowledge this is a slower payback than direct buyer demand. Cost of this bet: we earn renewal every month rather than coast on switching costs.
Bet 2: judgment, not facts; and a 24-month timing window. Memory features ship from every provider now. The shipping versions today mostly capture technical facts ("you prefer React"; "you chose JWT over sessions") rather than judgment ("you rejected clever dashboards twice because your staff need defaults; default to conservative UI and escalate the policy decision"). We bet (a) the judgment record is more valuable to operators than the facts record, and (b) the providers will not prioritize the judgment object in the next 24 months because facts are cheaper to ship and demo. We compound through that window or the bet does not pay off.
What would falsify the bet, observably: if any major provider (Anthropic, OpenAI, Cursor, Replit, GitHub Copilot, Devin, or another agentic builder) ships a memory product with both project-specific judgment capture AND user-editable taste memory exposed to non-engineer operators, AND Mandaire's renewal rate drops by 30% or more in any rolling 90-day window during the 24-month bet, we have lost the timing-window bet. The renewal-rate metric is the business-level falsifier; the feature-shipping check is the leading indicator. We tie the bet to observed buyer behavior, not to a competitor-feature checklist, because a competitor could ship four-of-six features and still win commercially. We bet on commercial outcome.
Bet 3: the non-engineer segment. Cursor serves engineers in an IDE. Claude Code, Codex, and the agentic-builder products are aimed at people who can read the code. Our audience is operators, founders, and product people who have a vision but cannot or should not be the one writing the code. The bet, sharper than "non-engineers want this": non-engineers can verify output quality well enough to safely operate the governed loop without a technical co-founder in every step. That is the load-bearing claim. The cost of this bet: we lose the engineer market entirely. What would falsify it: if the segment turns out to need a technical co-founder in the loop anyway, our value collapses to that of a junior co-founder's tool rather than the architectural function we claim.
Bet 4: enforced escalation discipline. Two pieces. The easy half: a code gate that blocks risky actions until you approve (a structural mechanism, see §The mediation mechanism → S2). The hard half is taste: knowing WHEN to escalate. The rule: surface only when both the AI loop has hit diminishing returns AND a real decision needs fresh human judgment. The honest framing: the structural half is binary and verifiable; the taste half is a credibility bet on Mandaire's ability to build escalation taste over time and have users renew because of it. The cost of this bet: every escalation is a tax on your time; bad taste makes the tax annoying. What would falsify it: your inbox fills with escalations you didn't need, OR real judgment calls slip through to autonomous execution, OR we ask before the AI loop has converged. You can monitor all three.
Bet 5: governed delegation is worth the setup and audit burden. Mandaire asks users to do something most AI products avoid asking: define disclosure boundaries, correct judgment, inspect artifacts, approve irreversible actions, occasionally read audit trails. We bet the value of governed delegation exceeds the cost of teaching and supervising the system. This bet sits underneath the other five: if users don't want the governed loop in the first place, provider competition (Bet 2) doesn't matter, segment fit (Bet 3) doesn't matter, escalation taste (Bet 4) doesn't matter. Cost: onboarding is heavier than a chatbot; the first 30 days may feel slower than "just ask Claude"; some users will bounce before the compounding becomes visible. We lose buyers who want magic with no ceremony. What would falsify it: qualified users admire the architecture but stop using it because the supervision burden feels like managing another employee; OR users route around the system for urgent work because the governance layer feels too heavy.
Bet 6: protocol distribution over product distribution. We bet that publishing the spec (the MCP tool surface, the entity-graph schema, the disclosure-policy semantics, the audit-log format) and inviting non-Mandaire implementations creates a larger ecosystem to participate in rather than a smaller one to own. The Mandaire-operated tier becomes one implementation; the protocol is the standard. The competitive question shifts from "is Mandaire bigger than Cursor?" to "did Mandaire shape the standard the labs eventually adopt?" Cost: we forfeit the path where Mandaire is the dominant single platform. Other implementations can adopt the spec without coordinating with us. What would falsify it: 24 months after spec v0.1 publication, fewer than three non-Mandaire implementations exist with non-trivial users, OR implementations exist but do not federate. In that case, the "protocol" was self-flattering documentation and the work should have gone to product features.
Stated as bets, not as moats, this is what we ask you to evaluate us on. You will see the answers play out in our renewal rate, our audit logs, our willingness to tell you to switch providers when switching is in your interest, and the number of non-Mandaire implementations of the protocol two years from spec v0.1. We will earn it or we will not.
The bounds
The strongest proof we have today is that mandaire.app and mandaire.dev were built using this exact system, in months, directing the build in plain English. The artifacts on the .dev page are real, redacted excerpts from the actual decision ledger and taste memory. That case is bounded in four ways, all named here:
The two surfaces
Both surfaces use the same architectural primitives (persistent encrypted store, deterministic disclosure engine, central reasoner [in development], BYO renderer, refused-paths ledger, open-protocol exit ramp). Both produce the same artifact shapes (intent brief, end-of-build, decision ledger, taste memory, failure admission, refusal note). The honest difference is which primitive carries the most weight on each surface, the binding constraint per domain.
mandaire.app, personal knowledge graph. The work product is your life. Family, career, relationships, health, finance, time. Binding constraint: relationship-graph integrity. The disclosure engine is the load-bearing primitive on this surface. If Mandaire leaks the wrong thing to the wrong person at the wrong time, the product fails, irrespective of how good its judgment record is. The judgment artifact is your personal taste memory + relationship model + commitment ledger; the disclosure surface is real people in real time (family, friends, colleagues, advisors, doctors, lawyers, network). What it refuses: the wrong send, the wrong commitment, the wrong allocation of attention.
mandaire.dev, context layer for builders. The work product is your product. Scope, architecture, builds, ships, refactors. Binding constraint: causal-model integrity. The decision ledger + refusal log + irreversible-action gate are the load-bearing primitives on this surface. Disclosure exists (PII in export fields, credentials, customer data, future-investor-facing material) but it is secondary; the surface is dominated by deferred-time judgment about future engineers, future users, future investors, future co-founders. If Mandaire builds the wrong thing because it failed to refuse a bad direction, the product fails, irrespective of how clean its disclosure record is. What it refuses: the wrong build, the wrong feature, the wrong technical bet.
What this means for the architecture in practice. The disclosure engine ships at full per-(person, topic, context, surface) granularity on .app because that is the surface's binding constraint. On .dev, the disclosure engine ships as a more constrained redaction-tool for decision-ledger excerpts when sharing with future investors or engineers; the per-person rules language exists but is not the value surface. Same primitives. Different intensity per surface. We tell you so explicitly because rhetoric ("same architecture, different work") that doesn't admit the asymmetry overstates how clean the cross-surface mapping is.
The relationship, the judgment record, the taste memory, the audit log, compounds across surfaces. A .dev customer who later wants .app can keep the substrate; the work product changes, the primitives do not. The cross-surface continuity is real even though the binding constraint per surface differs.
Identity model: account first, VM optional. An invited viewer does not need a full account. When a .app user sends an invite link, the viewer creates a Mandaire login (email or OAuth) and receives an access key. That key goes into their AI as a custom action. From that point, their ChatGPT or Claude calls the host's MCP server and receives only what the host's disclosure policy permits for that viewer's identity. No VM on the viewer's side. No data to import. No setup wizard. Upgrade adds a VM and data ingestion; the login identity carries forward. The identity layer is the primary unit; the VM is optional infrastructure added at upgrade.
What this is not
Not a chatbot. Chatbots answer questions. Mandaire holds your full context so your AI can act on your behalf, with disclosure mediated for every recipient. The difference is not incremental; it is structural.
Not a productivity tool. Notion, Todoist, Asana organize work. Mandaire organizes the work and the relationships and the decisions and the judgment, with the privacy and disclosure architecture that lets you safely hand the system everything.
Not a note-taking app. Storage is necessary but not sufficient. Mandaire ingests, reasons, and mediates disclosure. Your AI acts.
Not "ChatGPT with your files." Uploading files to ChatGPT does not resolve the same person across six apps, does not score inferences by evidence and track when they go stale, and does not enforce deterministic per-recipient disclosure rules upstream of the model. These are the three structural gaps between LLM memory and a knowledge graph, and they cannot be closed by giving an LLM more files. The disclosure policy engine, the statistical entity resolution layer, the calibrated inference store, the single-tenant isolation architecture, and the persistent compound substrate are structural differentiation that ChatGPT / Claude / Gemini cannot replicate without fundamental architectural changes that conflict with their business models.
Not a hardware product. The Humane / Rabbit / Limitless arc destroyed $5B+ proving that AI does not need a new gadget. Mandaire runs on the phone and laptop you already have.
Not a secret-keeping tool. Privacy and secrecy are different. Privacy is the default posture of a person with an inner life. Secrecy is active concealment of information someone else has a stake in knowing. Mandaire is infrastructure for privacy, in the way a journal is infrastructure for privacy.
Not lock-in. Open-source core. Single-tenant storage you can export in full. Pass-through token costs. Self-host always available. Your data and the code to run the system yourself are always yours. Full client-side key derivation (Argon2id) is under active development. This is a design requirement, not a marketing claim.
The protocol
Mandaire publishes its MCP tool surface, entity-graph schema, canonical-store schema, disclosure-engine semantics, and trust commitments as a versioned spec. Anyone can implement the protocol. Only Mandaire-operated instances carry the Mandaire brand and the support contract.
The competitive asymmetry is structural, not "we shipped first." Google has launched a 24/7 personal agent (Gemini Spark) with access to Gmail, Calendar, Docs, Drive, Photos, WhatsApp, Spotify, GitHub, and Tasks; its binding constraint is the same as every single-provider system: it reads Google data on Google's cloud, and cannot reach iMessage, Apple Notes, or your conversations with Claude or ChatGPT. Apple shipped an Extensions Framework at WWDC 2026 that makes the AI brain swappable (Gemini, Claude, ChatGPT) and did not build the memory layer, the entity graph, or the disclosure engine any of those brains need. Microsoft tried (Recall) and got pulverized publicly. VC startups cannot fit full-corpus ingest + local-first storage + frontier-model synthesis into per-seat SaaS unit economics. Open-source projects can build this, and Mandaire is closer to that posture than to a conventional startup.
The highest-value move from here is not shipping more features. It is publishing the architecture and the schema so others can build their own. The product becomes the spec, not the running system. The Mandaire-operated tier is one implementation; many others may follow.
Spec v0.1 is in preparation. It will document the MCP tool surface (the mandaire(verb, ...) single-tool architecture), the entity-graph schema, the canonical-store schema per source, the disclosure-policy rules language, the audit-log format, the trust commitments enforcement, and the federation hooks. Each section will name a falsifier: how an implementer can verify their implementation conforms, and how a user can verify the operator is honoring the protocol.
The two audiences
A page that talks about reproducible builds, AGPL licenses, deterministic disclosure compilation, replay-verifiable audit logs, and protocol federation is not the page a non-engineer founder reads before deciding whether to set up Mandaire. We know. The architecture matters to a different audience than the operator.
So this site (mandaire.org) is for the technical evaluator: the engineer your operator-buyer trusts to verify the architecture before they commit; the security-conscious advisor in their network; the auditor; the journalist; the security researcher; the implementer who wants to build a compatible Mandaire instance themselves. Their evaluation is what makes the operator's trust earnable. We invest in the architecture because the architecture's existence is what makes the operator's trust verifiable through someone they already trust, even if the operator never reads this page.
The operator surfaces are at mandaire.app (your life) and mandaire.dev (your product). Those pages talk about outcomes, prices, and what the system does for you. They reference back here when a careful reader wants to verify the architecture; they do not lead with it.
The honest framing: the architecture pays back two ways. First, through the trust brokers it makes possible, third-party audit (Q3 2026 roadmap), security researchers, technical advisors of non-technical buyers, journalists, named customer references, who do the verification work the operator cannot do themselves. Second, through the protocol becoming the standard others adopt, making Mandaire's operating commitments the reference point even for implementations Mandaire does not operate. Both are slower paybacks than direct buyer demand for technical features. We accept the slowness. The faster alternative, claiming architecture as buyer-acquisition lever for non-technical operators, is dishonest and we have no interest in shipping it.
Component breakdown
The fastest way to read this architecture: one row per component. Who runs it. What it can see in plaintext. What keys it holds. What survives after your request completes. Who can audit it. Current status. The full per-claim enforcement detail is in the verification matrix below.
| Component | Operated by | Plaintext access | Keys held | What persists | Who can audit | Status |
|---|---|---|---|---|---|---|
| Tenant VM Layer 1: local execution |
Mandaire-hosted (self-hostable) | Your data. No other tenant's data. Not accessible to Mandaire through any standard operational channel. (Target: client-side key derivation via Argon2id, in development; when shipped, the VM will hold only your encrypted archive.) | Anyone: all code is open-source (Apache / AGPL). You can read every line that runs here. | Encrypted archive. Watcher state. Audit log hashes. Configuration. | Anyone: all code is open-source (Apache / AGPL). You can read every line that runs here. | LIVE |
| Layer 2B: Central reasoner Mandaire-operated synthesis |
Mandaire | Yes: decrypts the context envelope in-memory to run synthesis. Plaintext is held only during the request lifecycle. Not persisted, not written to disk. | None: does not hold your encryption key. Receives a decrypted context envelope per request, in memory only. | Audit log (hashes + metadata, no content). 90 days hot. Operational telemetry (latency, cost class, model used). No raw tenant content. | Third-party audit (Q3 2026 roadmap). The crash-dump filter and telemetry deidentification will be open-sourced and audited. Code itself is proprietary (this is the moat). | LIVE / AUDIT PENDING |
| Layer 2A: Model serving Inference substrate |
Mandaire (local GPU) + cloud provider as configured (Anthropic / OpenAI / Gemini in Default profile) | Receives the synthesis prompt from Layer 2B. In Default profile, frontier API calls pass pre-filtered context to the cloud provider. Sovereign profile: local GPU only. | No encryption keys. Receives only what Layer 2B passes to it after disclosure compilation. | Local GPU: no persistence beyond the session. Cloud providers (Anthropic / OpenAI / Gemini): subject to their own retention policies, linked from the spec. Mandaire cannot override provider retention. | Local: code is open (Ollama / vLLM). Cloud: providers' own terms + the Q3 2026 audit will name what each provider retains and how Mandaire mitigates. | LIVE |
| MCP server mcp.mandaire.com |
Mandaire | Receives the tool call from your renderer LLM. Passes the pre-disclosure-filtered envelope. Does not see raw encrypted archive. | OAuth bearer tokens (session-scoped, not encryption keys). No access to your encrypted archive. | Request metadata in the audit log (hashes, tool kind, latency). No raw content. | MCP server scaffolding code is open source (Apache). Wire protocol is documented at the spec endpoint. | LIVE |
| Your renderer Claude / ChatGPT / Gemini |
You (BYOB: bring your own brain) | Receives the pre-disclosure-filtered MCP envelope from Mandaire. Phrases the response in natural language. Does not receive raw data from your archive. | None relevant to Mandaire. Manages your own subscription tokens. | Subject to the renderer provider's own retention and conversation history policies. Mandaire has no control over this layer. | Provider's own terms. The Q3 2026 audit will link provider retention policies for the three supported renderers. | USER-OPERATED |
Status legend: LIVE = running in production. LIVE / AUDIT PENDING = live but verification by a named third-party auditor is on the Q3 2026 roadmap. USER-OPERATED = outside Mandaire's operational control; subject to provider terms.
Verification matrix
The thesis is intellectually stronger than the evidence program, by design, because thesis is faster to revise than infrastructure. This matrix names per-claim where the evidence currently sits, what verification is available now versus on the roadmap, and what gaps remain. Every row is a commitment we earn or fail; "current status" is honest. We will update this table as evidence arrives.
Claim | Enforcement | Who can verify | Current status | Evidence/artifact | Remaining gap
-----------------------------------|-----------------------------------|-----------------------|--------------------|--------------------------------------|------------------------------------------------------
S1: Disclosure compilation | Deterministic Python upstream | Code reviewer + audit | PROTOTYPE | Source repository (open) + | Tamper-evident binding between policy
upstream of any LLM | of any model invocation | of Layer 2B | | architecture diagram | source and runtime; conformance test suite
S2: Irreversible-action gate | Typed action taxonomy + | Code reviewer + user | PROTOTYPE | Source repository (open) + | Comprehensive action taxonomy is
| approval-token requirement | observation in app | | tenant-side approval UI | engineering-incomplete; per-action gate
for novel-class actions not yet defined
S3: Append-only refused-paths | SQLite per-tenant; query at | User export + replay | PROTOTYPE | Schema published; per-tenant | Tamper-evidence (hash chaining, signing,
ledger as state | request planning time | | | export available | transparency-log semantics) not yet built
No tenant data persists in | In-memory only on Layer 2B host; | Third-party audit | SPEC / | Retention table per data class | Crash-dump filter source + false-negative
central reasoner (storage) | retention table per data class | | PENDING SVC LAUNCH | (above); audit log shape published | rate; APM/telemetry deidentification floor
No learning from observed traffic | Codebase commitment; no training | Third-party audit + | POLICY LIVE / | Documented in trust commitment 2; | Cloud-frontier provider retention not under
| pipeline pulls tenant payloads | code review | TECHNICAL VERIFY | absent training-data ingestor | our control; need provider-policy disclosure
| | | PENDING | |
Replay-verifiable audit log | sha256 of input + output + meta; | User-side replay | PROTOTYPE | Audit-log schema; per-tenant SQLite | User-facing replay tool; sample-replay
| no raw content stored | with original input | | on Layer 2B host | published; tenant-held log signing
Reproducible-build encryption | Build manifest + signed tag | Anyone with the | ROADMAP (Q3 2026) | Source under AGPL | Reproducible-build pipeline + manifest;
module | per release | source | | | named third-party audit
All tenant-VM code open source | Public repository; license tags | Anyone (binary) | ROADMAP (v0.1) | Public repo URL | Repo currently a partial mirror; canonical
| | | | (repo URL forthcoming with v0.1) | open release on GitHub for v0.1 spec
Disclosure-policy semantics public | Documented rules language in spec | Anyone with spec | ROADMAP (v0.1) | This page describes the language; | Spec v0.1 publication with conformance
| | | | rules-language reference forthcoming | test vectors per rule type
One-command export | CLI tool + portable format | User runs the command | PROTOTYPE | Schema documented | Export format stability across versions;
| | | | | round-trip verification (re-import test)
Self-host always available | Open-source skeleton + docs | User runs self-host | PROTOTYPE | Docker compose setup documented | Parity SLA between hosted and self-host
| | | | | tiers; patch cadence commitment
Mandaire = controller AND | Article 28 contract terms; | Legal review of | POLICY LIVE / | Privacy policy; processor terms; | DPIA + named DPO + breach-notification
processor (legal posture) | Article 30 records of processing | published terms | DOCS PENDING | named legal counsel forthcoming | runbook; cross-border transfer SCCs
Cloud-frontier provider retention | Disclosed in audit; provider | Audit + provider | ROADMAP (Q3 2026) | Provider terms (Anthropic / OpenAI / | What providers retain when Layer 2B
disclosure (what Anthropic / | policies linked from spec | terms | | Gemini) linked from spec; Mandaire | routes to them; how Mandaire mitigates
OpenAI / Gemini retain) | | | | mitigation per provider documented | (or names where it doesn't mitigate)
Bet 6: protocol adoption | 3+ non-Mandaire implementations | Public registry of | ROADMAP (24mo) | Spec v0.1 publication + federation | Active implementations beyond Mandaire-
| with non-trivial users at 24mo | implementations | | hooks | operated tier; user counts per impl
Q3 2026 audit completion | Named auditor + named scope | Audit report | ROADMAP (Q3 2026) | This page commits scope + timeline; | Auditor selection (Q2 2026); audit SoW;
| items above; published report | | | auditor selection forthcoming | response posture for partial-compliance
Legend: COMMITMENT (LIVE) = stated AND independently verifiable today.
LIVE CLAIM / AUDIT PENDING = stated; verification on the Q3 2026 audit roadmap.
POLICY LIVE / TECH PENDING = behavioral commitment in code; technical enforcement (vs policy-only) pending.
POLICY LIVE / DOCS PENDING = legal posture stated; supporting documentation forthcoming.
PROTOTYPE = built but not yet hardened to commitment standard.
ROADMAP = named timeline.
Where evidence is "forthcoming," it lands with the v0.1 spec or the Q3 2026 audit.
What this matrix is for. Buyers, technical evaluators, security researchers, journalists, and would-be implementers each need different verification surfaces. The matrix lets each audience identify which row matters to them and what evidence is current versus future. We commit to keeping this matrix accurate: when a row moves from PROTOTYPE to COMMITMENT (LIVE), the date moves with it; when a ROADMAP item slips, that slip is named here, not buried.
What it is not. A claim that the architecture is finished. Half the rows are PROTOTYPE today. The matrix is honest about that gap; turning PROTOTYPE rows into LIVE commitments is the operational work of the next 6-12 months. The matrix exists so the gap is observable to the same audiences whose evaluation makes the architecture pay back.