§ 00 · Security
Designed for the day
the model is wrong.
Every AI agent your team runs has root-level intent and zero judgement. Espada is the gate between that intent and your production. This page is the proof — every threat we model, every control we ship, every test we wrote, and where in the repository to read it. If you find a divergence between this page and the source, the source is correct.
Threat model
Most security pages tell you what they protect. Ours tells you who they protect against. Three adversaries, modelled by name, with the specific control that stops each one. STRIDE v2 for action-side risks; LINDDUN for data-handling. Thirty named threats in the full model — the three that matter most are below.
Adversary A — The agent itself
What if the model goes haywire? A misaligned, prompt-injected, or simply mistaken LLM proposes a destructive action. Every tool call resolves through Espada's effect inspector before it leaves the gateway — what will be destroyed, what IAM changes, what networks open. A resolved destructive action requires a signed human approval. No policy in the system auto-confirms. The model never had the keys.
Adversary B — A compromised operator
What if my engineer's laptop is stolen? Cookie exfiltrated, session token lifted, browser owned end-to-end. The attacker has every credential the operator had — every credential except the YubiKey in their pocket. The signing factor is cryptographically separated from the runtime; credentials and second factor never share an address space. Owning the laptop has never been enough.
Adversary C — A compromised Espada binary
What if you ship a backdoor? The question we want
you to ask. Reproducible builds via Bazel. Signed releases
against project release key 0xE3F7…AC81. Full
SBOM with every release. Full source on GitHub, MIT-licensed.
You can build from verified source, sign your own binary, and
never trust ours. We designed the policy that way on purpose.
Full STRIDE model — 30 named threats with mitigations, residual risk, and test counts — is published at docs.espada.dev/security/threat-model.
The seven controls
On December 22, 2025, an internal AI coding agent at a hyperscaler cloud provider deleted a production environment. The outage lasted roughly thirteen hours and reached customers in every region. The agent had inherited an engineer's standing credentials. It bypassed the two-person approval rule. No model alignment caught it. No prompt engineering saved it.
We read the postmortem. We built the gate. The seven controls below are that gate — named, implemented, tested. Each one corresponds to one of the seven specific failures the incident identified. Any one of them, in place, would have stopped it.
- 01
Conversation taint propagation
External content (files, URLs, tool output, untrusted prompts) is wrapped in
<<<EXTERNAL_UNTRUSTED_CONTENT>>>boundary markers at ingress. Any tool argument string-derived from a tainted region carries the taint forward. Policy can refuse to forward a tainted argument to a destructive tool.src/security/external-content.tsAttacker cost: there is no model-side jailbreak that strips the boundary. Requires compromise of the gateway process.
- 02
Per-agent blast-radius budget
Each agent identity carries a daily spend cap, a tool allowlist, and a maximum-affected-resources budget. A planned change whose graph impact exceeds the budget is denied at the gateway before the cloud API is called.
extensions/policy-engine/src/agent-budget.tsAttacker cost: a privileged operator must widen the budget in writing, in advance, and ride out the audit trail.
- 03
Initiator ≠ approver invariant
Every approval decision records the exact plan hash. A plan that mutates after approval invalidates the approval. The same identity cannot both initiate and approve; the invariant is enforced at every state transition.
extensions/policy-engine/src/approval-gate.tsAttacker cost: compromise two separate identities and observe the plan hash before it is sealed. A single stolen session is insufficient.
- 04
Topology-aware policy
Policy conditions can reference
blastRadius,dependencyDepth, andaffectedResourcesfrom the live infrastructure knowledge graph. A condition such asblastRadius > 50mapped torequire_approvalruns as part of the samebefore_tool_callevaluation as identity and quota.extensions/knowledge-graph/src/analysis/change-impact.tsAttacker cost: drift the graph so it under-reports dependents — but the graph is rebuilt from the same cloud APIs the agent itself targets.
- 05
Just-in-time scoped credentials
Agents never see the operator's standing credential. Each tool call is issued a freshly minted, short-TTL token whose scope is derived from the declared task and the agent's budget. Inheriting a deploying engineer's broad
aws:*role is structurally not expressible.src/agents/sandbox/types.tsAttacker cost: compromise the gateway credential broker. Stealing a running token buys minutes of read access at worst, not standing prod write.
- 06
Hash-chained immutable audit
Every tool invocation, policy decision, approval grant, and credential mint is written as a hash-chained Ed25519-signed audit record. Periodically the head is anchored to an external witness (GitHub commit, RFC-3161 timestamp, or a transparency log) to detect rollback.
src/gateway/audit/*· 29 + 142 testsAttacker cost: invert SHA-256 + forge an Ed25519 signature + rewrite the external witness. Stacked, all three.
- 07
Break-glass with cooldown
Emergency elevated access is time-bounded by policy (
minDurationMs..maxDurationMs), automatically expired by sweeper, requires a second human signature, and records everyisPermitted()call as an audit usage event.allowSelfApprovedefaults tofalse.src/gateway/governance/break-glass.ts· 27 testsAttacker cost: a second human signature, every time, with a red-banner UI warning if self-approve was ever enabled.
We reconstructed the incident minute-by-minute, mapped each failure to the control that would have caught it, and published the whole thing at docs.espada.dev/security/kiro-postmortem-pack.
Identity stack
Your IdP is the source of truth. Your engineers do not get to argue with it. Espada ships the four primitives a serious enterprise integration needs — bound to your provider, gated at every entry point, tested down to the individual JWT claim.
- OIDC
-
JWKS verification with
iss,aud,exp,iat,nbfenforcement and ≤60 s skew.kidpinning. Replay protection viajtitracking. extensions/enterprise-auth/src/oidc-jwks.ts · 23 tests - SAML
-
InResponseTo,NotBefore,NotOnOrAfter,Destination,Audienceenforcement; assertion signature + certificate-chain check. extensions/enterprise-auth/src/saml.ts · 12 tests - MFA
-
TOTP + WebAuthn / FIDO2 second factor on sensitive actions.
All secret comparisons via
crypto.timingSafeEqual. extensions/enterprise-auth/src/mfa.ts · 40 tests - SCIM 2.0
-
Create / update / disable / re-enable lifecycle from your
IdP. Immutable
externalIdblocks login-hijack via mapping swap. Filter parser restricted toeq/pr+ oneand— no injection surface. src/gateway/governance/scim.ts · 36 tests
Audit chain
When the auditor asks who did this, when, with what authority? you hand them a cryptographic answer. Not a log file — a proof. Every observed action and every decision is written as a SHA-256-hashed, Ed25519-signed entry whose hash depends on the one before it. The head is anchored to an external witness of your choosing — GitHub commit, RFC-3161 timestamp, or transparency log — to detect rollback. Tamper with one entry and every entry after it stops verifying.
{
"id": "9d72e3...f041a8",
"ts": "2025-12-22T14:02:51Z",
"actor": "claude:agent-9af1",
"action": "terraform apply",
"resolved": { "destroys": 41, "creates": 0, "modifies": 2 },
"verdict": "REQUIRES_SIGNATURE",
"policy": "prod.destroy",
"prev_hash": "8a91...0c0f"
} Data handling
We cannot leak your data because we do not have your data. Espada is a single self-hosted binary that runs in your environment and writes to a sink you control — local filesystem, S3, GCS, Azure Blob, Postgres, or your own adapter. Prompts never reach a vendor model. Audit records never reach a vendor cloud. Secrets never cross your VPC.
No telemetry by default. No update beacon. No crash reporter. No "anonymous usage statistics." The binary will not connect outbound unless your policies explicitly require it — and when it does, every connection is itself audited.
Approval & break-glass
There is no path through Espada that ends with "and then the agent just did it." Every destructive action waits on a second human signature. Every emergency override is time-bounded, swept on expiry, and logged at every check. The same primitive runs your routine deploy, your high-risk migration, and your 3 AM incident — so the controls you tested on Tuesday are the controls that hold on Saturday.
- Multi-stage workflows. Sequential or parallel stages, per-stage SLAs, delegate routing. Delegates cannot route back to the initiator. Terminal states are immutable. src/gateway/governance/approval-workflows.ts · 28 tests
- Hash-pinned plans. The approval binds to the plan hash. A plan that mutates after approval invalidates the approval. The signature never authorises a plan it didn't sign.
- Break-glass sessions. Time-bounded
(
minDurationMs..maxDurationMs), automatically swept on expiry, one pending or active per user, global concurrency cap, cooldown after each session ends. src/gateway/governance/break-glass.ts · 27 tests - Per-call audit. Every
isPermitted()check emits a usage event. The break-glass holder cannot deny scope after the fact.
Extension contract
Most of an AI platform's blast radius lives in its integrations
— the agents, tools, channels, and cloud connectors that
actually touch the outside world. In Espada that surface is
37 extensions, every one of them routed through the
Enterprise Extension Contract: a fixed
middleware chain, sandboxed subprocess and fetch wrappers, and
a static scanner that fails CI the moment an extension reaches
for a raw child_process or fetch.
- Subprocess + fetch wrappers. Allowlisted binaries, no-shell, RFC1918 outbound guard. extensions/cloud-utils/safe-process.ts · safe-fetch.ts · 56 tests
- Webhook HMAC + replay guard. Slack / Meta /
generic verifiers; TTL + LRU dedupe per
(scope, eventId); ±5-min clock skew window. extensions/cloud-utils/webhook-hmac · webhook-replay-guard · 74 tests - Prompt injection guard. OWASP LLM-01 detector + neutraliser, model-allowlist policy. extensions/llm-task/src/prompt-guard.ts · 81 tests
- Audit redactor. 12-pattern secret mask with hash-pin before any record reaches a sink. extensions/llm-task/src/audit-redactor.ts
- Tenant-scope filters. Knowledge-graph, memory-core, and incident-view all enforce tenant scope at the query layer. extensions/knowledge-graph/src/iql-guard.ts · 89 tests in Phase 12
- Static contract scanner. CI fails the build
if any extension imports raw
child_process,fetch, or unsigned audit sinks. scripts/audit-extensions.ts
Canonical spec: extensions/ENTERPRISE-CONTRACT.md.
Formal verification
On the highest-risk paths we do not just test the gate; we
prove it. Gateway exposure, nodes.run approval,
pairing-store TTL, ingress gating, session isolation — each
ships a TLA+/TLC model that machine-checks the security
property across every interleaving of states the model admits.
Beside it, we ship the same model with the property
deliberately broken, so the suite produces a counterexample
trace — evidence that if the property ever regressed, we would
catch it. Most vendors do not write these. We do.
# Green — the property holds under all interleavings
make gateway-exposure-v2
make nodes-pipeline
make approvals-token
make pairing
make routing-isolation
# Negative — the suite catches the bug
make gateway-exposure-v2-negative # → counterexample trace
make nodes-pipeline-negative
make approvals-token-negative Results are bounded by the state space TLC explores; green does not imply security beyond the modelled assumptions and bounds. Drift between model and TypeScript implementation is possible — the models are an attacker-driven security regression suite, not a whole-system proof.
Models live at github.com/vignesh07/espada-formal-models; documentation and Make targets at docs.espada.dev/security/formal-verification.
Compliance posture
When your auditor lands on the AC-2, AU-3, IA-2 control assessment, the cells already have answers — and a file path and a passing test count next to each. Espada is designed against NIST SP 800-53 Rev 5 at the FedRAMP Moderate baseline. The ledger below is a summary; every row is regenerable from source. The work is done. The evidence is reproducible.
- AC — Access Control
- Account lifecycle via SCIM. Deny-by-default RBAC. Tenant / project / environment scope hierarchy. Break-glass for emergency elevated access. Concurrent session control. Wildcard scopes structurally not expressible. Implemented · 35 + 36 + 27 + 28 tests
- AU — Audit & Accountability
- Hash-chained Ed25519-signed records. SIEM export (Splunk, Sentinel, CEF). UTC microsecond timestamps. Fail-closed on audit-append failure. Implemented · 29 + 68 tests
- IA — Identification & Authentication
- OIDC, SAML, MFA (TOTP + WebAuthn), bearer tokens. Per-IP and per-account rate limit with exponential backoff. Implemented · 23 + 12 + 40 tests
- SC — System & Communications
- TLS 1.2+ ingress. mTLS on cluster links. SCIM payload caps. Distributed lock with fencing tokens. SQLite migration lock self-reclaim. Implemented · 21 + 16 tests
- SI — System & Information Integrity
- Schema migrations checksum-pinned. Central error formatter strips stacks. Static contract scanner gates CI. Bounded- cardinality Prometheus labels. Implemented · 13 + 19 tests
- IR — Incident Response
- Hash-chained incident state machine. Transition guard captures principal, classification, reason. Rollback checkpoint log monotonic. Implemented · Phase 12
Full row-by-row mapping (Implemented / Partial / Inherited / Planned with evidence paths) is published at docs.espada.dev/security/compliance-nist-800-53. Last reviewed 2026-04-22.
Signing
The signing factor lives somewhere an attacker who owns the runtime cannot reach. Pick the form that fits your operation; the cryptographic guarantee is the same.
- WebAuthn / FIDO2 hardware tokens — YubiKey, Titan, Solo.
- SSH host or user key, with a hardware-backed agent.
- GPG signature, for offline and air-gapped approvals.
- Two-of-three quorum sign-off, for the highest-risk actions.
Disclosure policy
Report security issues to security@espadafirewall.com. We aim to acknowledge within 24 hours and publish a fix or a public CVE within 30 days. The full policy is on GitHub.
Standards alignment
Espada is designed against OWASP LLM Top 10 — specifically LLM-01 (prompt injection) and LLM-08 (excessive agency); NIST AI RMF 1.0 GOVERN-1, MAP-3, and MANAGE-2; FedRAMP Moderate against NIST SP 800-53 Rev 5; and ISO/IEC 27034-1 STRIDE v2 for the threat model. The frameworks are not the point — the controls are. The frameworks just say so.
THREAT MODEL / / sha256:9d72e3…f041a8