§ 00 · Security

Designed for the day
the model is wrong.

Every AI agent your team runs has root-level intent and zero judgement. Espada is the gate between that intent and your production. This page is the proof — every threat we model, every control we ship, every test we wrote, and where in the repository to read it. If you find a divergence between this page and the source, the source is correct.

00 Gate · Schematic v1.0 · plate 01

Threat model

Most security pages tell you what they protect. Ours tells you who they protect against. Three adversaries, modelled by name, with the specific control that stops each one. STRIDE v2 for action-side risks; LINDDUN for data-handling. Thirty named threats in the full model — the three that matter most are below.

Adversary A — The agent itself

What if the model goes haywire? A misaligned, prompt-injected, or simply mistaken LLM proposes a destructive action. Every tool call resolves through Espada's effect inspector before it leaves the gateway — what will be destroyed, what IAM changes, what networks open. A resolved destructive action requires a signed human approval. No policy in the system auto-confirms. The model never had the keys.

Adversary B — A compromised operator

What if my engineer's laptop is stolen? Cookie exfiltrated, session token lifted, browser owned end-to-end. The attacker has every credential the operator had — every credential except the YubiKey in their pocket. The signing factor is cryptographically separated from the runtime; credentials and second factor never share an address space. Owning the laptop has never been enough.

Adversary C — A compromised Espada binary

What if you ship a backdoor? The question we want you to ask. Reproducible builds via Bazel. Signed releases against project release key 0xE3F7…AC81. Full SBOM with every release. Full source on GitHub, MIT-licensed. You can build from verified source, sign your own binary, and never trust ours. We designed the policy that way on purpose.

Full STRIDE model — 30 named threats with mitigations, residual risk, and test counts — is published at docs.espada.dev/security/threat-model.

The seven controls

On December 22, 2025, an internal AI coding agent at a hyperscaler cloud provider deleted a production environment. The outage lasted roughly thirteen hours and reached customers in every region. The agent had inherited an engineer's standing credentials. It bypassed the two-person approval rule. No model alignment caught it. No prompt engineering saved it.

We read the postmortem. We built the gate. The seven controls below are that gate — named, implemented, tested. Each one corresponds to one of the seven specific failures the incident identified. Any one of them, in place, would have stopped it.

01
Conversation taint propagation

External content (files, URLs, tool output, untrusted prompts) is wrapped in <<<EXTERNAL_UNTRUSTED_CONTENT>>> boundary markers at ingress. Any tool argument string-derived from a tainted region carries the taint forward. Policy can refuse to forward a tainted argument to a destructive tool.

src/security/external-content.ts

Attacker cost: there is no model-side jailbreak that strips the boundary. Requires compromise of the gateway process.
02
Per-agent blast-radius budget

Each agent identity carries a daily spend cap, a tool allowlist, and a maximum-affected-resources budget. A planned change whose graph impact exceeds the budget is denied at the gateway before the cloud API is called.

extensions/policy-engine/src/agent-budget.ts

Attacker cost: a privileged operator must widen the budget in writing, in advance, and ride out the audit trail.
03
Initiator ≠ approver invariant

Every approval decision records the exact plan hash. A plan that mutates after approval invalidates the approval. The same identity cannot both initiate and approve; the invariant is enforced at every state transition.

extensions/policy-engine/src/approval-gate.ts

Attacker cost: compromise two separate identities and observe the plan hash before it is sealed. A single stolen session is insufficient.
04
Topology-aware policy

Policy conditions can reference blastRadius, dependencyDepth, and affectedResources from the live infrastructure knowledge graph. A condition such as blastRadius > 50 mapped to require_approval runs as part of the same before_tool_call evaluation as identity and quota.

extensions/knowledge-graph/src/analysis/change-impact.ts

Attacker cost: drift the graph so it under-reports dependents — but the graph is rebuilt from the same cloud APIs the agent itself targets.
05
Just-in-time scoped credentials

Agents never see the operator's standing credential. Each tool call is issued a freshly minted, short-TTL token whose scope is derived from the declared task and the agent's budget. Inheriting a deploying engineer's broad aws:* role is structurally not expressible.

src/agents/sandbox/types.ts

Attacker cost: compromise the gateway credential broker. Stealing a running token buys minutes of read access at worst, not standing prod write.
06
Hash-chained immutable audit

Every tool invocation, policy decision, approval grant, and credential mint is written as a hash-chained Ed25519-signed audit record. Periodically the head is anchored to an external witness (GitHub commit, RFC-3161 timestamp, or a transparency log) to detect rollback.

src/gateway/audit/* · 29 + 142 tests

Attacker cost: invert SHA-256 + forge an Ed25519 signature + rewrite the external witness. Stacked, all three.
07
Break-glass with cooldown

Emergency elevated access is time-bounded by policy (minDurationMs..maxDurationMs), automatically expired by sweeper, requires a second human signature, and records every isPermitted() call as an audit usage event. allowSelfApprove defaults to false.

src/gateway/governance/break-glass.ts · 27 tests

Attacker cost: a second human signature, every time, with a red-banner UI warning if self-approve was ever enabled.

We reconstructed the incident minute-by-minute, mapped each failure to the control that would have caught it, and published the whole thing at docs.espada.dev/security/kiro-postmortem-pack.

Identity stack

Your IdP is the source of truth. Your engineers do not get to argue with it. Espada ships the four primitives a serious enterprise integration needs — bound to your provider, gated at every entry point, tested down to the individual JWT claim.

OIDC: JWKS verification with iss, aud, exp, iat, nbf enforcement and ≤60 s skew. kid pinning. Replay protection via jti tracking. extensions/enterprise-auth/src/oidc-jwks.ts · 23 tests
SAML: InResponseTo, NotBefore, NotOnOrAfter, Destination, Audience enforcement; assertion signature + certificate-chain check. extensions/enterprise-auth/src/saml.ts · 12 tests
MFA: TOTP + WebAuthn / FIDO2 second factor on sensitive actions. All secret comparisons via crypto.timingSafeEqual. extensions/enterprise-auth/src/mfa.ts · 40 tests
SCIM 2.0: Create / update / disable / re-enable lifecycle from your IdP. Immutable externalId blocks login-hijack via mapping swap. Filter parser restricted to eq/pr + one and — no injection surface. src/gateway/governance/scim.ts · 36 tests

Audit chain

When the auditor asks who did this, when, with what authority? you hand them a cryptographic answer. Not a log file — a proof. Every observed action and every decision is written as a SHA-256-hashed, Ed25519-signed entry whose hash depends on the one before it. The head is anchored to an external witness of your choosing — GitHub commit, RFC-3161 timestamp, or transparency log — to detect rollback. Tamper with one entry and every entry after it stops verifying.

{
  "id": "9d72e3...f041a8",
  "ts": "2025-12-22T14:02:51Z",
  "actor": "claude:agent-9af1",
  "action": "terraform apply",
  "resolved": { "destroys": 41, "creates": 0, "modifies": 2 },
  "verdict": "REQUIRES_SIGNATURE",
  "policy": "prod.destroy",
  "prev_hash": "8a91...0c0f"
}

Data handling

We cannot leak your data because we do not have your data. Espada is a single self-hosted binary that runs in your environment and writes to a sink you control — local filesystem, S3, GCS, Azure Blob, Postgres, or your own adapter. Prompts never reach a vendor model. Audit records never reach a vendor cloud. Secrets never cross your VPC.

No telemetry by default. No update beacon. No crash reporter. No "anonymous usage statistics." The binary will not connect outbound unless your policies explicitly require it — and when it does, every connection is itself audited.

Approval & break-glass

There is no path through Espada that ends with "and then the agent just did it." Every destructive action waits on a second human signature. Every emergency override is time-bounded, swept on expiry, and logged at every check. The same primitive runs your routine deploy, your high-risk migration, and your 3 AM incident — so the controls you tested on Tuesday are the controls that hold on Saturday.

Multi-stage workflows. Sequential or parallel stages, per-stage SLAs, delegate routing. Delegates cannot route back to the initiator. Terminal states are immutable. src/gateway/governance/approval-workflows.ts · 28 tests
Hash-pinned plans. The approval binds to the plan hash. A plan that mutates after approval invalidates the approval. The signature never authorises a plan it didn't sign.
Break-glass sessions. Time-bounded (minDurationMs..maxDurationMs), automatically swept on expiry, one pending or active per user, global concurrency cap, cooldown after each session ends. src/gateway/governance/break-glass.ts · 27 tests
Per-call audit. Every isPermitted() check emits a usage event. The break-glass holder cannot deny scope after the fact.

Extension contract

Most of an AI platform's blast radius lives in its integrations — the agents, tools, channels, and cloud connectors that actually touch the outside world. In Espada that surface is 37 extensions, every one of them routed through the Enterprise Extension Contract: a fixed middleware chain, sandboxed subprocess and fetch wrappers, and a static scanner that fails CI the moment an extension reaches for a raw child_process or fetch.

7 Phases of hardening

608 Regression tests across the contract

37 Extensions, every method gated

0 Allowed unhardened imports (CI-enforced)

Subprocess + fetch wrappers. Allowlisted binaries, no-shell, RFC1918 outbound guard. extensions/cloud-utils/safe-process.ts · safe-fetch.ts · 56 tests
Webhook HMAC + replay guard. Slack / Meta / generic verifiers; TTL + LRU dedupe per (scope, eventId); ±5-min clock skew window. extensions/cloud-utils/webhook-hmac · webhook-replay-guard · 74 tests
Prompt injection guard. OWASP LLM-01 detector + neutraliser, model-allowlist policy. extensions/llm-task/src/prompt-guard.ts · 81 tests
Audit redactor. 12-pattern secret mask with hash-pin before any record reaches a sink. extensions/llm-task/src/audit-redactor.ts
Tenant-scope filters. Knowledge-graph, memory-core, and incident-view all enforce tenant scope at the query layer. extensions/knowledge-graph/src/iql-guard.ts · 89 tests in Phase 12
Static contract scanner. CI fails the build if any extension imports raw child_process, fetch, or unsigned audit sinks. scripts/audit-extensions.ts

Canonical spec: extensions/ENTERPRISE-CONTRACT.md.

Formal verification

On the highest-risk paths we do not just test the gate; we prove it. Gateway exposure, nodes.run approval, pairing-store TTL, ingress gating, session isolation — each ships a TLA+/TLC model that machine-checks the security property across every interleaving of states the model admits. Beside it, we ship the same model with the property deliberately broken, so the suite produces a counterexample trace — evidence that if the property ever regressed, we would catch it. Most vendors do not write these. We do.

# Green — the property holds under all interleavings
make gateway-exposure-v2
make nodes-pipeline
make approvals-token
make pairing
make routing-isolation

# Negative — the suite catches the bug
make gateway-exposure-v2-negative   # → counterexample trace
make nodes-pipeline-negative
make approvals-token-negative

Results are bounded by the state space TLC explores; green does not imply security beyond the modelled assumptions and bounds. Drift between model and TypeScript implementation is possible — the models are an attacker-driven security regression suite, not a whole-system proof.

Models live at github.com/vignesh07/espada-formal-models; documentation and Make targets at docs.espada.dev/security/formal-verification.

Compliance posture

When your auditor lands on the AC-2, AU-3, IA-2 control assessment, the cells already have answers — and a file path and a passing test count next to each. Espada is designed against NIST SP 800-53 Rev 5 at the FedRAMP Moderate baseline. The ledger below is a summary; every row is regenerable from source. The work is done. The evidence is reproducible.

AC — Access Control: Account lifecycle via SCIM. Deny-by-default RBAC. Tenant / project / environment scope hierarchy. Break-glass for emergency elevated access. Concurrent session control. Wildcard scopes structurally not expressible. Implemented · 35 + 36 + 27 + 28 tests
AU — Audit & Accountability: Hash-chained Ed25519-signed records. SIEM export (Splunk, Sentinel, CEF). UTC microsecond timestamps. Fail-closed on audit-append failure. Implemented · 29 + 68 tests
IA — Identification & Authentication: OIDC, SAML, MFA (TOTP + WebAuthn), bearer tokens. Per-IP and per-account rate limit with exponential backoff. Implemented · 23 + 12 + 40 tests
SC — System & Communications: TLS 1.2+ ingress. mTLS on cluster links. SCIM payload caps. Distributed lock with fencing tokens. SQLite migration lock self-reclaim. Implemented · 21 + 16 tests
SI — System & Information Integrity: Schema migrations checksum-pinned. Central error formatter strips stacks. Static contract scanner gates CI. Bounded- cardinality Prometheus labels. Implemented · 13 + 19 tests
IR — Incident Response: Hash-chained incident state machine. Transition guard captures principal, classification, reason. Rollback checkpoint log monotonic. Implemented · Phase 12

Full row-by-row mapping (Implemented / Partial / Inherited / Planned with evidence paths) is published at docs.espada.dev/security/compliance-nist-800-53. Last reviewed 2026-04-22.

Signing

The signing factor lives somewhere an attacker who owns the runtime cannot reach. Pick the form that fits your operation; the cryptographic guarantee is the same.

WebAuthn / FIDO2 hardware tokens — YubiKey, Titan, Solo.
SSH host or user key, with a hardware-backed agent.
GPG signature, for offline and air-gapped approvals.
Two-of-three quorum sign-off, for the highest-risk actions.

Disclosure policy

Report security issues to security@espadafirewall.com. We aim to acknowledge within 24 hours and publish a fix or a public CVE within 30 days. The full policy is on GitHub.

Standards alignment

Espada is designed against OWASP LLM Top 10 — specifically LLM-01 (prompt injection) and LLM-08 (excessive agency); NIST AI RMF 1.0 GOVERN-1, MAP-3, and MANAGE-2; FedRAMP Moderate against NIST SP 800-53 Rev 5; and ISO/IEC 27034-1 STRIDE v2 for the threat model. The frameworks are not the point — the controls are. The frameworks just say so.

THREAT MODEL / 2025-12-22T00:00Z / sha256:9d72e3…f041a8

Designed for the day the model is wrong.

Threat model

Adversary A — The agent itself

Adversary B — A compromised operator

Adversary C — A compromised Espada binary

The seven controls

Conversation taint propagation

Per-agent blast-radius budget

Initiator ≠ approver invariant

Topology-aware policy

Just-in-time scoped credentials

Hash-chained immutable audit

Break-glass with cooldown