SR 11-7 Compliance for AI Agents: a practical framework.
Twenty-four pages on how to apply the Federal Reserve's SR 11-7 model risk management guidance to the generative AI agents your business units are now deploying. Written for Chief Model Risk Officers and their teams.
Written by Ashish K. Saxena · Founder, Caventia
What the SR 11-7 framework asks, and why generative AI breaks it.
SR 11-7 was written in 2011 for a world of credit scorecards, fraud rules, and stress-test econometrics. Its three pillars — conceptual soundness, ongoing monitoring, and outcomes analysis — were designed around models you could specify, freeze, and reason about as a mathematical object. The generative AI agent your business unit is piloting today fits none of those assumptions, and your examiner has noticed.
The first place SR 11-7 strains is conceptual soundness. The guidance asks for a clear statement of what the model is supposed to do, what data it was trained on, and what the expected error modes are. For a foundation model invoked through prompts, those statements either become trivially true (“it processes natural language”) or trivially false (“we cannot enumerate the training data”). Neither answer satisfies the examiner.
The second place it strains is ongoing monitoring. SR 11-7 expects the model risk function to detect drift, retrain on schedule, and document each change. With prompt-based agents the change cadence is daily, the “retraining” is a prompt rewrite, and the population is shifting under your feet because user behaviour adapts to the agent. The bank needs a monitoring substrate that captures every prompt-version, every feature snapshot, and every outcome — and produces drift evidence on the schema your examiner reads.
The third place it strains is independent validation. SR 11-7 requires a third-party model validation function that can opine on the model's fitness. The standard answer — hire a Big 4 firm — is a six-month, half-million-dollar engagement that scales poorly when your bank has fifty agents under review. A productized validator network, bonded and trained on a common evidence model, is the only economically viable answer at scale.
The remainder of this whitepaper proposes a concrete framework for each pillar, with templates your model risk function can adopt this quarter. [Download to continue reading.]
Five sections, twenty-four pages, three appendices.
I.
Where SR 11-7 strains
Pillar-by-pillar diagnosis of where the 2011 guidance breaks under generative AI workloads.
II.
The conceptual-soundness rewrite
How to write a model documentation pack that satisfies an OCC examiner for a prompt-based agent.
III.
Monitoring substrate
The minimum data structure for runtime audit logs that survive an SR 11-7 reconstruction.
IV.
Independent validation, productized
A framework for a bonded validator network and the economics of moving from $500K engagements to $40K cycles.
V.
Implementation checklist
A 30-page-by-page checklist your model risk function can adapt this quarter.
App.
Templates and appendices
Model documentation template, drift detection schema, examiner briefing one-pager.