Model Risk Management Checklist for AI

← Back to Enterprise.AI

This checklist extends SR 11-7 and SS1/23 to cover ML and GenAI models. Use it as an intake gate before model development begins, and again as a pre-deployment validation scope document. Tick items as you confirm them with your model risk team.

1 · Intake & classification

2 · Data & documentation

3 · Development & testing

4 · Independent validation

5 · Deployment & monitoring

6 · Generative AI extensions

Coverage 0%

Tick items as you confirm them with your model risk team.

SR 11-7 & SS1/23 section references

This checklist is aligned with the Federal Reserve's SR 11-7 and Bank of England's SS1/23 for model risk management. Below are key cross-references:

SR 11-7 / SS1/23 Principle	Checklist Sections Addressing	Key Requirements
Governance	g1: Risk tier, owner, approval	Board & executive accountability; independent oversight; risk appetite defined
Model Development	g3: Methodology, testing, explainability	Documented process; rigorous testing; version control; reproducibility
Model Validation	g4: Independent validation, sign-off	Pre-deployment validation; conceptual, data, performance soundness; exception tracking
Data Governance	g2: Data quality, PII, lineage, bias	Data quality standards; PII protection; representativeness; fairness testing
Monitoring & Reporting	g5: SLOs, drift detection, alerting, revalidation	Real-time monitoring; performance tracking; drift detection; escalation procedures; periodic revalidation
Documentation	All sections; esp. g1, g3, g5	Model inventory, purpose, methodology, testing results, deployment plan, monitoring metrics
GenAI-specific (SS1/23)	g6: Foundation model, prompts, hallucination, human-in-the-loop	Model & vendor assessment; prompt governance; grounding; hallucination controls; human oversight

Validation scope template (for Level 1 & 2 models)

For Tier 1 & 2 models, use this template to scope independent validation before deployment:

Validation Area	Scope for Tier 1	Scope for Tier 2	Owner / Timeline
Conceptual Soundness	Full review of business case, model choice, design	Review of model design & business rationale	Model Risk / 2–3 weeks
Data Review	Full data audit: sources, quality, bias, representativeness	Sampling of data quality & bias testing	Model Risk / 1–2 weeks
Performance Testing	Backtesting on holdout set; stress testing; fairness testing all cohorts	Backtesting; fairness testing on main cohorts	Model Risk / 1–2 weeks
Robustness & Stress	Adversarial testing, edge cases, degraded data scenarios	Key edge case testing	Model Risk / 1 week
Implementation	Code review, integration testing, production readiness	Code review, basic integration testing	Tech Risk / 1 week
Explainability	Full explainability testing; SHAP/LIME plots; comparison models	Explainability for top features; simpler challenger	Model Risk / 1 week
Governance	Full validation report; sign-off; condition documentation	Validation report; sign-off	Model Risk / 1 week
Timeline Total	6–8 weeks (typical for Tier 1)	3–4 weeks (typical for Tier 2)	Plan accordingly in roadmap

Model inventory template (minimum fields)

Your central AI/ML inventory should track at minimum these fields for each model:

Field	Description	Example	Owner
Model ID	Unique identifier for the model	LOAN_SCORING_V2.1	Tech/Data
Business Name	Non-technical name for the model	Mortgage Pre-Approval Scorer	Business
Business Owner	Executive accountable for the model	Head of Retail Lending	Business
Model Owner (Tech)	Person responsible for day-to-day operation	[model-owner@yourbank.com]	Data Science
Risk Owner	Executive accountable for risk management	Chief Risk Officer / Head of Model Risk	Risk
Risk Tier	1 (Critical), 2 (High), 3 (Moderate), or 4 (Low)	1	Risk
Regulatory Classification	EU AI Act (prohibited/high-risk/limited/minimal), SR 11-7, GDPR ADM, Fair Lending	High-risk (EU AI Act), SR 11-7, GDPR ADM	Compliance
Model Type	Traditional ML, DL, GenAI, ensemble, prompt engineering	Logistic Regression + XGBoost ensemble	Data Science
Status	Development, Validation, Production, Retired	Production	Data Science
Production Date	When model went live	2025-06-15	Data Science
Last Revalidation Date	Most recent validation/revalidation completed	2026-03-20	Risk
Next Revalidation Due	Scheduled revalidation date by risk tier	2026-06-20 (quarterly for Tier 1)	Risk
Monitoring Status	Active, Alert, Suspended, Decommissioning	Active	Data Science
Fairness Testing Completed	Y/N and date of last fairness audit	Y (2026-03-15)	Risk
Explainability Available	Y/N; explanation method	Y (SHAP, LIME)	Data Science
Data Governance Owner	Team responsible for training data quality	Data Platform team	Data
Foundation Model (if GenAI)	OpenAI GPT-4, Anthropic Claude, etc.	OpenAI GPT-4 Turbo	Data Science
Vendor/Third-party	If model or foundation model from external vendor	OpenAI / NA	Vendor Risk
Audit Trail / Documentation	Link to model card, validation report, incident log	SharePoint link to MRM documentation folder	Risk

The most common failure: models deployed once and never reassessed. Automate revalidation triggers, schedule cadence in your inventory, and track compliance monthly. Silent degradation is the biggest risk in production AI.

Common Model Risk Management gaps

Watch out for these common failures in AI/ML model risk:

Validation bottleneck. Independent validation understaffed; model backlog grows; reviews become superficial. Fix: Risk-tier to avoid over-validating low-risk models; outsource to consulting firms if needed.
Stale inventory. Model inventory built once but never updated; shadow AI untracked. Fix: Automate discovery; quarterly inventory audits; reconcile to monitoring & spending data.
Silent failures. Models degrade in production but no one notices; fairness test failures go unaddressed. Fix: Real-time monitoring; SLOs & automatic alerting; escalation SLAs enforced.
Documentation debt. Models in production with no validation reports, testing results or explainability documentation. Fix: Templates & automation; gating on documentation quality before deployment approval.
Unfair outcomes ignored. Fairness testing shows disparate impact but model stays in production unchanged. Fix: Make fairness outcomes a go/no-go gate; tie business incentives to fairness SLOs.
No GenAI governance. LLMs and prompt engineering running wild outside MRM scope. Fix: Extend MRM policy to cover all generative AI; deploy enterprise GenAI gateway.
Revalidation never happens. Models deployed once and never reassessed; risk profile drifts silently. Fix: Automate revalidation triggers; schedule cadence in inventory; track compliance monthly.

Ready to implement this in your organisation?

Get in touch to discuss how this accelerator fits your institution.

Book a Consultation →