High-Impact AI Classification Under OMB M-25-21

How many of your agency’s AI systems qualify for high-impact AI classification under OMB M-25-21? Not the number you reported in last year’s use case inventory under M-24-10. The number that actually qualifies today, under the single standard the Office of Management and Budget (OMB) put in place with OMB M-25-21 compliance guide on April 3, 2025.

Most agencies have not answered that question with precision. They migrated their existing M-24-10 risk classifications forward, relabeled them, and called it compliant. That approach misses what M-25-21 actually changed. The new framework does not refine the old categories. It replaces them with a materially different analytical structure, and the classification process is where agencies either build durable governance or inherit the same accountability gaps they have been carrying for years.

The classification decision triggers every downstream requirement: pre-deployment testing, performance monitoring, human oversight documentation, and the public inventory entry that auditors, oversight bodies, and OMB will review. Get the classification wrong, and every governance structure built on top of it is misaligned. The four domains and the outcome proximity test are the foundation. Both require closer analysis than most agencies have applied.

Classify an AI system as high-impact by applying the outcome proximity test: does the system’s output materially determine a consequential action, or is it one input among several that a human evaluates independently? Run each system through four domains (individual rights, government services access, personal safety, sensitive federal resources). One domain match triggers high-impact status. The AI governance board documents the analysis, and the Chief AI Officer (CAIO) approves the determination in writing.

What “Materially Affect” Means for High-Impact AI Classification

The operative threshold in M-25-21 is “materially affect”. That phrase carries more analytical weight than most agencies are giving it. An AI system does not become high-impact because it operates in a sensitive domain. It becomes high-impact because its outputs, in the normal course of operations, produce consequential outcomes for the people or resources in that domain without a substantive human check between the AI output and the action taken.

The Outcome Proximity Test

The practical classification test is outcome proximity: how close is the AI output to the final decision or action? An AI system that screens benefits eligibility and produces a determination a caseworker accepts without independent review is high-impact. The AI output is, functionally, the decision. An AI system that summarizes regulations for an analyst who then exercises independent judgment is not. The AI output is one input among several.

The test requires looking at actual operational behavior, not intended design. A system designed as a decision-support tool often functions as a decisioning tool in practice. Case volume, staffing ratios, and organizational culture can all compress the intended human check into a perfunctory review. Classification based on design intent, without examining operational reality, produces systematically incorrect results.

Material Versus Informational Effects

Not every AI system that touches a regulated domain qualifies as high-impact. M-25-21 draws the line at material effect, which excludes purely informational applications. An AI chatbot that answers questions about federal program eligibility criteria is informational. An AI system that determines whether a specific applicant meets those criteria is material. The distinction is whether the system characterizes the world in general or determines outcomes for individuals.

That line is meaningful in practice because informational AI systems govern under M-25-21’s streamlined framework. They still require governance, but the documentation, testing, and oversight obligations are calibrated to a lower risk profile. Misclassifying a material system as informational does not reduce the compliance obligation. It just means the obligation goes unmet.

The audit fix. For each AI system under review, document three facts: what the system’s output is, who receives that output, and what typically happens between output delivery and the consequential action. If the answer to the third question is “nothing substantial,” treat the system as a high-impact classification candidate and proceed through the full four-domain analysis.

The Four Domains of High-Impact AI Classification

M-25-21 specifies four classification domains. Each domain represents a category of consequential federal activity where AI outputs can materially affect people or resources. A system qualifies as high-impact if it materially affects any one of the four. The domains are not a checklist where multiple boxes must be checked. A single domain match triggers high-impact classification.

Domain 1: Individual Rights

The individual rights domain covers AI systems whose outputs affect the legal rights, civil liberties, or constitutional protections of persons. This domain applies regardless of whether the person is a federal employee, contractor, benefit applicant, or member of the public interacting with a federal program.

Concrete examples include systems that make or inform determinations about law enforcement stops, surveillance authorization, background check outcomes, employment eligibility, or immigration status. The common thread is that the AI output affects a legal standing that the individual has a recognized right to contest or protect. The National Institute of Standards and Technology (NIST) AI Risk Management Framework (AI RMF) identifies individual rights applications as among the highest-stakes categories for AI risk management, warranting governance structures proportionate to the severity of potential harm.

Domain 2: Access to Government Services

The access to services domain covers AI systems that affect whether individuals receive federal benefits, program participation, or government-administered support. Benefits eligibility screening, grant application scoring, healthcare program enrollment determinations, and social services case management fall into this category when the AI output materially influences the access decision.

This domain carries particular weight because denial of access often produces cascading harm. A wrongful benefits denial does not simply delay a payment. It may prevent medical care, housing stability, or other outcomes that compound over time. Classification decisions in this domain should weigh not just the probability of error but the severity of the downstream consequence if an error occurs.

Domain 3: Personal Safety

The personal safety domain covers AI systems whose outputs affect physical safety outcomes for individuals. Emergency response dispatch systems, clinical decision-support tools used in federal healthcare facilities, critical infrastructure monitoring, and hazard assessment platforms all qualify when AI outputs materially influence safety-critical decisions.

The safety domain often involves the highest consequence scenarios in an agency’s AI portfolio. An incorrect output does not produce a document error or an administrative delay. It produces a physical outcome. NIST AI RMF treats safety-critical AI as requiring the most rigorous pre-deployment testing protocols, including adversarial testing, failure mode analysis, and defined performance thresholds with mandatory human escalation when thresholds are not met.

Domain 4: Sensitive Federal Resources

The sensitive federal resources domain covers AI systems that manage, protect, or make decisions about federal assets that would cause significant harm if compromised, mismanaged, or improperly disclosed. This domain includes AI systems in cybersecurity operations centers, classified information management, national security analysis, and critical infrastructure protection.

The resource domain is the one most likely to be underclassified. Agencies sometimes treat cybersecurity AI tools as purely technical systems rather than governance-relevant AI, and route them outside the classification process entirely. M-25-21 does not support that interpretation. Any AI system making material decisions about federal resource protection qualifies for the domain analysis.

The audit fix. Run each AI system through the four-domain sequence in writing. Document which domains apply, which do not, and the specific reason. A system that clears all four domains without triggering any does not require high-impact governance under M-25-21. A system that triggers one domain requires the full governance framework. Keep this analysis in the system’s official record for AI governance board review and CAIO approval.

Bottom Line Up Front

The four-domain structure in M-25-21 is not a simplified version of the M-24-10 framework. It is a more operationally precise one. M-24-10’s bifurcated categories invited definitional disputes that delayed governance action. M-25-21 routes every system through the same outcome proximity test against the same four domains. The point is speed to accurate classification, not reduced accountability.

Governance Requirements for High-Impact AI Systems

Classification as high-impact triggers four governance requirements under M-25-21, each of which must be documented before the system deploys or, for existing systems, before the next use case inventory submission. These are not aspirational standards. They are conditions for lawful deployment of systems in the high-impact category.

Pre-Deployment Testing

High-impact AI systems require pre-deployment testing that validates system performance against documented requirements before the system goes into production use. The testing record must demonstrate that the system performs as intended across the population of inputs it will encounter in operation, including edge cases and adversarial inputs where relevant.

NIST AI RMF provides the methodology for structuring pre-deployment testing plans. The framework calls for documenting the test population, the performance metrics, the acceptable thresholds, and the outcome of testing against each threshold. A testing record that documents only that testing occurred, without recording what was tested, how it was measured, and what the results were, does not satisfy the M-25-21 requirement.

Ongoing Performance Monitoring

High-impact AI systems require ongoing performance monitoring after deployment. The monitoring obligation does not expire when a system passes pre-deployment testing. AI system performance can degrade as operational data distributions shift, as the population of affected individuals changes, or as the system encounters inputs outside its training distribution.

Effective monitoring defines the metrics being tracked, the frequency of assessment, the threshold that triggers review, and the escalation path when performance falls below threshold. Monitoring logs should feed directly into the AI governance board’s review cycle. A system with pre-deployment testing records but no post-deployment monitoring plan has satisfied half the requirement.

Human Oversight Mechanisms

M-25-21 requires documented human oversight mechanisms for high-impact AI systems. The requirement is specific: the oversight must be a mechanism, not a policy statement. That means a defined process, assigned to a named role, with a documented trigger for when human review activates and a documented record of reviews conducted.

Human oversight that exists only on paper does not satisfy this requirement. The governance board’s review of a system’s human oversight mechanism should include evidence of actual reviews, not just a description of the review process. NIST AI RMF treats human oversight as one of the primary controls for managing AI risk in high-stakes applications. Where the oversight record is thin, auditors will treat it as a gap.

Decision Chain Documentation

High-impact AI systems require documentation of the system’s role in the decision chain. That documentation describes how the AI output connects to the downstream action, who receives the output, what authority that person has to act on or override it, and what record is created of the decision.

Decision chain documentation serves two functions. First, it makes the classification analysis auditable: the board and the CAIO can verify that the system’s operational role aligns with its classification. Second, it creates the accountability record needed when a high-impact decision is later reviewed, challenged, or subject to oversight inquiry. Gaps in the decision chain record are among the most common findings in federal AI governance audits.

The audit fix. Build a governance package for each high-impact AI system that contains four documents: the pre-deployment testing record, the ongoing monitoring plan with current performance data, the human oversight mechanism description with review logs, and the decision chain narrative. If any of the four documents does not exist or has not been updated in the past 12 months, treat that system as carrying a material governance gap. Bring it to the AI governance board for remediation before the next inventory submission.

Migrating from M-24-10 to M-25-21 Classification

Agencies that classified AI systems under M-24-10’s rights-impacting and safety-impacting categories need to migrate those classifications to the M-25-21 framework. The migration is not purely administrative. The new framework’s outcome proximity test and four-domain structure will produce different results for some systems, and identifying those differences is where the compliance risk lives.

Systems That May Change Classification Status

Three categories of systems warrant particular attention in the migration analysis. First, systems that were classified as rights-impacting or safety-impacting under M-24-10 but whose outputs are primarily informational. These systems may qualify for M-25-21’s streamlined framework if the outcome proximity test confirms they do not materially affect the relevant domain. The governance overhead reduction is real, but the reclassification must be documented.

Second, systems that were not classified as high-impact under M-24-10 but that operate in the sensitive federal resources domain. M-24-10’s framework did not give that domain equivalent weight. M-25-21 does. AI systems managing cybersecurity operations, classified data, or critical infrastructure that escaped M-24-10’s categories should go through a fresh four-domain analysis.

Third, systems that have materially changed in capability or operational scope since their M-24-10 classification. An AI tool that was a narrow decisioning assistant in 2024 and has since expanded to cover a broader population or a broader decision set needs to be reclassified under current operational facts, not the facts at original deployment.

The AI Governance Board’s Role in Migration

The AI governance board owns the classification decision. In the migration context, that means the board must formally review each existing system, apply the M-25-21 framework, and produce a documented classification determination. Systems that carried over from M-24-10 without a formal board review have not been classified under M-25-21. They have been assumed classified, which is a different and weaker position.

The CAIO approves classifications. That approval creates the official record and the accountability trail. If a system is later found to be misclassified, the inquiry will start with the classification record. Boards that produce thorough written analyses with explicit application of the four domains and the outcome proximity test create a defensible record. Boards that produce thin approvals do not.

The audit fix. Schedule a formal M-24-10 migration review session with the AI governance board. For each system previously classified under M-24-10, the board should produce a written analysis applying the M-25-21 four-domain framework and the outcome proximity test. The CAIO should approve or reject each determination in writing. Complete this review before the public AI use case inventory submission. The inventory reflects your current classification status; the board review creates the record supporting it.

Classification Factor	High-Impact Indicator	Non-High-Impact Indicator	Evidence to Collect
Outcome proximity	AI output is effectively the decision; human review is perfunctory	AI output is one input; human exercises independent judgment	Workflow documentation, staffing ratios, review time logs
Individual rights domain	Output affects legal standing, civil liberties, or constitutional rights	Output provides general legal information without individual determination	Description of affected population, legal authority applied
Access to services domain	Output affects whether an individual receives federal benefits or program access	Output provides program information without eligibility determination	Decision authority documentation, program participation records
Personal safety domain	Output influences a physical safety decision in real time	Output informs safety research or planning without operational use	Operational context, integration with safety-critical systems
Sensitive federal resources domain	Output affects protection, access, or management of classified or critical federal assets	Output supports administrative functions with no direct resource access	Asset classification level, data access permissions, system integrations
Governance documentation	High-impact: pre-deployment testing, monitoring plan, human oversight log, decision chain narrative required	Non-high-impact: standard inventory entry with use case description and basic risk documentation	All four documents for high-impact; use case inventory record for non-high-impact
Classification authority	AI governance board review and written determination required; CAIO approval required	AI governance board review required; CAIO approval required	Board meeting minutes, CAIO approval record, dated determination

High-impact AI classification under M-25-21 is not a paperwork exercise. It is a risk allocation decision. Get the classification right, and the governance structure the agency builds is proportionate to actual risk. Get it wrong, and the agency either over-governs low-risk systems while under-governing high-risk ones, or carries material compliance gaps into the public inventory that oversight bodies will find. The four-domain analysis and the outcome proximity test are the tools. The AI governance board and CAIO are the accountable parties. Both need to produce written records that hold up under examination, not approvals that read like they were signed without analysis.

Frequently Asked Questions

What is the high-impact AI classification standard under M-25-21?

High-impact AI classification under M-25-21 applies to systems whose outputs materially affect individual rights, access to government services, personal safety, or sensitive federal resources. The AI governance board conducts the analysis and the CAIO approves the determination. Systems that meet this standard require pre-deployment testing, ongoing performance monitoring, human oversight mechanisms, and decision chain documentation.

How does M-25-21 differ from M-24-10 on AI classification?

M-24-10 used two separate categories: rights-impacting AI and safety-impacting AI. M-25-21 replaces both with a single standard: high-impact AI. The new framework applies a single outcome proximity test against four defined domains rather than routing systems into separate category-specific checklists. Some systems that were classified under M-24-10 will reclassify under M-25-21, in either direction.

Who has authority to classify an AI system as high-impact?

The AI governance board conducts the classification review and produces the written determination. The CAIO approves the classification. Both steps are required; neither alone satisfies the framework. The approval record should document the board’s analysis, the domains reviewed, the outcome proximity determination, and the CAIO’s formal approval with date.

What governance requirements apply to high-impact AI systems?

M-25-21 requires four elements for high-impact AI systems: pre-deployment testing that validates performance before production use, ongoing performance monitoring with defined thresholds, a documented human oversight mechanism with evidence of actual reviews, and a decision chain narrative describing how the AI output connects to the consequential action. All four must be documented before deployment or before the next inventory submission for existing systems.

Does a non-high-impact AI system require any governance under M-25-21?

Non-high-impact AI systems still require governance under M-25-21, but the requirements are scaled to the lower risk profile. At minimum, non-high-impact systems must appear in the annual AI use case inventory with a documented classification basis. The AI governance board must still review the classification. The pre-deployment testing, performance monitoring, and human oversight documentation requirements that apply to high-impact systems do not automatically apply to non-high-impact systems at the same depth.

How does the outcome proximity test work in practice?

The outcome proximity test asks how close the AI output is to the consequential action. An AI system whose output a human decision-maker accepts without substantive independent review is high-impact regardless of how the system was designed. Agencies should examine actual operational behavior, not intended design, because case volume, staffing, and organizational culture regularly compress intended human review into a perfunctory step.

How does high-impact AI classification feed into the public AI use case inventory?

The annual AI use case inventory required under M-25-21 reflects each system’s current classification status. High-impact systems must appear with their governance documentation records current. The inventory is a public accountability document; OMB and oversight bodies use it to assess agency compliance. Classification decisions made by the governance board and approved by the CAIO become the official basis for each inventory entry.

What role does NIST AI RMF play in M-25-21 classification?

NIST AI RMF provides the risk assessment methodology that agencies apply to M-25-21 classification. The framework’s MAP function structures the context assessment for determining whether a system’s outputs materially affect the four domains. The MANAGE function governs the ongoing monitoring and human oversight requirements for classified systems. M-25-21 does not mandate NIST AI RMF by name, but the framework is the recognized federal standard for AI risk assessment that operationalizes the M-25-21 governance requirements.

Subscribe to The Authority Brief for next week’s analysis.