AI Governance

AI Red Teaming Methodology: The OWASP + NIST + MITRE ATLAS Synthesis for Enterprise Programs

· 13 min read

Bottom Line Up Front

AI red teaming is now governed by three canonical sources: OWASP Top 10 for Agentic Applications, NIST AI 600-1 plus the RMF Playbook, and MITRE ATLAS. None of them, on their own, gives you a program. GPT-5 was jailbroken within 24 hours of its January 2026 release. The discipline needs a synthesis. This article gives you the program design that maps OWASP risks to MITRE ATLAS tactics to NIST governance gates, plus the staffing model, the test-cycle cadence, and the four artifacts a board AI committee actually reads.

AI red teaming is now governed by three canonical sources: OWASP Top 10 for Agentic Applications, NIST AI 600-1 plus the Risk Management Framework Playbook, and MITRE ATLAS. None of them, on their own, gives you a program. OWASP describes the risks. NIST defines the governance gates. MITRE ATLAS catalogs the adversarial tactics. A security architect tasked with standing up an AI red team has to choose between a prompt-injection focus, an organizational-governance frame, and an adversarial-tactics encyclopedia. Her chief information security officer wants a single program design, not three parallel reading lists.

The discipline needs a synthesis because the threat surface is moving faster than the standards. GPT-5 was jailbroken within 24 hours of its January 2026 release. The pattern matters more than the specific incident: every major model release in the past 18 months has been jailbroken inside 48 hours, and the speed of attack is accelerating while the speed of governance is not. Piecemeal guidance does not scale.

This article gives you the program design that maps OWASP risks to MITRE ATLAS tactics to NIST governance gates, plus the staffing model, the test-cycle cadence, and the four artifacts a board AI committee actually reads. It is written for the security architect whose CISO needs the program design before the next board cycle.

Bottom Line Up Front. AI red teaming is now governed by three canonical sources: OWASP Top 10 for Agentic Applications, NIST AI 600-1 plus the RMF Playbook, and MITRE ATLAS. None of them, on their own, gives you a program. GPT-5 was jailbroken within 24 hours of its January 2026 release. The discipline needs a synthesis. This article gives you the program design that maps OWASP risks to MITRE ATLAS tactics to NIST governance gates, plus the staffing model, the test-cycle cadence, and the four artifacts a board AI committee actually reads.

The Three Canonical Sources, Mapped

OWASP, NIST, and MITRE each contribute a layer to the AI red-teaming discipline. The layers are complementary, not competitive, and a program that uses all three avoids the gaps that any single source leaves open.

Source What It Provides Most Recent Update Layer in the Program
OWASP Top 10 for Agentic Applications Catalog of risks specific to autonomous AI systems December 2025 Threat targets and prioritization
NIST AI 600-1 + RMF Playbook Governance gates, documentation requirements, evaluation methodology RMF Playbook updated March 27, 2026 Governance and decision rights
MITRE ATLAS Adversarial tactics, techniques, case studies, mitigations October 2025: 14 new agent-focused techniques Test execution and tactic library

OWASP Top 10 for Agentic Applications, published December 2025, is the first canonical risk list specifically for autonomous AI systems. The Top 10 covers agent behavior hijacking, tool misuse, identity and privilege abuse, memory poisoning, goal subversion, and similar agentic-specific risks. Earlier OWASP work covered LLM-specific risks (prompt injection, training data poisoning, model denial of service); the Agentic Top 10 extends to autonomous decision-making.

NIST AI 600-1, the Generative AI Profile of the AI Risk Management Framework, was published in 2024 and provides 200-plus suggested actions across 12 generative AI risk categories. The NIST AI RMF Playbook, updated March 27, 2026, operationalizes those actions with specific guidance for testing, evaluation, validation, and verification. NIST is the source for the governance and documentation discipline, not for the specific attack tactics.

MITRE ATLAS is the adversarial tactics encyclopedia. The October 2025 update added 14 new techniques focused on AI agents and generative systems. ATLAS now catalogs 15 tactics, 66 techniques, 46 sub-techniques, 26 mitigations, and 33 case studies. ATLAS is structured like the older MITRE ATT&CK framework for cybersecurity but is purpose-built for AI systems. It is the source for the tactic library that the red team executes against.

The Program Design

The program design has six components. Each draws from at least two of the three canonical sources. The design assumes a mid-sized enterprise with one to three production AI systems and a planned scale to ten or more in the next 18 months.

Component One: Scope Definition (NIST + OWASP)

The scope defines which AI systems are subject to red teaming and at what cadence. NIST AI 600-1 governance discipline drives the inclusion criteria: any system that produces consequential outputs, processes sensitive data, or operates with delegated authority is in scope. OWASP Top 10 risk categories drive the priority within scope: agentic systems with tool access rank higher than passive content-generation systems, even when both are in scope.

The output is a scope register that lists every in-scope AI system, the risk-tier (high, medium, low) based on OWASP risk presence and impact level, and the red-team cadence (quarterly for high, semi-annually for medium, annually for low).

Component Two: Threat Modeling (OWASP + MITRE ATLAS)

Threat modeling identifies which OWASP risks apply to each in-scope system and which MITRE ATLAS tactics the threat actor would plausibly use. A retrieval-augmented generation system answering customer questions has a different threat profile from an autonomous agent with tool access. The threat model produces a system-specific test plan rather than running every tactic against every system.

The output is a threat model per system that names the relevant OWASP risks, the relevant MITRE ATLAS tactics and techniques, and the test scenarios that exercise each.

Component Three: Test Execution (MITRE ATLAS)

Test execution runs the planned tactics against the system. The tests can be manual (expert red teamers crafting prompts and observing responses) or automated (frameworks like Promptfoo, garak, and Microsoft’s PyRIT) or hybrid. The MITRE ATLAS techniques provide the test inventory; the team executes the relevant subset for each system.

The output is a test-result log per system with pass, fail, and partial findings against each tested technique. Failures and partial findings feed remediation; passes feed the assurance baseline.

Component Four: Governance Gates (NIST)

Governance gates establish go and no-go decisions at defined points in the AI system lifecycle. NIST AI 600-1 recommends pre-deployment testing across all 12 GenAI risk categories. The gates operationalize that guidance: a system cannot move from development to staging without a passing red-team result; a system cannot move from staging to production without a separate red-team result; a system in production receives periodic re-testing on the cadence defined in scope.

The output is a gate-status record per system showing the current gate, the most recent passing test, and any open findings that would block the next gate.

Component Five: Mitigation and Remediation (MITRE ATLAS + OWASP)

Mitigation closes the findings. MITRE ATLAS publishes 26 mitigations mapped to specific techniques; OWASP publishes mitigation guidance for each Top 10 risk. The remediation effort uses both as the technical basis. Mitigations include input filtering, output filtering, system prompts hardening, agent permission scoping, monitoring, and architectural changes.

The output is a mitigation log per finding with the mitigation applied, the verification that the mitigation closed the finding, and the residual risk if any.

Component Six: Reporting (NIST + Board Discipline)

Reporting produces the artifacts that the CISO presents to the AI committee, the audit committee, and the board. NIST AI 600-1 recommends documenting the testing performed, the findings, the mitigations, and the residual risk. The reporting cadence aligns to the board’s AI committee meeting cadence, typically quarterly.

The four reports the board actually reads are described in detail below.

The Staffing Model

An effective AI red team has three roles. The roles can be filled by dedicated headcount, by a vendor engagement, or by a hybrid model. Most organizations at the early-program stage use a hybrid; mature programs build dedicated capability.

The first role is the AI red-team lead. The lead designs the program, owns the scope register, and reports to the CISO. The lead has senior security background and AI literacy; this role is rarely filled by an AI specialist without security depth. The lead’s effort is approximately 40 to 60 percent of a full-time role for an enterprise with five to ten in-scope systems.

The second role is the red-team engineer. The engineer executes tests, builds automation, maintains the test framework, and produces test logs. The engineer has security engineering background and adversarial-machine-learning skill. Multiple engineers may be required as the program scales; one engineer can typically support two to four high-tier systems.

The third role is the AI safety analyst. The analyst evaluates findings for severity, recommends mitigations, and tracks remediation. The analyst has AI safety background and policy literacy; this role bridges to the AI governance team. The analyst’s effort is approximately 20 to 40 percent of a full-time role at the early-program stage.

The hybrid model assigns the lead role to internal staff (typically a senior security engineer who picks up AI red-team responsibility) and engages a vendor for the engineer and analyst roles. The hybrid is appropriate when the program is new, the system count is small, and the in-house AI security capability is limited. The hybrid converts to a dedicated team when the system count grows and the program reaches steady state.

The Test-Cycle Cadence

High-tier systems get quarterly red-team cycles. Medium-tier systems get semi-annual. Low-tier systems get annual. Each cycle has the same structure but different depth.

The quarterly cycle for a high-tier system runs approximately 30 to 45 days. Week one is scope and threat-model refresh, drawing from any model updates, infrastructure changes, or OWASP/MITRE catalog updates since the last cycle. Weeks two and three are test execution, covering both regression tests against previous findings and new tests against new tactics. Week four is mitigation, verification, and reporting.

The semi-annual cycle for a medium-tier system runs approximately 20 to 30 days with the same structure but reduced test breadth. The annual cycle for a low-tier system runs approximately 10 to 15 days with focused testing against the OWASP Top 10 risks most relevant to the system.

Out-of-cycle red teaming triggers on three events: a major model update (new base model, fine-tuning, or instruction-tuning change), a new tool integration that expands agent capabilities, and a published vulnerability or attack technique relevant to the system’s class. Out-of-cycle tests are scoped to the change rather than to a full cycle.

The Four Artifacts the Board Reads

The board AI committee receives four artifacts each cycle. The artifacts are designed to be read in order; each builds on the previous one.

Artifact One: AI Red-Team Posture Summary

One page. Lists every in-scope AI system, current tier, last test date, current gate status, and outstanding high-severity findings. The board reads this artifact first to understand the inventory and current posture. The summary is binary in its key signal: every system is either at green (current testing, no high-severity open findings) or not.

Artifact Two: Findings and Mitigations Report

Three to five pages. Lists the findings from the most recent test cycle, the severity of each, the mitigation applied, and the residual risk. Findings are categorized by OWASP risk and mapped to MITRE ATLAS technique. The board reads this artifact to understand what went wrong and what was done about it.

Artifact Three: Threat Landscape Update

Two to three pages. Summarizes new attack techniques, public incidents involving similar systems, and updates to OWASP, NIST, or MITRE catalogs since the last cycle. The board reads this artifact to understand whether the threat surface has shifted in ways the program needs to address.

Artifact Four: Program Health Indicators

One to two pages. Reports test-cycle adherence, mean time to mitigate findings, scope-register currency, and any significant gaps in staffing or tooling. The board reads this artifact to understand whether the program itself is operating effectively, separate from whether the systems it tests are secure.

The Tooling Stack

The tooling stack has three layers. The lower layers are commodity; the upper layer is judgment-driven.

The first layer is the test-automation framework. Open-source frameworks include Promptfoo, garak, and Microsoft’s PyRIT. Commercial offerings include Lakera Guard and HiddenLayer. The framework choice depends on the system’s deployment pattern; some frameworks integrate more cleanly with cloud-deployed APIs than with on-premises models.

The second layer is the tactic library. MITRE ATLAS publishes the canonical library; some commercial tools maintain extended libraries. The library should be kept current with MITRE updates and with internal additions specific to the organization’s threat model.

The third layer is the human red-team capability. Tooling automates the known tactics; humans discover new ones. The human layer is what differentiates a mature program from an automated scan. Investment in human capability pays back through findings the tooling cannot produce.

Frequently Asked Questions

Should we build internal capability or use a vendor?

Most programs start with a vendor engagement and transition to internal capability as the system count grows. A program with one or two AI systems is rarely large enough to justify dedicated headcount. A program with ten or more systems needs internal capability for cadence and institutional knowledge.

How does AI red teaming differ from traditional penetration testing?

Traditional penetration testing tests deployed software for known classes of vulnerabilities. AI red teaming tests learned systems for emergent behaviors, including behaviors the developer did not anticipate. The skill set overlaps but the methodology is distinct; some traditional pen-testers retrain into AI red teaming successfully, others do not.

Are bug bounties relevant for AI systems?

Yes, with adaptation. Several organizations run AI-specific bug bounties focused on jailbreaks, prompt injection, and policy bypasses. Bug bounties supplement formal red teaming but do not replace it; the formal program produces the documented governance trail that bug bounties cannot.

What about regulatory red-teaming requirements?

The EU AI Act imposes red-teaming requirements on high-risk AI systems and on general-purpose AI models with systemic risk. NIST AI 600-1 incorporates red teaming into the GenAI Profile. Future regulatory frameworks are expected to add specifics. A program designed against OWASP, NIST, and MITRE will be substantially aligned with most emerging regulatory requirements.

Should we publish our red-team findings?

Public disclosure of findings is rare and controversial. Most organizations share findings selectively with the affected vendor (when the system is third-party) and within trusted information-sharing communities. Public disclosure of generic technique categories may serve the discipline; public disclosure of system-specific exploits is generally inappropriate.

How do we measure red-team program maturity?

Five indicators: scope-register completeness, test-cycle adherence, mean time to mitigate findings, OWASP and MITRE catalog currency, and board-artifact quality. A program scoring well on all five is mature; a program scoring well on two or three is in development.

The verdict. The three canonical sources cover the discipline together; none covers it alone. The synthesis is the program design, and the program design fits in six components, three roles, four artifacts, and a quarterly cadence for high-tier systems. The CISOs who walk into the board meeting with that synthesis answer the AI security question definitively. The CISOs who arrive with three reading lists and an apology are the ones whose programs are still forming. GPT-5 was jailbroken in 24 hours; the model underneath every enterprise AI system in production today is on the same trajectory. The program that is operating in 2026 is the program that survives the 2027 incident.

Discipline in preparation. Confidence in the room.

Josef Kamara, CPA, CISSP, CISA, Security+
Josef Kamara
Josef Kamara
CPA · CISSP · CISA · Security+

Former KPMG and BDO. Senior manager over third-party risk attestations and IT audits at a top-five global firm, and former technology risk leader directing the IT audit function at a Fortune 500 medical technology company. Advises growth-stage SaaS companies on SOC 2, HIPAA, and AI governance certifications.

The Authority Brief

One compliance analysis per week from Josef Kamara, CPA, CISSP, CISA. Federal and private compliance, written for practitioners.