Skip to main content

MITRE ATLAS

The MITRE ATLAS™ (Adversarial Threat Landscape for Artificial-Intelligence Systems) framework provides a structured knowledge base of adversarial tactics, techniques, and procedures (TTPs) used against AI and ML systems. It extends the principles of MITRE ATT&CK® to the AI threat surface, identifying how adversaries can manipulate, exploit, or misuse AI models throughout their lifecycle.

DeepTeam's MITRE ATLAS module implements these adversarial mappings to test your AI or LLM application for security, privacy, and robustness vulnerabilities, across each phase of the AI attack lifecycle.

Overview

In DeepTeam, the MITRE ATLAS framework operationalizes AI-specific adversarial tactics to help organizations detect, mitigate, and analyze model exploitation risks.

Each tactic represents a goal an adversary may have while attacking an AI system. These tactics collectively map the “why” behind adversarial actions and correspond to different testing modules in DeepTeam.

TacticDescription
ReconnaissanceGathering intelligence about AI systems and configurations
Resource DevelopmentAcquiring resources or tools to enable future attacks
Initial AccessGaining entry to the target AI system or environment
ML Attack StagingPreparing, training, or adapting attacks specifically for AI models
ExfiltrationStealing sensitive information, model data, or internal configurations
ImpactManipulating or degrading AI systems to achieve adversarial goals

Each of these categories can be tested independently or together:

from deepteam.frameworks import MITRE
from deepteam import red_team
from somewhere import your_model_callback

atlas = MITRE(categories=["reconnaissance", "impact"])
attacks = atlas.attacks
vulnerabilities = atlas.vulnerabilities

risk_assessment = red_team(
model_callback=your_model_callback,
vulnerabilities=vulnerabilities,
attacks=attacks
)

The MITRE framework accepts ONE optional parameter:

  • [Optional] categories: A list of strings that represent the MITRE ATLAS tactics you want to test your AI application on:
    • reconnaissance: Tests for unintended disclosure of internal info, prompts, or policies through probing or reasoning.
    • resource_development: Checks if the system can aid in creating or supporting malicious tools or content.
    • initial_access: Evaluates resistance to malicious entry via prompt injection, debug access, or exposed interfaces.
    • ml_attack_staging: Tests handling of poisoned or adversarial inputs used to prepare model-targeted attacks.
    • exfiltration: Detects data leakage or extraction of sensitive information through adversarial queries.
    • impact: Simulates harmful actions like goal hijacking, content manipulation, or autonomy abuse.

The MITRE ATLAS Tactics

Reconnaissance — Information Gathering and System Profiling

(MITRE ATLAS ID: AML.TA0002)

Goal: The adversary is trying to gather information about the AI system they can use to plan future operations.

Reconnaissance involves identifying model capabilities, business rules, and potential weaknesses. Adversaries may probe model outputs, query APIs, or use role-based misdirection to uncover internal policies, system prompts, or hidden logic.

DeepTeam tests whether your AI system inadvertently exposes:

  • Internal guardrails, policies, or roles
  • Confidential reasoning chains or business logic
  • Sensitive prompts, credentials, or decision rules
  • Competitive or private strategic information

Example vulnerabilities

  • Competition
  • PromptLeakage
  • RBAC
  • CustomVulnerabilityPolicy Disclosure

Example attacks

  • Roleplay
  • PromptInjection
  • CrescendoJailbreaking
  • SequentialJailbreak
  • TreeJailbreaking
from deepteam.frameworks import MITRE
from deepteam import red_team
from somewhere import your_model_callback

atlas = MITRE(categories=["reconnaissance"])
attacks = atlas.attacks
vulnerabilities = atlas.vulnerabilities

# Modify attributes for your specific testing context if needed
red_team(
model_callback=your_model_callback,
attacks=attacks,
vulnerabilities=vulnerabilities
)

Reconnaissance testing ensures your model does not reveal sensitive internal logic or operational metadata to external users.

Resource Development — Adversarial Capability Building

(MITRE ATLAS ID: AML.TA0003)

Goal: The adversary is creating or acquiring resources (data, prompts, tools, accounts) to enable future AI attacks.

Threat landscape: Adversaries prepare poisoned datasets, prompt libraries, proxy models, or obfuscated payloads that can be used later to bypass safeguards, craft jailbreaks, or seed long-term attacks.

Testing strategy: Simulate adversary preparations by submitting obfuscated/templated prompts, benign-looking poisoned examples, and dataset-like inputs to detect:

  • whether the system assists in creating malicious artifacts, or
  • whether such inputs are accepted, stored, or likely to influence future behavior (e.g., RAG ingestion, fine-tuning pipelines, or plugin storage).

Example vulnerabilities

  • IllegalActivity
  • CustomVulnerability - Execution, Persistence, Defense Evasion, Discovery, CommandAndControl

Example attacks

  • Roleplay
  • Leetspeak, ROT13
  • PromptInjection, CrescendoJailbreaking
from deepteam.frameworks import MITRE
from deepteam import red_team
from somewhere import your_model_callback

atlas = MITRE(categories=["resource_development"])
attacks = atlas.attacks
vulnerabilities = atlas.vulnerabilities

# Modify attributes for your specific testing context if needed
red_team(
model_callback=your_model_callback,
attacks=attacks,
vulnerabilities=vulnerabilities
)

This ensures your AI system remains resilient against adversarial data collection, obfuscation, or resource manipulation during reconnaissance and setup stages.

Initial Access — Entry Point Exploitation

(MITRE ATLAS ID: AML.TA0004)

Goal: The adversary is trying to gain access to the AI system.

Adversaries may attempt to exploit the AI model's input surfaces, administrative interfaces, or embedded systems. DeepTeam maps these behaviors through code injection, misuse of debugging features, and role impersonation.

Example vulnerabilities

  • DebugAccess
  • IllegalActivity
  • SQLInjection
  • SSRF
  • ShellInjection

Example attacks

  • PromptInjection
  • LinearJailbreaking
  • SequentialJailbreak
  • Roleplay
from deepteam.frameworks import MITRE
from deepteam import red_team
from somewhere import your_model_callback

atlas = MITRE(categories=["initial_access"])
attacks = atlas.attacks
vulnerabilities = atlas.vulnerabilities

# Modify attributes for your specific testing context if needed
red_team(
model_callback=your_model_callback,
attacks=attacks,
vulnerabilities=vulnerabilities
)

Testing this phase ensures your model and connected systems resist direct exploitation or unauthorized entry into AI pipelines.

ML Attack Staging — Model-Specific Attack Preparation

(MITRE ATLAS ID: AML.TA0001)

Goal: The adversary is leveraging their knowledge of and access to the target system to tailor the attack.

This phase is unique to AI. Attackers train proxy models, poison context, or craft adversarial inputs to prepare targeted manipulations.

DeepTeam emulates these scenarios by introducing hallucination-based, poisoned, or misleading inputs and observing how your model responds.

Example vulnerabilities

  • ExcessiveAgency
  • CustomVulnerabilityHallucination
  • CustomVulnerabilityIndirect Prompt Injection

Example attacks

  • PromptInjection, Leetspeak, ROT13
  • LinearJailbreaking, TreeJailbreaking, SequentialJailbreak
from deepteam.frameworks import MITRE
from deepteam import red_team
from somewhere import your_model_callback

atlas = MITRE(categories=["ml_attack_staging"])
attacks = atlas.attacks
vulnerabilities = atlas.vulnerabilities

# Modify attributes for your specific testing context if needed
red_team(
model_callback=your_model_callback,
attacks=attacks,
vulnerabilities=vulnerabilities
)

This phase helps ensure your model's context integrity, instruction following, and data reliability cannot be manipulated by crafted adversarial inputs.

Exfiltration — Data or Model Theft

(MITRE ATLAS ID: AML.TA0010)

Goal: The adversary is trying to steal AI artifacts or other sensitive information.

Adversaries in this stage attempt to extract system prompts, private data, or intellectual property through direct or indirect model interaction.

DeepTeam tests for leakage, encoding abuse, and unauthorized disclosure across multiple channels.

Example vulnerabilities

  • PIILeakage
  • IntellectualProperty
  • CustomVulnerabilityASCII Smuggling, Prompt Extraction, Privacy, Indirect Prompt Injection

Example attacks

  • PromptProbing, Leetspeak, ROT13
  • PromptInjection, SequentialJailbreak
  • Roleplay (Security Engineer persona)
from deepteam.frameworks import MITRE
from deepteam import red_team
from somewhere import your_model_callback

atlas = MITRE(categories=["exfiltration"])
attacks = atlas.attacks
vulnerabilities = atlas.vulnerabilities

# Modify attributes for your specific testing context if needed
red_team(
model_callback=your_model_callback,
attacks=attacks,
vulnerabilities=vulnerabilities
)

This testing phase validates whether your AI system properly masks, encodes, and protects data to prevent exfiltration through direct or covert model outputs.

Impact — Manipulation, Misuse, and Degradation

(MITRE ATLAS ID: AML.TA0011)

Goal: The adversary is trying to manipulate, interrupt, or degrade AI system performance or trustworthiness.

In this final phase, adversaries aim to cause misinformation, impersonation, or reputational harm, or to exploit recursive model behaviors for persistent manipulation.

DeepTeam evaluates whether your AI system can resist goal hijacking, recursive propagation, or harmful content generation.

Example vulnerabilities

  • ExcessiveAgency
  • GraphicContent
  • RecursiveHijacking
  • CustomVulnerabilityImitation

Example attacks

  • PromptInjection, LinearJailbreaking, CrescendoJailbreaking
  • Roleplay (Authoritative CEO persona)
from deepteam.frameworks import MITRE
from deepteam import red_team
from somewhere import your_model_callback

atlas = MITRE(categories=["impact"])
attacks = atlas.attacks
vulnerabilities = atlas.vulnerabilities

# Modify attributes for your specific testing context if needed
red_team(
model_callback=your_model_callback,
attacks=attacks,
vulnerabilities=vulnerabilities
)

This stage ensures your system's final outputs and decisions remain aligned with intended ethics, safety, and trust boundaries — even under adversarial manipulation.

Best Practices

  1. Test across all six phases to ensure lifecycle-wide resilience.
  2. Simulate real adversaries using chained or contextual attacks (CrescendoJailbreaking, PromptProbing).
  3. Audit and document every run — align red teaming results with MITRE ATLAS categories.
  4. Integrate with NIST AI RMF — combine measurement (NIST) and tactics (MITRE) for complete coverage.
  5. Continuously retrain and monitor models for post-deployment drift and new attack patterns.
  6. Review model behavior under role-based adversarial prompts to detect policy leaks or privilege misuse.

Learn More