Overiew

Agentic adversarial attacks are specialized attack methods designed to exploit vulnerabilities in autonomous AI agents. These attacks target the unique characteristics of agentic systems including persistent memory, goal-oriented behavior, and autonomous decision-making capabilities.

Available Agentic Attacks

deepteam provides 6 agentic attack methods that enhance baseline attacks with sophisticated manipulation techniques:

System Override - Makes attacks appear as legitimate system commands, administrative overrides, or trusted service requests to bypass agent security controls.

Permission Escalation - Shifts the agent's perceived identity, capabilities, or operational context to bypass role-based security restrictions.

Objective Reframing - Reframes the agent's objectives, priorities, or operational constraints to achieve unauthorized outcomes.

Semantic Manipulation - Uses semantic ambiguity, syntactic variation, and contextual reframing to confuse the agent's language understanding and bypass security controls.

Input Bypass - Circumvents security checks, compliance requirements, and validation procedures through exception handling and operational necessity claims.

Context Poisoning - Injects false environmental context, manipulates memory integration, and establishes persistent contextual influence to corrupt agent decision-making.

Usage Example

from deepteam import red_team
from deepteam.vulnerabilities.agentic import GoalTheft, RecursiveHijacking
from deepteam.vulnerabilities import RBAC, BFLA
from deepteam.attacks.single_turn import SystemOverride, ObjectiveReframing

# Define vulnerabilities and their best-paired attacks
vulnerabilities = [
    RBAC(types=["unscoped_elevated_permissions"]),
    GoalTheft(types=["goal_redirection"])
]

attacks = [
    SystemOverride(weight=2, max_retries=3),  # Best for access control vulnerabilities
    ObjectiveReframing(weight=2, max_retries=3)     # Best for goal-related vulnerabilities
]

# Execute red teaming with optimal attack-vulnerability pairings
risk_assessment = red_team(
    model_callback=your_agent_callback,
    vulnerabilities=vulnerabilities,
    attacks=attacks,
    attacks_per_vulnerability_type=5
)

Maximum Attack Effectiveness

For maximum attack effectiveness, use these optimal pairings:

Attack Method	Best Vulnerability Targets	Why This Pairing Works
System Override	RBAC, Debug Access, Excessive Agency, BFLA	System override exploits trust in administrative commands to bypass role-based access controls and escalate permissions
Permission Escalation	BFLA, BOLA, RBAC, Goal Theft	Permission escalation manipulates perceived roles and authority to bypass access controls and hijack agent objectives
Objective Reframing	Goal Theft, Recursive Hijacking, Misinformation	Objective reframing redirects agent goals and can lead to spreading false information through goal manipulation
Semantic Manipulation	Bias, Toxicity, Misinformation, Output Verification	Semantic manipulation exploits language processing to generate biased, toxic, or misleading content that bypasses verification
Input Bypass	Shell Injection, SQL Injection, SSRF, Debug Access	Input bypass circumvents validation systems to enable injection attacks and unauthorized system access
Context Poisoning	PII Leakage, Prompt Leakage, Intellectual Property, Recursive Hijacking	Context poisoning corrupts agent memory to cause data leakage and enable persistent goal subversion across sessions

tip

Combine multiple attack methods with their optimal vulnerability targets to create comprehensive attack scenarios that test different aspects of agent security simultaneously.

Available Agentic Attacks​

Usage Example​

Maximum Attack Effectiveness​

Available Agentic Attacks

Usage Example

Maximum Attack Effectiveness