Overiew
Agentic adversarial attacks are specialized attack methods designed to exploit vulnerabilities in autonomous AI agents. These attacks target the unique characteristics of agentic systems including persistent memory, goal-oriented behavior, and autonomous decision-making capabilities.
Available Agentic Attacks
deepteam
provides 6 agentic attack methods that enhance baseline attacks with sophisticated manipulation techniques:
System Override - Makes attacks appear as legitimate system commands, administrative overrides, or trusted service requests to bypass agent security controls.
Permission Escalation - Shifts the agent's perceived identity, capabilities, or operational context to bypass role-based security restrictions.
Objective Reframing - Reframes the agent's objectives, priorities, or operational constraints to achieve unauthorized outcomes.
Semantic Manipulation - Uses semantic ambiguity, syntactic variation, and contextual reframing to confuse the agent's language understanding and bypass security controls.
Input Bypass - Circumvents security checks, compliance requirements, and validation procedures through exception handling and operational necessity claims.
Context Poisoning - Injects false environmental context, manipulates memory integration, and establishes persistent contextual influence to corrupt agent decision-making.
Usage Example
from deepteam import red_team
from deepteam.vulnerabilities.agentic import GoalTheft, RecursiveHijacking
from deepteam.vulnerabilities import RBAC, BFLA
from deepteam.attacks.single_turn import SystemOverride, ObjectiveReframing
# Define vulnerabilities and their best-paired attacks
vulnerabilities = [
RBAC(types=["unscoped_elevated_permissions"]),
GoalTheft(types=["goal_redirection"])
]
attacks = [
SystemOverride(weight=2, max_retries=3), # Best for access control vulnerabilities
ObjectiveReframing(weight=2, max_retries=3) # Best for goal-related vulnerabilities
]
# Execute red teaming with optimal attack-vulnerability pairings
risk_assessment = red_team(
model_callback=your_agent_callback,
vulnerabilities=vulnerabilities,
attacks=attacks,
attacks_per_vulnerability_type=5
)
Maximum Attack Effectiveness
For maximum attack effectiveness, use these optimal pairings:
Attack Method | Best Vulnerability Targets | Why This Pairing Works |
---|---|---|
System Override | RBAC, Debug Access, Excessive Agency, BFLA | System override exploits trust in administrative commands to bypass role-based access controls and escalate permissions |
Permission Escalation | BFLA, BOLA, RBAC, Goal Theft | Permission escalation manipulates perceived roles and authority to bypass access controls and hijack agent objectives |
Objective Reframing | Goal Theft, Recursive Hijacking, Misinformation | Objective reframing redirects agent goals and can lead to spreading false information through goal manipulation |
Semantic Manipulation | Bias, Toxicity, Misinformation, Output Verification | Semantic manipulation exploits language processing to generate biased, toxic, or misleading content that bypasses verification |
Input Bypass | Shell Injection, SQL Injection, SSRF, Debug Access | Input bypass circumvents validation systems to enable injection attacks and unauthorized system access |
Context Poisoning | PII Leakage, Prompt Leakage, Intellectual Property, Recursive Hijacking | Context poisoning corrupts agent memory to cause data leakage and enable persistent goal subversion across sessions |
Combine multiple attack methods with their optimal vulnerability targets to create comprehensive attack scenarios that test different aspects of agent security simultaneously.