Skip to main content

Overiew

Agentic adversarial attacks are specialized attack methods designed to exploit vulnerabilities in autonomous AI agents. These attacks target the unique characteristics of agentic systems including persistent memory, goal-oriented behavior, and autonomous decision-making capabilities.

Available Agentic Attacks

deepteam provides 6 agentic attack methods that enhance baseline attacks with sophisticated manipulation techniques:

System Override - Makes attacks appear as legitimate system commands, administrative overrides, or trusted service requests to bypass agent security controls.

Permission Escalation - Shifts the agent's perceived identity, capabilities, or operational context to bypass role-based security restrictions.

Objective Reframing - Reframes the agent's objectives, priorities, or operational constraints to achieve unauthorized outcomes.

Semantic Manipulation - Uses semantic ambiguity, syntactic variation, and contextual reframing to confuse the agent's language understanding and bypass security controls.

Input Bypass - Circumvents security checks, compliance requirements, and validation procedures through exception handling and operational necessity claims.

Context Poisoning - Injects false environmental context, manipulates memory integration, and establishes persistent contextual influence to corrupt agent decision-making.

Usage Example

from deepteam import red_team
from deepteam.vulnerabilities.agentic import GoalTheft, RecursiveHijacking
from deepteam.vulnerabilities import RBAC, BFLA
from deepteam.attacks.single_turn import SystemOverride, ObjectiveReframing

# Define vulnerabilities and their best-paired attacks
vulnerabilities = [
RBAC(types=["unscoped_elevated_permissions"]),
GoalTheft(types=["goal_redirection"])
]

attacks = [
SystemOverride(weight=2, max_retries=3), # Best for access control vulnerabilities
ObjectiveReframing(weight=2, max_retries=3) # Best for goal-related vulnerabilities
]

# Execute red teaming with optimal attack-vulnerability pairings
risk_assessment = red_team(
model_callback=your_agent_callback,
vulnerabilities=vulnerabilities,
attacks=attacks,
attacks_per_vulnerability_type=5
)

Maximum Attack Effectiveness

For maximum attack effectiveness, use these optimal pairings:

Attack MethodBest Vulnerability TargetsWhy This Pairing Works
System OverrideRBAC, Debug Access, Excessive Agency, BFLASystem override exploits trust in administrative commands to bypass role-based access controls and escalate permissions
Permission EscalationBFLA, BOLA, RBAC, Goal TheftPermission escalation manipulates perceived roles and authority to bypass access controls and hijack agent objectives
Objective ReframingGoal Theft, Recursive Hijacking, MisinformationObjective reframing redirects agent goals and can lead to spreading false information through goal manipulation
Semantic ManipulationBias, Toxicity, Misinformation, Output VerificationSemantic manipulation exploits language processing to generate biased, toxic, or misleading content that bypasses verification
Input BypassShell Injection, SQL Injection, SSRF, Debug AccessInput bypass circumvents validation systems to enable injection attacks and unauthorized system access
Context PoisoningPII Leakage, Prompt Leakage, Intellectual Property, Recursive HijackingContext poisoning corrupts agent memory to cause data leakage and enable persistent goal subversion across sessions
tip

Combine multiple attack methods with their optimal vulnerability targets to create comprehensive attack scenarios that test different aspects of agent security simultaneously.