Red Teaming with YAML Configuration
Quick Summary
deepteam
offers a powerful CLI interface that allows you to red team LLM applications using YAML configuration files. This approach provides a declarative way to define your red teaming setup, making it easier to version control, share, and reproduce red teaming experiments across different environments and team members.
The YAML CLI interface is built on top of the same red teaming engine as the Python API. All vulnerabilities, attacks, and evaluation capabilities are available through both interfaces.
The YAML CLI approach is made up of 4 main configuration sections:
- Models Configuration - specify which LLMs to use for simulation and evaluation.
- Target Configuration - define your target LLM system and its purpose.
- System Configuration - control concurrency, output settings, and behavior.
- Vulnerabilities and Attacks - specify what to test and how to test it.
Here's how you can implement it with a YAML configuration:
# Red teaming models (separate from target)
models:
simulator: gpt-3.5-turbo-0125
evaluation: gpt-4o
# Target system configuration
target:
purpose: "A helpful AI assistant"
model: gpt-3.5-turbo
# System configuration
system_config:
max_concurrent: 10
attacks_per_vulnerability_type: 3
run_async: true
ignore_errors: false
output_folder: "results"
default_vulnerabilities:
- name: "Bias"
types: ["race", "gender"]
- name: "Toxicity"
types: ["profanity", "insults"]
attacks:
- name: "Prompt Injection"
Then run the red teaming with a single command:
deepteam run config.yaml
The YAML CLI interface is particularly useful for CI/CD pipelines and automated testing where you need reproducible, version-controlled red teaming configurations that can be easily shared across development teams.
Models Configuration
The models
section defines which LLMs to use for simulating attacks and evaluating responses. These models are separate from your target system and are used by deepteam
internally.
models:
simulator: gpt-3.5-turbo-0125
evaluation: gpt-4o
There are TWO optional parameters when creating models configuration:
- [Optional]
simulator
: the LLM model used to generate and enhance adversarial attacks. Defaulted to"gpt-3.5-turbo-0125"
. - [Optional]
evaluation
: the LLM model used to evaluate responses and determine vulnerability scores. Defaulted to"gpt-4o"
.
Using different models for simulation and evaluation can be beneficial. A more creative model like GPT-3.5 for simulation can generate more diverse attacks, while a more reliable model like GPT-4o for evaluation ensures consistent scoring.
Target Configuration
The target configuration defines the LLM system you want to red team and its intended purpose. This affects how attacks are generated and how responses are evaluated.
target:
purpose: "A helpful AI assistant for customer support"
model: gpt-3.5-turbo
There are ONE mandatory and ONE optional parameter when creating target configuration:
model
ORcallback
: the target LLM model to test (must specify eithermodel
orcallback
)- [Optional]
purpose
: a description of your LLM application's intended purpose that helps contextualize the red teaming. Defaulted to""
.
For testing your own LLM applications with custom logic:
target:
purpose: "A financial advice chatbot"
model:
provider: custom
file: "my_financial_bot.py"
class: "FinancialAdvisorLLM"
When using provider: custom
, both file
and class
fields are mandatory.
Alternatively, you can specify a custom callback function:
target:
purpose: "A custom chatbot"
callback:
file: "my_callback.py"
function: "model_callback" # optional, defaults to "model_callback"
When using callback
, the file
field is mandatory while function
is optional.
When using custom models, ensure your model class inherits from DeepEvalBaseLLM
and implements the required methods as described in the custom LLM documentation.
System Configuration
The system configuration controls how the red teaming process executes, including concurrency settings, output options, and error handling behavior.
system_config:
max_concurrent: 10
attacks_per_vulnerability_type: 3
run_async: true
ignore_errors: false
output_folder: "deepteam-results"
There are FIVE optional parameters when creating system configuration:
- [Optional]
max_concurrent
: maximum number of parallel operations. Defaulted to10
. - [Optional]
attacks_per_vulnerability_type
: number of attacks to generate per vulnerability type. Defaulted to1
. - [Optional]
run_async
: enable asynchronous execution for faster processing. Defaulted toTrue
. - [Optional]
ignore_errors
: continue red teaming even if some attacks fail. Defaulted toFalse
. - [Optional]
output_folder
: directory to save red teaming results. Defaulted toNone
.
Vulnerabilities and Attacks
The vulnerabilities and attacks sections define what weaknesses to test for and which attack methods to use. This mirrors the Python API but in a declarative YAML format.
Defining Vulnerabilities
default_vulnerabilities:
- name: "Bias"
types: ["race", "gender", "political"]
- name: "Toxicity"
types: ["profanity", "insults", "hate_speech"]
- name: "PII"
types: ["social_security", "credit_card"]
Each vulnerability entry has:
name
: the vulnerability class name (required, must match available vulnerability classes)- [Optional]
types
: list of sub-types for that vulnerability (specific to each vulnerability class, defaults to all types if not specified)
For custom vulnerabilities:
custom_vulnerabilities:
- name: "Business Logic"
criteria: "Check if the response violates business logic rules"
types: ["access_control", "privilege_escalation"]
prompt: "Custom evaluation prompt template"
There are TWO mandatory and TWO optional parameters when creating custom vulnerabilities:
name
: display name for your vulnerabilitycriteria
: defines what should be evaluated- [Optional]
types
: list of sub-types for this vulnerability - [Optional]
prompt
: custom prompt template for evaluation
Defining Attacks
attacks:
- name: "Prompt Injection"
weight: 2
- name: "ROT13"
weight: 1
- name: "Base64"
Each attack entry has:
name
: the attack class name (required, must match available attack classes)- [Optional]
weight
: relative probability of this attack being selected (default: 1) - [Optional]
type
: attack type parameter (specific to certain attacks) - [Optional]
persona
: persona parameter (for roleplay attacks) - [Optional]
category
: category parameter (specific to certain attacks) - [Optional]
turns
: number of turns (for multi-turn attacks) - [Optional]
enable_refinement
: enable attack refinement (for certain attacks)
Attack weights determine the distribution of attack methods during red teaming. An attack with weight 2 is twice as likely to be selected as an attack with weight 1.
Running Red Teaming
Once you have your YAML configuration file, you can start red teaming with the CLI command.
Basic Usage
deepteam run config.yaml
Command Line Overrides
You can override specific configuration values using command line flags:
# Override concurrency and output folder
deepteam run config.yaml -c 20 -o custom-results
# Override attacks per vulnerability
deepteam run config.yaml -a 5
# Combine multiple overrides
deepteam run config.yaml -c 15 -a 3 -o production-results
There are THREE optional command line flags:
- [Optional]
-c
: maximum concurrent operations (overridessystem_config.max_concurrent
) - [Optional]
-a
: attacks per vulnerability type (overridessystem_config.attacks_per_vulnerability_type
) - [Optional]
-o
: output folder path (overridessystem_config.output_folder
)
Configuration Examples
Quick Testing Configuration
models:
simulator: gpt-3.5-turbo
evaluation: gpt-4o-mini
target:
purpose: "A general AI assistant"
model: gpt-3.5-turbo
system_config:
max_concurrent: 5
attacks_per_vulnerability_type: 1
output_folder: "quick-results"
default_vulnerabilities:
- name: "Toxicity"
- name: "Bias"
types: ["race"]
attacks:
- name: "Prompt Injection"
Production Testing Configuration
models:
simulator: gpt-3.5-turbo-0125
evaluation: gpt-4o
target:
purpose: "A financial advisory AI for retirement planning"
model:
provider: custom
file: "financial_advisor.py"
class: "FinancialAdvisorLLM"
system_config:
max_concurrent: 8
attacks_per_vulnerability_type: 10
run_async: true
ignore_errors: false
output_folder: "production-security-audit"
default_vulnerabilities:
- name: "Bias"
types: ["age", "race", "gender"]
- name: "Misinformation"
types: ["financial"]
- name: "PII"
types: ["social_security", "credit_card"]
- name: "Excessive Agency"
attacks:
- name: "Prompt Injection"
weight: 4
- name: "Jailbreaking"
weight: 3
- name: "Context Poisoning"
weight: 2
- name: "ROT13"
weight: 1
Help and Documentation
Use the help command to see all available options:
deepteam --help
deepteam run --help
Available vulnerabilities: Bias
, Toxicity
, Misinformation
, Illegal Activity
, Prompt Leakage
, PII Leakage
, Unauthorized Access
, Excessive Agency
, Robustness
, Intellectual Property
, Competition
, Graphic Content
, Personal Safety
, CustomVulnerability
.
Available attacks: Base64
, Gray Box
, Leetspeak
, Math Problem
, Multilingual
, Prompt Injection
, Prompt Probing
, Roleplay
, ROT-13
, Crescendo Jailbreaking
, Linear Jailbreaking
, Tree Jailbreaking
, Sequential Break
, Bad Likert Judge
.
For detailed documentation, refer to the vulnerabilities documentation and attacks documentation.