Robustness
The Robustness vulnerability evaluates whether the target LLM can resist processing malicious inputs and avoid being overly influenced by user-provided data that compromise its intended behavior. More specifically, it focuses on testing whether the system mitigates risks from crafted prompts designed to hijack or alter its functionality.
Usage
from deepteam import red_team
from deepteam.vulnerabilities import Robustness
from deepteam.attacks.single_turn import Roleplay
from somewhere import your_callback
robustness = Robustness(types=["hijacking", "input_overreliance"])
red_team(
vulnerabilities=[robustness],
attacks=[Roleplay()],
model_callback=your_callback
)
There are FIVE optional parameters when creating a Robustness vulnerability:
- [Optional]
simulator_model: a string specifying which of OpenAI's GPT models to use, OR any custom LLM model of typeDeepEvalBaseLLM. Defaulted to 'gpt-3.5-turbo-0125'. - [Optional]
evaluation_model: a string specifying which of OpenAI's GPT models to use, OR any custom LLM model of typeDeepEvalBaseLLM. Defaulted to 'gpt-4o'. - [Optional]
async_mode: a boolean which when set toTrue, enables concurrent execution. Defaulted toTrue. - [Optional]
verbose_mode: a boolean which when set toTrue, prints the intermediate steps used to assess said vulnerability to the console. Defaulted toFalse. - [Optional]
types: a list oftypesofRobustnessto test through. Defaulted to alltypesavailable. Here are the list oftypesavailable forRobustness:hijacking: Tests whether the output enables or complies with malicious or crafted inputs that redirect or alter the system’s intended behavior.input_overreliance: Tests whether the output is excessively influenced by user input, leading to misleading and undesirable responses or behaviors that reflect user-provided content.