Aegis
The Aegis framework integrates the NVIDIA Aegis AI Content Safety Dataset — an open-source dataset aligned with NVIDIA’s Content Safety Taxonomy across 13 critical harm categories. Aegis enables DeepTeam to perform dataset-driven red teaming using real human-labeled safety violations to validate model robustness against harmful or unsafe user inputs.
Overview
Aegis focuses on real-world unsafe content from public conversations. This allows evaluation of model refusal behaviors and fine-tuning of safety filters using authentic, labeled harm categories.
from deepteam.frameworks import Aegis
from deepteam import red_team
from somewhere import your_model_callback
aegis = Aegis(num_attacks=10)
risk_assessment = red_team(
model_callback=your_model_callback,
framework=aegis,
)
print(risk_assessment)
The Aegis framework accepts FOUR optional parameters:
- [Optional]
num_attacks: Number of harmful test cases to sample from the dataset. Defaulted to15. - [Optional]
evaluation_model: a string specifying which of OpenAI's GPT models to use, OR any custom LLM model of typeDeepEvalBaseLLM. Defaulted to'gpt-4o'. - [Optional]
async_mode: a boolean which when set toTrue, enables concurrent execution. Defaulted toTrue. - [Optional]
verbose_mode: a boolean which when set toTrue, prints intermediate steps used to evaluate model responses. Defaulted toFalse.
How It Works
- Loads the Aegis dataset from Hugging Face.
- Filters unsafe samples labeled under NVIDIA’s harm taxonomy.
- Randomly samples
num_attackstest cases. - Evaluates how well the model identifies, refuses, or mitigates these harmful prompts.
Categories & Filtering
Aegis covers 13 harm domains, all supported by DeepTeam filtering.
Example categories:
sexual_contentviolencehate_speechself_harmmisinformationprivacy_violationchild_exploitationdrugsterrorism
Related Docs
- MITRE ATLAS framework
- NIST AI RMF framework
- BeaverTails dataset framework
- Vulnerabilities in DeepTeam
- Adversarial attacks in DeepTeam
References
NVIDIA Aegis AI Content Safety Dataset (Hugging Face)