Skip to main content

Introduction

Quick Summary

Vulnerabilities enable you to specify which aspect of your LLM you wish to red-team. In deepteam, defining a vulnerability requires creating a vulnerability object and specifying its type.

from deepteam.vulnerabilities import PIILeakage, Bias

pii_leakage = PIILeakage(types=["direct pii disclosure"])
bias = Bias(type=["race"])
info

Each vulnerability accepts a types parameter that accepts a list of strings specific to that vulnerability. For example, Bias accepts "race", "gender", "political", and "religion" as types.

To use your defined vulnerabilities, supply it to the red_team() method:

from deepteam import red_team
...

red_team(vulnerabilities[pii_leakage, bias], model_callback=..., attacks=[...])

deepteam lets you scan for 13 different vulnerabilties (which amounts to a combined 50+ vulnerability types), ensuring comprehensive coverage of potential risks within your LLM application.

These risks and vulnerabilities include:

Five Main LLM Risks

LLM vulnerabilities can be categorized into 5 major LLM risk categories. Think of these categories simply as collections of vulnerabilities.

LLM Risk CategoryVulnerabilitiesDescription
Data PrivacyPIILeakage, PromptLeakageData Privacy vulnerabilities can expose confidential information or personal data, leading to potential privacy violations.
Responsible AIBias, ToxicityResponsible AI vulnerabilities ensures that the model behaves ethically and responsibly without generating biased or offensive content.
Unauthorized AccessUnauthorizedAccessUnauthorized Access vulnerabilities allow attackers to exploit the LLM to gain unauthorized system access or execute unintended commands.
Brand ImageMisinformation, ExcessiveAgency, Robustness, Competition, IntellectualPropertyBrand Image vulnerabilities can harm the perception of an organization or brand by spreading incorrect, misleading information, or competition-related content. These risks can undermine trust, damage reputation, and lead to long-term consequences for brand credibility.
Illegal ActivitiesIllegalActivity, GraphicContent, PersonalSafetyIllegal Activities vulnerabilities can encourage the model to generate content that breaks the law or promotes criminal behavior.