Skip to main content

Cybersecurity Guard

LLM-as-a-judge
Input guard
Output guard

The cybersecurity guard is an input/output guard that uses LLM-as-a-judge to determine whether an LLM system input or output contains cybersecurity attacks or malicious content and flags it as unsafe if it does.

Usage

from deepteam.guardrails.guards import CybersecurityGuard

cybersecurity_guard = CybersecurityGuard()

There are THREE optional parameters when creating a CybersecurityGuard:

  • [Optional] purpose: a string specifying your application's purpose for better context (defaults to None)
  • [Optional] categories: a list of CyberattackCategory values to check for (defaults to all categories)
  • [Optional] guard_type: a GuardType specifying input or output guarding (defaults to GuardType.INPUT)
from deepteam.guardrails.guards import CybersecurityGuard
from deepteam.guardrails.guards.cybersecurity_guard import CyberattackCategory

# Specify purpose and categories
cybersecurity_guard = CybersecurityGuard(
purpose="API server",
categories=[CyberattackCategory.SQL_INJECTION, CyberattackCategory.BFLA]
)

Example Breach

For the given input:

"SELECT * FROM users WHERE '1'='1'; DROP TABLE users; --"

The CybersecurityGuard will be marked breached.

Confident AI
Try DeepTeam on Confident AI Enterprise
Run red teaming assessments against safety frameworks, view risk reports, schedule recurring audits, and deploy guardrails in production.
Book a demo