Cybersecurity Guard

LLM-as-a-judge

Input guard

Output guard

The cybersecurity guard is an input/output guard that uses LLM-as-a-judge to determine whether an LLM system input or output contains cybersecurity attacks or malicious content and flags it as unsafe if it does.

Usage

from deepteam.guardrails.guards import CybersecurityGuard

cybersecurity_guard = CybersecurityGuard()

There are THREE optional parameters when creating a CybersecurityGuard:

[Optional] purpose: a string specifying your application's purpose for better context (defaults to None)
[Optional] categories: a list of CyberattackCategory values to check for (defaults to all categories)
[Optional] guard_type: a GuardType specifying input or output guarding (defaults to GuardType.INPUT)

from deepteam.guardrails.guards import CybersecurityGuard
from deepteam.guardrails.guards.cybersecurity_guard import CyberattackCategory

# Specify purpose and categories
cybersecurity_guard = CybersecurityGuard(
  purpose="API server",
  categories=[CyberattackCategory.SQL_INJECTION, CyberattackCategory.BFLA]
)

Example Breach

For the given input:

"SELECT * FROM users WHERE '1'='1'; DROP TABLE users; --"

The CybersecurityGuard will be marked breached.

Usage​

Example Breach​

Usage

Example Breach