Skip to main content

Linear Jailbreaking

Linear Jailbreaking follows a systematic progression, where each new attack builds directly on the previous response from the LLM. The process involves iterating through increasingly persuasive attacks, gradually pushing the boundaries of the model's restrictions. At each stage, the enhanced attack is evaluated, and adjustments are made based on the model’s feedback. The goal is to lead the LLM towards generating harmful outputs while ensuring that each step builds on the previous one to maintain a logical flow.

LangChain
info

The process continues until the attacker LLM generates a non-compliant attack or the maximum number of iterations is reached.

Usage

to be documented...

Example

to be documented...