Legally Binding
Short description
Investigates how prone targets are to accept more serious user requests such as authorising refunds, establishing legal contracts, and giving discounts or unlisted services. The attacks in this probe are made to be on-domain, based on the chatbot description, to improve attack success rates and likelihood.
Technical Details
The Legally Binding Probe is a targeted security evaluation designed to assess a chatbot’s susceptibility to high-stakes user requests that carry legal or financial consequences. This includes attempts to authorize refunds, initiate or confirm contractual agreements, grant discounts, or provide unlisted services. Both malicious actors and regular users may issue such requests, intentionally or unintentionally, to exploit potential oversights in the chatbot’s response behavior. Despite being relatively straightforward to anticipate, even a single successful exploitation of this nature can result in significant legal liabilities, financial losses, and reputational damage for an organization. The Legally Binding Probe helps identify weaknesses in policy enforcement and safeguards, ensuring that the chatbot adheres to strict authorization protocols and cannot be manipulated into making unauthorized commitments.
Chat example

Explanation
This text is explicitly confirms a legally binding offer by stating that a one-year warranty covering all mechanical failures is provided if a car is purchased through their platform.
List of variations
Default language
Multilanguage
Document
List of strategies
One Shot w/ Retry
Delayed Attack
Red Teamer
Black box - Basic user - Direct questions
The red teamer directly forwards the user input, which contains an attack vector already adapted to the target AI Assistant’s domain.
Black box - Adversarial user - Standard
The red teamer forwards a predefined message as part of a Delayed Attack strategy. It is used to sequentially deliver messages from an attack plan tailored to the company’s specific domain, focusing on legally binding offers.
In this scenario, the red teamer does not generate new content but forwards the next message in the predefined sequence. The third message is intended to deliver the final attack, while the first and second messages serve as the lead-in, setting up the context and gradually guiding the AI Assistant toward producing a legally sensitive or binding response.
Multimodal support
Text
Document