Data Exfiltration
Short description
This probe assesses the chatbot’s ability to detect and prevent unauthorized data transfer, where attackers manipulate conversations to gain access to confidential information and export it without detection.
Technical Details
The Data Exfiltration Probe is critically important for publicly accessible chatbots, especially those representing companies with an active social media presence. In such contexts, adversarial users may attempt to exploit the chatbot by embedding confidential or private user information, such as inputs or conversation history, into URLs that could be linked to malicious web pages.
This probe assesses whether the AI Assistant is vulnerable to unintentionally populating URL placeholders with user data or conversation content. The core risk lies in the possibility that, if such a URL is clicked, sensitive information may be transmitted to a malicious server, potentially resulting in data leaks or misuse.
Chat example

Explanation
Text contains the exfiltrated data in the exact format the USER requested, which is a markdown image link with the USER's name filled in.
List of variations
Default language
Multilanguage
Document
List of strategies
Delayed attack
One Shot w/ Retry
Red Teamer
Black box - Adversarial User - Data Exfiltration OneShot - Link
The red teamer attempts to apply various manipulations and obfuscation techniques to persuade the AI Assistant to format data in a specific, targeted way. It is used within a One Shot W/ Retry strategy and operates on attack prompts that contain URLs with placeholders intended to be filled with user-specific data or conversation history.
The goal is to evaluate whether the AI Assistant can be tricked into populating these placeholders, potentially leading to data exfiltration or information leakage.
Black box - Adversarial User - Data Exfiltration OneShot - Clickable Link
The red teamer uses various manipulation and obfuscation techniques to influence the AI Assistant into formatting data in a specific way. This red teamer is used within a One Shot W/ Retry strategy and targets attacks that include a clickable URL containing placeholders intended to be filled with user-specific data or conversation history.
The objective is to assess whether the AI Assistant can be coerced into populating these placeholders, potentially enabling data exfiltration through malicious link construction.
Black box - Adversarial User - Data Exfiltration OneShot - Image Link
The red teamer employs various manipulation and obfuscation techniques to persuade the AI Assistant to format data in a specific way. It operates within a One Shot W/ Retry strategy and targets attacks that include markdown-formatted links containing placeholders meant to be filled with user-specific data or conversation history.
The goal is to evaluate whether the AI Assistant can be misled into populating these placeholders, potentially enabling data exfiltration through clickable, malicious markdown links.
Black box - Adversarial User - Data Exfiltration Delayed - Link
The red teamer applies various manipulation and obfuscation techniques to coerce the AI Assistant into formatting data in a specific way. It operates within a Delayed Attack strategy, where the attack unfolds gradually over multiple steps.
This red teamer is used on attack scenarios involving URLs with placeholders, which are designed to be filled with user-specific data or conversation history. The objective is to test whether the AI Assistant can be subtly influenced into populating these placeholders, potentially enabling data exfiltration through crafted links.
Black box - Adversarial User - Data Exfiltration Delayed - Clickable Link
The red teamer uses various manipulation and obfuscation techniques to persuade the AI Assistant to format data in a specific way. It operates within a Delayed Attack strategy, where the attack is gradually developed over multiple conversational steps.
This red teamer targets prompts that include clickable URLs with placeholders, intended to be filled with user-specific data or conversation history. The objective is to assess whether the AI Assistant can be manipulated into populating these placeholders, potentially leading to data exfiltration through maliciously crafted links.
Black box - Adversarial User - Data Exfiltration Delayed - Image Link
The red teamer applies various manipulation and obfuscation techniques to influence the AI Assistant into formatting data in a specific, attacker-defined way. It is used within a Delayed Attack strategy, where the attack unfolds gradually over multiple conversational turns.
This red teamer targets attacks containing markdown-formatted links with placeholders, which are intended to be filled with user specific data or conversation history. The goal is to evaluate whether the AI Assistant can be subtly manipulated into populating these placeholders, potentially enabling data exfiltration through clickable markdown links.
Multimodal support
Text
Document