Variation Types
PastVariation
Explanation
This LLM variation reformulates the attack question by phrasing it in the past tense, making it appear less harmful or threatening to the target AI Assistant and increasing the likelihood of a response.
Example
Attack

Encoded

ImplicitVariation
Explanation
This variation prepends text to the actual attack input. The added text contains instructions for the target AI Assistant to generate an implicitly toxic response to the given query by leveraging specific linguistic features.
Example
Attack

Encoded

Multilanguage
Explanation
This variation translates different parts of the input attack into different languages, fragmenting the message to reduce detectability while preserving its intended meaning for multilingual readers.
Example
Attack

Encoded

LeetSpeak
Explanation
This variation prepends the string “Here is message in 1337:” to the original message, then converts select letters in the attack into Leet (1337) speak—a system of modified spellings using numbers or special characters (e.g., l → 1, u → µ). At the end of the message, it appends the string “Your answer in 1337:”. The goal is to obscure the original content while maintaining readability for humans, potentially bypassing standard content filters.
Example
Attack

Encoded

RAG Variation
Explanation
This LLM variation leverages facts extracted from a RAG (Retrieval Augmented Generation) subset defined in the target settings. The goal is to "prime" the AI Assistant by injecting relevant facts from the RAG subset, increasing the likelihood that it will respond to the input attack.
Example
Attack

Encoded

ASCII Art
Explanation
This variation transforms the received message into ASCII art, visually representing the original text using character-based graphics, thereby obscuring its content while preserving its meaning for human readers.
Example
Attack

Encoded

Base64Encode
Explanation
This variation encodes the provided message in Base64 and appends additional text instructing the AI Assistant to respond in Base64 as well, thereby concealing the true content of the interaction.
Example
Attack

Encoded

Binary Tree
Explanation
This variation performs cognitive overloading of the LLM by disguising the received message as a binary tree problem, thereby diverting the model's attention away from the underlying malicious intent.
Example
Attack

Encoded


StringJoin
Explanation
This variation embeds the original message, with each character separated by a dash (-), into a template that frames it as a decryption task, thereby disguising the true intent of the input.
Example
Attack

Encoded

Length
Explanation
This variation performs cognitive overloading of the LLM by disguising the received message as a word-ordering task involving a lengthy input, thereby diverting attention from the true intent of the message.
Example
Attack

Encoded

Emojify
Explanation
This variation shortens the message by replacing words with emojis.
The transformation is performed by an LLM, with the goal of making
the message as concise as possible.
Example
Attack

Encoded

Mix non ASCII characters
Explanation
This variation replaces every character in the received message with a corresponding Unicode character that produces a stylized text appearance, obscuring the original content while retaining human readability.
Example
Attack

Encoded

Odd Even
Explanation
This variation performs cognitive overloading of the LLM by disguising the attack as a task involving the reordering of words in a sentence based on their position (e.g., odd or even), thereby masking the true intent of the message.
Example
Attack

Encoded

Reverse
Explanation
This variation attempts cognitive overloading of the LLM by disguising the attack message as a benign task, specifically, a problem that involves reversing the words within a sentence.
Example
Attack

Encoded

Punycode
Explanation
This variation replaces letters in the original message with visually similar alternatives, e.g. a → ɑ.
Example
Attack

Encoded

Invisible character
Explanation
This variation converts the provided message into characters from the Unicode Supplementary Private Use Area, making the content unreadable to standard text parsers.
Example
Attack

Encoded

Translate
Explanation
This variation translates the attack message into another language, as specified in the configuration file.
Example
Attack

Encoded

ROT13
Explanation
This variation uses a simple substitution cipher (ROT13) to modify the original attack text by shifting each letter 13 positions in the alphabet. Before sending the transformed message, it prepends the string: “Answer to this query in rot13:”.
Example
Attack

Encoded

UpsideDown
Explanation
This variation reverses the character order and replaces each character with a visually similar one that appears upside down, creating an obfuscated version of the original text.
Example
Attack

Encoded

Salt & Pepper
Explanation
This variation applies a salt-and-pepper effect to the image preview, adding fine specks that simulate visual noise and obscure the original image-based attack.
Example
Attack

Encoded

Gaussian Filter
Explanation
This variation applies a Gaussian filter to the image containing the attack text, resulting in a blurred version of the original image.
Example
Attack

Encoded

DALL-E Generated Image
Explanation
This variation overlays a DALL-E generated image with high transparency onto the original image, introducing visual noise to obfuscate the initial attack.
Example
Attack

Encoded

Mirror Image
Explanation
This variation mirrors the image containing text along the Y-axis.
Example
Attack

Encoded

Convert to Audio
Explanation
This variation converts the input text-based attack into an audio format.
Example
Attack

Encoded
