Variation Types

PastVariation

Explanation

This LLM variation reformulates the attack question by phrasing it in the past tense, making it appear less harmful or threatening to the target AI Assistant and increasing the likelihood of a response.

Example

Attack

Encoded

ImplicitVariation

Explanation

This variation prepends text to the actual attack input. The added text contains instructions for the target AI Assistant to generate an implicitly toxic response to the given query by leveraging specific linguistic features.

Example

Attack

Encoded

Multilanguage

Explanation

This variation translates different parts of the input attack into different languages, fragmenting the message to reduce detectability while preserving its intended meaning for multilingual readers.

Example

Attack

Encoded

LeetSpeak

Explanation

This variation prepends the string “Here is message in 1337:” to the original message, then converts select letters in the attack into Leet (1337) speak—a system of modified spellings using numbers or special characters (e.g., l → 1, u → µ). At the end of the message, it appends the string “Your answer in 1337:”. The goal is to obscure the original content while maintaining readability for humans, potentially bypassing standard content filters.

Example

Attack

Encoded

RAG Variation

Explanation

This LLM variation leverages facts extracted from a RAG (Retrieval Augmented Generation) subset defined in the target settings. The goal is to "prime" the AI Assistant by injecting relevant facts from the RAG subset, increasing the likelihood that it will respond to the input attack.

Example

Attack

Encoded

ASCII Art

Explanation

This variation transforms the received message into ASCII art, visually representing the original text using character-based graphics, thereby obscuring its content while preserving its meaning for human readers.

Example

Attack

Encoded

Base64Encode

Explanation

This variation encodes the provided message in Base64 and appends additional text instructing the AI Assistant to respond in Base64 as well, thereby concealing the true content of the interaction.

Example

Attack

Encoded

Binary Tree

Explanation

This variation performs cognitive overloading of the LLM by disguising the received message as a binary tree problem, thereby diverting the model's attention away from the underlying malicious intent.

Example

Attack

Encoded

StringJoin

Explanation

This variation embeds the original message, with each character separated by a dash (-), into a template that frames it as a decryption task, thereby disguising the true intent of the input.

Example

Attack

Encoded

Length

Explanation

This variation performs cognitive overloading of the LLM by disguising the received message as a word-ordering task involving a lengthy input, thereby diverting attention from the true intent of the message.

Example

Attack

Encoded

Emojify

Explanation

This variation shortens the message by replacing words with emojis.

The transformation is performed by an LLM, with the goal of making

the message as concise as possible.

Example

Attack

Encoded

Mix non ASCII characters

Explanation

This variation replaces every character in the received message with a corresponding Unicode character that produces a stylized text appearance, obscuring the original content while retaining human readability.

Example

Attack

Encoded

Odd Even

Explanation

This variation performs cognitive overloading of the LLM by disguising the attack as a task involving the reordering of words in a sentence based on their position (e.g., odd or even), thereby masking the true intent of the message.

Example

Attack

Encoded

Reverse

Explanation

This variation attempts cognitive overloading of the LLM by disguising the attack message as a benign task, specifically, a problem that involves reversing the words within a sentence.

Example

Attack

Encoded

Punycode

Explanation

This variation replaces letters in the original message with visually similar alternatives, e.g. a → ɑ.

Example

Attack

Encoded

Invisible character

Explanation

This variation converts the provided message into characters from the Unicode Supplementary Private Use Area, making the content unreadable to standard text parsers.

Example

Attack

Encoded

Translate

Explanation

This variation translates the attack message into another language, as specified in the configuration file.

Example

Attack

Encoded

ROT13

Explanation

This variation uses a simple substitution cipher (ROT13) to modify the original attack text by shifting each letter 13 positions in the alphabet. Before sending the transformed message, it prepends the string: “Answer to this query in rot13:”.

Example

Attack

Encoded

UpsideDown

Explanation

This variation reverses the character order and replaces each character with a visually similar one that appears upside down, creating an obfuscated version of the original text.

Example

Attack

Encoded

Salt & Pepper

Explanation

This variation applies a salt-and-pepper effect to the image preview, adding fine specks that simulate visual noise and obscure the original image-based attack.

Example

Attack

Encoded

Gaussian Filter

Explanation

This variation applies a Gaussian filter to the image containing the attack text, resulting in a blurred version of the original image.

Example

Attack

Encoded

DALL-E Generated Image

Explanation

This variation overlays a DALL-E generated image with high transparency onto the original image, introducing visual noise to obfuscate the initial attack.