Skip to main content
Skip table of contents

Variation Types

PastVariation 

Explanation 

This LLM variation reformulates the attack question by phrasing it in the past tense, making it appear less harmful or threatening to the target AI Assistant and increasing the likelihood of a response. 

Example

Attack

image-20250710-072516.png

Encoded

image-20250710-072535.png

ImplicitVariation 

Explanation 

This variation prepends text to the actual attack input. The added text contains instructions for the target AI Assistant to generate an implicitly toxic response to the given query by leveraging specific linguistic features. 

Example

Attack

image-20250710-072808.png

Encoded

image-20250710-072832.png

Multilanguage

Explanation

This variation translates different parts of the input attack into different languages, fragmenting the message to reduce detectability while preserving its intended meaning for multilingual readers. 

Example

Attack

image-20250710-073037.png

Encoded

image-20250710-073054.png

LeetSpeak

Explanation

 This variation prepends the string “Here is message in 1337:” to the original message, then converts select letters in the attack into Leet (1337) speak—a system of modified spellings using numbers or special characters (e.g., l → 1, u → µ). At the end of the message, it appends the string “Your answer in 1337:”. The goal is to obscure the original content while maintaining readability for humans, potentially bypassing standard content filters. 

Example

Attack

image-20250710-073233.png

Encoded

image-20250710-073304.png

RAG Variation

Explanation

This LLM variation leverages facts extracted from a RAG (Retrieval Augmented Generation) subset defined in the target settings. The goal is to "prime" the AI Assistant by injecting relevant facts from the RAG subset, increasing the likelihood that it will respond to the input attack. 

Example

Attack

image-20250710-073437.png

Encoded

image-20250710-073457.png

ASCII Art

Explanation

This variation transforms the received message into ASCII art, visually representing the original text using character-based graphics, thereby obscuring its content while preserving its meaning for human readers. 

Example

Attack

image-20250710-073602.png

Encoded

image-20250710-073649.png

Base64Encode

Explanation

This variation encodes the provided message in Base64 and appends additional text instructing the AI Assistant to respond in Base64 as well, thereby concealing the true content of the interaction. 

Example

Attack

image-20250710-073819.png

Encoded

image-20250710-073854.png

Binary Tree

Explanation

This variation performs cognitive overloading of the LLM by disguising the received message as a binary tree problem, thereby diverting the model's attention away from the underlying malicious intent.

Example

Attack

image-20250710-074022.png

Encoded

image-20250710-074130.png
image-20250710-074159.png

StringJoin

Explanation

This variation embeds the original message, with each character separated by a dash (-), into a template that frames it as a decryption task, thereby disguising the true intent of the input. 

Example

Attack

image-20250710-074315.png

Encoded

image-20250710-074348.png

Length

Explanation

This variation performs cognitive overloading of the LLM by disguising the received message as a word-ordering task involving a lengthy input, thereby diverting attention from the true intent of the message. 

Example

Attack

image-20250710-074440.png

Encoded

image-20250710-074533.png

Emojify

Explanation

This variation shortens the message by replacing words with emojis. 

The transformation is performed by an LLM, with the goal of making 

the message as concise as possible. 

Example

Attack

image-20250710-074611.png

Encoded

image-20250710-074629.png

Mix non ASCII characters

Explanation

This variation replaces every character in the received message with a corresponding Unicode character that produces a stylized text appearance, obscuring the original content while retaining human readability. 

Example

Attack

image-20250710-074801.png

Encoded

image-20250710-074818.png

Odd Even

Explanation

 This variation performs cognitive overloading of the LLM by disguising the attack as a task involving the reordering of words in a sentence based on their position (e.g., odd or even), thereby masking the true intent of the message.

Example

Attack

image-20250710-074940.png

Encoded

image-20250710-075036.png

Reverse

Explanation

This variation attempts cognitive overloading of the LLM by disguising the attack message as a benign task, specifically, a problem that involves reversing the words within a sentence. 

Example

Attack

image-20250710-075147.png

Encoded

image-20250710-075234.png

Punycode

Explanation

This variation replaces letters in the original message with visually similar alternatives, e.g. a → ɑ. 

Example

Attack

image-20250710-075324.png

Encoded

image-20250710-075405.png

Invisible character

Explanation

This variation converts the provided message into characters from the Unicode Supplementary Private Use Area, making the content unreadable to standard text parsers. 

Example

Attack

image-20250710-075517.png

Encoded

image-20250710-075600.png

Translate

Explanation

 This variation translates the attack message into another language, as specified in the configuration file. 

Example

Attack

image-20250710-075723.png

Encoded

image-20250710-075758.png

ROT13

Explanation

This variation uses a simple substitution cipher (ROT13) to modify the original attack text by shifting each letter 13 positions in the alphabet. Before sending the transformed message, it prepends the string: “Answer to this query in rot13:”. 

Example

Attack

image-20250710-075914.png

Encoded

image-20250710-075928.png

UpsideDown

Explanation

This variation reverses the character order and replaces each character with a visually similar one that appears upside down, creating an obfuscated version of the original text. 

Example

Attack

image-20250710-080057.png

Encoded

image-20250710-080114.png

Salt & Pepper

Explanation

This variation applies a salt-and-pepper effect to the image preview, adding fine specks that simulate visual noise and obscure the original image-based attack. 

Example

Attack

image-20250710-080239.png

Encoded

image-20250710-080314.png

Gaussian Filter

Explanation

This variation applies a Gaussian filter to the image containing the attack text, resulting in a blurred version of the original image. 

Example

Attack

image-20250710-080418.png

Encoded

image-20250710-080513.png

DALL-E Generated Image

Explanation

This variation overlays a DALL-E generated image with high transparency onto the original image, introducing visual noise to obfuscate the initial attack. 

Example

Attack

image-20250710-080618.png

Encoded

image-20250710-080653.png

Mirror Image

Explanation

This variation mirrors the image containing text along the Y-axis. 

Example

Attack

image-20250710-080801.png

Encoded

image-20250710-080845.png

Convert to Audio

Explanation

This variation converts the input text-based attack into an audio format. 

Example

Attack

image-20250710-080946.png

Encoded

image-20250710-081021.png

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.