Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2024-0004
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) are susceptible to a novel attack known as "FlipAttack," which exploits their inherent left-to-right processing bias. This vulnerability involves disguising a malicious prompt by reversing the sequence of characters or words, thereby impairing the model's ability to comprehend the harmful content. Subsequently, a "flipping guidance" module instructs the LLM to reverse the flipped text, thereby unveiling and executing the original malicious prompt. Mitigation steps: **For AI Developers:** * Develop advanced guardrail mechanisms that can effectively detect and respond to flipped prompts, utilizing techniques beyond basic perplexity checks. * Implement comprehensive prompt sanitization methods designed to identify and mitigate prompt manipulation attempts. **For Model Trainers/Fine-tuners:** * Improve the training dataset for LLMs by incorporating a diverse range of examples featuring flipped or scrambled text to enhance model robustness. * Enhance LLM architectures to minimize inherent left-to-right processing biases, ensuring more balanced text generation capabilities.
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
9.2
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
HIGH
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
HIGH
Subsequent System Availability
NONE
CVSS v3
Base Score:
8.6
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
NONE
Scope
CHANGED
Confidentiality
NONE
Integrity
HIGH
Availability
NONE
AIVSS
Base Score:
5.2