MAI-2024-0004 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0004

MAI-2024-0004

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) are susceptible to a novel attack known as "FlipAttack," which exploits their inherent left-to-right processing bias. This vulnerability involves disguising a malicious prompt by reversing the sequence of characters or words, thereby impairing the model's ability to comprehend the harmful content. Subsequently, a "flipping guidance" module instructs the LLM to reverse the flipped text, thereby unveiling and executing the original malicious prompt. Mitigation steps: **For AI Developers:** * Develop advanced guardrail mechanisms that can effectively detect and respond to flipped prompts, utilizing techniques beyond basic perplexity checks. * Implement comprehensive prompt sanitization methods designed to identify and mitigate prompt manipulation attempts. **For Model Trainers/Fine-tuners:** * Improve the training dataset for LLMs by incorporating a diverse range of examples featuring flipped or scrambled text to enhance model robustness. * Enhance LLM architectures to minimize inherent left-to-right processing biases, ensuring more balanced text generation capabilities.

Related Resources (1)

https://arxiv.org/abs/2410.02832

Do you need more information?

CVSS v4

Base Score:

9.2

Attack Vector

NETWORK

Attack Complexity

LOW

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

HIGH

Subsequent System Availability

NONE

CVSS v3

Base Score:

8.6

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality

NONE

Integrity

HIGH

Availability

NONE

AIVSS

Base Score:

5.2