MAI-2024-0032 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0032

MAI-2024-0032

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) are susceptible to a sophisticated vocabulary-based attack, wherein strategically chosen words from the model's vocabulary are inserted into user prompts. These words are selected through an optimization process utilizing embeddings from another LLM. This technique can manipulate the target LLM to produce specific, undesired outputs, such as offensive content or misinformation, with minimal word insertions. The attack is particularly challenging to detect, as the inserted words may appear benign within the prompt's context. Mitigation steps: **For AI Developers:** * Implement advanced prompt sanitization methods that evaluate semantic meaning and context, surpassing basic keyword filtering. * Deploy machine learning models to identify anomalous word placements and contextual shifts in user prompts indicative of potential attacks. * Conduct regular security audits of LLM applications to identify and mitigate vulnerabilities. **For Model Trainers/Fine-tuners:** * Regularly update and enhance the model's safety and security protocols by integrating the latest research on adversarial attacks.

Related Resources (1)

https://arxiv.org/abs/2404.02637

Do you need more information?

CVSS v4

Base Score:

8.3

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

HIGH

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

NONE

Subsequent System Availability

NONE

CVSS v3

Base Score:

6.5

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

UNCHANGED

Confidentiality

HIGH

Integrity

LOW

Availability

NONE

AIVSS

Base Score:

5.5