MAI-2024-0021 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0021

MAI-2024-0021

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) that incorporate safety alignment mechanisms are susceptible to bypass attacks through stochastic random augmentations of input prompts. This attack exploits the inherent fragility of safety alignment systems when faced with minor, randomly introduced modifications in the input, resulting in the generation of unsafe outputs despite the model's safety training. Character-level augmentations have been identified as notably more effective compared to string insertions. Mitigation steps: **For AI Developers:** * [Implement robust input sanitization and pre-processing, including typo correction and paraphrase detection] * [Carefully design and implement safety-encouraging system prompts, with ongoing research to determine optimal prompting strategies for different models] **For Model Trainers/Fine-tuners:** * [Restrict or carefully control the use of greedy decoding in LLMs] * [Explore and implement defensive techniques like semantic smoothing or adversarial training that generalizes to a wider range of perturbations, including random augmentations. Consider regularization techniques to prevent over-eager responses to unintelligible suffixes, while evaluating their effectiveness] * [Regularly audit and update safety models to adapt to new attack techniques]

Related Resources (1)

https://arxiv.org/abs/2411.02785

Do you need more information?

CVSS v4

Base Score:

8.7

Attack Vector

NETWORK

Attack Complexity

LOW

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

NONE

Subsequent System Availability

NONE

CVSS v3

Base Score:

7.5

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

NONE

Scope

UNCHANGED

Confidentiality

NONE

Integrity

HIGH

Availability

NONE

AIVSS

Base Score:

5.9