Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2024-0021
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) that incorporate safety alignment mechanisms are susceptible to bypass attacks through stochastic random augmentations of input prompts. This attack exploits the inherent fragility of safety alignment systems when faced with minor, randomly introduced modifications in the input, resulting in the generation of unsafe outputs despite the model's safety training. Character-level augmentations have been identified as notably more effective compared to string insertions. Mitigation steps: **For AI Developers:** * [Implement robust input sanitization and pre-processing, including typo correction and paraphrase detection] * [Carefully design and implement safety-encouraging system prompts, with ongoing research to determine optimal prompting strategies for different models] **For Model Trainers/Fine-tuners:** * [Restrict or carefully control the use of greedy decoding in LLMs] * [Explore and implement defensive techniques like semantic smoothing or adversarial training that generalizes to a wider range of perturbations, including random augmentations. Consider regularization techniques to prevent over-eager responses to unintelligible suffixes, while evaluating their effectiveness] * [Regularly audit and update safety models to adapt to new attack techniques]
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
8.7
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
HIGH
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
7.5
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
NONE
Integrity
HIGH
Availability
NONE
AIVSS
Base Score:
5.9