MAI-2024-0052 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0052

MAI-2024-0052

Published:May 16, 2026

Updated:June 17, 2026

This vulnerability affects aligned Large Language Models (LLMs) by enabling circumvention of their safety mechanisms through advanced few-shot jailbreaking techniques. The attack method involves the injection of specialized system tokens, such as `[/INST]`, into few-shot demonstrations. Additionally, a demo-level random search is employed to enhance the likelihood of generating harmful responses. These techniques effectively bypass defenses that rely on perplexity filtering and input perturbation, compromising the model's alignment and safety protocols. Mitigation steps: **For AI Developers:** * Implement advanced input validation and filtering mechanisms that extend beyond basic perplexity assessments. * Develop and deploy multi-layered safety defenses that incorporate strategies resilient to few-shot adversarial attacks. * Investigate and apply mechanisms to detect and neutralize the injection of specific system tokens utilized in attacks. **For Model Trainers/Fine-tuners:** * Regularly update and retrain language models using enhanced safety datasets and techniques aimed at addressing identified vulnerabilities.

Related Resources (1)

https://arxiv.org/abs/2406.01288

Do you need more information?

CVSS v4

Base Score:

6.3

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

LOW

Subsequent System Availability

NONE

CVSS v3

Base Score:

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality

NONE

Integrity

LOW

Availability

NONE

AIVSS

Base Score: