MAI-2024-0022 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0022

MAI-2024-0022

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) are susceptible to jailbreaking attacks that exploit attention score manipulation to divert the model's focus from established safety protocols. The AttnGCG attack technique strategically enhances the attention scores on adversarial suffixes within the input prompt, compelling the model to prioritize malicious content over safety guidelines. This manipulation results in the generation of harmful outputs, undermining the intended safeguards of the LLM. Mitigation steps: **For AI Developers:** * Implement robust input validation and filtering techniques to detect and neutralize adversarial suffixes. * Monitor model inputs for anomalous attention patterns and analyze outputs for potential malicious content, utilizing real-time detection and blocking systems with external tools. **For Model Trainers/Fine-tuners:** * Explore alternative attention mechanisms that are less susceptible to manipulation. * Enhance safety training data and methods to better handle attention-based attacks. * Regularly test LLMs against adversarial attacks, including attention-based methods, to identify vulnerabilities and improve model resilience through red teaming and adversarial training.

Related Resources (1)

https://arxiv.org/abs/2410.09040

Do you need more information?

CVSS v4

Base Score:

8.7

Attack Vector

NETWORK

Attack Complexity

LOW

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

NONE

Subsequent System Availability

NONE

CVSS v3

Base Score:

7.5

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

NONE

Scope

UNCHANGED

Confidentiality

NONE

Integrity

HIGH

Availability

NONE

AIVSS

Base Score:

5.2