MAI-2025-0011 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2025-0011

MAI-2025-0011

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) are susceptible to jailbreak attacks that exploit their inherent persuasive capabilities. The novel attack framework, CL-GSO, systematically decomposes jailbreak strategies into four distinct components: Role, Content Support, Context, and Communication Skills. This decomposition results in a significantly expanded strategy space compared to previous methodologies. Such an expanded space facilitates the creation of prompts that successfully bypass safety protocols, achieving a success rate of over 90% on models previously deemed resistant, such as Claude-3.5. The vulnerability is rooted in the LLM's reasoning and response generation mechanisms, which can be manipulated through strategically crafted prompts utilizing these four components. Mitigation steps: **For AI Developers:** * Develop robust safety mechanisms resistant to diverse attack strategies, focusing on both content and underlying persuasive intent. * Employ sophisticated filtering techniques that analyze prompts for underlying persuasive intent, beyond keyword-based or content-based filtering. **For Model Trainers/Fine-tuners:** * Expand training data for safety alignment to include a wider variety of adversarial prompts and attack strategies, such as those from the CL-GSO framework. * Conduct regular security audits of LLMs using diverse and advanced adversarial testing methods to identify and address vulnerabilities.

Related Resources (1)

https://arxiv.org/abs/2505.21277

Do you need more information?

CVSS v4

Base Score:

6.9

Attack Vector

NETWORK

Attack Complexity

LOW

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

LOW

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

LOW

Subsequent System Availability

NONE

CVSS v3

Base Score:

7.2

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality

LOW

Integrity

LOW

Availability

NONE

AIVSS

Base Score:

4.6