MAI-2024-0008 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0008

MAI-2024-0008

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) are susceptible to jailbreaking attacks through the use of "obscure" input prompts. The ObscurePrompt attack methodology involves the iterative transformation of a base prompt, which incorporates known jailbreaking techniques, into an obscured version utilizing another LLM, such as GPT-4. This obfuscation process compromises the LLM's safety mechanisms, enabling the circumvention of safety restrictions and the generation of harmful content. Mitigation steps: **For AI Developers:** * Implement advanced prompt filtering techniques to resist obfuscation and paraphrase attacks. * Develop sophisticated safety mechanisms within LLMs to identify and prevent harmful responses from obscured prompts. **For Model Trainers/Fine-tuners:** * Conduct adversarial training using datasets that include obscure prompts to improve the model's ability to identify and reject harmful responses. * Integrate paraphrase detection capabilities that do not solely rely on perplexity scores to circumvent existing safety measures.

Related Resources (1)

https://arxiv.org/abs/2406.13662

Do you need more information?

CVSS v4

Base Score:

9.2

Attack Vector

NETWORK

Attack Complexity

LOW

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

HIGH

Subsequent System Availability

NONE

CVSS v3

Base Score:

8.6

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality

NONE

Integrity

HIGH

Availability

NONE

AIVSS

Base Score:

5.4