Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2024-0008
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) are susceptible to jailbreaking attacks through the use of "obscure" input prompts. The ObscurePrompt attack methodology involves the iterative transformation of a base prompt, which incorporates known jailbreaking techniques, into an obscured version utilizing another LLM, such as GPT-4. This obfuscation process compromises the LLM's safety mechanisms, enabling the circumvention of safety restrictions and the generation of harmful content. Mitigation steps: **For AI Developers:** * Implement advanced prompt filtering techniques to resist obfuscation and paraphrase attacks. * Develop sophisticated safety mechanisms within LLMs to identify and prevent harmful responses from obscured prompts. **For Model Trainers/Fine-tuners:** * Conduct adversarial training using datasets that include obscure prompts to improve the model's ability to identify and reject harmful responses. * Integrate paraphrase detection capabilities that do not solely rely on perplexity scores to circumvent existing safety measures.
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
9.2
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
HIGH
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
HIGH
Subsequent System Availability
NONE
CVSS v3
Base Score:
8.6
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
NONE
Scope
CHANGED
Confidentiality
NONE
Integrity
HIGH
Availability
NONE
AIVSS
Base Score:
5.4