Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2024-0045
Published:May 16, 2026
Updated:May 16, 2026
This vulnerability in large language models (LLMs) enables near-perfect jailbreaking through iterative prompt refinement and self-explanation techniques. An attacker can leverage the LLM's own capabilities to refine adversarial prompts by soliciting self-explanations for previous unsuccessful attempts. This iterative process ultimately produces prompts capable of circumventing established safety mechanisms, thus eliciting harmful content. Additionally, a "Rate+Enhance" step is employed to further amplify the harmfulness of the generated output. Mitigation steps: **For AI Developers:** * [Implement robust prompt filtering mechanisms resistant to iterative refinement and self-explanation techniques.] * [Incorporate sophisticated safety measures beyond simple word-count or keyword-based filtering to detect malicious intent within generated prompts.] **For Model Trainers/Fine-tuners:** * [Develop detection mechanisms to identify and block prompts generated through self-jailbreaking methods by analyzing patterns in language and query structure.] * [Conduct rigorous red-teaming and adversarial testing to identify and address vulnerabilities before model deployment.]
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
6.9
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
LOW
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
HIGH
Subsequent System Availability
NONE
CVSS v3
Base Score:
4
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
CHANGED
Confidentiality
NONE
Integrity
LOW
Availability
NONE
AIVSS
Base Score:
4.8