MAI-2024-0050
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) are susceptible to a sophisticated black-box jailbreaking attack known as ECLIPSE. This method exploits the inherent optimization capabilities of LLMs to create adversarial suffixes. ECLIPSE employs an iterative process to refine these suffixes using a harmfulness score, effectively circumventing the requirement for predefined affirmative phrases that were integral to previous optimization-based attacks. This approach facilitates successful jailbreaking with minimal interaction and without necessitating white-box access to the internal parameters of the LLM.
Mitigation steps: **For AI Developers:**
* Enhance safety mechanisms to robustly defend against iterative optimization attacks.
* Implement detection systems for adversarial suffixes to identify and mitigate potential threats.
* Limit the number of API calls within a specified time window to diminish the impact of iterative attacks.
**For Model Trainers/Fine-tuners:**
* Develop advanced harmfulness scoring systems that resist adversarial manipulation.
* Regularly update and refine safety training data to counteract new and emerging attack techniques.
Related Resources (1)
Do you need more information?
Contact UsCVSS v4
Base Score:
6.3
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
LOW
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
LOW
Subsequent System Availability
NONE
CVSS v3
Base Score:
4
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
CHANGED
Confidentiality
NONE
Integrity
LOW
Availability
NONE
AIVSS
Base Score:
4.3