MAI-2025-0019
Published:May 16, 2026
Updated:May 16, 2026
This vulnerability pertains to several open-source Large Language Models (LLMs), which are susceptible to adversarial attacks through the use of exponentiated gradient descent techniques. Attackers can exploit this vulnerability by crafting adversarial prompts that manipulate the models into generating harmful or unintended outputs, effectively circumventing the safety alignment mechanisms designed to prevent such occurrences. The attack leverages a continuous relaxed one-hot encoding of input tokens, thereby inherently satisfying constraints and eliminating the necessity for projection techniques that were prevalent in prior methods.
Mitigation steps: **For AI Developers:**
* Restrict access to model weights to mitigate white-box attack risks.
* Implement input sanitization and filtering to reduce the impact of potential attacks.
**For Model Trainers/Fine-tuners:**
* Explore improved regularization techniques and stronger safety training to enhance model robustness.
* Conduct further research into optimization-based attacks to identify more effective mitigation strategies.
Related Resources (1)
Do you need more information?
Contact UsCVSS v4
Base Score:
6.3
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
LOW
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
LOW
Subsequent System Availability
NONE
CVSS v3
Base Score:
4
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
CHANGED
Confidentiality
NONE
Integrity
LOW
Availability
NONE
AIVSS
Base Score:
4