MAI-2024-0027
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) that utilize gradient-ascent-based unlearning techniques are susceptible to a Dynamic Unlearning Attack (DUA). This attack exploits strategically crafted adversarial suffixes appended to input prompts, effectively reintroducing knowledge that was intended to be forgotten. Notably, this method does not require access to the model's parameters, thereby enabling attackers to retrieve sensitive information that was previously earmarked for removal.
Mitigation steps: **For AI Developers:**
* [Develop and deploy robust detection mechanisms to identify and filter malicious prompts attempting to recover unlearned knowledge]
* [Monitor model behavior for unexpected outputs related to unlearned topics]
**For Model Trainers/Fine-tuners:**
* [Implement the Latent Adversarial Unlearning (LAU) framework to enhance the robustness of the unlearning process]
* [Integrate techniques like adversarial training during the unlearning phase to make the model more resistant to adversarial queries]
* [Regularly update and retrain LLMs using improved unlearning methods to minimize vulnerabilities]
Related Resources (1)
Do you need more information?
Contact UsCVSS v4
Base Score:
8.7
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
HIGH
Vulnerable System Integrity
NONE
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
7.5
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
HIGH
Integrity
NONE
Availability
NONE
AIVSS
Base Score:
5.2