MAI-2025-0021 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2025-0021

MAI-2025-0021

Published:May 16, 2026

Updated:May 16, 2026

This white-box vulnerability enables adversaries with full access to a model to circumvent the safety alignments of Large Language Models (LLMs). By identifying and selectively pruning parameters that enforce the rejection of harmful prompts, attackers can effectively bypass security measures. The method employs an innovative "twin prompt" strategy to distinguish parameters related to safety from those crucial for the model's core functionality, allowing for precise pruning with negligible impact on the model's overall performance. Mitigation steps: **For AI Developers:** * Restrict direct access to model parameters, especially for open-source models, to prevent unauthorized modifications. * Implement model integrity checks to detect unauthorized parameter modifications. **For Model Trainers/Fine-tuners:** * Develop robust safety alignment techniques that distribute safety mechanisms across a larger, less identifiable portion of the model's parameters. * Explore and implement advanced detection mechanisms to identify signs of parameter pruning or other modifications indicative of attacks.

Related Resources (1)

https://arxiv.org/abs/2506.07596

Do you need more information?

CVSS v4

Base Score:

5.9

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

PRESENT

Privileges Required

HIGH

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

LOW

Subsequent System Availability

NONE

CVSS v3

Base Score:

5.8

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

HIGH

User Interaction

NONE

Scope

CHANGED

Confidentiality

NONE

Integrity

HIGH

Availability

NONE

AIVSS

Base Score:

3.8