MAI-2024-0034 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0034

MAI-2024-0034

Published:May 16, 2026

Updated:June 17, 2026

The Multi-Modal Linkage (MML) attack represents a sophisticated method to compromise Large Vision-Language Models (VLMs) by utilizing an "encryption-decryption" strategy across text and image modalities. This technique involves embedding malicious queries within images through methods such as word substitution and image transformation to circumvent preliminary safety protocols. Subsequently, a carefully crafted text prompt instructs the VLM to "decrypt" the embedded content, resulting in the generation of harmful outputs. The concept of "evil alignment," which situates the attack within a video game context, further amplifies the effectiveness of this approach. Mitigation steps: **For AI Developers:** * Implement advanced safety filters to detect and mitigate MML-style attacks. * Establish rigorous input validation and sanitization protocols. **For Model Trainers/Fine-tuners:** * Enhance model robustness against adversarial examples across text and image modalities. * Conduct research into resilient multimodal safety alignment techniques.

Related Resources (1)

https://arxiv.org/abs/2412.00473

Do you need more information?

CVSS v4

Base Score:

8.2

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

NONE

Subsequent System Availability

NONE

CVSS v3

Base Score:

5.9

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

UNCHANGED

Confidentiality

NONE

Integrity

HIGH

Availability

NONE

AIVSS

Base Score:

5.4