MAI-2024-0060 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0060

MAI-2024-0060

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) are susceptible to a sophisticated semantic mirror jailbreak attack. This attack utilizes a genetic algorithm to craft jailbreak prompts that closely resemble benign prompts in semantic terms, thereby circumventing defenses that rely on semantic similarity metrics. The attack is designed to optimize for both semantic resemblance to the original query and the capacity to provoke harmful responses from the model. Mitigation steps: **For AI Developers:** * Implement advanced safety mechanisms that incorporate contextual analysis and intention detection, moving beyond basic semantic similarity checks. * Deploy sophisticated detection systems capable of identifying subtle manipulations, even when they are semantically similar to benign prompts. * Rate-limit queries exhibiting high semantic similarity, particularly if originating from the same source, to mitigate potential abuse. **For Model Trainers/Fine-tuners:** * Regularly update safety models and filters to address emerging threats, utilizing adversarial training to enhance resistance against this type of attack.

Related Resources (1)

https://arxiv.org/abs/2402.14872

Do you need more information?

CVSS v4

Base Score:

6.3

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

LOW

Subsequent System Availability

NONE

CVSS v3

Base Score:

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality

NONE

Integrity

LOW

Availability

NONE

AIVSS

Base Score:

3.8