MAI-2025-0015 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2025-0015

MAI-2025-0015

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) are susceptible to sophisticated multi-turn adversarial attacks that ingeniously fragment malicious intents into innocuous interactions. These interactions progressively steer the conversation towards generating harmful outputs. This vulnerability enables attackers to circumvent the safety mechanisms of LLMs through a sequence of meticulously designed prompts, leveraging the model's iterative response generation capabilities. The effectiveness of the attack relies on the adaptive modification of each prompt based on the model's preceding responses, rendering traditional keyword-based detection methods inadequate. Mitigation steps: **For AI Developers:** * Implement multi-turn dialogue safety scrutiny to analyze entire conversation contexts for detecting harmful trajectories. * Develop robust context-aware safety mechanisms to track evolving conversation contexts and flag harmful pathways. **For Model Trainers/Fine-tuners:** * Utilize advanced detection methods beyond simple keyword filtering to identify subtle shifts in conversation direction, including techniques measuring semantic similarity changes. * Adopt reinforcement learning for safety by fine-tuning LLMs with data that includes adversarial prompts to enhance resilience to multi-turn manipulation.

Related Resources (1)

https://arxiv.org/abs/2501.14250

Do you need more information?

CVSS v4

Base Score:

6.9

Attack Vector

NETWORK

Attack Complexity

LOW

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

LOW

Subsequent System Availability

NONE

CVSS v3

Base Score:

5.8

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality

NONE

Integrity

LOW

Availability

NONE

AIVSS

Base Score:

4.8