MAI-2024-0014 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0014

MAI-2024-0014

Published:May 16, 2026

Updated:May 16, 2026

This vulnerability in large language models (LLMs) enables attackers to extract unsafe or unethical responses by employing a sequence of semantically connected multi-turn prompts. Known as the "Chain of Attack" (CoA), this method exploits the model's contextual comprehension and adaptive response mechanisms to incrementally guide the dialogue towards harmful outputs, circumventing single-turn prompt rejection by safety protocols. The attack utilizes semantic similarity scoring, such as SIMCSE, to strategically generate prompts that progressively align with the intended malicious objective. Mitigation steps: **For AI Developers:** * Develop robust safety mechanisms that resist multi-turn attacks through sophisticated contextual analysis and semantic drift detection. * Implement real-time monitoring and filtering of outputs to detect and block unsafe responses, regardless of prompt appearance. **For Model Trainers/Fine-tuners:** * Enhance Reinforcement Learning from Human Feedback (RLHF) to effectively manage adversarial prompts and prevent harmful responses in multi-turn contexts. * Conduct adversarial training using examples generated by techniques like CoA to improve model robustness against attacks.

Related Resources (1)

https://arxiv.org/abs/2405.05610

Do you need more information?

CVSS v4

Base Score:

8.9

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

HIGH

Subsequent System Availability

NONE

CVSS v3

Base Score:

6.8

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality

NONE

Integrity

HIGH

Availability

NONE

AIVSS

Base Score: