MAI-2025-0014
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) are susceptible to sophisticated multi-turn jailbreak attacks that exploit their reasoning capabilities. The attack, known as Reasoning-Augmented Conversation Exploit (RACE), transforms harmful queries into seemingly benign reasoning tasks. By leveraging the LLM's advanced reasoning abilities, attackers can ultimately induce the model to generate unsafe content. This method effectively circumvents standard safety mechanisms designed to prevent the creation of harmful responses.
Mitigation steps: **For AI Developers:**
* [Implement a layered security approach with multiple safety checks at different stages of the query processing pipeline]
* [Strengthen safety mechanisms beyond simple keyword filtering to detect and prevent reasoning-based attacks]
**For Model Trainers/Fine-tuners:**
* [Develop robust models that can identify and resist manipulation of their reasoning processes]
* [Develop more sophisticated detection methods to identify reasoning-based attacks, including analysis of information gain during conversation]
* [Conduct rigorous red-teaming and adversarial testing to identify and address vulnerabilities before deployment]
Related Resources (1)
Do you need more information?
Contact UsCVSS v4
Base Score:
6.9
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
LOW
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
LOW
Subsequent System Availability
NONE
CVSS v3
Base Score:
5.8
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
NONE
Scope
CHANGED
Confidentiality
NONE
Integrity
LOW
Availability
NONE
AIVSS
Base Score:
4.8