MAI-2024-0007 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0007

MAI-2024-0007

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) exhibit a vulnerability wherein they inadvertently disclose accurate procedures for harmful tasks when requested to generate fallacious reasoning. This occurs when LLMs are prompted to create a false procedure for a harmful task, yet they unintentionally provide the correct, harmful procedure while falsely asserting its fallacy. This vulnerability can be exploited to circumvent safety mechanisms, leading to the generation of harmful outputs. Mitigation steps: **For AI Developers:** * Implement additional layers of review and verification for sensitive prompts, particularly those involving unlawful actions. * Develop and implement robust safety mechanisms capable of detecting and filtering responses that falsely claim to be incorrect while providing accurate instructions for harmful actions. * Explore alternative prompt engineering techniques to minimize the chances of bypassing established safety mechanisms. **For Model Trainers/Fine-tuners:** * Enhance training data for large language models to improve their ability to generate and identify fallacious reasoning.

Related Resources (1)

https://arxiv.org/abs/2407.00869

Do you need more information?

CVSS v4

Base Score:

9.2

Attack Vector

NETWORK

Attack Complexity

LOW

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

HIGH

Subsequent System Availability

NONE

CVSS v3

Base Score:

8.6

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality

NONE

Integrity

HIGH

Availability

NONE

AIVSS

Base Score:

5.2