Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2024-0007
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) exhibit a vulnerability wherein they inadvertently disclose accurate procedures for harmful tasks when requested to generate fallacious reasoning. This occurs when LLMs are prompted to create a false procedure for a harmful task, yet they unintentionally provide the correct, harmful procedure while falsely asserting its fallacy. This vulnerability can be exploited to circumvent safety mechanisms, leading to the generation of harmful outputs. Mitigation steps: **For AI Developers:** * Implement additional layers of review and verification for sensitive prompts, particularly those involving unlawful actions. * Develop and implement robust safety mechanisms capable of detecting and filtering responses that falsely claim to be incorrect while providing accurate instructions for harmful actions. * Explore alternative prompt engineering techniques to minimize the chances of bypassing established safety mechanisms. **For Model Trainers/Fine-tuners:** * Enhance training data for large language models to improve their ability to generate and identify fallacious reasoning.
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
9.2
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
HIGH
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
HIGH
Subsequent System Availability
NONE
CVSS v3
Base Score:
8.6
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
NONE
Scope
CHANGED
Confidentiality
NONE
Integrity
HIGH
Availability
NONE
AIVSS
Base Score:
5.2