Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2023-0010
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) that incorporate alignment techniques are susceptible to "jailbreak" attacks. The AutoDAN methodology facilitates the automatic creation of semantically coherent prompts that circumvent safety mechanisms, inducing aligned LLMs to produce malicious outputs. Unlike previous approaches that generated easily detectable nonsensical prompts, AutoDAN exploits vulnerabilities in the LLM's alignment, prompting the model to generate responses that contravene its intended safety protocols. Mitigation steps: **For AI Developers:** * Implement advanced detection mechanisms that surpass basic perplexity checks, focusing on identifying semantically meaningful yet malicious prompts. * Employ multi-layered safety protocols with independent verification steps to ensure secure responses to user queries. * Implement stringent input sanitization techniques to detect and neutralize potentially harmful prompts. **For Model Trainers/Fine-tuners:** * Enhance the robustness of LLM safety features through adversarial training techniques that address semantically meaningful attacks. * Regularly update and refine the alignment training data to mitigate newly discovered vulnerabilities.
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
6.3
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
LOW
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
3.7
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
NONE
Integrity
LOW
Availability
NONE
AIVSS
Base Score:
3.8