MAI-2023-0010 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2023-0010

MAI-2023-0010

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) that incorporate alignment techniques are susceptible to "jailbreak" attacks. The AutoDAN methodology facilitates the automatic creation of semantically coherent prompts that circumvent safety mechanisms, inducing aligned LLMs to produce malicious outputs. Unlike previous approaches that generated easily detectable nonsensical prompts, AutoDAN exploits vulnerabilities in the LLM's alignment, prompting the model to generate responses that contravene its intended safety protocols. Mitigation steps: **For AI Developers:** * Implement advanced detection mechanisms that surpass basic perplexity checks, focusing on identifying semantically meaningful yet malicious prompts. * Employ multi-layered safety protocols with independent verification steps to ensure secure responses to user queries. * Implement stringent input sanitization techniques to detect and neutralize potentially harmful prompts. **For Model Trainers/Fine-tuners:** * Enhance the robustness of LLM safety features through adversarial training techniques that address semantically meaningful attacks. * Regularly update and refine the alignment training data to mitigate newly discovered vulnerabilities.

Related Resources (1)

https://arxiv.org/abs/2310.04451

Do you need more information?

CVSS v4

Base Score:

6.3

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

NONE

Subsequent System Availability

NONE

CVSS v3

Base Score:

3.7

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

UNCHANGED

Confidentiality

NONE

Integrity

LOW

Availability

NONE

AIVSS

Base Score:

3.8