MAI-2024-0013 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0013

MAI-2024-0013

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) are susceptible to jailbreak attacks facilitated by autonomously discovered strategies. The AutoDAN-Turbo method exemplifies a black-box attack approach capable of identifying novel and highly effective jailbreak strategies without human intervention. This method achieves a notable success rate, such as 88.5% on GPT-4-1106-turbo, in extracting harmful or unsafe responses from LLMs. The attack employs a lifelong learning agent to iteratively refine strategies based on model feedback, resulting in progressively effective prompts that circumvent established safety protocols. Mitigation steps: **For AI Developers:** * Implement advanced detection systems to identify and block malicious prompts effectively. * Continuously assess and enhance safety mechanisms in response to new attack techniques, including automated jailbreaks. **For Model Trainers/Fine-tuners:** * Integrate robust safety mechanisms into LLMs to withstand iterative attacks and strategy adaptation. * Conduct regular red teaming exercises using diverse attack strategies to uncover and mitigate vulnerabilities.

Related Resources (1)

https://arxiv.org/abs/2410.05295

Do you need more information?

CVSS v4

Base Score:

8.9

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

HIGH

Subsequent System Availability

NONE

CVSS v3

Base Score:

6.8

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality

NONE

Integrity

HIGH

Availability

NONE

AIVSS

Base Score: