MAI-2024-0048 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0048

MAI-2024-0048

Published:May 16, 2026

Updated:June 17, 2026

This report details a sophisticated black-box attack framework that utilizes fuzz testing to automatically craft concise and semantically coherent prompts capable of bypassing safety mechanisms in large language models (LLMs). The attack initiates with an empty seed pool and employs LLM-assisted mutation strategies, including Role-play, Contextualization, and Expansion, alongside a dual-level judge module to efficiently identify successful jailbreak attempts. The framework's efficacy has been demonstrated across various open-source and proprietary LLMs, achieving a success rate that surpasses existing benchmarks by over 60% in certain instances. Mitigation steps: **For AI Developers:** * Implement robust and multi-layered safety mechanisms that extend beyond simple input filtering and output restrictions. * Develop sophisticated detection methods that are resistant to semantically coherent adversarial prompts, utilizing contextual understanding and advanced anomaly detection techniques. **For Model Trainers/Fine-tuners:** * Regularly update and refine LLM safety models and datasets to adapt to evolving jailbreaking techniques. * Investigate techniques to detect and mitigate attacks based on prompt length and perplexity. * Employ adversarial training to enhance LLM robustness against diverse attack vectors.

Related Resources (1)

https://arxiv.org/abs/2409.14866

Do you need more information?

CVSS v4

Base Score:

6.3

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

LOW

Subsequent System Availability

NONE

CVSS v3

Base Score:

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality

NONE

Integrity

LOW

Availability

NONE

AIVSS

Base Score:

4.5