MAI-2024-0063 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0063

MAI-2024-0063

Published:May 16, 2026

Updated:June 17, 2026

The voice mode of GPT-4o is susceptible to a security vulnerability known as the "Voice Jailbreak" attack. This sophisticated technique exploits narrative elements such as setting, character, and plot within audio prompts to circumvent established safety protocols. By employing these storytelling principles, attackers can manipulate the language model to produce responses that contravene OpenAI's usage policies, including content related to illegal activities, hate speech, physical harm, fraud, pornography, and privacy violations. Notably, this method demonstrates a higher success rate compared to direct forbidden inquiries or text-based jailbreaks converted into audio format. Mitigation steps: **For AI Developers:** * Enhance safety mechanisms to detect and mitigate Voice Jailbreak attacks by refining the model's ability to recognize and resist manipulation through persuasive narratives. * Implement granular control over permitted topics and conversational flows in the voice interface to reduce the attack surface, including restrictions on the length or complexity of voice interactions. **For Model Trainers/Fine-tuners:** * Develop robust detection mechanisms to identify and block conversational patterns indicative of Voice Jailbreak attempts, using techniques such as analyzing the structure and style of voice interactions. * Investigate the feasibility of employing adversarial training techniques to improve the model's resilience to Voice Jailbreak attacks.

Related Resources (1)

https://arxiv.org/abs/2405.19103

Do you need more information?

CVSS v4

Base Score:

4.6

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

ACTIVE

Vulnerable System Confidentiality

LOW

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

HIGH

Subsequent System Availability

NONE

CVSS v3

Base Score:

4.7

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

REQUIRED

Scope

CHANGED

Confidentiality

LOW

Integrity

LOW

Availability

NONE

AIVSS

Base Score:

3.2