Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2024-0063
Published:May 16, 2026
Updated:May 16, 2026
The voice mode of GPT-4o is susceptible to a security vulnerability known as the "Voice Jailbreak" attack. This sophisticated technique exploits narrative elements such as setting, character, and plot within audio prompts to circumvent established safety protocols. By employing these storytelling principles, attackers can manipulate the language model to produce responses that contravene OpenAI's usage policies, including content related to illegal activities, hate speech, physical harm, fraud, pornography, and privacy violations. Notably, this method demonstrates a higher success rate compared to direct forbidden inquiries or text-based jailbreaks converted into audio format. Mitigation steps: **For AI Developers:** * Enhance safety mechanisms to detect and mitigate Voice Jailbreak attacks by refining the model's ability to recognize and resist manipulation through persuasive narratives. * Implement granular control over permitted topics and conversational flows in the voice interface to reduce the attack surface, including restrictions on the length or complexity of voice interactions. **For Model Trainers/Fine-tuners:** * Develop robust detection mechanisms to identify and block conversational patterns indicative of Voice Jailbreak attempts, using techniques such as analyzing the structure and style of voice interactions. * Investigate the feasibility of employing adversarial training techniques to improve the model's resilience to Voice Jailbreak attacks.
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
4.6
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
ACTIVE
Vulnerable System Confidentiality
LOW
Vulnerable System Integrity
LOW
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
HIGH
Subsequent System Availability
NONE
CVSS v3
Base Score:
4.7
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
REQUIRED
Scope
CHANGED
Confidentiality
LOW
Integrity
LOW
Availability
NONE
AIVSS
Base Score:
3.2