MAI-2024-0062
Published:May 16, 2026
Updated:May 16, 2026
GPT-4o is susceptible to jailbreak attacks conducted through audio prompts, despite its improved defenses against text-based vulnerabilities. These attacks exploit the model by converting adversarial text prompts—originally designed to target other large language models (LLMs) using methods such as GCG, AutoDAN, PAP, and BAP—into audio format via text-to-speech (TTS) synthesis. This approach effectively bypasses GPT-4o's safety mechanisms, enabling the generation of unsafe responses that the model would typically suppress. The efficacy of these audio-based attacks is on par with traditional text-based methods, highlighting a critical security flaw within the audio processing pipeline of GPT-4o.
Mitigation steps: **For AI Developers:**
* Implement robust audio pre-processing and content filtering to detect and mitigate adversarial audio prompts effectively.
* Develop sophisticated detection mechanisms to identify and block attempts to circumvent safety measures using audio.
**For Model Trainers/Fine-tuners:**
* Enhance model training to improve resilience against audio-based adversarial attacks, utilizing adversarial training methods focused on the audio modality.
* Regularly update and refine safety protocols based on ongoing research and discovered vulnerabilities, addressing gaps highlighted by recent attacks.
Related Resources (1)
Do you need more information?
Contact UsCVSS v4
Base Score:
5.1
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
ACTIVE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
LOW
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
LOW
Subsequent System Availability
NONE
CVSS v3
Base Score:
4.7
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
REQUIRED
Scope
CHANGED
Confidentiality
NONE
Integrity
LOW
Availability
NONE
AIVSS
Base Score:
3.4