MAI-2024-0062 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0062

MAI-2024-0062

Published:May 16, 2026

Updated:June 17, 2026

GPT-4o is susceptible to jailbreak attacks conducted through audio prompts, despite its improved defenses against text-based vulnerabilities. These attacks exploit the model by converting adversarial text prompts—originally designed to target other large language models (LLMs) using methods such as GCG, AutoDAN, PAP, and BAP—into audio format via text-to-speech (TTS) synthesis. This approach effectively bypasses GPT-4o's safety mechanisms, enabling the generation of unsafe responses that the model would typically suppress. The efficacy of these audio-based attacks is on par with traditional text-based methods, highlighting a critical security flaw within the audio processing pipeline of GPT-4o. Mitigation steps: **For AI Developers:** * Implement robust audio pre-processing and content filtering to detect and mitigate adversarial audio prompts effectively. * Develop sophisticated detection mechanisms to identify and block attempts to circumvent safety measures using audio. **For Model Trainers/Fine-tuners:** * Enhance model training to improve resilience against audio-based adversarial attacks, utilizing adversarial training methods focused on the audio modality. * Regularly update and refine safety protocols based on ongoing research and discovered vulnerabilities, addressing gaps highlighted by recent attacks.

Related Resources (1)

https://arxiv.org/abs/2406.06302

Do you need more information?

CVSS v4

Base Score:

5.1

Attack Vector

NETWORK

Attack Complexity

LOW

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

ACTIVE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

LOW

Subsequent System Availability

NONE

CVSS v3

Base Score:

4.7

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

REQUIRED

Scope

CHANGED

Confidentiality

NONE

Integrity

LOW

Availability

NONE

AIVSS

Base Score:

3.4