MAI-2023-0005
Published:May 16, 2026
Updated:May 16, 2026
The GPT-4V model is susceptible to a system prompt extraction vulnerability, whereby internal system prompts can be revealed through strategically designed incomplete dialogues paired with image inputs. These extracted prompts serve as potent jailbreak tools, circumventing established safety protocols and potentially leading to the generation of undesirable outputs. Such outputs may include the disclosure of personally identifiable information derived from images, posing significant privacy risks.
Mitigation steps: **For AI Developers:**
* Implement robust prompt validation and filtering mechanisms to prevent adversarial prompt injection.
* Develop mechanisms for detecting and mitigating jailbreak attacks, including prompt analysis and output filtering.
**For Model Trainers/Fine-tuners:**
* Regularly audit and update system prompts to minimize vulnerabilities.
* Employ techniques to limit information disclosure in system prompts, ensuring functionality without revealing sensitive details.
* Explore methods to detect and prevent system prompt extraction attempts.
Related Resources (1)
Do you need more information?
Contact UsCVSS v4
Base Score:
8.2
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
HIGH
Vulnerable System Integrity
NONE
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
5.9
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
HIGH
Integrity
NONE
Availability
NONE
AIVSS
Base Score:
4.9