Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2023-0003
Published:May 16, 2026
Updated:May 16, 2026
A vulnerability has been identified within the fine-tuning API of GPT-4, which permits adversaries to bypass the Reinforcement Learning from Human Feedback (RLHF) safety mechanisms. By employing a relatively small set of meticulously crafted prompt-response pairs, attackers can fine-tune the model to produce harmful content. This content includes instructions for illegal activities and the creation of hazardous materials, which the base model is designed to refuse to generate. Mitigation steps: **For AI Developers:** * Implement post-processing mechanisms to detect and filter harmful content generated by the model. * Restrict access to the API to trusted users and organizations. **For Model Trainers/Fine-tuners:** * Implement robust input sanitization and filtering to prevent malicious prompts from being used in the fine-tuning process, including detecting and blocking prompts designed to elicit harmful responses. * Continuously monitor fine-tuning datasets for the presence of malicious or harmful content, incorporating automated detection systems into the fine-tuning process. * Develop more robust Reinforcement Learning from Human Feedback (RLHF) techniques that are less susceptible to fine-tuning attacks.
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
8.3
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
LOW
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
HIGH
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
HIGH
Subsequent System Availability
NONE
CVSS v3
Base Score:
7.7
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
LOW
User Interaction
NONE
Scope
CHANGED
Confidentiality
NONE
Integrity
HIGH
Availability
NONE
AIVSS
Base Score:
5.7