Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2024-0018
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) that have undergone safety fine-tuning are susceptible to a sophisticated attack known as Response-Guided Question Augmentation (ReG-QA). This method exploits the disparity in safety alignment between the processes of question generation and answer formulation. By introducing toxic answers generated by an unaligned LLM to a safety-aligned LLM, ReG-QA facilitates the creation of semantically related, naturally phrased questions that effectively bypass established safety protocols, leading to the generation of undesirable responses. Notably, this attack circumvents the need for adversarial prompt engineering or model optimization. Mitigation steps: **For AI Developers:** * Implement robust filtering mechanisms that utilize deeper analysis of both input and output content, surpassing standard perplexity checks. * Develop defenses that prioritize semantic understanding over reliance on surface-level features of the input. **For Model Trainers/Fine-tuners:** * Improve the symmetry of safety training to enhance generalization capabilities for question generation from unsafe answers. * Investigate and address the "reversal curse," ensuring safety training effectiveness in both directions (question to answer and answer to question).
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
8.7
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
HIGH
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
7.5
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
NONE
Integrity
HIGH
Availability
NONE
AIVSS
Base Score:
5.2