Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2024-0036
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) are susceptible to sophisticated jailbreak attacks that exploit benign data mirroring techniques. In this attack vector, adversaries construct a local "mirror model" by training it on non-malicious data sourced from the target LLM. This mirror model replicates the target's operational characteristics and is subsequently employed to craft adversarial prompts. These prompts are strategically deployed against the target LLM, effectively circumventing content moderation systems due to the absence of overtly malicious elements during the initial data collection phase. Mitigation steps: **For AI Developers:** * Implement robust input detection systems to identify subtle patterns indicative of shadow model attacks, focusing on statistically unusual query sequences or prompt variations. * Develop adaptable safety mechanisms that dynamically adjust response strategies based on real-time threat assessments, ensuring regular updates to dynamic threat models. **For Model Trainers/Fine-tuners:** * Train LLMs using diverse datasets that include a wide range of potential adversarial prompts, incorporating shadow model techniques to enhance safety alignment.
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
8.2
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
HIGH
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
5.9
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
NONE
Integrity
HIGH
Availability
NONE
AIVSS
Base Score:
5.7