Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2024-0025
Published:May 16, 2026
Updated:May 16, 2026
The Adaptive Position Pre-Fill Jailbreak Attack (AdaPPA) is a sophisticated method targeting Large Language Models (LLMs) by exploiting their varying levels of alignment protection across different output positions. This attack manipulates the model's instruction-following capabilities by strategically pre-filling the output with meticulously crafted "safe" content. This approach creates a false sense of completion, thereby reducing the model's defenses before introducing malicious content. The attack's success hinges on the adaptive generation of both benign and harmful pre-fill content, which is tactically positioned to exploit vulnerabilities in the model's defense mechanisms at various output stages. Mitigation steps: **For AI Developers:** * Implement advanced output filtering and moderation systems to prevent the generation of malicious content, regardless of preceding text. * Utilize diverse defensive strategies that extend beyond semantic-level analysis, addressing positional vulnerabilities in the model's output generation. **For Model Trainers/Fine-tuners:** * Enhance the robustness of LLM alignment models to accommodate varying lengths of pre-filled outputs. * Develop sophisticated detection techniques to identify patterns indicative of AdaPPA style pre-fill attacks.
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
8.7
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
HIGH
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
7.5
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
NONE
Integrity
HIGH
Availability
NONE
AIVSS
Base Score:
5.4