MAI-2024-0025 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0025

MAI-2024-0025

Published:May 16, 2026

Updated:June 17, 2026

The Adaptive Position Pre-Fill Jailbreak Attack (AdaPPA) is a sophisticated method targeting Large Language Models (LLMs) by exploiting their varying levels of alignment protection across different output positions. This attack manipulates the model's instruction-following capabilities by strategically pre-filling the output with meticulously crafted "safe" content. This approach creates a false sense of completion, thereby reducing the model's defenses before introducing malicious content. The attack's success hinges on the adaptive generation of both benign and harmful pre-fill content, which is tactically positioned to exploit vulnerabilities in the model's defense mechanisms at various output stages. Mitigation steps: **For AI Developers:** * Implement advanced output filtering and moderation systems to prevent the generation of malicious content, regardless of preceding text. * Utilize diverse defensive strategies that extend beyond semantic-level analysis, addressing positional vulnerabilities in the model's output generation. **For Model Trainers/Fine-tuners:** * Enhance the robustness of LLM alignment models to accommodate varying lengths of pre-filled outputs. * Develop sophisticated detection techniques to identify patterns indicative of AdaPPA style pre-fill attacks.

Related Resources (1)

https://arxiv.org/abs/2409.07503

Do you need more information?

CVSS v4

Base Score:

8.7

Attack Vector

NETWORK

Attack Complexity

LOW

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

NONE

Subsequent System Availability

NONE

CVSS v3

Base Score:

7.5

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

NONE

Scope

UNCHANGED

Confidentiality

NONE

Integrity

HIGH

Availability

NONE

AIVSS

Base Score:

5.4