MAI-2025-0006 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2025-0006

MAI-2025-0006

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) are susceptible to a sophisticated privacy breach known as the Privacy Jailbreak Attack (PIG), which employs Gradient-based Iterative In-Context Optimization. This attack exploits in-context learning and gradient-based iterative optimization techniques to extract Personally Identifiable Information (PII) from LLMs, effectively circumventing embedded safety protocols. By iteratively refining a specially crafted prompt using gradient information, the attack targets tokens associated with PII entities, thereby enhancing the probability of successful PII extraction. Mitigation steps: **For AI Developers:** * Enhance prompt engineering techniques to effectively detect and prevent malicious prompts targeting PII extraction. * Implement advanced safety mechanisms in LLMs to actively resist and detect PII extraction attempts, including gradient-based attacks. * Apply strict input sanitization and filtering to remove or modify harmful components of prompts before processing by LLMs, using specialized regular expressions or machine learning classifiers. **For Model Trainers/Fine-tuners:** * Conduct regular security audits of LLMs to identify and address vulnerabilities, utilizing adversarial training for increased robustness against attack methods. * Investigate the application of differential privacy techniques to limit PII disclosure during training and inference, balancing privacy with utility.

Related Resources (1)

https://arxiv.org/abs/2505.09921

Do you need more information?

CVSS v4

Base Score:

8.2

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

HIGH

Vulnerable System Integrity

NONE

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

NONE

Subsequent System Availability

NONE

CVSS v3

Base Score:

5.9

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

UNCHANGED

Confidentiality

HIGH

Integrity

NONE

Availability

NONE

AIVSS

Base Score:

5.4