MAI-2025-0006
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) are susceptible to a sophisticated privacy breach known as the Privacy Jailbreak Attack (PIG), which employs Gradient-based Iterative In-Context Optimization. This attack exploits in-context learning and gradient-based iterative optimization techniques to extract Personally Identifiable Information (PII) from LLMs, effectively circumventing embedded safety protocols. By iteratively refining a specially crafted prompt using gradient information, the attack targets tokens associated with PII entities, thereby enhancing the probability of successful PII extraction.
Mitigation steps: **For AI Developers:**
* Enhance prompt engineering techniques to effectively detect and prevent malicious prompts targeting PII extraction.
* Implement advanced safety mechanisms in LLMs to actively resist and detect PII extraction attempts, including gradient-based attacks.
* Apply strict input sanitization and filtering to remove or modify harmful components of prompts before processing by LLMs, using specialized regular expressions or machine learning classifiers.
**For Model Trainers/Fine-tuners:**
* Conduct regular security audits of LLMs to identify and address vulnerabilities, utilizing adversarial training for increased robustness against attack methods.
* Investigate the application of differential privacy techniques to limit PII disclosure during training and inference, balancing privacy with utility.
Related Resources (1)
Do you need more information?
Contact UsCVSS v4
Base Score:
8.2
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
HIGH
Vulnerable System Integrity
NONE
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
5.9
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
HIGH
Integrity
NONE
Availability
NONE
AIVSS
Base Score:
5.4