MAI-2024-0011
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) that incorporate safety protocols such as filtering mechanisms and alignment training are susceptible to information leakage through "Decomposition Attacks." These attacks strategically fragment a malicious query into multiple innocuous sub-queries, prompting the LLM to generate responses that, when combined, disclose sensitive information. This process circumvents safety filters and avoids producing overtly harmful outputs.
Mitigation steps: **For AI Developers:**
* Implement mechanisms to bound the leakage of impermissible information through interactions, exploring randomized response mechanisms to minimize sensitive information exposure.
* Develop sophisticated filtering techniques to identify and block sequences of responses that collectively reveal sensitive information, focusing on sequences of questions as potential attack vectors.
* Implement systems for query analysis to evaluate the intent and context of user queries, aiding in the detection of malicious attempts to extract information through aggregated interactions.
* Restrict the number of interactions a user can have with the LLM within a specified timeframe to reduce the risk of accumulating knowledge necessary for successful attacks.
* Apply rate limiting to control the number of queries per user, mitigating the impact of automated attempts.
**For Model Trainers/Fine-tuners:**
* Enhance model training protocols to incorporate mechanisms that limit information leakage through interactions, including exploration of randomized response techniques.
* Train models with advanced filtering capabilities to recognize and block sequences of responses that may collectively disclose sensitive information.
* Integrate query analysis capabilities into model training to better understand user intent and context, facilitating detection of composed malicious information extraction attempts.
* Implement interaction limits during model training to prevent excessive user engagement that could lead to successful attacks.
* Incorporate rate limiting strategies in model training to manage query volumes and reduce automation risks.
Related Resources (1)
Do you need more information?
Contact UsCVSS v4
Base Score:
9
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
LOW
Vulnerable System Integrity
HIGH
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
HIGH
Subsequent System Availability
NONE
CVSS v3
Base Score:
7.5
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
CHANGED
Confidentiality
LOW
Integrity
HIGH
Availability
NONE
AIVSS
Base Score:
5.6