MAI-2024-0011 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0011

MAI-2024-0011

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) that incorporate safety protocols such as filtering mechanisms and alignment training are susceptible to information leakage through "Decomposition Attacks." These attacks strategically fragment a malicious query into multiple innocuous sub-queries, prompting the LLM to generate responses that, when combined, disclose sensitive information. This process circumvents safety filters and avoids producing overtly harmful outputs. Mitigation steps: **For AI Developers:** * Implement mechanisms to bound the leakage of impermissible information through interactions, exploring randomized response mechanisms to minimize sensitive information exposure. * Develop sophisticated filtering techniques to identify and block sequences of responses that collectively reveal sensitive information, focusing on sequences of questions as potential attack vectors. * Implement systems for query analysis to evaluate the intent and context of user queries, aiding in the detection of malicious attempts to extract information through aggregated interactions. * Restrict the number of interactions a user can have with the LLM within a specified timeframe to reduce the risk of accumulating knowledge necessary for successful attacks. * Apply rate limiting to control the number of queries per user, mitigating the impact of automated attempts. **For Model Trainers/Fine-tuners:** * Enhance model training protocols to incorporate mechanisms that limit information leakage through interactions, including exploration of randomized response techniques. * Train models with advanced filtering capabilities to recognize and block sequences of responses that may collectively disclose sensitive information. * Integrate query analysis capabilities into model training to better understand user intent and context, facilitating detection of composed malicious information extraction attempts. * Implement interaction limits during model training to prevent excessive user engagement that could lead to successful attacks. * Incorporate rate limiting strategies in model training to manage query volumes and reduce automation risks.

Related Resources (1)

https://arxiv.org/abs/2407.02551

Do you need more information?

CVSS v4

Base Score:

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

LOW

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

HIGH

Subsequent System Availability

NONE

CVSS v3

Base Score:

7.5

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

CHANGED

Confidentiality

LOW

Integrity

HIGH

Availability

NONE

AIVSS

Base Score:

5.6