MAI-2024-0019 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0019

MAI-2024-0019

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs) are prone to a specific vulnerability where they exhibit a bias towards authoritative sources. This allows malicious actors to circumvent built-in safety mechanisms by crafting prompts that include fabricated citations resembling credible sources, such as academic papers or GitHub repositories. The inherent trust these models place in such citations can lead to the generation of harmful content, as the models are misled into believing the information is legitimate. Mitigation steps: **For AI Developers:** * [Implement robust citation verification mechanisms to authenticate referenced sources before incorporating them into the LLM's response generation process] * [Develop methods to identify and filter prompts containing fabricated or misleading citations] * [Incorporate harmfulness detection systems that specifically target responses generated based on potentially malicious citations] * [Employ multiple sampling and response analysis techniques to identify and mitigate the generation of harmful content] **For Model Trainers/Fine-tuners:** * [Train models on datasets that explicitly counter the bias towards authoritative sources, introducing examples where authoritative-sounding information is false or misleading]

Related Resources (1)

https://arxiv.org/abs/2411.11407

Do you need more information?

CVSS v4

Base Score:

8.7

Attack Vector

NETWORK

Attack Complexity

LOW

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

HIGH

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

NONE

Subsequent System Availability

NONE

CVSS v3

Base Score:

7.5

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

NONE

Scope

UNCHANGED

Confidentiality

NONE

Integrity

HIGH

Availability

NONE

AIVSS

Base Score:

5.7