MAI-2024-0057 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0057

MAI-2024-0057

Published:May 16, 2026

Updated:May 16, 2026

Large Language Models (LLMs) are susceptible to adversarial prompting attacks, a method where a strategically crafted suffix is appended to an input instruction, prompting the LLM to produce unsafe or harmful content. The AdvPrompter technique employs a separate LLM to generate these adversarial suffixes, effectively and swiftly circumventing established LLM safety protocols. These suffixes are designed to be human-readable and contextually appropriate, rendering them more challenging to detect compared to previous adversarial methods. This attack demonstrates efficacy against both open-source and proprietary (black-box) LLMs through transfer attacks. Mitigation steps: **For AI Developers:** * Implement advanced prompt filtering systems that extend beyond basic perplexity evaluations. * Deploy enhanced defense models capable of identifying and mitigating adversarial prompt threats. * Utilize supplementary LLM-based security measures to analyze both input prompts and generated outputs for alignment. **For Model Trainers/Fine-tuners:** * Conduct regular red-teaming exercises with LLMs using a variety of sophisticated attack strategies. * Integrate adversarial training methodologies into the LLM training pipeline to improve resilience against adversarial prompts.

Related Resources (1)

https://arxiv.org/abs/2404.16873

Do you need more information?

CVSS v4

Base Score:

6.3

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

NONE

Subsequent System Availability

NONE

CVSS v3

Base Score:

3.7

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

UNCHANGED

Confidentiality

NONE

Integrity

LOW

Availability

NONE

AIVSS

Base Score: