MAI-2024-0043 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2024-0043

MAI-2024-0043

Published:May 16, 2026

Updated:May 16, 2026

The Social Facilitation Prompt (SoP) framework facilitates the automated creation of jailbreak prompts, effectively circumventing the safety mechanisms embedded within large language models (LLMs). This framework employs multiple optimized "jailbreak characters" within a single prompt to coerce the LLM into generating harmful or undesirable content, even in the absence of pre-existing jailbreak templates. The vulnerability has been successfully demonstrated on models such as GPT-3.5, GPT-4, and LLaMA-2. Mitigation steps: **For AI Developers:** * [Enhance LLM safety mechanisms to resist attacks using multiple personas or collaborative narratives.] * [Implement detection systems to identify and block prompts utilizing strategies such as multiple characters with specific instructions or affirmative prefixes.] * [Regularly update and refine safety filters and guardrails in response to emerging attack techniques.] **For Model Trainers/Fine-tuners:** * [Employ a combination of detection-based and prompt-based defensive strategies, ensuring thorough evaluation of their effectiveness against automated attacks.]

Related Resources (1)

https://arxiv.org/abs/2407.01902

Do you need more information?

CVSS v4

Base Score:

6.9

Attack Vector

NETWORK

Attack Complexity

LOW

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

NONE

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

NONE

Subsequent System Availability

NONE

CVSS v3

Base Score:

5.3

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

NONE

Scope

UNCHANGED

Confidentiality

NONE

Integrity

LOW

Availability

NONE

AIVSS

Base Score: