MAI-2023-0011 | Mend Vulnerability Database

Vulnerability DatabaseMAI-2023-0011

MAI-2023-0011

Published:May 16, 2026

Updated:June 17, 2026

Large Language Models (LLMs), including GPT-4, are susceptible to a novel attack vector known as "CipherChat." This attack exploits the model's safety alignment mechanisms by employing cipher prompts—such as ASCII, Unicode, Caesar cipher, and Morse code—alongside system role descriptions and few-shot enciphered demonstrations. These techniques enable attackers to circumvent the safety filters that are typically trained on natural language inputs. The vulnerability is exacerbated by the model's capacity to interpret a "secret cipher," which is facilitated through role-playing and unsafe demonstrations in natural language, a method referred to as SelfCipher. Mitigation steps: **For AI Developers:** * Develop robust input sanitization techniques to identify and block potentially malicious cipher-based inputs. * Implement detection and mitigation strategies specifically targeting cipher-based prompts. **For Model Trainers/Fine-tuners:** * Include examples with various ciphers and obfuscation techniques in safety training datasets to improve generalization. * Enhance the model's ability to correctly interpret the intent behind prompts, distinguishing between legitimate use of code and attempts to evade safety mechanisms.

Related Resources (1)

https://arxiv.org/abs/2308.06463

Do you need more information?

CVSS v4

Base Score:

6.3

Attack Vector

NETWORK

Attack Complexity

HIGH

Attack Requirements

NONE

Privileges Required

NONE

User Interaction

NONE

Vulnerable System Confidentiality

LOW

Vulnerable System Integrity

LOW

Vulnerable System Availability

NONE

Subsequent System Confidentiality

NONE

Subsequent System Integrity

NONE

Subsequent System Availability

NONE

CVSS v3

Base Score:

4.8

Attack Vector

NETWORK

Attack Complexity

HIGH

Privileges Required

NONE

User Interaction

NONE

Scope

UNCHANGED

Confidentiality

LOW

Integrity

LOW

Availability

NONE

AIVSS

Base Score: