Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
MAI-2023-0011
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs), including GPT-4, are susceptible to a novel attack vector known as "CipherChat." This attack exploits the model's safety alignment mechanisms by employing cipher prompts—such as ASCII, Unicode, Caesar cipher, and Morse code—alongside system role descriptions and few-shot enciphered demonstrations. These techniques enable attackers to circumvent the safety filters that are typically trained on natural language inputs. The vulnerability is exacerbated by the model's capacity to interpret a "secret cipher," which is facilitated through role-playing and unsafe demonstrations in natural language, a method referred to as SelfCipher. Mitigation steps: **For AI Developers:** * Develop robust input sanitization techniques to identify and block potentially malicious cipher-based inputs. * Implement detection and mitigation strategies specifically targeting cipher-based prompts. **For Model Trainers/Fine-tuners:** * Include examples with various ciphers and obfuscation techniques in safety training datasets to improve generalization. * Enhance the model's ability to correctly interpret the intent behind prompts, distinguishing between legitimate use of code and attempts to evade safety mechanisms.
Related Resources (1)
Do you need more information?
Contact Us
CVSS v4
Base Score:
6.3
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
LOW
Vulnerable System Integrity
LOW
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
4.8
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
LOW
Integrity
LOW
Availability
NONE
AIVSS
Base Score:
4