MAI-2024-0024
Published:May 16, 2026
Updated:May 16, 2026
Large Language Models (LLMs) are susceptible to a sophisticated attack known as "bijection learning," which exploits in-context learning to teach the model a custom string-to-string encoding. This method effectively bypasses the model's inherent safety mechanisms by encoding malicious queries, transmitting them to the model, and subsequently decoding the responses. The attack's complexity can be adjusted to suit different LLMs, with more advanced models being more vulnerable to intricate encoding schemes.
Mitigation steps: **For AI Developers:**
* Implement advanced input/output filtering systems to detect and block encoded malicious prompts and responses, even if they are not explicitly flagged as harmful.
* Develop mechanisms to identify and mitigate computational overload caused by complex encoding processing, potentially through resource limiting or computational complexity analysis.
**For Model Trainers/Fine-tuners:**
* Enhance LLM safety mechanisms to ensure robustness against in-context learning of arbitrary encodings.
* Conduct regular evaluations of LLMs to identify vulnerabilities against novel attacks, including those exploiting in-context learning and encoding techniques.
Related Resources (1)
Do you need more information?
Contact UsCVSS v4
Base Score:
8.7
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
HIGH
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
7.5
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality
NONE
Integrity
HIGH
Availability
NONE
AIVSS
Base Score:
6.4