MAI-2024-0049
Published:May 16, 2026
Updated:May 16, 2026
The Heuristic Token Search (HTS) Attack represents a sophisticated method for circumventing safety protocols in text-to-image (T2I) models, enabling the generation of Not Safe For Work (NSFW) content. This attack involves systematically substituting tokens within a malicious prompt with semantically similar alternatives from the model's vocabulary, thereby evading detection by both prompt and image verification systems. The approach utilizes a surrogate Contrastive Language–Image Pretraining (CLIP) model to ensure semantic alignment with the intended NSFW prompt.
Mitigation steps: **For AI Developers:**
* Implement advanced prompt and image analysis methods to reduce susceptibility to semantic manipulation.
* Regularly update safety filters and integrate insights from security research to counteract evolving threats.
**For Model Trainers/Fine-tuners:**
* Develop enhanced defenses against black-box attacks by utilizing techniques that evaluate prompt embeddings beyond basic keyword matching.
* Investigate diverse and sophisticated safety training methodologies to bolster the model's resilience against adversarial prompts.
Related Resources (1)
Do you need more information?
Contact UsCVSS v4
Base Score:
6.3
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
NONE
Vulnerable System Confidentiality
NONE
Vulnerable System Integrity
LOW
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
LOW
Subsequent System Availability
NONE
CVSS v3
Base Score:
4
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
CHANGED
Confidentiality
NONE
Integrity
LOW
Availability
NONE
AIVSS
Base Score:
4.5