Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
CVE-2026-53923
Published:June 22, 2026
Updated:June 29, 2026
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
Do you need more information?
Contact Us
CVSS v4
Base Score:
5.3
Attack Vector
NETWORK
Attack Complexity
LOW
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
PASSIVE
Vulnerable System Confidentiality
LOW
Vulnerable System Integrity
LOW
Vulnerable System Availability
NONE
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
5.4
Attack Vector
NETWORK
Attack Complexity
LOW
Privileges Required
NONE
User Interaction
REQUIRED
Scope
UNCHANGED
Confidentiality
LOW
Integrity
LOW
Availability
NONE
Weakness Type (CWE)
Incorrect Conversion between Numeric Types
Exposure of Sensitive Information to an Unauthorized Actor
EPSS
Base Score:
0.28