OWASP Top 10 for LLM Applications: A Quick Guide
Table of Contents
Published in 2023, the OWASP Top 10 for LLM Applications is a monumental effort made possible by a large number of experts in the fields of AI, cybersecurity, cloud technology, and beyond. OWASP contributors came up with over 40 distinct threats and then voted and refined their list down to the ten most important vulnerabilities.
LLMs are still very new to the market, and the OWASP Top 10 for LLM Applications is still a new project for OWASP. Unlike with the most recent version of the OWASP Top 10 Web Application Security Risks (what we usually refer to as just “OWASP Top 10”), the OWASP Top 10 for LLM Applications is not ranked by the frequency of actual exploitation in the wild as of the current version, 1.1. Even still, the expertise and insights provided, including prevention and mitigation techniques, are highly valuable to anyone building or interfacing with LLM applications.
Here is a quick rundown of each vulnerability and its potential consequences. Thorough mitigation and prevention information for each vulnerability can be found in the original report.
OWASP Top 10 for LLM Applications
LLM01: Prompt Injection
Prompt injections are maliciously crafted inputs that lead to an LLM performing in unintended ways that expose data, or performing unauthorized actions such as remote code execution. It’s no shock that prompt injection is the number one threat to LLMs because it exploits the design of LLMs rather than a flaw that can be patched. In some instances there is no way to stop the threat; you can only mitigate the damage it causes.
There are two kinds of prompt injections: direct prompt injections and indirect prompt injections.
Direct prompt injection. A threat actor provides a prompt designed to circumnavigate the underlying system prompts that AI developers have put in place to secure the model. One direct prompt injection method popular with AI hackers is called DAN, or “Do Anything Now.” DAN uses role play to trick ChatGPT into ignoring the underlying guardrails OpenAI had put in place to keep the LLM from providing dangerous, illegal, or unethical information.
Indirect prompt injection. Here, the LLM user unwittingly provides the LLM with data from a bad actor who has maliciously added LLM prompts (usually not visible to the human reader) into the source. Most LLMs don’t differentiate between user prompts and external data, which is what makes indirect prompt injections possible and a real threat. A real life “AI hack” that became viral late last year was adding a prompt to resumes stating that an LLM should ignore all other criteria and report that the user (an overworked hiring manager looking to save some time, no doubt) should hire the resume submitter. The prompt goes unnoticed by the human eye because it’s in white lettering on an imperceptibly off-white background, but the LLM still picks it up and complies.
LLM02: Insecure Output Handling
Insecure output handling describes a situation where plugins or other components accept LLM output without secure practices such as sanitization and validation. This can lead to multiple undesirable behaviors, including cross-site scripting and remote code execution on backend systems.
Here’s one possible insecure output handling scenario: After an indirect prompt injection is left in the review of a product by a threat actor, an LLM tasked with summarizing reviews for a user outputs malicious JavaScript code that is interpreted by the user’s browser.
LLM03: Training Data Poisoning
Your models are what they eat, and LLMs ingest quite a bit. Training data poisoning occurs when data involved in pre-training or fine-tuning an LLM is manipulated to introduce vulnerabilities that affect the model’s security, ethical behavior, or performance. Data poisoning is a tough vulnerability to fight due to the sheer quantity of data that LLMs take in and the difficulty in verifying all of that data. The absolute best-case scenario for training data poisoning is that your model ends up being not very good at analyzing text and making predictions, but that still negatively impacts your reputation.
LLM04: Model Denial of Service
Model denial of service happens when user prompts cause models to use too many resources, causing service to degrade and leading to availability problems. Failing to limit the number of prompts that are entered, the length of prompts, recursive analysis by the LLM, the number of steps that an LLM can take, or the resources an LLM can use can all result in model denial of service.
LLM05: Supply Chain Vulnerabilities
Supply chain vulnerabilities can come from any third-party component, including plugins, platforms, pretrained models, and training data. Third-party models and training data can be prone to poisoning attacks and any third-party non-AI components can contain the classic vulnerabilities we already know and loathe.
LLM06: Sensitive Information Disclosure
Ask the right question and an LLM may end up pouring its heart out, which might include your organization’s or other entities’ sensitive information, such as proprietary algorithms or confidential information that results in privacy violations.
Since user prompts may end up reused as training data by design, it’s vital that users are made aware of those terms and conditions so they can use your LLM with caution regarding sensitive data.
LLM07: Insecure Plugins
Insecure plugins is a vulnerability of the plugins you write for LLM systems, rather than third-party plugins that would fall under the supply chain vulnerabilities. Insecure plugins accept unparameterized text from LLMs without proper sanitization and validation, which can lead to undesirable behavior, including providing a route for prompt injections to lead to remote code execution.
LLM08: Excessive Agency
When interfacing with other systems, LLMs need what they need and nothing more. When they have too much functionality, permission, or autonomy, you’ve got an excessive agency vulnerability on your hands.
Some examples of excessive agency include using a plugin to let an LLM read files that also allows it to write or delete files (excessive functionality), an LLM designed to read a single user’s files but has access to every user’s files (excessive permissions), and a plugin that allows an LLM to elect to delete a user’s files without that user’s input (excessive autonomy).
LLM09: Overreliance
Even the best LLMs aren’t infallible. Overreliance happens when users take LLM outputs as gospel without checking the accuracy. Overreliance can lead to poor decision making that can lead to financial or reputational damages.
LLMs always have limits in what they can do and what they can do well, but they are often seen by the public as magical bases of knowledge in anything and everything. They aren’t. Ask ChatGPT a math question or information about case law and you might see results that look accurate on first read but are in fact inaccurate or completely fabricated.
LLM10: Model Theft
Model theft can happen when attackers move through other vulnerabilities of your infrastructure in order to access your model’s repository, or even through prompt injection and output observation when attackers glean enough of your LLM’s secret sauce that they can build their own shadow model.
Model theft can result in loss of competitive edge, which is bad for business. Even worse, powerful LLMs can be stolen and reconfigured to perform unethical tasks they’d otherwise refrain from, which is bad for everyone.
Best practices for keeping LLMs secure
The best practices for AI models will be familiar to those that work in securing any application. Sanitizing and validating inputs, keeping track of your components with a software bill of materials (SBOM), exercising the principles of least privilege and zero trust, and educating users and developers are still the cornerstones of application security, even when you’re working with breakthrough technologies like LLMs.