Quick Guide to Popular AI Licenses

Table of Contents

Only about 35 percent of the models on Hugging Face bear any license at all. Of those that do, roughly 60 percent fall under traditional open source licenses. But while the majority of licensed AI models may be open source, some very large projects–including Midjourney, BLOOM, and LLaMa—fall under that remaining 40 percent category. So let’s take a look at some of the top AI model licenses on Hugging Face, including the most popular open source and not-so-open source licenses.

Open source licenses

Ahhh… refreshing, timeless open source licenses. We know ’em, we usually love ’em, but it’s actually hard to say how well these licenses transition from covering traditional applications—which are distributed as binaries, source code, or both—to covering AI models, which are usually shared as the model binary plus training weights that essentially act as configuration files. 

The thing that makes existing open source licenses tricky for AI is that they define their terms around source code, and it is not the source code that people are generally interested in when it comes to modifying AI models. For more information on the legal questions surrounding whether or not common open source licenses work for AI, check out this OSI webinar with Mary Hardy, corporate counsel at Microsoft.

Permissive

Apache 2.0

The Apache 2.0 license is a permissive license that also includes patent grants, which can be important for AI models.

MIT

The MIT license is about as permissive as you can get, but it doesn’t include patent grants. On Hugging Face, this is the most popular license for data sets, and the second most popular for models.

Academic Free License 3.0 (AFL 3.0)

You don’t hear as much about the AFL because many consider it redundant with the more popular Apache 2.0 license, but that hasn’t stopped a large number of projects on Hugging Face from using it.

Copyleft

GNU Public License 3 (GPL3)

GPL3 does include patent grants, which makes it good for AI. Of course, the reciprocity terms of the very strongly copyleft GPL may not be good for your organization’s particular use case, AI or otherwise.

Not actually open source licenses

If the source code is available and you’re free to modify and redistribute it, it’s open source, right? Not according to the Open Source Initiative (OSI). In order to be considered “open source”, a license must meet all ten of the criteria outlined in their Open Source Definition. What Richard Stallman calls “Freedom 0”, the right for anyone to use open source software however they please, the OSI lists as their 6th criterion: “No Discrimination Against Fields of Endeavor”.

The following licenses restrict usage and are therefore not considered open source licenses—even if the projects under them might be free to use, modify, or distribute under most circumstances. Not being truly open source doesn’t mean these licenses are bad or worth avoiding, but it does mean you may need to work with your legal department to work out policies on which licenses are going to work in your products.

Use Restrictors

Llama2

Meta’s Llama2 license was created to get people active on Meta’s Llama projects while also keeping Meta’s IP from giving any competitors a boost. This license includes the use restriction that you cannot use the model or any output from it to build, train, or otherwise enhance any other LLM. 

Additionally, if your Llama2 licensed project has more than 700 million monthly users, you have to write to Meta to get approval for a new license and if they don’t grant it, your rights from the Llama2 license are revoked. Sure, 700 million is more than twice the population of the United States, but it’s a restriction nonetheless.

RAIL Family

Responsible AI Licenses (RAIL) are created and used by people who are openly in disagreement with Stallman’s Freedom 0 and OSI’s criterion 6. They believe that because AI is such a powerful technology, AI licenses must ensure that models and data sets are built, trained, and used responsibly. What exactly defines “responsible use” is up to the license writer, and you can read some specific examples here.

Generally, RAIL licenses aim to stop things like harassment and discrimination against legally protected characteristics like race and gender. But—and the specifics here depend on which license is used—they can also prohibit the development of AI applications to be used in medicine or law enforcement.

Public Domain

Creative Commons

Creative Commons (CC) licenses seek to enter a work into the public domain so others can use or create derivatives from the work without the need to pay royalties. There are many versions of the CC license, including those with non-commercial (NC) use restrictions and copyleft “sharealike” (SA) reciprocity terms. Free? Yes. Open? Usually. Source? Software? Not so much. 

CC licenses are best used for documents, images, music files, and other similar artifacts. The OSI does not consider CC licenses to be open source licenses. Even ignoring the CC-NC licenses that violate Stallman’s “Freedom 0” or OSI criterion 6, these licenses are not recommended for software because they are not written with software distribution in mind, making it legally unclear if they cover the source code (OSI criterion 2), or just the completed binary.

Top 12 Licenses on Hugging Face

LicenseNumber of HF models*OSI recognized?Permissive copyright?Commercial use?Use restriction?Patent grant?
Apache 2.097,421YesYesYesNoYes
MIT42,831YesYesYesNoNo
Open Rail Family27,919NoYesYesYesUndefined**
CreativeML – Open Rail18,631NoYesYesYesYes
CC-BY-NC 4.07,081NoYesNoNoNo
Llama25,375NoYesYes, with exceptionsYesNo
CC-BY-4.03,840NoYesYesNoNo
OpenRAIL ++2,379NoYesYesYesYes
AFL 3.02,377YesYesYesNoYes
CC-BY-NC-sa 4.02,108NoNoNoNoNo
CC-BY-SA 4.01,547NoNoYesNoNo
GPL31,483YesNoYesNoYes
*As of 5/6/2024
**Hugging Face allows tagging a model under a “family” of licenses instead of one specific license. Not all Open RAIL family licenses include patent grants but some (including OpenRAIL-M) do.

In sum

With AI, the license battle isn’t just “copyleft” vs “permissive”. You now need to consider not only how you wish to distribute (or not distribute) your software, but also how it is intended to be used. We’re not lawyers, so we can’t give you any particular advice for which licenses will be ok for your organization to work with, but we hope this guide was helpful nonetheless.

Increase visibility and control over AI models used in your applications

Recent resources

The Power of Platform-Native Consolidation in Application Security

Streamline workflows, consolidate data, boost security posture, and empower developers to focus on innovation.

Read more

What is the KEV Catalog?

A quick guide to the Known Exploited Vulnerabilities (KEV) catalog.

Read more

Application Security — The Complete Guide

Explore our application security complete guide and find key trends, testing methods, best practices, and tools to safeguard your software.

Read more