Quick Guide to Popular AI Licenses
Table of Contents
Only about 35 percent of the models on Hugging Face bear any license at all. Of those that do, roughly 60 percent fall under traditional open source licenses. But while the majority of licensed AI models may be open source, some very large projects–including Midjourney, BLOOM, and LLaMa—fall under that remaining 40 percent category. So let’s take a look at some of the top AI model licenses on Hugging Face, including the most popular open source and not-so-open source licenses.
Open source licenses
Ahhh… refreshing, timeless open source licenses. We know ’em, we usually love ’em, but it’s actually hard to say how well these licenses transition from covering traditional applications—which are distributed as binaries, source code, or both—to covering AI models, which are usually shared as the model binary plus training weights that essentially act as configuration files.
The thing that makes existing open source licenses tricky for AI is that they define their terms around source code, and it is not the source code that people are generally interested in when it comes to modifying AI models. For more information on the legal questions surrounding whether or not common open source licenses work for AI, check out this OSI webinar with Mary Hardy, corporate counsel at Microsoft.
Permissive
Apache 2.0
The Apache 2.0 license is a permissive license that also includes patent grants, which can be important for AI models.
MIT
The MIT license is about as permissive as you can get, but it doesn’t include patent grants. On Hugging Face, this is the most popular license for data sets, and the second most popular for models.
Academic Free License 3.0 (AFL 3.0)
You don’t hear as much about the AFL because many consider it redundant with the more popular Apache 2.0 license, but that hasn’t stopped a large number of projects on Hugging Face from using it.
Copyleft
GNU Public License 3 (GPL3)
GPL3 does include patent grants, which makes it good for AI. Of course, the reciprocity terms of the very strongly copyleft GPL may not be good for your organization’s particular use case, AI or otherwise.
Not actually open source licenses
If the source code is available and you’re free to modify and redistribute it, it’s open source, right? Not according to the Open Source Initiative (OSI). In order to be considered “open source”, a license must meet all ten of the criteria outlined in their Open Source Definition. What Richard Stallman calls “Freedom 0”, the right for anyone to use open source software however they please, the OSI lists as their 6th criterion: “No Discrimination Against Fields of Endeavor”.
The following licenses restrict usage and are therefore not considered open source licenses—even if the projects under them might be free to use, modify, or distribute under most circumstances. Not being truly open source doesn’t mean these licenses are bad or worth avoiding, but it does mean you may need to work with your legal department to work out policies on which licenses are going to work in your products.
Use Restrictors
Llama2
Meta’s Llama2 license was created to get people active on Meta’s Llama projects while also keeping Meta’s IP from giving any competitors a boost. This license includes the use restriction that you cannot use the model or any output from it to build, train, or otherwise enhance any other LLM.
Additionally, if your Llama2 licensed project has more than 700 million monthly users, you have to write to Meta to get approval for a new license and if they don’t grant it, your rights from the Llama2 license are revoked. Sure, 700 million is more than twice the population of the United States, but it’s a restriction nonetheless.
RAIL Family
Responsible AI Licenses (RAIL) are created and used by people who are openly in disagreement with Stallman’s Freedom 0 and OSI’s criterion 6. They believe that because AI is such a powerful technology, AI licenses must ensure that models and data sets are built, trained, and used responsibly. What exactly defines “responsible use” is up to the license writer, and you can read some specific examples here.
Generally, RAIL licenses aim to stop things like harassment and discrimination against legally protected characteristics like race and gender. But—and the specifics here depend on which license is used—they can also prohibit the development of AI applications to be used in medicine or law enforcement.
Public Domain
Creative Commons
Creative Commons (CC) licenses seek to enter a work into the public domain so others can use or create derivatives from the work without the need to pay royalties. There are many versions of the CC license, including those with non-commercial (NC) use restrictions and copyleft “sharealike” (SA) reciprocity terms. Free? Yes. Open? Usually. Source? Software? Not so much.
CC licenses are best used for documents, images, music files, and other similar artifacts. The OSI does not consider CC licenses to be open source licenses. Even ignoring the CC-NC licenses that violate Stallman’s “Freedom 0” or OSI criterion 6, these licenses are not recommended for software because they are not written with software distribution in mind, making it legally unclear if they cover the source code (OSI criterion 2), or just the completed binary.
Top 12 Licenses on Hugging Face
License | Number of HF models* | OSI recognized? | Permissive copyright? | Commercial use? | Use restriction? | Patent grant? |
---|---|---|---|---|---|---|
Apache 2.0 | 97,421 | Yes | Yes | Yes | No | Yes |
MIT | 42,831 | Yes | Yes | Yes | No | No |
Open Rail Family | 27,919 | No | Yes | Yes | Yes | Undefined** |
CreativeML – Open Rail | 18,631 | No | Yes | Yes | Yes | Yes |
CC-BY-NC 4.0 | 7,081 | No | Yes | No | No | No |
Llama2 | 5,375 | No | Yes | Yes, with exceptions | Yes | No |
CC-BY-4.0 | 3,840 | No | Yes | Yes | No | No |
OpenRAIL ++ | 2,379 | No | Yes | Yes | Yes | Yes |
AFL 3.0 | 2,377 | Yes | Yes | Yes | No | Yes |
CC-BY-NC-sa 4.0 | 2,108 | No | No | No | No | No |
CC-BY-SA 4.0 | 1,547 | No | No | Yes | No | No |
GPL3 | 1,483 | Yes | No | Yes | No | Yes |
**Hugging Face allows tagging a model under a “family” of licenses instead of one specific license. Not all Open RAIL family licenses include patent grants but some (including OpenRAIL-M) do.
In sum
With AI, the license battle isn’t just “copyleft” vs “permissive”. You now need to consider not only how you wish to distribute (or not distribute) your software, but also how it is intended to be used. We’re not lawyers, so we can’t give you any particular advice for which licenses will be ok for your organization to work with, but we hope this guide was helpful nonetheless.