What You Need to Know About Hugging Face
Table of Contents
The risk both to and from AI models is a topic so hot it’s left the confines of security conferences and now dominates the headlines of major news sites. Indeed, the deluge of frightening hypotheticals can make AI feel like we are navigating an entirely new frontier with no compass.
And to be sure, AI poses a lot of unique challenges to security, but remember: Both the media and AI companies have a vested interest in upping the fright hype to keep people talking. For those of us dinosaurs who’ve been around for a while, the problems organizations are facing with AI feel in many ways fairly similar to when open source software started to hit it big.
What is Hugging Face?
Creating AI models, including large language models (LLMs), from scratch is extremely costly, so most organizations rely on existing models. Much in the way they look to GitHub for open source modules, developers who want to build with AI head to Hugging Face, a platform that currently hosts over 350,000 pre-trained models and 75,000 data sets—with more on the way. Those AI models use licenses just like open source software does (many of them the same licenses), and you’ll want to make sure your AI models use commercial-friendly ones.
So when it comes to risk, the first thing organizations need to know is what open source AI models developers have put into their code base. Conceptually it’s a simple task, but if your organization has enough code, finding AI models specifically can be less than straightforward. (I didn’t mean this to be a product promoting blog but I have to mention that we solved that problem already.) Once you know what you have, where it comes from, what version you’re on, and so forth, you can make adjustments and keep up with any security notices that might come up, same as with open source code.
But from there, I must admit, we definitely diverge from the classic problems of open source software. By their very nature, AI models are opaque. They might be open source-like in that you’re free to use and distribute them, but you can’t really see and understand the source. Because drilling down into AI models is nearly impossible, organizations that work with AI models are going to need to do a lot more threat modeling and penetration testing.
The next chapter of risks
The risks of open source vulnerabilities will still exist throughout your applications; AI adds some twists to the classics and throws in a few new ones on top of that. Just a few examples:
Risk of data exfiltration (AI version). Goodbye SQL injection, hello prompt injection. Can an attacker interfacing with your AI model use plain language prompts to get it to divulge training data which may include sensitive information?
Risk of bias. Does your AI model include biases that could end up causing discrimination against people based on immutable characteristics? Regulators sure don’t like that.
Risk of poisoning/sabotage. Can an attacker use a poisoned data set against your AI to make it perform incorrectly across the board or with specific targeted interest? For instance, could it register a specific face as, say, an innocent houseplant? (I’m looking forward to all of the heist films that will inevitably use this in their plots.) Artists are already using this concept to protect their copyrighted works from image-generating AI models.
Risk of being just plain wrong. There’s a lot of ways AI can hand you seemingly good output that comes back to bite you. Just one example: if you use an AI chatbot in place of live support on your website, they might give your customers bad information. And you might be on the hook for that.
Again, this list is non-exhaustive. There are many other threats, like the power of AI being wielded for creating phishing campaigns or writing malware. There are many, many ways that using AI in an application increases the attack surface.
Reading ahead for solutions
As a technology, AI is rather immature. It’s still early days, which means huge susceptibility to disruptions in the field and in the marketplace. No one knows precisely where this is all heading, in tech or in the courts.
For now, we’ll have to all stay diligent and keep our ears open. One thing for sure, you cannot keep your head in the sand and ignore AI. You have to know what AI models you have and keep them updated. From there, frameworks like MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) are an excellent place to start for threat modeling.
AI threats won’t be the end of the world, as long as we stay alert.