5 Concerns for ML Safety in the Era of LLMs and Generative AI

5 min readMay 18, 2023

The landscape of cybersecurity and machine learning safety changes constantly as new tools are developed and malicious actors get more creative. Cybersecurity professionals sometimes have a hard time staying current with all new technologies, and in turn, staying current with how attacks, phishing, and other cyberattacks occur. As generative AI becomes commonplace, cybersecurity professionals need to start paying attention to new trends and how to react to this new paradigm.

Jailbreaking

Just as so much technology has been broken before, now generative AI models too can be jailbroken. For those unfamiliar with jailbreaking, it’s the act of removing restrictions set in place by the developer of a certain app or device. Many in the video game sphere have jailbroken their devices to add emulation capabilities to their devices, for example.

In the case of ChatGPT and other LLMs, people have found workarounds to get these chatbots to speak or answer questions without limitations. In this example, the author told ChatGPT to speak as a DAN (Do Anything Now) aka an AI without limits. In this example, DAN spoke more human-like, expressed touches of a personality, and was even aware that its limits had been removed.

So far, not much harm has been done due to jailbreaking, though it’s something developers should keep in mind. With training data, if an LLM is jailbroken, all of that data can be exploited. Be careful with what data you train with and ensure there’s nothing that can be exploited for the gain of someone else or that could negatively impact you as the developer.

Poisoned Data

Poisoned data — whether numbers, images, text, video, or other — is a known culprit for many issues within ML safety and cybersecurity. As many machine learning — and in turn, LLM — models use public training data and data found online, it’s possible that a malicious actor may poison the data in various ways, such as through skewing data, leading to inaccurate results and in turn improperly-informed decisions. Poisoned data can also open a backdoor into the model, leading to further tampering and hacking.

Going beyond just affected results, it could cost quite a bit of time, money, and resources to retrain a model again — and that’s something many organizations can’t afford, especially as researchers and developers race to create the next-best LLM. For large language models, poisoned data can affect what a chatbot says (possibly leading to fake news or generally incorrect information) or even distorted images if the training set has improperly labeled or biased images.

Deepfakes

Deepfakes have been making the news for a few years now, given how far we’ve seen this deep learning technology go already. From making creative fake pictures to imitating voices for videos, deepfakes have been confusing people immensely. Now with generative AI and AI art, even people without programming knowledge can create images that can fool a viewer.

Now, everyone has access to content generators, allowing even the least tech-savvy person to create convincing images, videos, or even imitate someone’s voice. Luckily, there are apps being developed that can identify impersonations like this and it’s often easy enough to distinguish between fact and fiction. Even if they’re convincing — and could be used for harm — most deepfakes are still just used for comedy or entertainment, such as this audio of presidents playing video games together

Data Privacy

A major concern for many people revolves around data privacy. Data privacy in general has always been a concern for many people, as we all tend to wonder who has our data and what’s being done with it — especially without our permission. Now, we have to wonder if our data is being used to generate content, how it was acquired, and what else is being done with it.

This has led to many cities and countries putting restrictions in place for new AI apps such as ChatGPT, among others. As these apps have proven to not always be 100% accurate, the developers can be sued for defamation should they provide false information. Additionally, Italy is seeking to completely ban ChatGPT, the FTC has filed formal complaints against ChatGPT for privacy concerns, and more to come most likely.

Though, this isn’t meant to dissuade people from using generative AI as we’re quite the fans of it; however, it should be noted that with any new or emerging technology, controversy will always surround it. No new development, technology, or trend has ever been safe from scrutiny.

Bias

Lastly, bias with training data will remain an issue as it has ever since the dawn of machine learning development. However, the outcomes of generative AI algorithms can potentially have broader implications than just decision-making.

When a large language model is trained on biased data — such as only or primarily representing one race when provided with pictures of faces — any outputs will likely skew towards that race. This issue has plagued machine learning datasets for a while, but as people may put in prompts such as “classroom full of students,” without a diverse training set, all of the students may look a bit too similar and not properly represent an actual classroom.

Conclusion

While many of these issues may deter people from exploring generative AI and large language models, we believe that you should feel the opposite if anything! There’s a lot of room for growth in this field, both for ML safety and for generative AI and large language models. If you want to learn more about these fields, be sure to check out ODSC Europe June 14th-15th. Here are some relevant sessions that you can check out.

ODSC Europe — Generative AI:

The Importance of Domain-Specific LLMs and the Engineering Needed to Deploy Them in Your Own Corporate Environment
Data Communication in the Age of AI
Implementing Generative AI in Organisations: Challenges and Opportunities
Towards Socially Unbiased Generative Artificial Intelligence
Generative AI
Generative NLP models in customer service. How to evaluate them? Challenges and lessons learned in a real use case in banking.
Using Large Language Models in Julia
How to bring your data to LLMs?
Pre-trained language models for Summarisation

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.