Google slapped with a lawsuit for 'secretly stealing' data to train Bard

From the same law firm that's also suing OpenAI for data theft.
By Cecily Mauran  on 
scales of justice against a backdrop of binary code
Another lawsuit involving AI and data theft. Credit: Getty Images

A California law firm has filed a class-action lawsuit against Google for "secretly stealing" vast amounts of data from the web to train its AI technologies.

Clarkson Law Firm is suing the tech giant for negligence, invasion of privacy, larceny, copyright infringement, and profiting from personal data that was illegally obtained. "Google has taken all our personal and professional information, our creative and copywritten works, our photographs, and even our emails—virtually the entirety of our digital footprint—and is using it to build commercial Artificial Intelligence ('AI') Products like 'Bard,'" said the complaint, which was filed on July 11 in the Northern District of California.

The lawsuit comes on the heels of Google quietly updating its privacy policy last week, claiming any public information can be used to train its AI products like Bard. Google is essentially saying anything published on the web is fair game, but the law firm believes this is a massive invasion of privacy, by scraping data without compensation or consent for the express reason of training AI models. The lawsuit alleges that Google, a multi-billion dollar company with over a billion users worldwide, is putting users in an "untenable" position: "either use the internet and surrender all your personal and copyrighted information to Google’s insatiable AI models — or avoid the internet entirely."

In a statement to Reuters, Google general counsel Halimah DeLaine Prado called the claims "baseless," saying, "we use data from public sources — like information published to the open web and public datasets – to train the AI models behind services like Google Translate, responsibly and in line with our AI Principles."

Mashable Light Speed
Want more out-of-this world tech, space and science stories?
Sign up for Mashable's weekly Light Speed newsletter.
By signing up you agree to our Terms of Use and Privacy Policy.
Thanks for signing up!

Recently, Clarkson filed a similar class-action lawsuit against OpenAI, the company that created ChatGPT, for "theft and misappropriation of personal data," using the same kind of data-scraping operation. Large language models need huge amounts of data to train AI chatbots and make them conversational and intelligent. Both Bard and ChatGPT rely on large language models to work, which has raised concerns about use of private data as well as copyright infringement.

The most recent lawsuit says Google has misappropriated datasets like the Common Crawl, a non-profit, which makes its data free for research and education purposes, as well as data from sites like Medium, and Kickstarter. Google also uses its own data from Gmail and Google Search to feed its models. Other data scraped includes copyrighted works like e-books in digital libraries, and even from piracy websites, that the company is using without compensating artists and authors.

The key to Clarkson's lawsuit is the issue of public domain. But, "'publicly available' has never meant free to use for any purpose," the complaint said. Yes, some data or available to purchase, but it depends on the context of their use and user consent. Yes, users consent to privacy policies when they publish content on the web, but they have a right to know if it's being used somewhere else. In other words, Clarkson says, "Google must understand, once and for all: it does not own the internet."

Topics Google Privacy

Mashable Image
Cecily Mauran

Cecily is a tech reporter at Mashable who covers AI, Apple, and emerging tech trends. Before getting her master's degree at Columbia Journalism School, she spent several years working with startups and social impact businesses for Unreasonable Group and B Lab. Before that, she co-founded a startup consulting business for emerging entrepreneurial hubs in South America, Europe, and Asia. You can find her on Twitter at @cecily_mauran.


Recommended For You
Users get a taste of Google's AI search results, unprompted
A screenshot of the Google Search homepage.

How to turn off Google AI Overviews
A smartphone showing the Goole and Gemini logos

Gemini Nano can detect scam calls for you
Google Gemini Nano

Google I/O: Google announces new safety framework for responsible AI
A man stands in front of a large screen that reads "Responsible AI: Human insight and safety testing."

Google injects still more AI into Google Docs and other Workspace apps
A woman on a stage in front of a screen with the words Gemini for Workspace

Trending on Mashable
NYT Connections today: See hints and answers for May 20
A phone displaying the New York Times game 'Connections.'


'Wordle' today: Here's the answer hints for May 20
a phone displaying Wordle

NYT Connections today: See hints and answers for May 19
A phone displaying the New York Times game 'Connections.'

The biggest stories of the day delivered to your inbox.
This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.
Thanks for signing up. See you at your inbox!