Microsoft AI Research Division Accidentally Leaks 38TB of Internal Data

Microsoft’s artificial intelligence research division accidentally leaked 38TB of internal sensitive data via GitHub, according to a cybersecurity startup. The compromised data includes passwords, access keys, computer backups, and thousands of Microsoft Teams messages.

Wiz, a New York City-based cloud security company, spotted the exposed information back in June. Wiz regularly researches how companies accidentally expose their cloud-hosted data, and during one of those dives, it located a GitHub repository belonging to Microsoft. Called “robust-models-transfer,” the repository provides AI researchers with open-source code and AI models for image recognition. Anyone who visits the repository is instructed to download models using an Azure Storage link—but at the time of Wiz’s research, the URL was configured to provide access to Microsoft’s entire Azure Storage account.

In a blog post published Monday, Wiz says it found 38TB of internal data using Microsoft’s misconfigured link. This included “passwords to Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from 359 Microsoft employees,” as well as backups to two employees’ personal computers. Worse, the SAS token—a string generated within Azure to provide access to storage resources—was configured to grant visitors full control permissions instead of read-only. As a result, threat actors who used the link could delete and overwrite existing Microsoft files as they wished.

How a SAS token is configured. Credit: Wiz

Wiz believes the SAS token had been misconfigured since July 2020, when it was first committed to GitHub. On June 22, 2023, Wiz submitted its findings to Microsoft, which invalidated the SAS token two days later. Microsoft replaced the token on July 7, though it took until mid-August to complete its investigation into the leak’s potential impact.

“No customer data was exposed, and no other internal services were put at risk because of this issue,” Microsoft’s Security Response Center wrote in a statement published Monday. “The root cause issue for this has been fixed, and the system is now confirmed to be detecting and properly reporting on all over-provisioned SAS tokens.” No customer action is necessary following the exposure.

Microsoft has reportedly increased its usage of GitHub’s secret scanning service, which regularly looks for sensitive data that might have been committed to the site accidentally. Should the service locate a Microsoft secret, it will inform the company immediately.

Microsoft AI Research Division Accidentally Leaks 38TB of Internal Data

Tagged In

More from Internet & Security