Making Data Useful

What is synthetic data?

A field guide to the various species of fake data: Part 1

Cassie Kozyrkov
5 min readJun 30, 2023

--

Synthetic data is, to put it bluntly, fake data. As in, data that’s not actually from the population you’re interested in. (Population is a technical term in data science, which I explain here.) It’s data that you’re planning to treat as if it came from the place/group you wish it came from. (It didn’t.)

Synthetic data is, to put it bluntly, fake data.

--

--

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita