‘Don’t use AI detectors for anything important,’ says the author of the definitive ‘AI Weirdness’ blog. Her own book failed the test

Janelle Shane of AI Weirdness has been an authoritative voice on A.I. for nearly a decade.

Phil Walter/Getty Images for NZTE

Years before the rest of us heard about A.I., Janelle Shane was on the case. A scientist with a doctoral degree in engineering, she works for a research company that supplies complex, custom light control systems to organizations like NASA, with applications in the International Space Station, and she’s had a side hustle as an authoritative voice on A.I. for nearly a decade. In 2019 she released a popular book, You Look Like a Thing and I Love You, and gave a widely watched TED talk about the hype and realities of A.I. And since 2016, she’s been keeping a blog called AI Weirdness, and she just came down with a post about how A.I. detection tools shouldn’t be used for anything important, like, not at all.

In a blog post dated June 30, she highlighted an April study from Stanford University showing that A.I.-detection products, which are supposed to spot text written by a language model like ChatGPT, often incorrectly identify A.I.-generated content as being written by a human. They’re also strongly biased against non-native English-speaking writers.

The finding, in brief, is that A.I. detection tools are failing to spot content created by generative A.I., and the flawed technology especially flags and punishes content written by non-native English-speaking writers. In other words, A.I. thinks other A.I. is human, and it thinks humans writing in a language they’re not proficient in are A.I.

“What does this mean?” Shane writes. “Assuming they know of the existence of GPT detectors, a student who uses AI to write or reword their essay is LESS likely to be flagged as a cheater than a student who never used AI at all.”

Shane’s conclusion was stated in the headline of her blog post: “Don’t use AI detectors for anything important.”

“If you spell it out kind of clearly, it becomes so obvious that these tools have problems,” Shane told Fortune in an interview.

If these problems are intentionally or unintentionally being used to single out individuals, then maybe this isn’t a tool that should exist, she added.

Her post discussed how the study found tools such as Originality.ai, Quill.org, and Sapling GPT, which are used to detect text written by an A.I. language generator, were “misclassifying writing by non-native English speakers” as A.I.-generated between 48% and 76% of the time, compared to 0% to 12% for native English speakers.

In simpler terms, A.I. detectors often label non-native English speakers’ writing as A.I.-written, according to the study. It’s also easy to trick A.I. detectors into thinking content is human-written if the ChatGPT prompt modifies existing language—for example, inputting “elevate the provided text by employing literary language” into ChatGPT, according to the study.

“We strongly caution against the use of GPT detectors in evaluative or educational settings, particularly when assessing the work of non-native English speakers. The high rate of false positives for non-native English writing samples identified in our study highlights the potential for unjust consequences and the risk of exacerbating existing biases against these individuals,” the authors of the study write.

“In some cases, AI is being used to decide really important things that really strongly affect people’s future, who gets a loan or who gets parole, and when these algorithms are wrong, or when they’re biased and can really hurt people, it can be a reason to not use AI in those applications,” Shane said.

Shane puts the detectors to the test

Shane wrote that she entered a portion of her own book into two detectors. One detector rated Shane’s writing as “very likely to be AI-written” and the other, which was used in the Stanford study with a supposedly 0% false positive rate for native speakers, rated her writing as “moderately likely” to have been written by A.I. The sentence that was found “most likely to be written by a human” described an allegorical sandwich’s ingredients of “jam, ice cubes, and old socks,” Shane writes in the post.

When she prompted ChatGPT to “elevate the following text by employing literary language,” it spits out a bizarre, wordy speech—one full of words like “interlocutor,” “forsaken hosiery,” and “sojourn.”

The A.I. detection tools give that kind of writing a “likely written entirely by human” rating.

Shane then asks ChatGPT to rewrite her original test as a Dr. Seuss poem and in Old English. How did A.I. detection tools rate the passages? They were rated as “more likely human-written” than the untouched text from her published book.

The responsibility is on people who are marketing these detectors to honestly look into the rate of misidentifying text and at what the consequences of that are, Shane told Fortune.

It’s already an issue in schools, which are likely using cheating prevention measures like these tools. In one known case, AI detection tools punished a student at the University of California, with others likely facing similar undeserved repercussions.

When she posted the blog entry, Shane said it received strong interaction on Tumblr, with more than 5,500 users agreeing with how problematic and frustrating these tools can be. “It was sort of disheartening to see, once I posted a version of that on Tumblr, I got all these re-blogs and comments from people who had their papers flagged and they knew they hadn’t cheated, just to see how widespread the use seems to be and how it’s affecting the students right now,” she said.

Whether or not using ChatGPT is defined as plagiarism, it’s certainly a new field of concern for linguists, professors, and writers alike. Even though there are limitations to these studies, both Shane and the Stanford researchers call for action.

“Ongoing research into alternative, more sophisticated detection methods, less vulnerable to circumvention strategies, is essential to ensure accurate content identification and fair evaluation of non-native English authors’ contributions to broader discourse,” the authors of the study write.

[This article has been updated to include comments from an interview with Janelle Shane.]

Subscribe to the Eye on AI newsletter to stay abreast of how AI is shaping the future of business. Sign up for free.

‘Don’t use AI detectors for anything important,’ says the author of the definitive ‘AI Weirdness’ blog. Her own book failed the test

Shane puts the detectors to the test

Latest in Tech

Super Micro rides the AI wave to a Fortune 500 debut

Mourners can now speak to AI versions of dead friends and relatives, but researchers wonder what it means when mourning is ‘fully integrated into the capitalist market’

DOJ tells court that releasing audio of Biden interview with special counsel could spur deepfakes that trick Americans, highlighting AI worries

Russian disinformation group—known by Microsoft as Storm-1679—uses fake Tom Cruise Netflix documentary to disrupt Paris Olympics

As Shein’s IPO approaches, what will it mean for the ultra-cheap online retailer and for London?

‘Orbiting’ is the latest dating nightmare fueling Gen Z’s disillusionment with finding love on the apps

Most Popular

Media mogul Rupert Murdoch, 93, ties the knot for the 5th time, marrying the ex-wife of a billionaire energy investor and Russian politician

Rachel Romer built a $4.4 billion education unicorn by 34—then she had a stroke. Now her CEO successor reckons with Guild’s new chapter

Managers are puzzled by Gen Zers as giving feedback becomes a lost art in the era of the ‘coddled mind’

Elon Musk accused of selling $7.5 billion of Tesla stock before releasing disappointing sales data that plunged the share price to two-year low

Home prices could start dropping this summer, marking the ‘first domino to fall’ on the way to a weaker economy, strategist says

NYC’s Eric Adams, who is waging all-out war against rats, get ticketed for 5th rodent violation on his property since becoming mayor