The Turing Test for AI Is Far Beyond Obsolete

For more than 70 years, the Turing Test has been a popular benchmark for analyzing the intelligence of computers.
For nearly a decade, programmers have created AI reportedly beating the Turing Test while experts argue that test is an imperfect benchmark of "true" intelligence.
Many tests and benchmarks have been proposed as a replacement with the latest proposal, called the AI Classification Framework, aiming many different types of intelligence beyond language and mathematics

First proposed in 1950, the “Turing Test”—named after renowned British computer scientist Alan Turing—is a hypothetical framework to test the intelligence of an AI system. In this “imitation game,” as Turing originally described it, a human participant blindly asks questions to both a human and a computer. If the computer successfully tricks the questioner into thinking it's a human, then it has passed the Turing's test.

The first attempt at passing the test came in the mid-1960s when computer programmers designed a chatbot named Eliza to mimic a psychologist, and in 2014, the first AI to reportedly pass the test (this is debated) was Eugene Goostman, a program designed to simulate the responses of a 15-year-old Ukrainian boy.

In the decade since, many more programs have purported to pass the Turing test. Most recently, Google’s AI LaMDA passed the test and even controversially convinced a Google engineer that it was “sentient.”

However, some argue that the test is far from perfect. Using language as a test for neural network’s “intelligence” makes sense to some degree, as it is one of the hardest things for an AI system to imitate. But the main criticism is that it ignores several other facets of “intelligence” that are just as critical as a human’s language ability, and many chatbots have been designed specifically to fool people into thinking they're human. Eugene Goostman, for example, was designed so that English was the chatbot's second language, effectively hiding some of its awkward responses.

Proposals for amending of even replacing the Turing Test with something that more accurately captures true intelligence have been around for years. Just this week, a new test called the AI Classification Framework essentially makes the Turing Test and its language-testing ability only one part of an 8-part evaluation of an AI’s general intelligence.

Chris Saad, former head of product development at Uber and designer of the framework, took inspiration from 1983’s The Theory of Multiple Intelligences, an idea by psychologist Howard Gardner that intelligence isn’t just a monolithic construction but a tapestry of 8 separate intelligences.

These include logical-mathematical, linguistic-verbal, visual-spatial, musical-rhythmic, bodily-kinesthetic, interpersonal, intrapersonal, and existential. From there, AI is rated one each intelligence category in a 1 to 5 scale—"1" being essentially non-existent or infant-like and "5" being super intelligence.

“The theory challenged the traditional view of intelligence as a singular, fixed entity and opened up new avenues for exploring the diversity of human cognition,” Saad writes on TechCrunch. “While the theory of multiple intelligences has been subject to some criticism and debate over the years, it has had a significant impact on the field of psychology and education.”

Using this AI framework as applied to the hit chatbot of the moment, ChatGPT clearly displays average human intelligence when it comes to logical-mathematical and linguistic-verbal intelligence, but essentially scores a N/A on everything else. While some have already argued that the chatbot has passed the Turing Test, under this new framework, ChatGPT has a long way to go before being considered truly “intelligent.”

It’s becoming increasingly clear that the AI of today is outgrowing a test designed in an era when the power and sophistication of today’s computers were completely unimaginable. Maybe today’s AI truly can pass the Turing Test, but it has a lot of studying to do if its wants to pass the final exam called human consciousness.

Darren Orf

Darren lives in Portland, has a cat, and writes/edits about sci-fi and how our world works. You can find his previous stuff at Gizmodo and Paste if you look hard enough.