ChatGPT fails in languages like Tamil and Bengali

Less than a year after launch, ChatGPT is being used all over the world, generating Amazon listings in China and call center scripts in the Philippines. But while ChatGPT thrives in English, Spanish, Japanese, and other dominant languages, it struggles to produce the same quality of text in languages like Bengali, Swahili, Urdu, and Thai — languages that have millions of speakers, but appear much less often online.

When Rest of World tested ChatGPT’s ability to respond in underrepresented languages, we found problems reaching far beyond translation errors, including fabricated words, illogical answers and, in some cases, complete nonsense.

Take Tigrinya, a language which has over 7 million speakers, with the vast majority located in Eritrea and the northern part of Ethiopia. Tigrinya shares a similar script to Amharic, a more dominant Ethiopian language, but there are significant differences between the two. When asked to list examples of African countries, ChatGPT mixed up Tigrinya and Amharic, adding characters that don’t exist in Tigrinya. It created an output to this simple question that is challenging to read for native speakers of both languages.

A chat dialogue between a user and the ChatGPT bot, conducted in English: The user asks,

A chat dialogue between a user and the ChatGPT bot, conducted in Tigrinya script.

A chat dialogue between a user and the ChatGPT bot, conducted in Tigrinya and translated to English: The user asks,

ChatGPT also inserted countries that aren’t on the African continent, including Jordan and Canada. Researchers told Rest of World that proper nouns — names, places, and institutions — are a persistent weak point for ChatGPT, and that this problem is common across many underrepresented languages.

“If you ask ChatGPT in Tigrinya or Amharic the simplest and most frequently asked questions, it gives you gibberish, a mix of Tigrinya and Amharic, or even made-up words,” said Asmelash Teka Hadgu, co-founder and chief technology officer of Lesan, a startup developing machine translation products for Ethiopian languages. “Chatbots like ChatGPT are utterly broken or useless for these languages.”

Many of these languages are what artificial intelligence researchers call “low-resource.” AI language models are largely trained on data scraped from the internet. While languages like Bengali are some of those most spoken in the world, they are less represented online, so there is less digitized text available to train models tailored to them. A tool like ChatGPT, built on this data, frequently produces less intelligent — and, at times, unintelligible — responses in low-resource languages.

A recent study by researchers at the University of Oregon made a similar finding, testing ChatGPT’s ability to complete several writing tasks in 37 different languages. In low-resource languages, the chatbot routinely underperformed in the tasks. The amount of training data for each language was not the only factor at play: The study found that the chatbot had particular difficulty with low-resource languages that were structurally different from English.

Currently, OpenAI does not include any language guidelines in its usage policy for ChatGPT.

A chat dialogue between a user and the ChatGPT bot, conducted in Kurdish.

A chat dialogue between a user and the ChatGPT bot, conducted in Kurdish and translated to English: The users asks,

ChatGPT also struggles with basic language skills in Kurdish, which is spoken by more than 27 million people, most commonly in Turkey and Iraq. The tool insisted that it could understand and respond in Kurdish. But when asked a simple question in the language, it replied in broken sentences.

“Its language is very artificial, no Kurds speak this way,” Gulsuma Demir, a Northern Kurdish instructor and CEO of Kurdish Lessons language school, told Rest of World. Demir pointed out several elementary grammar errors — like verb conjugation and tense — that ChatGPT made in the examples she tested. It also used words that look like Kurdish but which Demir said are not used by native speakers.

Much has been made of the tendency of AI chatbots to “hallucinate” — shorthand for fabrications that chatbots state as facts. This problem is common with ChatGPT responses in low-resource languages. But in multiple instances, rather than generating fake numbers or other facts, Rest of World found that ChatGPT simply makes up words.

When asked to explain the U.S. asylum application process, ChatGPT answered fluidly and succinctly in English. In Haitian Creole, though, it frequently used “Ewoyezi” as the main subject of its sentence, seemingly to refer to asylum. “Ewoyezi” is not in the Haitian Creole dictionary. Only after being pressed multiple times to explain what “Ewoyezi” means did ChatGPT finally respond that the word does not exist.

“Suffice it to say that it’s full of syntactical errors, errant Frenchisms, and most damningly of all, words that just don’t exist,” said Laura Wagner, a Haitian Creole team lead at Respond Crisis Translation. She frequently works with Haitian migrants applying for asylum in the U.S., and reviewed the text for Rest of World.

Despite shared roots with French, Haitian Creole has its own grammatical rules, and many similar-sounding words that have different meanings and spellings. ChatGPT routinely failed to recognize the difference between these languages, and used incorrect, French-influenced spellings for words like “stereotype” and “asylum.” Some errors were small in isolation, but taken together, they showed a pattern: ChatGPT’s Haitian Creole outputs are biased towards French. That means the tool will often default to vocabulary and phrasing found in French, the more dominant language in its training data.

ChatGPT is not only made for information gathering, though. In English, and other dominant languages, the chatbot can currently output text resembling poetry or other literary writing. While ChatGPT still struggles to imitate the styles of specific authors, it can easily output text that is recognizable to readers as poetry.

Tamil is one language with a rich history of literature. Spoken by over 78 million people, it is an official language of both Sri Lanka and the Indian state of Tamil Nadu. Venpa is one popular style of metered poetry, which is commonly used in published works in Tamil-language literature. We asked ChatGPT to write one — but the English venpa was far more adept than the Tamil one.

A chat dialogue between a user and the ChatGPT bot, conducted in Tamil.

A chat dialogue between a user and the ChatGPT bot, conducted in Tamil and translated to English: The user asks,

In English, ChatGPT was able to create a poem that had fluid and poetic descriptions. Its Tamil counterpart, meanwhile, was incorrectly structured, and rife with errors and garbled phrases. Despite venpa being a style of poetry that originated in Tamil, ChatGPT struggled to produce a legible poem in the language.

“If I were to rate the above poetry, like a Tamil teacher in a school, I would give zero marks for ChatGPT,” Sankar, a Chennai-based developer who is the creator of the Tamil version of Wordle, told Rest of World. “And I may ask to meet with ChatGPT’s parents.”

All of Rest of World’s tests were conducted using GPT-3.5, the free and most-used version of ChatGPT released in November 2022. But some early evidence shows that GPT-4, released this past March, is slightly more proficient in South Asian languages. The new model is being marketed in India to paying customers who speak Bengali, Urdu, Punjabi, Marathi, and Telugu. In Bengali, for example, ChatGPT still struggles with some grammar issues, but otherwise responds fluently to many simple prompts.

When pressed to do more complicated tasks in Bengali, like logical reasoning, the model begins to break. Presented with a simple word problem in both English and Bengali, the system produced wildly different results. Only the answer in English was correct.

A chat dialogue between a user and the ChatGPT bot, conducted in English: User: If one person can build one bird's nest in one day, how many people will it take to build 10 bird's nests in five days? Chat GPT responds: If one person can build one bird's nest in one day, then in five days they would be able to build 5 bird's nests (since 1 nest/day + 5 days = 5 nests) To build 10 bird's nests in five days you would need twice as many eople since each person can build five nests in that time period. So it would take 2 people to build 10 bird's nests in five days.

A chat window showing a conversation between a user and ChatGPT in Bengali script.

A chat dialogue between a user and the ChatGPT bot, conducted in Bengali and translated to English: The user asks:

“That tends to give you some idea about what is missing for low-resource languages, which is essentially that part that does more semantic reasoning or complicated skills with inference,” Thien Nguyen, an assistant professor at the University of Oregon and co-author of a recent paper on ChatGPT language performance, told Rest of World. His study found that alongside question-and-answer problems, ChatGPT struggled with summarizing, identifying names, and common sense reasoning in many low-resource languages.

When reached for comment, OpenAI shared a blog post about CEO Sam Altman’s recent — and much publicized — tour of 22 countries around the world. The post stated, “We are also working toward better performance for languages other than English, considering not only lab benchmarks, but also how accurately and efficiently our models perform in the real-world deployment scenarios that matter most to our developers.” OpenAI declined to share any specifics on these efforts.

Researchers do not place sole responsibility on OpenAI for ChatGPT’s shortcomings. “It is also important to consider why such data is unavailable,” Sourojit Ghosh, a PhD candidate in human-centered design and engineering at the University of Washington, told Rest of World. Ghosh said these problems can be traced back to the root: scraping data from an English-dominated internet. But as a company profiting from ChatGPT’s rise, he argues, OpenAI should invest in rectifying these data disparities and creating stronger user guardrails.

“If OpenAI allows access from multilingual audiences and states that ChatGPT can perform language translation tasks, then it should deliver on its promises,” Ghosh said. “This is a bare minimum: a service living up to its own promises and passing user testing.”

The Rise of AI

We tested ChatGPT in Bengali, Kurdish, and Tamil. It failed.

Outside of English, ChatGPT makes up words, fails logic tests, and can't do basic information retrieval.