A Gentle Introduction to GPTs

You don’t need to have a PhD to understand the billion parameter language model

Harsha S
4 min readFeb 19, 2023

GPT is a general-purpose natural language processing model that revolutionized the landscape of AI. It is one of the significant milestones in the journey of complex AI systems.

What is GPT-3?

GPT-3 is a autoregressive language model created by OpenAI, released in 2020 . It’s a statistical model that generates human like fluent text, based on the probability distribution range over a sequence of words. It is intuitive to think of GPT-3 as a model that throws up the next word when an initial text is given as input, based on the context.

OpenAI’s research paper on the GPT-3, “Language Models are Few-Shot Learners” was released in May 2020 and it outlined the fact that state-of-the-art GPT-3 generated text is nearly indistinguishable to that of the text written by humans. Along with text generation it can also be used to text classification and text summarization.

The 175 billion parameter is capable of some incredible real-world applications, we’ll get to it soon. But let’s a briefly touch upon Natural Language Processing and how Language Models work.

Natural Language Processing (NLP)

NLP is subset of Artificial Intelligence that is concerned with helping machines to understand the human language. It combines techniques from computational linguistics, probabilistic modeling, deep learning to make computers intelligent enough to grasp the context and the intent of the language.

NLP encompasses different types of language processing tasks like content generation, sentiment analysis, speech recognition, machine translation, text generation, text summarization and Question Answering.

NLP is a branch of AI that focuses on the use of natural human language for various computing applications. NLP is a broad category that encompasses many different types of language processing tasks, including sentiment analysis, speech recognition, machine translation, text generation, and text summarization, to name a few.

As the language is a part of every interaction humans have with each other and with computers, NLP is a field that has seen some of the most exciting AI discoveries and implementations of the past decade.

Language Models

Language models come in different types and are a key component in the NLP applications. Language modeling is simply the task of assigning a probability to a sequence of words in a text in a specific language.

Simple statistical language models, e.g., n-grams can look at a word and predict the next word most likely to follow it, based on statistical analysis of existing text sequences.

Different N-grams or word combinations can be used to build a simple probabilistic language model. This is possible by counting the number of times each word combination occurs and then dividing it by the total number of times the previous word occurs.

The auto-complete feature on your smartphone is based on this principle. When you type “how”, the auto-complete will suggest words like “to” or “are”.

However, to predict the likelihood of a sequence of words, you need to train it on large sets of data. They should be robust enough to deal with large vocabularies are be able to capture the context better. Neural Language Models such as RNNs and Transformer networks are well suited as they have more complex language structure.

GPT-3 as a Large Language Model (LLM)

Language models were designed to perform one specific NLP task, such as text generation, summarization, or classification. GPT-3 changed this paradigm as it was built to be a a single system could successfully handle any NLP task.

The objective of GPT-3 is to be an multi-purpose system that can provide SOTA(state-of-the-art) performance on any NLP task. It’s architecture can handle a array of natural language processing tasks. But it gained more prominence for being a model that generate human like text for a given set of words as input.

GPT-3 is a successor to the earlier GPT-2 (released in Feb 2019) and GPT-1 (released in June 2018) models . As explained earlier, to get a better and robust model it has to be trained on large dataset. That’s the precise difference between GPT-3 and its predecessors.

GPT-3 was trained with a massive dataset comprising of corpus of text from five datasets: Common Crawl, WebText2, Books1, Books2, and Wikipedia. That is roughly 57 billion words and 175 billion parameters.

Along with English, GPT-3 can also generate text in European languages like German, Spanish, French and Asian language Japanese.

Applications of GPT-3

An average human might read, write, speak, and hear somewhere close to a billion words in an entire lifetime. GPT-3 has been trained on an approx. 57 times the number of words most humans will ever process. These mindboggling numbers just demonstrate the power of the GPT-3 model.

Here are a few amazing applications of GPT-3, where the model can be fine-tuned to serve the purpose of the application. It is capable of generating content like blogs, articles to helping programmers in code generation.

This was the brief introduction to the GPT-3 and its workings. Hope you like it!

BECOME a WRITER at MLearning.ai

--

--

Harsha S

NLP Engineer | I love to write about AI in beginner way