3 LLM Architectures

3 min readJul 24, 2023

Transformers form the backbone of the revolutionary Large Language Models

While LLMs like GPT4, llama2 & Falcon seem to do an excellent jobs across a variety of tasks, the performance of an LLM on a particular task is a direct result of the underlying architecture.

There are three variations of the transformer architecture that power different LLMs.

1️⃣ Autoencoders — In auto-encoders, the decoder part of the transformer is discarded after pre-training and only the encoder is used to generated the output. The widely popular BERT and RoBERTa models were based on this architecture and performed well on sentiment analysis and text classification . These models are trained using a process called MLM or masked language modeling.

2️⃣ Autoregressors — The modern LLMs like the GPT series, bloom are autoregressors. In this architecture, the decoder part is retained and the encoder part is discarded after pre-training. While text generation is the most suitable use case of autoregressors, they perform exceptionally well on a wide variety of tasks. Most modern LLMs are autoregressors. These models are trained using a process called Causal Language Modeling.

3️⃣ Sequence-to-Sequence — The genesis of the transformer models was the sequence-to-sequence models. These models have both the encoder and the decoder part and can be trained in multiple ways. One of the methods is span corruption and reconstruction. These models are best suited for language translation. The T5 and the BART family of models are sequence to sequence models.

If you’re someone who follows Generative AI and Large Language Models let’s connect on LinkedIn — https://www.linkedin.com/in/abhinav-kimothi/

Also, please read a free copy of my notes on Large Language Models — https://abhinavkimothi.gumroad.com/l/GenAILLM

Retrieval Augmented Generation - A Simple Introduction

How to make a ChatGPT or a Bard for your own data❓ The answer is in creating an organisation "knowledge brain" and use…

abhinavkimothi.gumroad.com

I write about Generative AI and Large Language Models. Please follow https://medium.com/@abhinavkimothi to read my other blogs

Context is Key: The Significance of RAG in Language Models

30th November, 2022 will be remembered as the watershed moment in artificial intelligence. OpenAI released ChatGPT and…

medium.com

Getting the Most from LLMs: Building a Knowledge Brain for Retrieval Augmented Generation

The advancements in the LLM space have been mind-boggling. However, when it comes to using LLMs in real scenarios, we…

medium.com

3 LLM Architectures

Retrieval Augmented Generation - A Simple Introduction

How to make a ChatGPT or a Bard for your own data❓ The answer is in creating an organisation "knowledge brain" and use…

Context is Key: The Significance of RAG in Language Models

30th November, 2022 will be remembered as the watershed moment in artificial intelligence. OpenAI released ChatGPT and…

Getting the Most from LLMs: Building a Knowledge Brain for Retrieval Augmented Generation

The advancements in the LLM space have been mind-boggling. However, when it comes to using LLMs in real scenarios, we…

Creating Impact: A Spotlight on 6 Practical Retrieval Augmented Generation Use Cases

In 2023, RAG has become one of the most used technique in the domain of Large Language Models. In fact, one can assume…

WRITER at MLearning.ai // Code Interpreter // Animate Midjourney

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

Written by Abhinav Kimothi