The Generative AI List of Lists: 5000 Models, Tools, Technologies, Applications, & Prompts

Maximilian Vogel
16 min readJan 16, 2024

A curated list of resources on generative AI. Updated April 20, 2024 — new models, new resources

This huge beast called Generative AI

A gentle whisper for the model, a booming wake-up call for humanity: the very first response from the freshly published ChatGPT on November 30, 2022 made it clear to everyone: Generative AI is here! And it will change everything.

Let us dive into the wild world of genAI. Each section of this story comprises a discussion of the topic plus a curated list of resources, sometimes containing sites with more lists of resources:
20+: What is Generative AI?
95x:
Generative AI history
600+:
Key Technological Concepts
2,350+: Models
& Mediums — Text, Image, Video, Sound, Code, etc.
350x: Application Areas
, Companies, Startups
3,000+: Prompts
, Prompt Engineering, & Prompt Lists
250+: Hardware, Frameworks
, Approaches, Tools, & Data
300+:
Achievements, Impacts on Society, AI Regulation, & Outlook

20+: What is Generative AI?

Let’s play the comparison game. If classic AI is the wise owl, generative AI is the wiser owl with a paintbrush and a knack for writing. Traditional AI can recognize, classify, and cluster, but not generate the data it is trained on. Plus, classic AI models are usually focused on a single task. Their generative sisters, on the other hand, are pre-trained on giant amounts of data from a wide range of domains. They can build up general knowledge and use it to generate almost any output in their specific medium (text, image, sound, or other).

Traditional computing vs. traditional AI vs. generative AI

95x Generative AI history

Generative AI has a pretty history, with early theories emerging from Leibniz, Pascal, Babbage, Lovelace. This was even preceded by the development of so-called automatons (robots and calculating machines) of all sorts (Yan Shi, Ctesibius, Heron of Alexandria, the Banū Mūsā brothers, Ismail_al-Jazari).

The mathematical groundwork was laid in the 1940s and 1950s (Shannon, Turing, von Neumann, Wiener). The foundations for today’s generative language models were elaborated in the 1990s (LSTM, Hochreiter, Schmidhuber), and the whole field took off around 2018 (Radford, Devlin, et al.). Major milestones in the last few years comprised BERT (Google, 2018), GPT-3 (OpenAI, 2020), Dall-E (OpenAI, 2021), Stable Diffusion (Stability AI, LMU Munich, 2022), ChatGPT (OpenAI, 2022), Mixtral (Mistral, 2023), a Mixture of Experts LLM.

1x: Evolution of Generative AI
1x: Timeline of Generative AI
1x: Exciting, amazing, and sometimes a little bit spooky: Early precursors of LLMs
2x: AI timeline, and some striking data visualizations
90+: Current and past notable AI projects

Evolutionary Tree of Large Language Models: From Word2Vec to GPT-4. Image credit: Yang, Jingfeng et. al

600+: Key technology concepts of generative AI

300+: Deep Learning — the core of any generative AI model:

Deep learning is a central concept of traditional AI that has been adopted and further developed in generative AI. Complex ML problems can only be solved in neural networks with many layers. Incidentally, this also applies to cognitive processes and the brains of mammals (yes, that means us).

Deep learning neural network. Image credit (CC): BrunelloN

In an artificial neural network, a node represents a neuron, and a connection between nodes is a synapse, which unidirectionally transports information. Generative AI models usually have millions of neurons and billions of synapses (aka „parameters“). Current models do not use neurons built from silicon, but work with traditional computing algorithms with more or less traditional hardware (sometimes CPUs, usually GPUs/TPUs). In the code, the complete deep learning network is represented as a matrix of weights. And — yes I’m trying to finally demystify generative AI — both learning and answer generation in all the magic models like ChatGPT can ultimately be broken down to matrix multiplicationgood old high school algebra. Just much, much more of it, executed at lightning speed.

Matrix multiplication

More on deep learning and neural network technology:
7x: Best machine learning courses
300+: Deep learning resources: Books, courses, lectures, papers, datasets, conferences, frameworks, tools by Christos Christofidis

200+: Foundation models, pre-training, fine-tuning & prompting

Generative AI is based on a foundation models. Foundation models are huge models (billions of parameters), pre-trained on giant datasets (GB or TB of data), and capable of performing an infinite number of tasks in their domain (text or image generation). Datasets for pre-training usually comprise all genres of data in the domain. For text: scientific papers, haikus, spreadsheets, encyclopedic contents, dialogs, laws, manuals, invoices, screenplays, textbooks, or novels. The pre-trained model is comparable to a super smart and knowledgeable high school graduate with a lot of basic knowledge and the ability to understand many languages but with no specific qualification for a job. To prepare a model for a specific job, like answering questions in a support hotline for a specific product, you may use fine-tuning: an additional training with a small dataset of contents for the specific task. More often than not, you just use the prompt to specify the job, provide data for the job, and format the response.

Pre-training, fine-tuning and prompting of large language models. Fine-tuning can be skipped for many tasks; pre-training and prompting is a must. Image Credit: Maximilian Vogel, Brain icon by Freepik

1x: Empowering Language Models: Pre-training, Fine-Tuning, and In-Context Learning
1x: Great video intro to GPT (General Pre-Trained-Transformer) technology by Grant Sanderson of 3Blue1Brown1x: Deep dive into pre-training LLMs by Yash Bhaskar
1x: Tutorial: Fine-tune a large language model with code examples
200+: A curated list on fine-tuning resources
(see resources on prompting later in the story)

120+: Tokens, embeddings & vectors

ChatGPT is letter blind

Oh, that is not correct; it contains 89 characters including spaces and punctuation — not 83. Why does the smartest bot on earth fail on this simple counting task? A seven-year-old could do better!

ChatGPT, like any other language model, does not understand language, text, or characters. The ChatGPT model does not even get to see my prompt:

The prompt is first being split up („tokenized“) into these 19 tokens:

Tokenization: Each coloured rectangle is one token.

Common English words are not split; they are single tokens. Less common words („ChatGPT“ was not common in the training material before the release of ChatGPT) and misspelled words („inlcuding“) are comprised of two or more tokens.

Each model uses a constant vocabulary of tokens. Each token is then transformed into an embedding, a high-dimensional vector (often more than 1,000 dimensions), before the models get to see it. Embeddings represent the semantic value of a token. For semantically similar tokens like king, queen, and prince, the vectors should be close together. Similarly spelled tokens like prince, price, and prance are not close if they have no semantic similarity. The embeddings are machine-generated based on the word or token neighbours in texts, not by a human meta-explanation of what a word means. So „king“ could be both close to „throne“ and „checkmate“ based on these two contexts in English texts. After the prompt is transformed into a sequence of embeddings — high dimensional vectors representing tokens — these embeddings are fed to the language model and can then be processed.

Words, vectors, and embeddings. Here is a visualization with low-dimensional vectors. In reality, vectors in LLMs comprise hundreds of dimensions. Something hard to imagine for mere mortals. Source (CC): https://doi.org/10.1371/journal.pone.0231189.g008

The model doesn’t generate a full answer to these embeddings. No, no, no! It just generates (in ML lingo „predicts“) the next token. After that, it takes the embeddings of the prompt and the first predicted token and predicts the second token of its answer … and so on.

Large language models generate a response token by token. For the model, its previously generated tokens are no different from the initial user input; they are all just input for the next generation step.

In the process of the generation of token after token, the models usually don’t know (and don’t need to know) where their own contribution to the ongoing flow of text really started. In my view, this is one of the weirdest features of the LLM technology.

More on tokens, shmokens and embeddings:
1x: Word embedding tutorial
1x: Tokenization and token production explained by Bea Stollnitz
1x: Deep dive into the LLM architecture with a particular focus on tokens and embeddings by Vijayarajan Alagumalai
1x: Dense, sparse vectors & embeddings with code examples explained by James Briggs
120+: Embeddings in comparison, MTEB

10+: The transformer architecture

Almost all relevant language models are based on a technology called the transformer architecture. It would have been a great honour and even greater pleasure for me to discuss it here. Unfortunately, any attempt to describe it has exceeded the scope of this introduction to gen AI.

1x: The key differentiator in transformer models: The attention mechanismum. Super tricky, super well explained by Grant Sanderson of 3Blue1Brown
10x: I recommend the beautifully illustrated introduction to language generation concepts (from RNN to LSTM to all the concepts in the transformer architecture) by Giuliano Giacaglia to anyone who is not afraid of a well-dosed sip of complexity.
1x: Here’s the original paper from the Google team introducing the transformer concept — read it with awe
50x: Resources to study transformers

10x: Image Generation Technology: Latent Diffusion Models / Stable Diffusion

Latent diffusion models (LDMs) like Stable Diffusion work differently from large language models. It starts with the training: While LLMs are trained on unlabelled data, LDMs are trained on text/image pairs. This allows for text prompting of image generation models.

LDMs don’t process data directly in the vast image space but first compress the images into a much smaller but perceptually equivalent space, making the model faster and more efficient.

Latent diffusion pipeline architecture, image credit: Rombach, et. al.

The image-generation process is counterintuitive. It is not really drawing a visual, but taking out the noise of a random pixel distribution the model uses as a starting point. The process is like that of a sculptor — removing all the unnecessary marble to get the David statue.

Denoising process to carve out the image. Image credit (CC): Benlisquare

2350+: Models — Text, Image, Video, Sound, Code and Much More

1200+: Text — Large Language Models

Without a doubt, language is the most important application area for generative AI. And while it is raining dollars in any domain of AI, here the dollars are bigger. These are the most important LLMs:
3x OpenAI: GPT-4-turbo, GPT-3.5-turbo, ChatGPT — the models of the mother of invention, still the best in many aspects.
1x Mistral: Mixtral 8x7B — A high performing small model with Mixture-of-Experts architecture. From Paris with love.
3x: Anthropic: The Claude 3 model family — one of the best models up to date
2x Meta: Llama 2, Llama — Not very large (as measured in parameters), but high performing and open source.
1x Stanford University: Alpaca — another member of the Camelidae family and based on Llama. Surprisingly small (7B params).
2x: Google: Gemini, Palm 2
1x: TII (Abu Dhabi): Falcon 180B
Bloom: Bloomz, Bloom-Lora
1x: Aleph Alpha: Luminous supreme
1x: Baidu: Ernie Bot — China’s answer to ChatGPT with more than 100m registered users.
5x: Amazon: Titan models

More, more and still more LLMs:
100+: A list of major open source LLMs by Hannibal046
100+: Stanford’s HELM model list
1000+: A graphical overview of thousands of current and historic LLMs

120+: Image Generation Models and Tools

Perhaps not the most important domain of generative AI, but certainly the most enchanting.

Prompt on Midjourney: An impressive panda with a white hat and big eyes working as a magic librarian and sitting at a huge desk with card boxes in front of him. Stacks of books and scrolls next to him and behind him. Warm orange light through big windows. Butterflies in the air. Photorealistic, cinematic lights, exquisite details. In the style of Heavy Metal fantasy comics. Ultrahd, 32k

5x: CompVis / Stability.ai: Stable diffusion 1, Stable Diffusion 2.1 — the top open source model
1x: Midjourney — love it!
1x: OpenAI: DALL-e 3 — too!
14x: Curated list of image creation models tested with the same prompt by Vinnie Wong
100+: List of image creation models and tools

15x: Code Generation Models and Tools

Code Generation tools support developers in writing, debugging, and documenting code and can be integrated into IDEs or other development tools.
1x: GitHub: CoPilot The most widely adopted code generation model
1x: OpenAI: Codex, the model behind CoPilot
1x: Tabnine — open source AI code generation
1x: Salesforce: CodeT5 — open source, and read here how to fine-tune it
1x: Meta: Code Llama based on Llama 2
1x: Google: Codey Generation, completion & code chat
10x: AI code generation models by Tracy Phillips

17+: Speech Recognition (STT / ASR), Speech Generation (TTS) Models

There are now models for both transformation processes: Speech to text and text to speech.
1x: Openai: Whisper — one of the first huge foundation models in ASR
1x: RevAI ASR — the most accurate ASR
1x: Google is in the game now with Chirp ASR
3x: Top open source speech recognition models in comparison
1x: Meta: Voicebox voice generator (open source)
10x: Best AI voice generators

15x: Music Generation Models, Tools

Because I can

It is real fun to create a song just with a ten word prompt.
1x: Harmonai — Community-driven and OS production tool
1x: Mubert — A royalty-free music ecosystem
1x: MusicLM — A model by Google Research for generating high-fidelity music from text descriptions.
1x: Aiva — Generate songs in 250 styles.
1x: Suno — Took me about 50 seconds to register, write a prompt and create my first shining masterpiece of elevator music
10x: Best AI music generators

18x: Video Generation (Text to Video Models)

Similar to image generation, video generation is often based on diffusion / latent diffusion models:
1x: OpenAI: Sora, many of the first reviewers got a mild form of exophthalmos when experiencing the capabilities of this models
1x: Google: Imagen video generation from text
1x: Synthesia — Generate a video in seconds
1x: DeepBrain AI: Creates video and even the scripts to create the videos
5x: Comparison of video creation AI by Artturi Jalli
10x: And still some more models

7x: Other Generative AI Models

Generative AI can be used in completely different domains as long as there is such a thing as similarly structured content formats (such as images and texts) and a gigantic data base that can be used for pre-training.
1x: Robotics control. Google: RT-2 repository
2x: Molecule fold prediction: AlphaFold. Super interesting, here the foundation model and generative AI approach is used in a completely different domain, which has almost no touchpoints to media contents, like language or image. Startup with an application in drug creation: Absci
1x: Genomics: Building genome-scale language models (GenSLMs) by adapting large language models (LLMs) for genomic data
1x: Llemma — an open language model for mathematics
1x: AstroLLaMA — a foundation model for astronomy
1x: Antibiotics:
Generative AI for designing and validating easily synthesizable and structurally novel antibiotics

1000+: GPT Store:

The GPT Store is OpenAI’s equivalent to an app store. It hosts thousands of custom GPTs based on GPT-4 and Dall-E: From personal prompt engineering tools to daily schedule assistance, presentation and logo designs, task management, step-by-step tech troubleshooting, website creation and hosting, AI insight generation, explain board and card games, digital visionary painting, text-based adventure games, etc.
The access to the GPT Store is for ChatGPT Plus users only (around $20 per month).
You can create your own GPT and offer it to other users.

OpenAI GPT Store

10+: Autonomous Agent AIs

Agent AIs are usually not models of their own but platforms that orchestrate different models (language, image generation, etc.) to perform complex, multimodal tasks. Usually, they employ large language models to plan the task execution and the breakdown in simple steps.
1x: AgentGPT
1x: AutoGPT
10x: Intro to agent AI and overview of agents

350x: Application Areas, Companies, Startups

Generative AI start-ups are mushrooming, and many established companies are building tools and applications in this area. An XXXL-sized thank you to everyone who has made the effort to map this area.

150+: Sequoia’s market map by target group & application area:

Image Credit: Sequoia Capital

8x: Generative AI market maps, landscapes, comparisons & timelines
100x: Top generative AI startup list by YCombinator
100x: Generative AI application areas from audit reporting to writing product descriptions

3000+: Prompts, Prompt Engineering & Prompt Lists

The prompt serves as the tool to control a model’s behaviour. Users can provide a description of the desired output to prompt most models, including those generating images, videos, or music.

Prompt (You = me) and response created by inference (ChatGPT).

Prompts can be so much more than just an instruction or question. They can comprise

  • a few shot examples (showing the models how to generate the output),
  • data (which the model should use to generate the output)
  • a conversation history (for multi-turn conversations)
  • an exact definition of an output format
  • and much more

Prompt engineering is the art of generating safe, exact, successful, efficient, & robust prompts.

The Automat Framework in the prompt engineering cheat sheet

250+: Hardware, Frameworks, Approaches, Tools & Data

Generative AI models are huge (require a lot of memory) and need a lot of processor resources (an incredible amount of flops executed for training and still many for a single inference). So the hardware is game-changing in gen AI:
1x: Hardware: Generative AI hardware intro
15x: Overview on deep learning hardware with links to other resources
100x: Resources on processing units — CPU, GPU, APU, TPU, VPU, FPGA, QPU
1x: For some people language processing units (LPUs) have become the latest craze in AI hardware: 10 times faster than GPU’s / TPU’s in LLM token prediction: Read about Groq’s LPU inference engine and test it.

3x: Generative AI frameworks facilitate the development of applications with language and other models: LangChain, Llamaindex, Comparison of La and Lla

The LangChain ecosystem. Image credit: langchain docs

1x: RAG — Retrieval augmented generation is the key approach to let LLMs run with your data: Intro

10+: Vector databases store your data in gen AI applications and make them retrievable: Intro to vector DBs and top 6 DBs, & a few more

5x: Platforms providing models, resources to use and operate them: HuggingFace, Haystack, Azure AI, Google, Amazon Bedrock

150+: More resources on generative AI tools, frameworks and other contents

300+: Generative AI Achievements, Security & Privacy, Impacts on Society, AI Regulation, and an Outlook

40+: Achievements

Generative AI models — and here mostly OpenAIs models — took the bar exam, the medical licensing exam, the verbal intelligence test with an IQ of 147, the SAT college readiness test and many more tests and exams.
30x: List of ChatGPT / GPT-4 achievements
10x: Here are some more tests gen AI passed and as well where it failed

200+: AI security, privacy, AI TRiSM, explainability, hallucination control

AI TRiSM stands for Trust, Risk, and Security Management and comprises these fields:

More resources:
1x: OWASP AI Security and Privacy Guide
5+: AI security guidelines by Jiadong Chen
200+: More resources on AI security

25+: Impact on Society

Generative AI will have a deep impact on our society on different levels and at different time scales. Usually, we are prone to overestimate the short-term and underestimate the long-term impacts of new technologies.
1x: Macro-, meso- and micro-level impacts of generative AI
1x: Comprehensive paper on fields of impact of generative AI on systems on society
1x: The ILO on how it might affect quality and quantity of jobs
6x: Long-term impacts of AI on humantiy
15x: Catastrophic AI risks
2x: Superintelligence and why we should and how we can make sure, that future AIs are aligned to humanity’s goals

Superintelligence

50+: AI Regulation

AI regulation will be necessary if only to define what is permitted in what form in the new fields of application, who is allowed to profit from which intellectual property and how, and who is liable for errors and damage. With its AI Act draft, the EU has started the competition for the toughest AI regulation with a bang. Many insiders hope that other legislations will take a more measured approach and ensure adaptability with current technologies (generative AI). In principle, the EU has issued a regulation that essentially addresses the capabilities of pre-generative models.

20+: The new EU AI Act and more resources
1x: US, EU & UK regulation approaches in comparison
30+: A list of the evolving AI regulation approaches around the world

1x: Outlook & the End:

As almost nobody (maybe not even the guys at OpenAI) had predicted how generative AI would take off in 2023, it is really hard to forecast how it will evolve in 2024 and the upcoming years. ZDNet’s Vala Afshar did a great job here. The best outlook for a journey into the unknown is a compilation of outlooks: An exciting overview of what the leading tech fortune tellers like IDC, Gartner, Forrester & Co. expect: Half-life? A year? A few months? Just weeks until a groundbreaking development sets us on a new trajectory again.

I am delighted to be in this journey with you! I hope that you were able to take something away from my story. I wish you many, many, many more insights and an insane amount of success in AI!

This is the end

Special thanks to Ellen John for supporting me with this story.

--

--

Maximilian Vogel

Machine learning, generative AI aficionado and speaker. Co-founder BIG PICTURE.