An Exhaustive List of Open-source Generative AI Models in 2023

Haziqa
Heartbeat
Published in
8 min readAug 10, 2023

--

Photo by Milad Fakurian on Unsplash

Introduction

With advanced models like Generative Pre-trained Transformer 3 (GPT-3), which provides human-like responses to user queries, AI is progressing toward generative tools to create realistic content, including text, videos, images, and audio.

And with open-source becoming the norm, most AI models are available for public use for research and experimentation. As such, we explore the most recent open-source generative AI models that demonstrate the ever-expanding applications of AI.

1. LLaMa

Image Source: Meta

Within the text generator domain, Large Language Model Meta AI (LLaMa) is a revolutionary technology that surpasses ChatGPT-3 by Open AI regarding safety and quality. Its architecture consists of four models, having 7, 13, 34, and 70 billion parameters, respectively. Although the parameter size is smaller than the more recent GPT 4 platform, potentially having eight 220-billion parameter models, LLaMa 2 uses a much larger dataset with a 3000-word context window.

And like other Large Language Models (LLMs), LLaMa 2’s most significant use case is superior chatbots that can provide relevant answers to different user prompts. Enterprises can download it directly onto their servers to build customer-centric applications that help visitors engage with businesses more effectively.

But the real game-changer is how LLaMa 2 manages the safety-helpfulness trade-off. Traditional chatbots are more helpful by answering almost any question — even dangerous questions like “How to kill?”.

LLaMa 2 changes the landscape by incorporating two reward models to control the responses optimally. One model rewards LLaMa based on how helpful it is, while the other rewards based on safety. This is part of the Reinforcement Learning from Human Feedback (RLHF) approach, where the reward models are similar to humans assessing the quality of LLaMa 2’s response. In effect, the model learns to maximize the reward and improve its output.

If LLaMa 2 assesses a prompt as dangerous, it switches to the safety reward model and generates an appropriate response. For other prompts, it uses the helpfulness reward model. As such, LLaMa 2’s architecture is revolutionary and paves the way for AI to interact more safely with the real world.

How does the team at Uber manage to keep their data organized and their team united? Comet’s experiment tracking. Learn more from Uber’s Olcay Cirit.

2. BLOOM

Image Source: Bloom

Yet another innovation in the text generator space, BigScience Large Open-science Open-access Multilingual Language Model (BLOOM), is a multilingual language model by Hugging Face that can solve several mathematical and programming problems.

Trained with 176 billion parameters, BLOOM supports 46 languages and 13 programming languages. However, running BLOOM on a local machine can take time due to its sheer size.

Its architecture is similar to GPT-3, where it predicts the next token using 70 transformer blocks. Each block has a multi-layer perceptron and an attention layer to predict the next token from a given input which it takes in the form of word embeddings.

The model has several use cases as it can quickly solve arithmetic problems, translate one language into another, generate appropriate code, and generate general content per user requirements. Also, users can conveniently deploy it in production through the Hugging Face Accelerator library, making it easier to train and infer from the model.

With the only model having 100-billion-plus parameters, BLOOM extends AI’s boundaries to provide accurate and relevant responses with an easy-to-implement framework. And being open-source, users can fine-tune the model through the Hugging Face Transformers library and expand its applications in various fields, such as education, eCommerce, research, etc.

3. MPT-30B

Image Source: MosaicML

MosaicML recently launched its Mosaic Pretrained Transform (MPT) — 30B language model that outperforms several other LLMs, such as ChatGPT-3, StableLM 7B, and LLaMA-7B. It’s an open-source decoder-only transformer model that improves upon the previous version — the MPT-7B.

As the name suggests, the model consists of 30 billion parameters with a context window of 8000 tokens, meaning it can understand pretty long word sequences to generate appropriate responses.

It also uses the Attention with Linear Biases (ALiBi) technique enabling the model to comprehend sequences longer than 8000 tokens. The feature makes MPT-30 highly valuable in the legal domain, where experts may use it for analyzing long contracts with complex legal diction.

In addition, the MPT-30-Instruct is a purpose-built variation of MPT-30B, which effectively understands user instructions as input prompts. The variant is applicable where users want the model to precisely follow a set of instructions.

In contrast, the MPT-30 chat is a conversational application that generates relevant human-like responses. The version also performs well when generating code in several programming languages and reportedly performs better than other code generators, such as StarCoder-GPTeacher on HumanEval. However, the MPT-30 chat is not yet available for commercial use.

The defining development of the MPT-30 model is that it’s the first-ever LLM that partially used NVIDIA’s H100 GPUs for training, increasing throughput by 2.44 times per GPU.

4. Dall-E Mini

Image Source: Craiyon

With the increasing popularity of AI text generators, several text-to-image models are also emerging, with advanced architectures producing realistic visuals.

Dall-E by Open AI is a text-to-image model that is an offshoot of the GPT-3 model with 12 billion parameters. The company released the original version on January 5, 2021, followed by Dall-E 2 on September 28, 2022, which claims to have better speed and image resolution.

While Dall-E 2 is not widely available, Dall-E mini, also known as Craiyon, is open-sourced and generates simple images through textual prompts read through a bidirectional encoder.

The model features a transformer neural network that uses the attention mechanism allowing the neural net to consider the most significant aspects of a given sequence.

The attention method allows the model to produce more accurate results and make better connections between abstract elements to give unique images.

Craiyon and the more advanced Dall-E versions are invaluable in the fashion industry, where companies display several outfits and products. With such image-generating technology, they can conveniently generate relevant photos without hiring expensive models and other professional staff.

With an ability to create entirely new images of animals, humans, nature, and other arcane creatures, the Dall E line of image generators can comprehend abstract textual descriptions and produce several variations by combining distinct concepts — extending human creativity to new levels.

5. Stable Diffusion

Image Source: Stability.ai

Boasting much faster speed and more realistic images, Stable Diffusion is a more sophisticated model that uses textual prompts to produce high-quality visual art.

As the name suggests, Stable Diffusion uses the diffusion model to create images. A diffusion model has two elements — forward diffusion and reverse diffusion. In forward diffusion, the model adds random noise to a photo. While in reverse diffusion, it subtracts the noise to get to the original image.

The architecture features a noise predictor which takes a text prompt and random noise in latent space — a low-dimensional space with compressed representations of an image.

Next, the model subtracts the predicted noise from the latent image and repeats this step several times. Finally, a variational encoder decodes the latent image into an actual photo.

While DALL-E and DALL-E 2 also feature the diffusion model, they are slower than Stable Diffusion. Also, Stable Diffusion is open-source and allows users to tweak several options through the Stability DreamStudio app, providing more control over how you want to generate an image.

For example, you can increase the number of steps for subtracting noise, provide different seeds and control the prompt strength.

Stable Diffusion is suitable for users that want images that relate more to the real world. It is perfect for generating photographs, portraits, 3D images, etc.

6. AudioCraft

Image Source: Meta AI

Although still in the early stages, generative AI’s capabilities are extending into the audio domain, with many technologies, such as Open AI’s Jukebox and Harmonai models for music generation. But the most recent advancement is AudioCraft by Meta — an open-source text-to-music generation model.

AudioCraft can effectively take textual prompts, such as “rock music with electronic sounds,” and generate high-fidelity soundtracks without background noise. This is an impressive leap in generative AI, as all previous models require audio inputs to create short clips that are often low quality.

The technology uses three proprietary models — MusicGen, AudioGen, and EnCodec. MusicGen is an autoregressive transformer model that creates music clips using textual prompts. AudioGen, in contrast, generates environmental sounds, such as a dog barking, a child crying, whistling, etc., through textual prompts.

But the real game-changer is Meta’s audio compression codec — Encodec — a neural network that allows the model to learn discrete audio tokens (similar to word tokens in large language models) and create a vocabulary for music.

The audio tokens then feed into autoregressive language models to generate new tokens. Finally, the Encodec model decodes the tokens and maps them onto the audio space to produce realistic musical clips.

With Encodec’s neural network, AI can finally analyze music containing long audio sequences with different frequencies. To give some perspective, songs spanning a couple of minutes contain millions of timesteps compared to textual tokens that consist of mere thousands of timesteps used for training LLMs.

AudioCraft can help artists and other creative professionals conveniently generate unique soundtracks and add them to videos, podcasts, and other forms of media without hassle while experimenting with different melodies to speed up the production process.

Conclusion

With generative AI entering the scene, the technology landscape is changing rapidly, allowing businesses to find more cost-effective ways to run their operations. Of course, what it means is organizations must adopt AI quickly to remain ahead of the competition.

And the Comet Machine Learning (ML) platform will help you get up to speed by letting you quickly train, test, and manage ML models in production. The tool allows businesses to easily leverage ML’s power and boost productivity through its experiment-tracking features, interactive visualizations, and monitoring capabilities.

So, create a free account now to benefit from Comet’s full feature stack.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.

--

--