Understanding the Power of Large Language Models

Introduction to LLMs

Pragati Baheti
Heartbeat

--

LLM in the sphere of AI

Large language models (often abbreviated as LLMs) refer to a type of artificial intelligence (AI) model typically based on deep learning architectures known as transformers. They are usually trained on a massive amount of text data. The end goal of such a model is to understand and be able to generate human-like text.

Large language models, such as GPT-3 (Generative Pre-trained Transformer 3), BERT, XLNet, and Transformer-XL, etc., are trained on various sources of text publicly available on the internet. During this training process, these language models learn to predict the next word or sequence of words in the given context to gain a comprehensive knowledge of grammar, language structure, and contextual and semantic relationships.

Large language models can be used for enormous language-related tasks like chatbots to have contextual conversations with a user base, summarize documents by deriving meaningful information from massive text, translation engines, and more.

Large language models have gained considerable attention and popularity due to their impressive capabilities and potential applications, and even more with the launch of ChatGPT, an advanced language model developed by OpenAI.

General Working of Language Models

The typical architecture for language models is based on the transformer model, which has proven to be highly effective in capturing contextual relationships in text. The foundation of many cutting-edge language models stems from the transformer architecture, first introduced in the influential paper “Attention Is All You Need” by Vaswani et al. in 2017.

This architecture comprises two primary components: the encoder and the decoder. However, to derive the context of the language, only the encoder component is typically used.

Transformer model used for Language-based tasks [Source]

The encoder consists of a heap of identical layers, containing two sub-layers: a multi-head self-attention mechanism and a position-wise feed-forward neural network. The self-attention mechanism allows the model to weigh the importance of various words in the input sequence based on their relevance to each other. This mechanism helps the model capture contextual associations effectively.

The position-wise feed-forward neural network processes the information from the self-attention layer, applying a non-linear transformation to each position independently. It enables the model to capture more complex patterns and relationships within the sequence.

In addition to the encoder layers, the transformer architecture includes positional encoding, which provides information about the order or position of words in the input sequence. The positional encoding allows the model to understand the sequential nature of language.

Overall, transformer architecture has revolutionized the field of natural language processing by enabling efficient and effective modeling of long-range dependencies and capturing complex contextual relationships in text. It has become the backbone of many successful language models, like GPT-3, BERT, and their variants.

Benefits of Using Language Models

1. Natural Language Understanding: Language models have the inherent skills of natural language understanding from the underlying architecture of Transformers. Tools based on language models like ChatGPT have clearly shown their skillset to understand, interpret natural language input, and logically respond, allowing for more intuitive and human-like user interactions since it already has an extensive knowledge base of a diverse training corpus.

2. Scalability: Language models can parallelly handle many conversations, making them suitable for applications with high user interaction demands, such as customer support or chat-based services. Applications like ChatGPT, developed by OpenAI, handle millions of traffic spanning various conversations and geo-boundaries.

3. 24/7 Availability and Performance: Language models can operate around the clock, providing continuous support and interaction without human intervention. This makes them valuable for applications where real-time assistance is crucial, and resource and cost is limited. These automated models produce instantaneous responses with minimal overhead of training time and data requirements.

4. Multilingual Support: Many language models can process, understand and generate text in multiple languages, facilitating global interactions and expanding accessibility for users worldwide.

Challenges or Limitations of Using Language Models

1. Generating Convincing and Contextually Appropriate Responses: Language models can sometimes produce responses that may be grammatically correct but lack coherence or fail to capture the intended meaning. Ensuring high-quality and contextually plausible answers remains a challenge also highlighted by Sobieszek et al. in this paper.

2. Biases and Inaccuracies: Language models can inadvertently generate biased or inaccurate information far from the ground truth, reflecting the biases in the training data they were exposed to. Addressing and mitigating biases in language models is an ongoing challenge.

3. Lack of Real-time Knowledge: Language models, including ChatGPT, have a knowledge cutoff date and do not possess real-time or up-to-date information, like ChatGPT does not possess real-time or up-to-date knowledge beyond its September 2021 knowledge cutoff. This can limit their ability to provide the latest and most accurate information, especially in rapidly evolving domains or current events.

4. Ethical Considerations: There are ethical concerns related to the use of language models for conversations, such as the potential for negative use, the responsibility of addressing harmful, inappropriate, or sensitive content like credit card information, questions on terrorism, malicious content like cyber attacks, plagiarism, and the impact on human-to-human interaction in certain prohibited domains.

Real-World Use Cases

Adaptions of large language models to serve multiple use-cases [Source]

1. Customer Support: Large language models can be employed in customer support systems like chatbots to handle frequently asked questions, provide troubleshooting assistance, and automated guidance to users through various processes.

2. Virtual Assistants: Language models can serve as the foundation for virtual assistants, enabling them to understand and respond to user queries, perform tasks, and provide personalized recommendations. Virtual assistants are extensively used in financial institutions, government portals, travel booking sites, etc., to give you customized content that best suits your needs.

3. Content Generation: Language models can assist in generating content for various purposes, such as writing articles, producing creative pieces, or providing suggestions for social media posts. LLM is the one-stop solution for summarization of posts, headline generation in news channels, translation of content to different lingual, etc.

4. Language Tutoring: Language models can simulate conversations and aid in language learning by providing feedback, answering questions, and engaging in interactive language practice.

5. Interactive Chatbots: Language models can power chatbots used in messaging platforms, websites, or applications, allowing users to have interactive and conversational experiences, from general queries to specific tasks.

These are just a few examples of the benefits, challenges, and real-world applications of using language models for interactive conversations. The field continues to evolve, and advancements are being made to address the challenges and maximize the potential of these models in real-world scenarios.

Transfer Learning — Train ChatGPT with Custom Data

One can fine-tune the language model like ChatGPT on specific datasets using Transfer Learning. Customizing involves re-training the model on some domain-specific data while retaining the knowledge it had gained during pre-training and expanding the feature set even further. This approach is effective for adapting a language model to a specific domain or task.

Pre-training vs. Fine-tuning [Source]

Approach I — Using OpenAI API

With OpenAI’s ChatGPT training your own chatbot with your custom data is easier than ever. So, let’s start with how it’s done.

Prerequisites

  1. Install Python
  2. Upgrade pip
  3. Install libraries using pip that is needed for training the ChatGPT model : openai (OpenAI library), gpt_index (LLM to connect to our data and train further on it), gradio (interactive UI for ChatGPT)
  4. Retrieve the API key from OpenAI
Snapshot of OpenAI site to get secret API keys [Source: Author]

5. Collect the data you want the model to train on. Here is a catch: the more data you use for training, the more tokens will be exhausted, and a free account has a one-time $ 18$ worth of tokens to use.

6. Create a Python script that reads the custom domain-specific text. Refer to this gist link train_chatgpt_v1.py (github.com)

7. Now, one can run this script using python3 app.py, which will start training our custom chatbot. The training duration can vary depending on the amount of data you provide. Once completed, the console will have a link to test the fine-tuned model. A quick reminder that the questions-answers that one asks the custom model also use tokens from the OpenAI account.

Approach II — Leveraging a Database for Prompt Engineering

Complete architecture of re-training ChatGPT using Prompt engineering [Source]

An effective alternative strategy is to leverage a specialized database to store and retrieve task-specific information in convergence with the knowledge of ChatGPT. Few techniques that are commonly used in prompt engineering are:

  1. Role prompting — where we specify our role and the domain in which we are expecting the answer to be. For example, you can specify “I am looking for a scientific reason behind...” or “I am working as a data scientist...”
  2. Few shots prompting — In this type of prompt engineering, we provide some examples so that GPT learns the patterns from the prompts and answer in a similar pattern.
  3. Chain of thought prompting — GPT can also yield better results if we ask GPT to give a reasoning behind the answers and making GPT think in a particular chain of thought to avoid invalid answers.
  4. Generated knowledge — In this approach, GPT in a prior step is asked to generate potential background by providing information about a prompt before it actually generates the final response. For example, asking the languages commonly used in US and then asking to translate a sentence to all possible languages used in US using the chain of thought prompting.

SingleStoreDB is an example of a customized database that can process substantial volumes of data and is an optimal choice for real-time data querying.

The approach is limited to sending only 8,000 tokens or around 32,000 characters while transmitting prompts to the GPT APIs.

The architecture of this method involves introducing an intermediary layer, like middleware, between the user’s query and the OpenAI APIs. Consequently, utmost attention must be given to querying the information from the custom database that matches the precision and efficiency of the search outcomes and prompt engineering before forwarding the modified prompt to GPT-4. I was inspired by the method highlighted by Madhukar Kumar in this article of adapting ChatGPT to answer questions specific to a company.

Steps to Utilize Custom Data with ChatGPT:

1. Create a database as per your choice to store the domain-specific data.

2. Forge a table in your database with the desired schema. For instance, you can create a sample table named “embeddings” containing a text column to be indexed.

3. Use OpenAI’s embeddings API to generate embeddings for each entry and store them in a separate column in the table.

4. Once the embeddings are generated for each entry, employ an inherent and highly parallelized DOT_PRODUCT vector function to retrieve the most relevant content from the custom database that matches the user’s prompt.

5. After obtaining the text from the custom database, we need to incorporate them into the prompt before dispatching it to OpenAI. This will result in a response tailored to your specific dataset.

Kudos!! We have come far 😄 Starting from what large language models are and how they are trained using attention mechanism and positional encoding of words in the sequence to the benefits and challenges that come with them. Furthermore, we can now easily re-train effective language models like OpenAI’s ChatGPT to adapt them to custom needs using the two techniques discussed above!!

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.

--

--

SDE at Microsoft | Amalgamation of different technologies | Deep learning enthusiast