Conversing with Documents: Unleashing the Power of LLMs and LangChain

8 min readJul 7, 2023

Over the past few months, I’ve been captivated by the flood of apps claiming to be the ultimate “ChatGPT for your documents” on Product Hunt. The question that lingered in my mind was, “How do these apps actually work?” Curiosity led me down an exciting path of discovery, and I stumbled upon a framework that I think is revolutionizing the world of app development in the context of Large Language Models. It’s called LangChain.

As I delved deeper into the workings of LangChain, I discovered that creating such an app is not as daunting as it seems. In fact, it’s surprisingly achievable by combining three key workflows with the incredible power of the OpenAI API.

That being said, I humbly acknowledge that creating a software application is a very complex process and me not being a software developer have only explored this at a surface level. All nuances of software development remain unexplored. Let me begin with the context of the technique and then introduce the application.

Decoding the technique

Document Embeddings — First things first, we need to convert our documents into something called “embeddings”. Think of it as a fancy way of representing our documents in a language that the computer can understand. We use these document loaders provided by LangChain to upload the documents. Then, via the embeddings models, convert the documents to vector embeddings. Once we have our embeddings, we store them in a vector store for future searches.

Below is an example of extracting YouTube video transcripts and storing them in an FAISS vector store.

Note: In case the length of the document is low, instead of creating embeddings, we can directly pass the text in the prompt. This means that the entire document text is the context.

####extract_YT function to extract text from YouTube link
####uses YoutubeLoader to extract text from YouTube link
####parameters: "link" is the YouTube link
#### youtube-transcript-api is a dependency and needs to 
#### be installed before running this function
#### returns: words->number of words, num->0 to indicate embeddings, 
#### text->text extracted, tokens->number of tokens from tiktoken
def extract_YT(link): #### Function to extract text from YouTube link ####
    address=link 
    #### Store YouTube link in address variable ####
    loader = YoutubeLoader.from_youtube_url(address, add_video_info=True) 
    #### Load YouTube link using YoutubeLoader ####
    document=loader.load() 
    #### Extract text from YouTube link ####
    text=str(document[0].page_content) 
    #### Convert extracted text to string ####
    words=len(text.split()) 
    #### Count number of words in the extracted text ####
    tokens=num_tokens_from_string(text,encoding_name="cl100k_base") 
    #### Count number of tokens in the extracted text ####
    return words, 0, text, tokens 
    #### Return number of words, number of embeddings(placeholder), 
    #### extracted text and number of tokens ####

####create_embeddings function to create embeddings from text
####uses OpenAIEmbeddings and FAISS to create embeddings from text
####parameters: "text" is the text to be embedded
####returns: db->database with embeddings, num_emb->number of embeddings
####Embeddings are created once per input and 
#### only if the input text is greater than 2500 tokens
@st.cache_data #### Cache embeddings to avoid re-embedding ####
def create_embeddings(text): #### Function to create embeddings from text ####
    with open('temp.txt','w') as f: #### Write text to a temporary file ####
         f.write(text) #### Write text to a temporary file ####
         f.close() #### Close temporary file ####
    loader=TextLoader('temp.txt') 
    #### Load temporary file using TextLoader ####
    document=loader.load() 
    #### Extract text from temporary file ####
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, 
    chunk_overlap=2000) 
    #### Initialize text splitter to split text into chunks of 10000 tokens ####
    docs = text_splitter.split_documents(document) 
    #### Split document into chunks of 10000 tokens ####
    num_emb=len(docs) 
    #### Count number of embeddings ####
    embeddings = OpenAIEmbeddings() 
    #### Initialize embeddings ####
    db = FAISS.from_documents(docs, embeddings) 
    #### Create embeddings from text ####
    return db, num_emb 
    #### Return database with embeddings and number of embeddings ####

Context Establishment — Now, let’s talk about context. When a user asks a question, we need to understand the context and find the most relevant documents to provide accurate answers. We convert the user’s question into vector representations (again using LangChain’s embedding service). Then, using these vectors, we search through our library of embeddings and retrieve the most relevant documents.

#### search_context function to search the database for 
#### the most relevant section to the user question ####
#### This function takes the following inputs: ####
#### db: the database with embeddings to be used for 
#### answering the question ####
#### query: the question to be answered ####
#### This function returns the following outputs: ####
#### defin[0].page_content: the most relevant section to 
#### the user question ####
def search_context(db,query): ###### search_context function
     defin=db.similarity_search(query) ###### call the FAISS 
      ####similarity_search function that searches the database for 
      #### the most relevant section to the user question and orders the 
      #### results in descending order of relevance
     return defin[0].page_content ###### return the most relevant 
      #### section to the user question

Language Models at Work — Here comes the exciting part! We unleash the power of large language models. These super-smart models take the user’s question and the context we established earlier and generate precise answers. It’s like having a real conversation with our documents! These language models analyze the question, consider the context, and deliver the best response.

#### q_response function to generate an answer to a question ####
#### This function takes the following inputs: ####
#### query: the question to be answered ####
#### doc: the document to be used for answering the question ####
#### models: the model to be used for generating text ####
#### This function returns the following outputs: ####
#### text_final: the generated answer ####
def q_response(query,doc,models): ###### q_response function
    prompt=f"Answer the question below only from the context provided. 
            Answer in detail and in a friendly, enthusiastic tone. 
            If not in the context, respond with '100'\n context:{doc}.
            \nquestion:{query}.\nanswer:" 
            ###### create the prompt asking openai to generate an 
            ###### answer with the question and document as context 
            ###### and '100' as the answer if the answer is not in the context
    text, t1, t2, t3=open_ai_call(models,prompt) 
    ###### call the open_ai_call function
    try: ###### try block
        if int(text)==100: 
          ###### check if the generated text is 100. 
          #### This is the case when the generated text is not in the context
            text2,tx,ty,tz=open_ai_call(models, query) 
          ###### call the open_ai_call function without any context
            text_final="I am sorry, I couldn't find the information 
            in the documents provided.\nHere's the information 
            I have from the data I was pre-trained on-\n"+text2 
            ###### create the final answer with the result is not in the context
    except: ###### except block
        text_final=text 
        ###### create the final answer with the result in the context
    return text_final 
        ###### return the final answer

A diagrammatic representation of the three workflows

Exploring the Development of an Application

With this grasp of the technique behind document-based conversations, let’s take a closer look at VIDIA.I — the app that brings this magic to life.

Overview

VIDIA.I integrates Streamlit, OpenAI API, and LangChain loaders and embeddings to deliver a seamless user experience. You will need an OpenAI API key to get started.

Asset Upload and Processing

VIDIA.I is designed to handle various asset types with ease. Whether it’s PDFs, web links, audio files, or more, you can upload them all to VIDIA.I for processing. For longer documents, embeddings are created.

Once uploaded, the app analyzes and processes these assets, enabling Q&A interactions, summarization, and extraction of valuable information. It’s like having your own document-savvy assistant right at your fingertips!

from langchain.document_loaders import YoutubeLoader 
#### Import YoutubeLoader to extract text from YouTube link
from langchain.document_loaders import TextLoader 
#### Import TextLoader to extract text from text file
from langchain.document_loaders.image import UnstructuredImageLoader 
#### Import UnstructuredImageLoader to extract text from image file
import pdfplumber 
#### Import pdfplumber to extract text from pdf file
import pathlib 
#### Import pathlib to extract file extension
import requests 
#### Import requests to extract text from weblink
from bs4 import BeautifulSoup #### 
Import BeautifulSoup to parse response from weblink
import openai #### 
Import openai to extract text from audio file
'''Libraries for Embeddings'''
from langchain.text_splitter import RecursiveCharacterTextSplitter 
#### Import RecursiveCharacterTextSplitter to split text into chunks of 10000 tokens
from langchain.embeddings.openai import OpenAIEmbeddings 
#### Import OpenAIEmbeddings to create embeddings
from langchain.vectorstores import FAISS 
#### Import FAISS to create embeddings

Chat Away

Once the documents are uploaded and embeddings created (if needed), the chat window is enabled. You can ask questions related to the document.

Here, I have provided a link to blog on the OpenAI website. Let’s ask a few questions.

Let’s now ask a few general questions which aren’t a part of the blog. VIDIA still answers the question but puts a disclaimer saying that the information is not provided in the document. We can choose to take any action once we recognize that the question is out of context.

Document Summary, Questions, Talking Points

VIDIA also provides other information about the document that is of your interest.

Try it out!

You can try out the app and play around with it here — https://abhinav-kimothi-vidia-i-srcmain-w0iybq.streamlit.app/

If you’re interested, check out the source-code on Github

Cautions

The main essence for the success of a system like this is not the generative LLM but how the curated context is being passed to it.
Storing the data and metadata correctly is critical.
Using the right embedding model creates a world of difference.
Using the right vector database to reduce search space and efficient filtering can improve the system
Like everywhere else, the system can hallucinate. Beware!

Limitations of VIDIA

VIDIA uses the gpt-3 text-davinci model and can be upgraded.
It has yet to be a chat system and does not remember the previous responses. That can be built using the gpt-3.5/4 endpoints.
It can’t handle multiple documents/entire websites, yet.
CSV/excel/spreadsheets are out of scope for now.

Hope you find this useful. Do let me know what other techniques and applications you have come across in for this use case.

BECOME a WRITER at MLearning.ai // invisible ML // 800+ AI tools

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com