How to use Hugging Face Pretrained Models and Streamlit to develop a Medical Diagnostic Assistant Chatbot

Alexander Roman
8 min readSep 1, 2023

Learn how to develop a Medical Diagnostic Assistant Chatbot using Hugging Face pre-trained models, Python and Streamlit.

1. INTRODUCTION

Throughout life, people face health problems, but many don’t have enough time or medical knowledge to spot important signs. These problems often lead to unhealthy habits, especially when busy people don’t visit the doctor promptly. Waiting can make it harder to find out what’s wrong.

This article is mostly about using new technology to solve these issues and encourage better health choices. Chatbots, like helpful computer programs, can talk to people and answer their questions anytime, anywhere. So, the article focuses on building a medical diagnostic assistant chatbot using pre-trained models from Hugging Face. This chatbot provides information about diseases when people need it, aiming to address the health issues mentioned earlier, and it’s deployed with Streamlit.

The article will be presented in 4 sections, which will be described as follows:

  • Section 1: Brief description that acts as the motivating foundation of this article.
  • Section 2: Explanation of the Pipeline diagram for the project.
  • Section 3: The technical section for the project includes the development and deployment stages.
  • Section 4: Project findings emphasizing the importance of the medical diagnostic assistant and suggesting future areas for study.

You can visit my GitHub in this link for the full code of the project: AlexRoman938/medical_diagnostic (github.com)

2. PIPELINE DIAGRAM

The Figure 1 illustrates the project’s pipeline, which initiated with symptoms as input, progressed through a voice recognition model, continued to a disease prediction model, and ultimately produced a disease as output.

Figure 1: Medical Diagnostic Assistant’s Pipeline

2.1. Symptoms

Initially, patients will communicate their symptoms by recording their voice to the medical diagnostic assistant. For instance, they might say, “I have a fever” or “I have a headache”.

2.2. Voice Recognition Model

Once patients have communicated their symptoms, these will be processed by the voice recognition model, specifically the whisper-tiny.en model, which has been sourced from Hugging Face. This model will convert the patients’ spoken words into text format.

The model was selected for its small size and lightweight nature, making it easily deployable and user-friendly. Additionally, the project exclusively focuses on English speech recognition.

Figure 2: whisper-tiny by OpenAI

The openai/whisper-tiny.en pretrained model is a speech recognition model developed by OpenAI. It is trained on 680,000 hours of English-only data collected from the web. The model is a Transformer-based encoder-decoder model

2.3. Disease Prediction Model

The output of the voice recognition model will be use by Disase Prediction Model, specifically the symptom-2-disease-net model which has been sourced from Hugging Face as well. This model will predict the disease based on patient’s symptoms.

Figure 3: Disease Prediction Model’s training results

The model was selected based on the Figure 3 which shows a high validation accuracy of 0.9583 after 5 epochs. Furthermore, other models lack training results in their descriptions.

Figure 4: symptom-2-diasease-net by abhirajeshbhai

The abhirajeshbhai/symptom-2-disease-net pretrained model is a symptom-to-disease classifier developed by Abhirajesh Bhai. It is trained on a dataset of 100,000 symptom-disease pairs. The model is a Transformer-based encoder-decoder model, also referred to as a sequence-to-sequence model.

2.4. Disease

Lastly, patients will know their disease based on the output of the Disease Prediction Model such as shown in the Figure 5.

Figure 5: Medical Diagnostic Assistant in production

3. CODING STAGE

In this section we are going to code in Python 3.9 using Anaconda Environment.

3.1. Setting up Anaconda Environment

We need to write the below command in Anaconda Navigator’s Prompt:

conda create -n medic_assistant python=3.9

Where “medic_assistant” is the name of our environment (You can change it).

Next, we need to enter to the project environment:

conda activate medic_assistant

Finally, we need to download “packages.txt” and “requirements.txt” in the project’s Github, and install them in our Conda environment.

conda install --file packages.txt
pip install -r requirements.txt

Remember to do it in your preferred path on your local machine.

3.2. Code for the Voice Recognition Model and Disease Prediction Model

After setting up our Anaconda environment, we will utilize the APIs of the voice recognition model and the disease prediction model to create corresponding functions.

import time
import json
import requests

token_hugging_face = "Your token access"

headers = {"Authorization": f"Bearer {token_hugging_face}"} #TOKEN HUGGING FACE
API_URL_RECOGNITION = "https://api-inference.huggingface.co/models/openai/whisper-tiny.en"
API_URL_DIAGNOSTIC = "https://api-inference.huggingface.co/models/abhirajeshbhai/symptom-2-disease-net"

#Voice recognition model
def recognize_speech(audio_file):

"""
INPUT: PATIENT'S SYMPTOMPS BY VOICE

OUTPUT: PATIENT'S SYMPTOMPS IN A TEXT FORMAT
"""

with open(audio_file, "rb") as f:

data = f.read()

time.sleep(1)

while True:

try:

response = requests.request("POST", API_URL_RECOGNITION, headers=headers, data=data)

output = json.loads(response.content.decode("utf-8"))

final_output = output['text']

break

except KeyError:

continue

return final_output


#Disease prediction model
def diagnostic_medic(voice_text):

"""
INPUT: PATIENT'S SYMPTOMPS IN A TEXT FORMAT

OUTPUT: PATIENT'S DISEASE
"""

synthomps = {"inputs": voice_text}

data = json.dumps(synthomps)



time.sleep(1)

while True:

try:

response = requests.request("POST", API_URL_DIAGNOSTIC, headers=headers, data=data)

output = json.loads(response.content.decode("utf-8"))

final_output = output[0][0]['label']

break

except KeyError:

continue

return final_output

For more information on how to obtain your ‘token_hugging_face’ and Hugging Face API Tokenclick on my recent article: https://medium.com/@aroman11/how-to-use-hugging-face-api-token-in-python-for-ai-application-step-by-step-be0ed00d315c

3.3. Integrating everything into Streamlit

Once we created the functions of the models. So, we need to integrate with the Streamlit library.

import streamlit as st
from streamlit_chat import message as st_message
from audiorecorder import audiorecorder
import time
import json
import requests


token_hugging_face = "Your token access"

headers = {"Authorization": f"Bearer {token_hugging_face}"} #TOKEN HUGGING FACE
API_URL_RECOGNITION = "https://api-inference.huggingface.co/models/openai/whisper-tiny.en"
API_URL_DIAGNOSTIC = "https://api-inference.huggingface.co/models/abhirajeshbhai/symptom-2-disease-net"

#Voice recognition model
def recognize_speech(audio_file):

with open(audio_file, "rb") as f:

data = f.read()

time.sleep(1)

while True:

try:

response = requests.request("POST", API_URL_RECOGNITION, headers=headers, data=data)

output = json.loads(response.content.decode("utf-8"))

final_output = output['text']

break

except KeyError:

continue

return final_output

#Disease prediction model
def diagnostic_medic(voice_text):


synthomps = {"inputs": voice_text}

data = json.dumps(synthomps)



time.sleep(1)

while True:

try:

response = requests.request("POST", API_URL_DIAGNOSTIC, headers=headers, data=data)

output = json.loads(response.content.decode("utf-8"))

final_output = output[0][0]['label']

break

except KeyError:

continue

return final_output



def generate_answer(audio):

"""
INPUT: PATIENT'S SYMPTOMPS BY VOICE RECORDING

OUTPUT: MEDICAL CONSULTATION
"""

with st.spinner("Consultation in progress..."):

# To save audio to a file:
wav_file = open("audio.wav", "wb")

wav_file.write(audio.tobytes())

# Voice recognition model

text = recognize_speech("./audio.wav")


#Disease Prediction Model

diagnostic = diagnostic_medic(text)

#Save conversation
st.session_state.history.append({"message": text, "is_user": True})
st.session_state.history.append({"message": f" Your disease would be {diagnostic}", "is_user": False})


st.success("Medical consultation done")





if __name__ == "__main__":

# remove the hamburger in the upper right hand corner and the Made with Streamlit footer
hide_menu_style = """
<style>
#MainMenu {visibility: hidden;}
footer {visibility: hidden;}
</style>
"""
st.markdown(hide_menu_style, unsafe_allow_html=True)


col1, col2, col3 = st.columns(3)

with col1:
st.write(' ')


with col2:
st.image("./logo_.png", width = 200)


with col3:
st.write(' ')


if "history" not in st.session_state:

st.session_state.history = []

st.title("Medical Diagnostic Assistant")


#Show Input
audio = audiorecorder("Start recording", "Recording in progress...")

if len(audio) > 0:

generate_answer(audio)

for i, chat in enumerate(st.session_state.history): #Show historical consultation

st_message(**chat, key =str(i))

3.4. Deploying with Streamlit

In this section, we will share our Streamlit app, so we need to access GitHub (If you don’t have an account, then please create one).

Step 1: Create a new Repository called “medical_diagnostic” and check on “Add a README file” such as the Figure 6.

Figure 6: Create a new repository in GitHub

Step 2: Upload the files that we’ve downloaded before such as requirements.txt and packages.txt. Moreover, the python code and an image. And commit changes.

The image called “logo_.png“is in the project’s GitHub: AlexRoman938/medical_diagnostic (github.com)

Figure 7: medical_diagnostic repository

The repository has to be like the Figure 8. In other words, we must have these files:

  • logo_.png
  • python_file
  • packages.txt
  • requirements.txt
Figure 8: medical_diagnostic repository with all files

Step 3: Create an account in Streamlit Share. Once, we’ve created then click on “New app”

It’s recommended to create with your Github Account

Figure 9: Streamlit Share profile

Step 4: Edit the following boxes such as the Figure 10. And deploy… 🚀

  • Repository: The repository where the project is.
  • Branch: main (It’s not necessary to edit)
  • Main file path: The name of the python file.
  • App URL: The name of your URL
Figure 10: Streamlit Share’s deploy menu

Step 5: Just wait until the deployment is already…

Figure 11: Streamlit Share’s waiting app

3.5. Medical Diagnostic’s Functionality

Phase 1: Click on “Start recording”

Figure 12: Phase 1

Phase 2: When is in Recording in progress… We have to talk 🔈… And when we finish talking, we have to click on Recording in progress button.

Figure 13: Phase 2

Phase 3: Just wait the diagnostic👁️⌛…

Figure 14: Phase 3

Phase 4: Medical consultation done! 👌

Figure 15: Phase 4

4. CONCLUSION AND RECOMMENDATION

Medical Diagnostic Assistant chatbot is limited; however, it serves as a support to gain an understanding of the most likely illness. On the other hand, the information provided can be incorrect at times, so it cannot replace a consultation with a doctor.

As a direction for future work, we suggest employing improve data quality to enhance the precision and accuracy of results by fine-tuning the Disase Prediction Model.

Finally, thank you for reading this article. This project was made in the IA Group of my university.

Members:

Alexander Roman (me)

Walter Diaz

• Sofia Pinaya

• Valeria Quispe

• Claudia Tiburcio

REFERENCES

Hugging Face — The AI community building the future.

Streamlit • A faster way to build and share data apps

WRITER at MLearning.ai / AI Video / AI art Copyright / ART learning

--

--

Alexander Roman

Machine Learning Engineer. I enjoy discussing about MLOps, NLP & Chatbots. Follow me at: https://www.linkedin.com/in/alexanderdroman/