BUD-E: An Open-Source Voice Assistant by LAION that Runs on a Laptop

ODSC - Open Data Science
3 min readFeb 22, 2024

LAION has introduced, BUD-E, an open-source voice assistant that is able to run on a gaming laptop and does not require an active internet connection. If proven to be scalable, voice assistant technology could see a major boost in the near future.

In LAION’s blog, the company explains that BUD-E came into being in collaboration with the ELLIS Institute Tübingen, Collabora, and the Tübingen AI Center. The team is hoping to help AI assistant technology address the cap in nuanced understanding and emotional intelligence that is inherent to human dialogue.

And that was how BUD-E (Buddy for Understanding and Digital Empathy) was born. The goal of this program is to help in creating voice assistants that not only respond in real-time but do so with a depth of empathy and understanding previously unseen.

According to the blog, by harnessing the capabilities of advanced Speech-to-Text, Large Language, and Text-to-Speech models, BUD-E aims to minimize the latency and mechanical nature of responses, aiming for a seamless conversational flow.

As of January 2024, the project has achieved latencies between 300 to 500 milliseconds using the Phi 2 model, with aspirations to further reduce this with larger models like LLama 2 30B. The team is aware that the road journey toward an empathic and naturally conversational AI is filled with challenges requiring new solutions.

Key among these is the reduction of system latency and requirements through sophisticated quantization techniques and the fine-tuning of streaming text-to-speech (TTS) and speech-to-text (STT) models. Enhancing the naturalness of speech and responses involves building a dataset of natural human dialogues and developing a reliable speaker-diarization system.

To achieve this, BUD-E aims to maintain continuity over conversations spanning days, months, or even years, leveraging Retrieval Augmented Generation (RAG) for improved performance. And with the project’s open-source foundation, researchers and developers hope to take on the challenges associated with emotional context understanding.

But what does this exactly mean in the medium to long term? First, there’s a growing interest in assisting models that power voice assistants, to understand contextual language. This of course includes a level of emotional language.

This is because as AI becomes more prevalent in the daily lives of people, its ability to interact more recognizably will become important. If you’re interested in the BUD-E project, you can check out LAION’s GitHub page.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.