Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Creating a Smart Home AI Assistant
Artificial Intelligence   Latest   Machine Learning

Creating a Smart Home AI Assistant

Last Updated on May 12, 2024 by Editorial Team

Author(s): Michael K

Originally published on Towards AI.

Source: Image generated by the author (using Adobe Generative AI)

The hardware AI assistants recently released have been making splashes in the news which gave me a lot of inspiration around the concept of an ‘action model’ and how powerful they could be. It also made me curious about how hard would it be to give a large language model access to my smart home API — because coding an entire assistant is totally easier than just opening a tab to my dashboard.

In this article, using Python and a few open-source tools, we’ll create an assistant that can perform almost any action we desire. We’ll also explore how this works under the hood, and how we can use some extra tools to make debugging these agents a cakewalk.

Wrestling LLM Responses

I’ve previously written an article about prompt engineering, which still proves to be our most powerful concept as the end-users of the models. Tool use is a supercharged version of prompt engineering, which allows us to give the models a way to do more than just generate text in the end.

For example, we could give the model the ability to search Wikipedia, look up customer information for a support request, or send an email — the sky is truly the limit, other than your programming ability of course. Combined with tool use, we can also get the LLMs to generate structured output, allowing us to reliably provide a formatted response.

Without these tools, the model’s response can vary wildly, or be heavily influenced by the context provided. This often distracts the model from the requested format, or, depending on the context, can produce erroneous results. The random seed the model uses, as well as its temperature (willingness to generate more varied responses), can be controlled; however, this is far from perfect.

Creating the Solution

To manage the dependencies for the project, I’ll be using Poetry, which we can initialize like so:

Poetry will create all of the boilerplate we need to get started, so the next step is to define any additional dependencies we have. Let’s go ahead and add those now:

Ollama

I’ll be using Ollama to handle communicating with the model, however, Phidata supports numerous LLM integrations, so you could swap out Ollama for whichever works best for you. To get Ollama set up, it only takes a few steps:

Other than Meta’s Llama 3, I’ve had great success with Mistral’s 7B model and Microsoft’s Wizard LM2 when using tools. As more modern models are released, tool use will likely become better supported.

Creating the Assistant

Phidata lets us structure and format the LLM’s response using Pydantic objects, giving us a reliable method to extract information from the response in a programmatic fashion. For example, if we wanted to create an assistant that only answered math questions:

This is incredibly useful for instances where you have complex responses from the model. If you take a look at the prompt it generated, we can see how it gets the model to play nice:

Through prompt engineering, it is massaging their response into exactly what we would need, with or without the fields we would require. For example, if we asked a question without an apparent answer:

Based on my previous experience with Phidata in a few projects, it’s vital to give the model every possible option as it can trigger an error. In the math example above, if you did not tell Pydantic that the answer key could be None as well, it will provide a verbose answer in addition to the context, versus just returning None:

Assistant Tool Use

Much like ourselves, giving the LLM tools to perform actions makes it more efficient, accurate, and useful in the long run. Phidata comes with a bunch of awesome tools built-in, but we can also create our own tools giving it access to databases, APIs, or even local binaries if we desire.

Let’s give the Assistant access to the internal API for my house, so it can tell us the temperature in a few locations around the house:

Phidata does all of the heavy lifting for us, by parsing the response from the model, calling the correct function, and finally returning the response. I’ve included a mock feature so you can test it out without having an API of your own.

API Creation

To interact with our assistant, we’ll use FastAPI to create a light REST API to handle incoming requests and run the assistant code for us. Another option would be to use a queue system, however, for our use case, this should work fine since it is low traffic.

First, let's install the dependencies we’ll need for the API:

Then, we can define our base application:

I’m setting up Logfire here, which is optional, but it increases our visibility greatly, and we don’t have to spelunk through a mountain of logs as well! Most of the libraries used in this project already have integrations with Logfire, allowing us to truly extract as much information as possible in the fewest lines of code.

Testing

To run the server, we can use the fastapi utility that gets linked after we install the library:

By default, FastAPI uses port 8000 so we’ll use that to send a test prompt:

Logfire

If you enabled Logfire, we can follow the chain of actions and see the arguments and values for each step:

Source: Image by the author

The timing chart to the right is also great for understanding where a request might be getting stuck so further investigation can be done. Also, since I plan to eventually try this with a physical device, being able to go back and investigate a weird response is a lifesaver.

Next Steps

The only part missing now is the actual hardware — so my next project is to take an extra ESP32 I have lying around, and see how much work it’ll be to do speech-to-text conversion as well as give our helpful assistant a voice.

If you would like the code in the finished format, check out the code repository linked below for the full example.

Resources

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓