Posted

Last Updated:

by

in

Tags:

How to run a GenAI LLM Chatbot locally.

GenAI Chatbots can be helpful assistants. While they can make up facts and answers, they can answer most questions correctly. You do need to cross check their answers. Most of these bots like ChatGPT from OpenAI operate over an internet connection. The free tier will likely use your questions as training for your bots. You need to careful not to divulge to important personal information as the bots may sent that information out for whole world to see. There is good support for running capable chatbots locally using your own computer. Through quantization, it is also possible to run a smaller version of the LLMs locally offline on your desktop computer. By running locally your questions stay private. The drawback is the size of the models you can run is limited by your computer’s memory capacity and compute. For example, the latest 16 GB Apple Mac mini M4 does a good job with models up to 16B parameters. A 70b parameter model is too much of most home computers.

Here’s how to get started.

Step 1 Install Ollama

Ollama runs the AI models. It uses techniques from Docker images to store the different layers of an AI model. Download installer from https://ollama.com

Step 2 Choose one or more AI models

ollama has a repository for AI models that can be run locally. As of this writing, phi4:14b model from Microsoft has a good performance. https://ollama.com/library/phi4

From command prompt, ollama pull <model name>:<size> will download a model. Example:

ollama pull phi4:14b

If a model does not perform well, you can remove it using

ollama rm <model>:<size>

Step 3 Download MSTY Desktop

Msty is a chatbot GUI that let us use ollama as the backend. It can also be used to operate with OpenAI. It is free for personal use. A few features are restricted on the free app. Download the Msty Desktop app to have a user friendly GUI. https://msty.app

Step 4 Optional: Rancher Desktop and Open WebUI extension

Download the Rancher Desktop at https://www.rancher.com/products/rancher-desktop

After installation completes, start the Rancher Desktop. Select Extensions from the right panel. In the list of Extensions, on the right, select Open WebUI. The Open WebUI extension has GUI frontend to ollama. After installation, you can run it by selecting Open WebUI on the left panel. Rancher Desktop is powerful Kubernetes development environment used by application developers but this extension works great as a front end for Ollama. It is developed by the SuSE, a German open source company that sells enterprise linux distribution to companies. The one downside to using Rancher Desktop is on MacOS, it runs a virtual machine. This uses about 5Gb of memory. Msty uses a lot less at 170Mb.

The AI are trained on web data up to a certain cutoff date. One way for these AI models to process new information is through a process call RAG. With Open WebUI, this is simple process. If you need to ask questions about one or more web URLs, start the chat with # followed by the URL. A popup box slightly above the prompt entry box appears. Click on it and the tool will process it. You can then ask questions related to the URL(s). It helps the AI model to begin with “Based on the provided context, …”. This directs the chatbot to use the provided URLs or documents. Avoid using words like current and now in the question. The AI chat bots are trained to avoid answering questions on events that happen after its data cutoff date.

Since most AI models are not perfect, it helps to get answer from multiple models. By cross checking there is better chance at getting a more complete answer to your question. In Open WebUI there is drop down on the top corner that selects the AI Model. Once a model is selected, next to the drop “v” there is “+”. Click on the “+” to add another AI model.