GenAI Chatbots can be helpful assistants. While they can make up facts and answers, they can answer most questions correctly. You do need to cross check their answers. Most of these bots like ChatGPT from OpenAI requires an internet connection. The OpenAI’s free tier uses your questions as training for your bots. You need to careful not to divulge important personal information to these online services as the bots may release that information for whole world to see. To avoid those risks, you can host the chat bot locally using your own computer to generate the response output. Through quantization, it is also possible to run a smaller version of the LLMs locally offline on your desktop computer with modest memory and without a GPU. By running locally your questions stay private. The drawback is the size of the models you can run is limited by your computer’s memory capacity and compute. For example, the latest 16 GB Apple Mac mini M4 does a good job with models up to 16B parameters. A 70b parameter model is too much of most home computers.
Here’s how to get started.
Step 1 Install Ollama
Ollama runs the AI models. It uses techniques from Docker images to store the different layers of an AI model. Download installer from https://ollama.com
Step 2 Choose one or more AI models
ollama has a repository for AI models that can be run locally. As of this writing, phi4:14b model from Microsoft has a good performance. https://ollama.com/library/phi4
From command prompt, run the command ollama pull <model name>:<size>. This will download an AI model. Example below downloads the phi4.14b model.
ollama pull phi4:14b
If a model does not perform well, you can remove it using the command:
ollama rm <model>:<size>
Step 3 Download MSTY Desktop Application
Msty is a chatbot GUI that let us interactive with ollama through a graphical user interface. Besides supporting ollama, it can also be used to with OpenAI through its API. The Msty app is free for personal use. A few features are restricted on the free app. Download from: https://msty.app
Step 4 Optional: Rancher Desktop and Open WebUI extension
Download the Rancher Desktop at https://www.rancher.com/products/rancher-desktop
After installation completes, start the Rancher Desktop. Select Extensions from the right panel. In the list of Extensions, on the right, select Open WebUI. The Open WebUI extension has GUI frontend to ollama. After installation, you can run it by selecting Open WebUI on the left panel. Rancher Desktop is powerful Kubernetes development environment used by application developers but this extension works great as a front end for Ollama. It is developed by the SuSE, a German open source company that sells enterprise linux distribution to companies. The one downside to using Rancher Desktop is on MacOS, it runs a virtual machine. This uses about 5Gb of memory. Msty uses a lot less at 170Mb.
The AI are trained on web data up to a certain cutoff date. One way for these AI models to process new information is through a process call RAG. With Open WebUI, this is simple process. If you need to ask questions about one or more web URLs, start the chat with # followed by the URL. A popup box slightly above the prompt entry box appears. Click on it and the tool will process it. You can then ask questions related to the URL(s). It helps the AI model to begin with “Based on the provided context, …”. This directs the chatbot to use the provided URLs or documents. Avoid using words like current and now in the question. The AI chat bots are trained to avoid answering questions on events that happen after its data cutoff date.
Since most AI models are not perfect, it helps to get answer from multiple models. By cross checking there is better chance at getting a more complete answer to your question. In Open WebUI there is drop down on the top corner that selects the AI Model. Once a model is selected, next to the drop “v” there is “+”. Click on the “+” to add another AI model.