Dev Log @2026.5.29

It has been a while.

If I am being honest, Utsuwa began early this year as a small dev project for me. I wanted to experiment with VRM, LLMs, local-first storage, and the shape of an AI companion that felt like something you owned rather than something rented from a closed platform.

A lot has changed in the entire industry since then. LLM development has moved incredibly fast, local models have become more capable, multimodal systems are becoming normal, and many of the other projects I am involved in had taken precedence for a bit.

To my surprise, when I checked back in on Utsuwa, over a thousand of you were visiting in a month, and roughly half of those visitors were actually using it. I have also gotten several messages on social media about the project: cool experiments, bug reports, questions, and ideas for where this could go.

That changed how I think about the project.

Utsuwa is moving beyond just an experiment. It is going to be in active development moving forward, and there is a lot of work to do.

What just changed

The latest round of work focused on one of the most important pieces of the project: local LLMs.

Local models are essential to what Utsuwa is trying to be. If this is an open-source AI companion that respects ownership and privacy, then connecting to tools like Ollama and LM Studio cannot feel fragile or confusing.

The app now discovers installed local models directly from Ollama and LM Studio instead of asking you to manually type a model name. That should make setup clearer, especially for people who are new to local AI and just want to know which models are available on their machine.

There is also better troubleshooting around Ollama browser access. If you use Utsuwa from a hosted site, your browser still has to be allowed by Ollama through OLLAMA_ORIGINS. That is how Ollama works, and the app now tries to explain it more clearly instead of leaving you with a vague failed request.

The roadmap

Here is the current development roadmap I am focusing on:

File, image, and video uploads - Many LLMs are multimodal now, and Utsuwa should support richer context than plain text. This also opens the door for providers and tools that can work with files, images, video, and web-aware workflows.
OpenAI-compatible models - A lot of services and local servers expose OpenAI-compatible APIs. Utsuwa should make it easier to point at those endpoints without requiring every compatible provider to be hardcoded.
Multi-provider STT - Voice input currently supports Groq and the browser Web Speech API. I want to add more speech-to-text options so users have more flexibility across platforms and providers.
Live2D support - VRM is still core to the project, but Live2D would open the door for 2D animated companions as an alternative style.
Windows and Linux desktop apps - The desktop app is currently macOS-focused. Windows and Linux builds are important for making the project more accessible.

Where this is going

The north star has not changed: Utsuwa should be an open, local-first AI companion that you can shape, inspect, and own.

What has changed is that it is no longer just a weekend experiment sitting on a shelf. People are using it, people are asking for it to improve, and that makes the work feel worth continuing in a more serious way.

Thanks to everyone who has tried it, reported bugs, sent messages, or just poked around. More updates soon.