1.2.0 Release
This is a substantial update, packed with major features including Letta-based long-term memory, MCP support, Live2D Cubism 5 support, Chinese support for the frontend, an improved update system, a Bilibili Danmaku client, and numerous bug fixes.
First, we'd like to apologize for the extended release cycle. We will do our best to avoid such long intervals between updates in the future.
Additionally, please note a licensing change for the project's frontend (the Open-LLM-VTuber-Web
repository, which powers the built-in web and Electron clients). Effective with this release (v1.2.0), the frontend will transition from unspecified license (all rights reserved) to the Open-LLM-VTuber License 1.0
.
The backend will remain under the MIT License for v1.2.0 but is expected to be unified under the Open-LLM-VTuber License 1.0
around v1.3 or v1.4. We are still discussing the specifics and will provide a clear announcement in the GitHub Release when the change occurs. Please be aware that Live2D models have their own licenses, which you should check separately.
⚠️ Notice: Potential Breaking Changes
In this version, we have refactored the Live2D implementation to add support for Live2D 5.0 models and fix display issues with many existing models. As part of this change, support for Live2D 2.1 models has been removed. While this should increase compatibility with modern models, if you encounter any issues with your Live2D model not displaying after updating, please let us know and consider rolling back to the previous version.
✨ Highlights
- (MCP) The AI can now call tools that support the Model-Context Protocol (MCP). Built-in support is included for time and ddg-search. The frontend now displays the status of tool calls. (See the Appendix for a demo).
- (MCP) Added support for BrowserBase's Browser Use MCP with a Live View in the frontend.
- (Live2D) The frontend Live2D SDK has been migrated from
pixi-live2d-display-lipsync
to the official Live2D Web SDK. This adds support for Cubism 5 but removes support for Cubism 2. Models now have improved feedback on click interactions. - The default Live2D model has been changed to
mao_pro
, as the expressions for theshizuku
model were removed by the official creators in the Live2D 5 version. - (Frontend) Added Chinese language support.
- Implemented an interface for live streaming platforms and added a client for receiving Bilibili Danmaku (live comments).
- (Memory) Implemented Letta-based long-term memory.
- (LLM) Added support for LM Studio.
- (TTS) Added support for OpenAI-Compatible TTS, SparkTTS, and SiliconFlow TTS.
- Added a
requirements.txt
file for users who are not familiar withpip
commands or prefer not to useuv
. - Numerous bug fixes.
- Updated the documentation, which now includes an "Ask AI" feature.
Detailed Changes Since v1.1.0:
Backend:
- Changed some preset options in the configuration file:
llm_provider
->ollama_llm
. - Set
project_id
andorganization_id
inconf.yaml
tonull
by default to prevent API errors. - Azure ASR: Added a list for detected languages and fixed several bugs.
- Fixed bugs related to configuration file updates (2bc0c1b5f75ea79f563935b03a2267e6584d9bc @ylxmf2005).
- To allow Windows users to confidently use backslashes for file paths, all double quotes in the configuration file have been changed to single quotes (758d0b304bfa9d2c561987e9d3edac74857309c7).
- Fixed Claude's vision capabilities. It seems this was never working correctly—did no one notice until now?
- Information about Live2D models can now be fetched from the
GET /live2d-models/info
route. - When using the update script, the frontend (linked via git submodule) will now be updated as well.
- Fixed #150: The
temperature
parameter was not passed during the initialization of OpenAI-Compatible LLMs. - Fixed #141: A dependency issue on Intel Macs.
- Implemented a live streaming platform interface and a Bilibili Danmaku client based on blivedm (fea16ace015851656e6c044961758c69247ce69e), #142 @Fluchw, @ylxmf2005.
- Merged #161, adding the
StatelessLLMWithTemplate
class. Thanks, @aaronchantrill! - Added OpenAI-Compatible TTS #178. Thanks, @fastfading!
- Implemented Letta-based long-term memory #179. Thanks, @rayburstray! See the Letta Agent docs.
- Added LM Studio LLM support (b971867b231dac5f3e9e14a28e6c4124fa592a72).
- Added
requirements.txt
and documentation for installing withpip
andconda
in the Quick Start guide (044e5ba9aaab9de8fae440f54e6667c63ab89b85). - Added Spark TTS #182 (@Because66666), SiliconFlow TTS #208 (@endtower), and MiniMax TTS #214 (@Y0oMu).
- Fixed an issue where FunASR could not run offline (Issue #7, fixed in #214 by @Y0oMu).
- Added prompt configuration for whisper, fast-whisper, and whisper.cpp #214 @Y0oMu.
- Fixed [#159]: Resolved an error caused by empty chunks returned from third-party OpenAI-compatible APIs #184 @872226263.
- ✨ Feature Enhancement: Implemented MCP Plus #185 @Stewitch @ylxmf2005.
- Fixed bugs related to AI group chat (4da3c82e6388604dc0817927a7f07796ef524785 @ylxmf2005).
- Fixed a bug that could cause garbled text in
conf.yaml
when merging configurations (67e1622891e264cc71b6da71533a3be188a09692). - Added a DuckDuckGo-based web search MCP tool (3904419fb9f0b67e5f22027e183741cc0f1719dc @ylxmf2005).
- Fixed a bug where auto language detection could not be selected for faster-whisper (#188).
- Added a configurable prompt in
conf.yaml
for the AI's proactive speech (Issue #190 @ylxmf2005). - Added a status bar for MCP function calls (51adb61895f1e5040e238fa1c97acdeefe9e2690 @ylxmf2005).
- Added an optional prompt for speaking style (0a76ac69b04d288c102ec52423d927a4ab9a246d @ylxmf2005).
- Implemented browser control capabilities via Stagehand: The AI can now operate a web browser (1dc2055d74d342202d4a54ea96109d3cfaa7bee7 @ylxmf2005).
- Implemented a backend check to automatically pull the
frontend
submodule, preventing issues where the frontend code is missing.
Frontend (@ylxmf2005):
- Adopted a mode-management system and added a button in the Window mode UI to switch modes directly.
- Added support for playing "Talk" motion groups when the model is speaking (to create a swaying effect). A guide on how to use this will be available in v1.3.
- Migrated the Live2D SDK from
pixi-live2d-display-lipsync
to the official Live2D Web SDK (supports Cubism 5, drops Cubism 2). Note: This was developed using a beta version of the SDK from another project and does not yet support motionsync. - Added i18n support for Chinese.
- Refactored VAD dependency static files from being loaded via CDN to being referenced from the local build output (#5 @East333, #7 @charliedcc).
- Fixed a "404 Not Found" bug for an invalid CSS link (#2 @East333).
- Added a "Click-through" toggle switch for Desktop Pet mode.
- Removed the "follow mouse" feature. The "Pointer Interactive" setting now only controls whether clicks trigger actions, which must be configured in
model_dict
. - Fixed a bug where the fallback avatar failed to display in the history area.
- Fixed a bug with abnormal mouse click-through behavior in Desktop Pet mode.
- Added a status display for when the AI is using tools.
- Added a Browser Live View Panel based on BrowserBase.
- Fixed a conflict between expression display and blinking (Issue #105).
- Updated VAD to the latest version and used the new
onSpeechRealStart
event to prevent misfires that could interrupt the AI's response (Commit 445dc86). - Added settings to limit the size and dimensions of images sent to the backend (Issue #209).
⚠️ there are way too many pull requests and contributions. If I happen to miss anyone, please let me know.
What's Next: A Look at v1.3-v1.4
- Streaming TTS: We plan to add streaming support for major TTS models, which will significantly reduce response latency.
- Hume AI Changes: The Hume AI Agent will be removed and replaced with an option for Hume AI API TTS (the official TTS API was released recently). Hume AI's emotion control and naturalness are the best I've seen (though it's also the priciest at $200/1M characters vs. Fish Audio at $15/1M).
- Natural Motion: We will provide examples and tutorials for achieving natural, neuro-sama-like idle swaying motions.
motionMap
Feature: Similar toemotionMap
, this will allow the model to perform actions while speaking.- One-Click Character Import.
- MCP Bridge Support: We'll add a demo for a decoupled MCP setup, where the MCP Server & Client run on the user's machine. The main server will provide a ready-to-use bridge to push MCP commands via WebSocket and receive results.
- Character Status Panel: A new UI area to display and manage the character's state (e.g., mood, affinity, current thoughts, what they are doing). This highly customizable state will influence the character's behavior. The "thinking" tag will likely be moved here (planned for v1.4).
Upcoming License Change Notice (v1.3-v1.4)
As the project grows, we plan to adjust our licensing model to better support its long-term sustainability.
Starting from a future version (the exact version will be clearly announced, likely around v1.3.0), the Open-LLM-VTuber project will adopt a modified Apache 2.0 license with the following terms:
- Unified License: The entire project (both frontend and backend) will be under a single, modified Apache 2.0 license.
- Clear Usage Scope: The new license will clarify permitted uses and commercial activities that require a separate license.
How does this affect you?
For most users, including streamers, educators, and researchers, there is no impact.
The software is licensed under Apache 2.0 with the following additional terms:
✅ Uses that DO NOT require a separate license:
- All non-commercial purposes (e.g., personal projects, education, academic research, non-profit activities).
- Using the software for VTuber streaming and video creation (e.g., on YouTube, Twitch, Bilibili).
❌ Uses that DO require a commercial license:
- Providing paid access, subscriptions, or hosting services (including offering the software as a SaaS, paid download, or online service).
- Redistributing, reselling, rebranding, or repackaging the software for commercial purposes.
- Integrating the software into a commercial product that is sold or licensed for a fee (including both software and hardware).
For full details, please refer to the LICENSE
file in the frontend repository and the specific release notes when the backend license is updated.
Why are we planning this change?
The primary reasons for this adjustment are:
- Our frontend previously lacked a specific license, which led to instances of our software being repackaged, rebranded, and deployed commercially without attribution.
- We may develop a SaaS offering in the future. We want to protect the software we've invested significant effort in from being directly copied into a competing product.
Please note: Even if we launch a SaaS, we have no plans to close-source the core Open-LLM-VTuber project, nor do we intend to change its ability to run completely offline and locally. We deeply value the trust we have built within the open-source community. Even if we were to close-source it one day, you would still be able to use older, open-licensed versions.
An open-source license is an agreement that binds both users and developers. I can guarantee we will not delete the repository, barring unforeseen circumstances (and even then, GitHub's fork mechanism makes deletion largely symbolic).
The core purpose of exploring a SaaS model is to make the project sustainable and to better realize our vision for AI companionship. There may come a day when we, the core developers, no longer have the time and energy to maintain Open-LLM-VTuber. Or perhaps a better, more advanced solution will emerge from the community, and our project will be consigned to the annals of history. But I hope that day is far off.
Regarding project sustainability, I've considered two paths. One is the SaaS model mentioned above. The other is to better enable contributors to participate in our development, improving our efficiency and developer retention. I will be making progress on this front after the v1.2 release.
The decision to change the license stems from observing multiple incidents of open-source misuse and license violations across the community. After seeing these events, we've come to feel that the MIT license may not align with our ideal expectations. A license should reflect the core developers' intent for how their code is used, serving as a protection and a set of boundaries for both developers and users. Due to my own oversight in the beginning, I chose a license without fully considering the project's potential scale (I just picked MIT without much thought, never imagining the project would get this big)
. To ensure our contributors can continue to code without worry and that our users understand the intended boundaries of use, we have decided to amend our license.
In fact, our React-based frontend (since v1.0.0) has never had a specified license. According to GitHub's documentation, if a repository has no license, we retain full copyright (which is effectively closed-source). We want to clarify our licensing terms moving forward.
During this process, we considered various options and looked at the approaches taken by other open-source projects like Dify and LobeHub, striving to avoid negative impacts on our regular users and open-source contributors.
Which files should I get?
For Existing Open-LLM-VTuber Users (v1.0.0 or newer)
- Run
uv run upgrade.py
to update to the latest version. - Download the new Electron app from the assets below.
For New Users or Versions Below v1.0.0
Please refer to the new deployment documentation for installation instructions.
Download Files
If you are here because you read the documentation, download the zip file and the Electron app below. Download both of these files:
- The Electron app for your OS.
- The language-specific ZIP file:
- English:
Open-LLM-VTuber-v1.2.0-en.zip
- Chinese:
Open-LLM-VTuber-v1.2.0-zh.zip
- English:
Note: The ZIP files are identical except for the language of the configuration file. Both packages include the SenseVoiceSmall model file to ensure accessibility for users in Mainland China.