1.0.1 Release
This release marks a significant milestone for Open-LLM-VTuber, featuring a complete rewrite of the backend and frontend with over 240+ new commits, along with numerous enhancements and new features. If you were using a version before this, version v1.0.0
is basically a new app.
⚠️ Direct upgrades from older versions are impossible due to architectural changes. Please refer to our new documentation site for installation.
(v1.0.0 had a bug after the release, so let's just ignore that and have the v1.0.1)
✨ Highlights
- Vision Capability: Video chat with the AI.
- Desktop Pet Mode: A new Desktop Pet Mode lets you have your VTuber companion directly on your desktop.
- Brand New Frontend: A completely redesigned frontend built with React, ChakuraUI, and Vite offers a modern user experience. Available as web and desktop apps, located in the Open-LLM-VTuber-Web repository.
- Chat History Management: Implemented a system to store and retrieve conversation history, enabling persistent interactions with your AI.
- New LLM support: Many new (stateless) LLM providers are now supported (and refactored), including Ollama, OpenAI, Gemini, Claude, Mistral, DeepSeek, Zhipu, and llama.cpp.
- DeepSeek R1 Reasoning model support: The reasoning chain will be displayed but not spoken. See your waifu's inner thoughts!
- Major Backend Rewrite: The core of Open-LLM-VTuber has been rebuilt from the ground up, focusing on asynchronous operations, improved memory management, and a more modular architecture.
- Refactored Configuration: The
conf.yaml
file was restructured, andconfig_alts
has been renamed tocharacters
. - TTS Preprocessor: Text inside
asterisks
,brackets
,parentheses
, andangle brackets
will no longer be spoken by the TTS. - Dependency management: Switched to
uv
for dependency management, removed unused dependencies such asrich
,playsound3
, andsounddevice
. - Documentation Site: A comprehensive documentation site is now live at https://open-llm-vtuber.github.io/.
📋 Detailed Changes
🧮 Backend
- Architecture:
- The project structure has been reorganized to use the
src/
directory. - The backend is now fully asynchronous, improving responsiveness.
- CLI mode (
main.py
) has been removed. - The "exit word" has been removed.
- Models are initialized and managed using
ServiceContext
, offering better memory management, particularly when switching characters. - Refactored LLMs into
agent
andstateless_llm
, supporting a wider range of LLMs with a new agent interface:basic_memory_agent
andhume_ai_agent
.
- The project structure has been reorganized to use the
- LLM (Language Model) Enhancements:
- New (and old but refactored) providers: Ollama, OpenAI (and any OpenAI Compatible API), Gemini, Claude, Mistral, DeepSeek, Zhipu, llama.cpp.
temperature
parameter added.- No more tokens will be generated after interruption, improving the responsiveness of voice interruption.
- Ollama models are preloaded at startup, kept in memory for the server's duration, and unloaded at exit.
- Added a
hf_mirror
flag to specify whether to use the Hugging Face mirror source.
- TTS (Text-to-Speech) Enhancements:
- TTS now generates multiple audio segments concurrently and sends them sequentially, reducing latency.
- New interruption logic for smoother transitions.
- Added filters (
asterisks
,brackets
,parentheses
) to prevent unwanted text from being spoken. - Implemented
faster_first_response
feature to prioritize the synthesis and playback of the first sentence fragment, minimizing latency.
- ASR (Automatic Speech Recognition) Enhancements:
- Made Sherpa-onnx ASR with the SenseVoiceSmall int8 model the default for both English and Chinese presets, with automatic model download.
- Added a
provider
option for sherpa-onnx-asr.
- Other Improvements:
- Chat log persistence is used to maintain conversation history.
- All
print
statements are replaced withloguru
for structured logging. - Added a Chinese configuration preset:
conf.CN.yaml
. - Basic AI proactive speaking (experimental).
- Added some checks in the CI/CD process
- Added input/output type system to agents
- Added Tencent Translate in https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/pull/107
🖥️ Frontend
- New frontend built with Electron, React, ChakuraUI, and Vite.
- Multi-Mode in Single Codebase:
- Web Mode: Browser interface
- Window Mode: Desktop window
- Pet Mode: Transparent desktop companion
- Seamless context sharing between Window and Pet modes, allowing for the preservation of settings, history, connections, and model states.
- Enhanced UI Features
- Responsive layout with collapsible sidebar and footer
- Customizable Live2D model interactions: Mouse tracking for eye movement, Click-triggered animations, Drag & resize capabilities.
- Persistent local storage for user preference settings, including background, VAD configuration, Live2D size and interactions, and agent behavior.
- Supports viewing, loading, and deleting conversation history with streaming subtitles.
- (Electron pet-mode) A transparent, always-on-top desktop companion with click-through, non-interactive areas featuring draggable and hideable Live2D and UI, right-click menu controls.
- Camera and screen capturing panel
- Switch characters easily
📖 Documentation
- Rewritten README file.
- New comprehensive documentation with a dedicated website.
🧹 Cleanup
- Removed unused and legacy code, including
TaskQueue.py
,scripts/install_piper_tts.py
,model_manager_old.py
,service_context_old.py
,main.py
,asr_with_vad
,vad
,start_cli
,fake_llm
,MemGPT
, thepywhispercpp
submodule, and CoreML script. - Removed unused dependencies:
rich
,playsound3
,sounddevice
, among others. - Removed configuration options that are no longer relevant:
VOICE_INPUT_ON
,MIC_IN_BROWSER
,LIVE2D
,EXTRA_SYSTEM_PROMPT_RAG
,AI_NAME
,USER_NAME
,SAVE_CHAT_HISTORY
,CHAT_HISTORY_DIR
,RAG_ON
,LLMASSIST_RAG_ON
,SAY_SENTENCE_SEPARATELY
,MEMORY_SNAPSHOT
,PRELOAD_MODELS
,tts_on
.
⚠️⚠️⚠️ Critical Upgrade Notice
-
No Direct Upgrades - Previous installations are incompatible
-
Fresh Install Required - Follow new documentation
-
Config Changes - Back up existing configurations before migration
Why the Hassle? 💡
- UV dependency manager replaces legacy systems
- Complete configuration schema overhaul
Please check out the new documentation to install Open-LLM-VTuber again. Fortunately, thanks to uv,
there should be fewer headaches during installation.
🎉 Contributors
- @t41372, which is me
- @ylxmf2005, the creator of the new frontend, implemented LLM vision capability, chat history management, TTS concurrency, hume AI agent, better sentence division, a better live2d configuration, countless bug fixes, and more. He also wrote the majority of the documentation and provided countless insights. The version
v1.0.0
was a close collaboration with him and wouldn't have existed without his tremendous contribution. - @Stewitch, who added the hf_mirror option and is currently working on a launcher for this project to streamline the installation and configuration process. It's still a work in progress but will be completed very soon. https://github.com/Stewitch/LiZhen
- @Fluchw, who added Tecent translator and helped us fix the translator bug.
And all the other contributors who worked on this project in previous versions.
Full Changelog: https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/compare/v0.5.2...v1.0.0
Faster download links for Chinese users 给内地用户准备的(相对)快速的下载链接
Open-LLM-VTuber-v1.0.3.zip (包含 sherpa onnx asr 的 sense-voice 模型,就不用再从github上拉取了)
open-llm-vtuber-electron-1.0.0-frontend.exe (桌面版前端,Windows)
open-llm-vtuber-electron-1.0.0-frontend.dmg (桌面版前端,macOS)