1.0.1 Release

February 4, 2025 · 6 min read

yey

qwq

This release marks a significant milestone for Open-LLM-VTuber, featuring a complete rewrite of the backend and frontend with over 240+ new commits, along with numerous enhancements and new features. If you were using a version before this, version v1.0.0 is basically a new app.

⚠️ Direct upgrades from older versions are impossible due to architectural changes. Please refer to our new documentation site for installation.

(v1.0.0 had a bug after the release, so let's just ignore that and have the v1.0.1)

✨ Highlights

Vision Capability: Video chat with the AI.
Desktop Pet Mode: A new Desktop Pet Mode lets you have your VTuber companion directly on your desktop.
Brand New Frontend: A completely redesigned frontend built with React, ChakuraUI, and Vite offers a modern user experience. Available as web and desktop apps, located in the Open-LLM-VTuber-Web repository.
Chat History Management: Implemented a system to store and retrieve conversation history, enabling persistent interactions with your AI.
New LLM support: Many new (stateless) LLM providers are now supported (and refactored), including Ollama, OpenAI, Gemini, Claude, Mistral, DeepSeek, Zhipu, and llama.cpp.
DeepSeek R1 Reasoning model support: The reasoning chain will be displayed but not spoken. See your waifu's inner thoughts!
Major Backend Rewrite: The core of Open-LLM-VTuber has been rebuilt from the ground up, focusing on asynchronous operations, improved memory management, and a more modular architecture.
Refactored Configuration: The conf.yaml file was restructured, and config_alts has been renamed to characters.
TTS Preprocessor: Text inside asterisks, brackets, parentheses, and angle brackets will no longer be spoken by the TTS.
Dependency management: Switched to uv for dependency management, removed unused dependencies such as rich, playsound3, and sounddevice.
Documentation Site: A comprehensive documentation site is now live at https://open-llm-vtuber.github.io/.

📋 Detailed Changes

🧮 Backend

Architecture:
- The project structure has been reorganized to use the src/ directory.
- The backend is now fully asynchronous, improving responsiveness.
- CLI mode (main.py) has been removed.
- The "exit word" has been removed.
- Models are initialized and managed using ServiceContext, offering better memory management, particularly when switching characters.
- Refactored LLMs into agent and stateless_llm, supporting a wider range of LLMs with a new agent interface: basic_memory_agent and hume_ai_agent.
LLM (Language Model) Enhancements:
- New (and old but refactored) providers: Ollama, OpenAI (and any OpenAI Compatible API), Gemini, Claude, Mistral, DeepSeek, Zhipu, llama.cpp.
- temperature parameter added.
- No more tokens will be generated after interruption, improving the responsiveness of voice interruption.
- Ollama models are preloaded at startup, kept in memory for the server's duration, and unloaded at exit.
- Added a hf_mirror flag to specify whether to use the Hugging Face mirror source.
TTS (Text-to-Speech) Enhancements:
- TTS now generates multiple audio segments concurrently and sends them sequentially, reducing latency.
- New interruption logic for smoother transitions.
- Added filters (asterisks, brackets, parentheses) to prevent unwanted text from being spoken.
- Implemented faster_first_response feature to prioritize the synthesis and playback of the first sentence fragment, minimizing latency.
ASR (Automatic Speech Recognition) Enhancements:
- Made Sherpa-onnx ASR with the SenseVoiceSmall int8 model the default for both English and Chinese presets, with automatic model download.
- Added a provider option for sherpa-onnx-asr.
Other Improvements:
- Chat log persistence is used to maintain conversation history.
- All print statements are replaced with loguru for structured logging.
- Added a Chinese configuration preset: conf.CN.yaml.
- Basic AI proactive speaking (experimental).
- Added some checks in the CI/CD process
- Added input/output type system to agents
- Added Tencent Translate in https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/pull/107

🖥️ Frontend

New frontend built with Electron, React, ChakuraUI, and Vite.
Multi-Mode in Single Codebase:
- Web Mode: Browser interface
- Window Mode: Desktop window
- Pet Mode: Transparent desktop companion
- Seamless context sharing between Window and Pet modes, allowing for the preservation of settings, history, connections, and model states.
Enhanced UI Features
- Responsive layout with collapsible sidebar and footer
- Customizable Live2D model interactions: Mouse tracking for eye movement, Click-triggered animations, Drag & resize capabilities.
- Persistent local storage for user preference settings, including background, VAD configuration, Live2D size and interactions, and agent behavior.
- Supports viewing, loading, and deleting conversation history with streaming subtitles.
- (Electron pet-mode) A transparent, always-on-top desktop companion with click-through, non-interactive areas featuring draggable and hideable Live2D and UI, right-click menu controls.
- Camera and screen capturing panel
- Switch characters easily

📖 Documentation

Rewritten README file.
New comprehensive documentation with a dedicated website.

🧹 Cleanup

Removed unused and legacy code, including TaskQueue.py, scripts/install_piper_tts.py, model_manager_old.py, service_context_old.py, main.py, asr_with_vad, vad, start_cli, fake_llm, MemGPT, the pywhispercpp submodule, and CoreML script.
Removed unused dependencies: rich, playsound3, sounddevice, among others.
Removed configuration options that are no longer relevant: VOICE_INPUT_ON, MIC_IN_BROWSER, LIVE2D, EXTRA_SYSTEM_PROMPT_RAG, AI_NAME, USER_NAME, SAVE_CHAT_HISTORY, CHAT_HISTORY_DIR, RAG_ON, LLMASSIST_RAG_ON, SAY_SENTENCE_SEPARATELY, MEMORY_SNAPSHOT, PRELOAD_MODELS, tts_on.

⚠️⚠️⚠️ Critical Upgrade Notice

No Direct Upgrades - Previous installations are incompatible
Fresh Install Required - Follow new documentation
Config Changes - Back up existing configurations before migration

Why the Hassle? 💡

UV dependency manager replaces legacy systems
Complete configuration schema overhaul

Please check out the new documentation to install Open-LLM-VTuber again. Fortunately, thanks to uv, there should be fewer headaches during installation.

🎉 Contributors

@t41372, which is me
@ylxmf2005, the creator of the new frontend, implemented LLM vision capability, chat history management, TTS concurrency, hume AI agent, better sentence division, a better live2d configuration, countless bug fixes, and more. He also wrote the majority of the documentation and provided countless insights. The version v1.0.0 was a close collaboration with him and wouldn't have existed without his tremendous contribution.
@Stewitch, who added the hf_mirror option and is currently working on a launcher for this project to streamline the installation and configuration process. It's still a work in progress but will be completed very soon. https://github.com/Stewitch/LiZhen
@Fluchw, who added Tecent translator and helped us fix the translator bug.

And all the other contributors who worked on this project in previous versions.

Full Changelog: https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/compare/v0.5.2...v1.0.0

Faster download links for Chinese users 给内地用户准备的(相对)快速的下载链接

Open-LLM-VTuber-v1.0.3.zip (包含 sherpa onnx asr 的 sense-voice 模型，就不用再从github上拉取了)

https://pub-17317087be374bc68161ac63de2022a5.r2.dev/v1.0.3/Open-LLM-VTuber-v1.0.3.zip

open-llm-vtuber-electron-1.0.0-frontend.exe (桌面版前端，Windows)

https://pub-17317087be374bc68161ac63de2022a5.r2.dev/v1.0.3/open-llm-vtuber-electron-1.0.0-setup.exe

open-llm-vtuber-electron-1.0.0-frontend.dmg (桌面版前端，macOS)

https://pub-17317087be374bc68161ac63de2022a5.r2.dev/v1.0.3/open-llm-vtuber-electron-1.0.0.dmg

✨ Highlights​

📋 Detailed Changes​

🧮 Backend​

🖥️ Frontend​

📖 Documentation​

🧹 Cleanup​

⚠️⚠️⚠️ Critical Upgrade Notice​

Why the Hassle? 💡​

🎉 Contributors​

Faster download links for Chinese users 给内地用户准备的(相对)快速的下载链接​