跳到主要内容

1.0.1 版本发布

· 阅读需 6 分钟

This release marks a significant milestone for Open-LLM-VTuber, featuring a complete rewrite of the backend and frontend with over 240+ new commits, along with numerous enhancements and new features. If you were using a version before this, version v1.0.0 is basically a new app.

⚠️ Direct upgrades from older versions are impossible due to architectural changes. Please refer to our new documentation site for installation.

(v1.0.0 had a bug after the release, so let's just ignore that and have the v1.0.1)

i4_pet_desktopi1
i3i2
i4i3_browser_world_fun

✨ Highlights

  • Vision Capability: Video chat with the AI.
  • Desktop Pet Mode: A new Desktop Pet Mode lets you have your VTuber companion directly on your desktop.
  • Brand New Frontend: A completely redesigned frontend built with React, ChakuraUI, and Vite offers a modern user experience. Available as web and desktop apps, located in the Open-LLM-VTuber-Web repository.
  • Chat History Management: Implemented a system to store and retrieve conversation history, enabling persistent interactions with your AI.
  • New LLM support: Many new (stateless) LLM providers are now supported (and refactored), including Ollama, OpenAI, Gemini, Claude, Mistral, DeepSeek, Zhipu, and llama.cpp.
  • DeepSeek R1 Reasoning model support: The reasoning chain will be displayed but not spoken. See your waifu's inner thoughts!
  • Major Backend Rewrite: The core of Open-LLM-VTuber has been rebuilt from the ground up, focusing on asynchronous operations, improved memory management, and a more modular architecture.
  • Refactored Configuration: The conf.yaml file was restructured, and config_alts has been renamed to characters.
  • TTS Preprocessor: Text inside asterisks, brackets, parentheses, and angle brackets will no longer be spoken by the TTS.
  • Dependency management: Switched to uv for dependency management, removed unused dependencies such as rich, playsound3, and sounddevice.
  • Documentation Site: A comprehensive documentation site is now live at https://open-llm-vtuber.github.io/.

📋 Detailed Changes

🧮 Backend

  • Architecture:
    • The project structure has been reorganized to use the src/ directory.
    • The backend is now fully asynchronous, improving responsiveness.
    • CLI mode (main.py) has been removed.
    • The "exit word" has been removed.
    • Models are initialized and managed using ServiceContext, offering better memory management, particularly when switching characters.
    • Refactored LLMs into agent and stateless_llm, supporting a wider range of LLMs with a new agent interface: basic_memory_agent and hume_ai_agent.
  • LLM (Language Model) Enhancements:
    • New (and old but refactored) providers: Ollama, OpenAI (and any OpenAI Compatible API), Gemini, Claude, Mistral, DeepSeek, Zhipu, llama.cpp.
    • temperature parameter added.
    • No more tokens will be generated after interruption, improving the responsiveness of voice interruption.
    • Ollama models are preloaded at startup, kept in memory for the server's duration, and unloaded at exit.
    • Added a hf_mirror flag to specify whether to use the Hugging Face mirror source.
  • TTS (Text-to-Speech) Enhancements:
    • TTS now generates multiple audio segments concurrently and sends them sequentially, reducing latency.
    • New interruption logic for smoother transitions.
    • Added filters (asterisks, brackets, parentheses) to prevent unwanted text from being spoken.
    • Implemented faster_first_response feature to prioritize the synthesis and playback of the first sentence fragment, minimizing latency.
  • ASR (Automatic Speech Recognition) Enhancements:
    • Made Sherpa-onnx ASR with the SenseVoiceSmall int8 model the default for both English and Chinese presets, with automatic model download.
    • Added a provider option for sherpa-onnx-asr.
  • Other Improvements:
    • Chat log persistence is used to maintain conversation history.
    • All print statements are replaced with loguru for structured logging.
    • Added a Chinese configuration preset: conf.CN.yaml.
    • Basic AI proactive speaking (experimental).
    • Added some checks in the CI/CD process
    • Added input/output type system to agents
    • Added Tencent Translate in https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/pull/107

🖥️ Frontend

  • New frontend built with Electron, React, ChakuraUI, and Vite.
  • Multi-Mode in Single Codebase:
    • Web Mode: Browser interface
    • Window Mode: Desktop window
    • Pet Mode: Transparent desktop companion
    • Seamless context sharing between Window and Pet modes, allowing for the preservation of settings, history, connections, and model states.
  • Enhanced UI Features
    • Responsive layout with collapsible sidebar and footer
    • Customizable Live2D model interactions: Mouse tracking for eye movement, Click-triggered animations, Drag & resize capabilities.
    • Persistent local storage for user preference settings, including background, VAD configuration, Live2D size and interactions, and agent behavior.
    • Supports viewing, loading, and deleting conversation history with streaming subtitles.
    • (Electron pet-mode) A transparent, always-on-top desktop companion with click-through, non-interactive areas featuring draggable and hideable Live2D and UI, right-click menu controls.
    • Camera and screen capturing panel
    • Switch characters easily

📖 Documentation

  • Rewritten README file.
  • New comprehensive documentation with a dedicated website.

🧹 Cleanup

  • Removed unused and legacy code, including TaskQueue.py, scripts/install_piper_tts.py, model_manager_old.py, service_context_old.py, main.py, asr_with_vad, vad, start_cli, fake_llm, MemGPT, the pywhispercpp submodule, and CoreML script.
  • Removed unused dependencies: rich, playsound3, sounddevice, among others.
  • Removed configuration options that are no longer relevant: VOICE_INPUT_ON, MIC_IN_BROWSER, LIVE2D, EXTRA_SYSTEM_PROMPT_RAG, AI_NAME, USER_NAME, SAVE_CHAT_HISTORY, CHAT_HISTORY_DIR, RAG_ON, LLMASSIST_RAG_ON, SAY_SENTENCE_SEPARATELY, MEMORY_SNAPSHOT, PRELOAD_MODELS, tts_on.

⚠️⚠️⚠️ Critical Upgrade Notice

  1. No Direct Upgrades - Previous installations are incompatible

  2. Fresh Install Required - Follow new documentation

  3. Config Changes - Back up existing configurations before migration

Why the Hassle? 💡

  1. UV dependency manager replaces legacy systems
  2. Complete configuration schema overhaul

Please check out the new documentation to install Open-LLM-VTuber again. Fortunately, thanks to uv, there should be fewer headaches during installation.

🎉 Contributors

  • @t41372, which is me
  • @ylxmf2005, the creator of the new frontend, implemented LLM vision capability, chat history management, TTS concurrency, hume AI agent, better sentence division, a better live2d configuration, countless bug fixes, and more. He also wrote the majority of the documentation and provided countless insights. The version v1.0.0 was a close collaboration with him and wouldn't have existed without his tremendous contribution.
  • @Stewitch, who added the hf_mirror option and is currently working on a launcher for this project to streamline the installation and configuration process. It's still a work in progress but will be completed very soon. https://github.com/Stewitch/LiZhen
  • @Fluchw, who added Tecent translator and helped us fix the translator bug.

And all the other contributors who worked on this project in previous versions.

Full Changelog: https://github.com/Open-LLM-VTuber/Open-LLM-VTuber/compare/v0.5.2...v1.0.0

Open-LLM-VTuber-v1.0.3.zip (包含 sherpa onnx asr 的 sense-voice 模型,就不用再从github上拉取了)

open-llm-vtuber-electron-1.0.0-frontend.exe (桌面版前端,Windows)

open-llm-vtuber-electron-1.0.0-frontend.dmg (桌面版前端,macOS)