đī¸ Backend Architecture Overview
Core Component Interaction Flow
đī¸ Configuration file
This is the basic structure of the configuration file as of v1.0.0.
đī¸ Speech Recognition (ASR)
Speech Recognition (ASR, Automatic Speech Recognition) converts user speech to text. This project supports multiple speech recognition model implementations.
đī¸ Language Models (LLM)
This project supports multiple large language model backends and models.
đī¸ Agent
An Agent is an LLM system that includes memory, tools, and personality. The default option in the current version is basicmemoryagent.
đī¸ Speech Synthesis (TTS)
After installing the required dependencies and configuring conf.yaml, enable the corresponding speech synthesis engine by modifying the TTS_MODEL option in conf.yaml.
đī¸ Translation
Translation Feature
đī¸ Docker Deployment
Due to significant refactoring, the Docker image has not yet been updated to version v1.0.0, but it will be updated soon.
đī¸ Character Settings & Prompts
The Open-LLM-VTuber project allows you to modify the character's personality prompt and supports adding multiple different characters that can be switched in the frontend.
đī¸ Remote Deployment, Non-Local Deployment, Mobile
If you are: