TTS
How to Add New TTS Support?
"TTS installation is always the most difficult step in adding TTS support. The second most difficult is getting others to install this TTS as well."
The steps to add TTS are relatively fixed and do not require a deep understanding of the Open-LLM-VTuber project code, so it is relatively suitable for newcomers to familiarize themselves with the project (Pull Requests are also easier to merge).
0. Overview
0.1 Requirements
- Latest version of Open-LLM-VTuber. Please switch to dev branch. (The target of your pull request should also be the dev branch)
- Ability to write Python (though you'll know if you can as you follow the steps below. You can learn as you go)
- Basic understanding of GitHub open source workflow. Specifically, understanding the concepts of fork and Pull Request.
- Understanding of how to use uv to manage Python project dependencies. (If you don't know uv, learn here)
- A little knowledge of Pydantic (or... if the AI knows it...)
0.2 Step Overview
This project directory follows the src structure, with all backend code stored in the src/open_llm_vtuber
directory. The tts
and config_manager
mentioned later are both in the src/open_llm_vtuber
directory.
Frontend code references, configuration files, a very small amount of code and other non-code files are stored in the project root directory.
Development
- Add new TTS implementation to the
tts
directory - Add code to initialize your implemented tts class in the
tts_factory
factory class. - Add TTS configuration items in both
conf.default.yaml
andconf.ZH.default.yaml
files - Add the configuration related to your tts to the configuration file data validation code in the
config_manager/tts.py
file.
Testing
- Switch to your tts implementation and test it, ensuring it works normally with empty input and strange punctuation.
- Organize the code and submit a pull request
Documentation
- Please go to User Guide/Backend User Guide/Text-to-Speech (TTS), edit the source code of the TTS documentation and add installation guide for the TTS you implemented.
- If you encounter difficulties at this step, you can contact me directly, and I can help you add the documentation to the documentation website (but you need to write the installation guide yourself!)
1. Development:
2.1 Add new TTS implementation to the tts
directory
Go to the src/open_llm_vtuber/tts
directory.
There are several important files in the directory:
tts_interface.py
- This is the interface class for the tts class. All tts must inherit from this interface.
tts_factory.py
- The factory class for tts. The backend will use configuration items to call this factory class to initialize tts and get tts instances.
The remaining files are implementations of tts supported by the project. I recommend referring to the implementation of edge_tts.py
, which is also the default tts for the project.
Please refer to tts_interface.py
and other tts implementations to add new tts.
You can choose to implement either async_generate_audio
or generate_audio
. The project backend is mainly asynchronous code. In tts_interface.py
, if you choose to implement the synchronous generate_audio
function, the async_generate_audio
function will wrap the generate_audio
function into an asynchronous function.
The generate_audio
function will not be called externally, so if your tts supports asynchronous operations, you can only implement the async_generate_audio
function.
Input Values
You will most likely want to get some parameters from the configuration file when initializing this tts class to customize some options like voice and language. Please write these configurable parameters as parameters of the constructor.
2.2 Add code to initialize your implemented tts class in the tts_factory
factory class.
Go to the tts_factory.py
file and add initialization logic for the new tts in the get_tts_engine
function.
The get_tts_engine
function will be called by external logic and return an instance of tts.
engine_type
indicates what the user wrote in thetts_model
option inconf.yaml
, which is the choice of tts.kwargs
is a dictionary containing the configuration undertts_config
for thistts_model
inconf.yaml
.
For example:
tts_config:
tts_model: "edge_tts"
# text to speech model options:
# "azure_tts", "pyttsx3_tts", "edge_tts", "bark_tts",
# "cosyvoice_tts", "melo_tts", "coqui_tts",
# "fish_api_tts", "x_tts", "gpt_sovits_tts", "sherpa_onnx_tts"
azure_tts:
api_key: "azure-api-key"
region: "eastus"
voice: "en-US-AshleyNeural"
pitch: "26" # percentage of the pitch adjustment
rate: "1" # rate of speak
bark_tts:
voice: "v2/en_speaker_1"
edge_tts:
# Check out doc at https://github.com/rany2/edge-tts
# Use `edge-tts --list-voices` to list all available voices
voice: "en-US-AvaMultilingualNeural" # "en-US-AvaMultilingualNeural" #"zh-CN-XiaoxiaoNeural" # "ja-JP-NanamiNeural"
When tts_model
is set to edge_tts
:
- The input value of
engine_type
will beedge_tts
. kwargs
will be a dictionary like this:
{"voice": "en-US-AvaMultilingualNeural"}
2.3 Add TTS Configuration Items in conf.default.yaml
and conf.ZH.default.yaml
Files
After version v1.1.0
, the project's configuration file (conf.yaml
) is generated from the preset configuration files conf.default.yaml
and conf.ZH.default.yaml
. To add configuration items, we need to edit the conf.default.yaml
file and also add them to the conf.ZH.yaml
file.
If you don't speak Chinese, leave the conf.ZH.default.yaml
to me.
conf.ZH.default.yaml
is a configuration file provided for Chinese users, containing Chinese comments and some preset options. If the user's system language is Chinese, the project's configuration file initialization and updates will be sourced fromconf.ZH.default.yaml
instead ofconf.default.yaml
.
You can add new TTS-required configuration items in the tts_config
section of both preset configuration files.
2.4 Add the configuration related to your tts to the configuration file data validation code in the config_manager/tts.py
file.
Since version v1.0.0
, the project has introduced Pydantic-based configuration file data validation, so if we want to add configurable options, we need to modify the corresponding Pydantic model.
Go to the src/open_llm_vtuber/config_manager/tts.py
file. Refer to other tts implementations, add configuration classes in tts.py, and add tts to TTSConfig
at the bottom.
3. Testing and Submitting Code
Test the tts you added, confirm that the project can run normally, and all configuration items can be used normally.
Then use the uv run ruff format
command in the command line to format the code.
After that, you can submit a pull request.
Please put default options in the conf.default.yaml
and conf.ZH.yaml
files. Please do not change options that are irrelevant to the thing you tried to do in your pull request in the configuration files.
Please do not push your own API key to the branch. Please check the code you are about to submit before submitting. Don't push weird stuff.
4. Supplementary Documentation
Please go to User Guide/Backend User Guide/Text-to-Speech (TTS), edit the source code of the TTS documentation and add installation guide for the TTS you implemented.
Note that our documentation is available in both Chinese and English. If possible, you can complete one language first and then use AI to translate it into the other language.