# LiveTalking
**Repository Path**: felord/LiveTalking
## Basic Information
- **Project Name**: LiveTalking
- **Description**: Real time streaming digital human based on nerf
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: https://livetalking-doc.readthedocs.io/
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 53
- **Created**: 2025-12-26
- **Last Updated**: 2025-12-26
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# English | [中文版](./README.md)
A real-time interactive streaming digital human system enabling synchronized audio-video conversation, which basically meets commercial application standards.
[wav2lip Demo](https://www.bilibili.com/video/BV1scwBeyELA/) | [ernerf Demo](https://www.bilibili.com/video/BV1G1421z73r/) | [musetalk Demo](https://www.bilibili.com/video/BV1gm421N7vQ/)
Domestic Mirror Repository:
## News
- Dec 8, 2024: Enhanced multi-concurrency support; GPU memory no longer increases with the number of concurrent streams.
- Dec 21, 2024: Added model preheating for wav2lip and musetalk to resolve stuttering during first inference. Thanks to [@heimaojinzhangyz](https://github.com/heimaojinzhangyz).
- Dec 28, 2024: Integrated the digital human model "Ultralight-Digital-Human". Thanks to [@lijihua2017](https://github.com/lijihua2017).
- Feb 7, 2025: Added fish-speech Text-to-Speech (TTS) functionality.
- Feb 21, 2025: Added open-source wav2lip256 model. Thanks to @Buchunbuchun (literally "Not Stupid, Not Stupid").
- Mar 2, 2025: Added Tencent Cloud Text-to-Speech service.
- Mar 16, 2025: Supported GPU inference on macOS. Thanks to [@GcsSloop](https://github.com/GcsSloop).
- May 1, 2025: Simplified runtime parameters; moved the ernerf model to the Git branch "ernerf-rtmp".
- Jun 7, 2025: Added virtual camera output.
- Jul 5, 2025: Added Doubao Text-to-Speech. Thanks to [@ELK-milu](https://github.com/ELK-milu).
- Jul 26, 2025: Supported musetalk v1.5.
## Features
1. Supports multiple digital human models: ernerf, musetalk, wav2lip, Ultralight-Digital-Human.
2. Supports voice cloning.
3. Supports interrupting the digital human while it is speaking.
4. Supports full-body video stitching.
5. Supports WebRTC and virtual camera output.
6. Supports motion choreography: plays custom videos when the digital human is not speaking.
7. Supports multi-concurrency.
## 1. Installation
Tested on Ubuntu 24.04, Python 3.10, PyTorch 2.5.0, and CUDA 12.4.
### 1.1 Install Dependencies
```bash
conda create -n nerfstream python=3.10
conda activate nerfstream
# If your CUDA version is not 12.4 (check via "nvidia-smi"), install the corresponding PyTorch version from
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.4 -c pytorch -c nvidia
pip install -r requirements.txt
```
For common installation issues, refer to the [FAQ](https://livetalking-doc.readthedocs.io/en/latest/faq.html).
For CUDA environment setup on Linux, refer to this article:
Troubleshooting for video connection issues:
## 2. Quick Start
- Download Models
Quark Cloud Drive:
Google Drive:
1. Copy `wav2lip256.pth` to the `models` directory of this project and rename it to `wav2lip.pth`.
2. Extract the `wav2lip256_avatar1.tar.gz` archive and copy the entire extracted folder to `data/avatars` of this project.
- Run the Project
Execute: `python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1`
The server must open the following ports: TCP: 8010; UDP: 1-65536
You can access the client in two ways:
(1) Open `http://serverip:8010/webrtcapi.html` in a browser. First click "start" to play the digital human video; then enter any text in the input box and submit it. The digital human will broadcast the text.
(2) Use the desktop client (download link: ).
- Quick Experience
Visit and create an instance with this image to run the project successfully immediately.
If you cannot access Hugging Face, run the following command before starting the project:
```
export HF_ENDPOINT=https://hf-mirror.com
```
## 3. More Usage
For detailed usage instructions:
## 4. Docker Run
No prior installation is required; run directly with Docker:
```
docker run --gpus all -it --network=host --rm registry.cn-zhangjiakou.aliyuncs.com/codewithgpu3/lipku-livetalking:toza2irpHZ
```
The code is located in `/root/livetalking`. First run `git pull` to fetch the latest code, then execute commands as described in Sections 2 and 3.
The following images are available:
- AutoDL Image:
[AutoDL Tutorial](https://livetalking-doc.readthedocs.io/en/latest/autodl/README.html)
- UCloud Image:
Supports opening any port; no additional SRS service deployment is required.
[UCloud Tutorial](https://livetalking-doc.readthedocs.io/en/latest/ucloud/ucloud.html)
## 5. Performance
- Performance mainly depends on CPU and GPU: Each video stream compression consumes CPU resources, and CPU performance is positively correlated with video resolution; each lip-sync inference depends on GPU performance.
- The number of concurrent streams when the digital human is not speaking depends on CPU performance; the number of concurrent streams when multiple digital humans are speaking simultaneously depends on GPU performance.
- In the backend logs, `inferfps` refers to the GPU inference frame rate, and `finalfps` refers to the final streaming frame rate. Both need to be above 25 fps to achieve real-time performance. If `inferfps` is above 25 but `finalfps` is below 25, it indicates insufficient CPU performance.
- Real-Time Inference Performance
| Model | GPU Model | FPS |
| :---------- | :--------- | :--- |
| wav2lip256 | RTX 3060 | 60 |
| wav2lip256 | RTX 3080Ti | 120 |
| musetalk | RTX 3080Ti | 42 |
| musetalk | RTX 3090 | 45 |
| musetalk | RTX 4090 | 72 |
A GPU of RTX 3060 or higher is sufficient for wav2lip256, while musetalk requires an RTX 3080Ti or higher.
## 6. Commercial Version
The following extended features are available for users who are familiar with the open-source project and need to expand product capabilities:
1. High-definition wav2lip model.
2. Full voice interaction: supports interrupting the digital human’s response via a wake word or button to ask a new question.
3. Real-time synchronized subtitles: provides the frontend with events for the start and end of each sentence spoken by the digital human.
4. Each connection can specify a corresponding avatar and voice; accelerated avatar image loading.
5. Supports avatars (digital human images) with unlimited duration.
6. Provides a real-time audio stream input interface.
7. Transparent background for the digital human, supporting dynamic background overlay.
8. Real-time avatar switching.
9. Python client.
For more details:
## 7. Statement
Videos developed based on this project and published on platforms such as Bilibili, WeChat Channels, and Douyin must include the LiveTalking watermark and logo.
---
If this project is helpful to you, please give it a "Star". Contributions from developers interested in improving this project are also welcome.
* Knowledge Planet (for high-quality FAQs, best practices, and Q&A): https://t.zsxq.com/7NMyO
* WeChat Official Account: 数字人技术 (Digital Human Technology)
