# ASR_API_V2

**Repository Path**: svo/ASR_API_V2

## Basic Information

- **Project Name**: ASR_API_V2
- **Description**: https://github.com/MotorBottle/ASR_API_V2
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-10-01
- **Last Updated**: 2025-10-01

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# ASR API Server V2

A FastAPI-based server for audio/video transcription services, extracted and optimized from the Private-ASR project.

## Features

- **Audio/Video Transcription**: Support for various audio and video formats
- **Multiple Output Formats**: Text and SRT subtitle formats
- **Speaker Diarization**: Identify and separate different speakers
- **Hotwords Support**: Improve recognition accuracy with custom hotwords
- **Speaker Naming**: Replace speaker IDs with custom names
- **Multi-language Support**: Chinese (zh) and English (en)
- **Docker Deployment**: Both CPU and GPU support
- **REST API**: Clean HTTP endpoints for easy integration

## Supported Formats

### Input Formats
- **Audio**: `.wav`, `.mp3`, `.aac`, `.m4a`, `.flac`
- **Video**: `.mp4`, `.avi`, `.mkv`, `.mov`, `.webm`

### Output Formats
- **Text**: Plain text transcription
- **SRT**: Timestamped subtitle format with speaker labels

## Quick Start

### Using Docker (Recommended)

#### CPU Deployment
```bash
# Build and run CPU version
docker-compose --profile cpu up -d

# Or build manually
docker build -t asr-api-v2:cpu .
docker run -p 7869:7869 -e DEVICE=cpu asr-api-v2:cpu
```

#### GPU Deployment
```bash
# Build and run GPU version (requires NVIDIA Docker)
docker-compose --profile gpu up -d

# Or build manually
docker build -f Dockerfile.gpu -t asr-api-v2:gpu .
docker run --gpus all -p 7869:7869 -e DEVICE=cuda:0 asr-api-v2:gpu
```

### Local Development

1. Install dependencies:
```bash
pip install -r requirements.txt
```

2. Set environment variables:
```bash
cp .env.example .env
# Edit .env as needed
```

3. Run the server:
```bash
python main.py
```

## API Endpoints

### Health Check
```http
GET /health
```

### Get Model Information
```http
GET /models
```

### Transcribe File
```http
POST /transcribe
Content-Type: multipart/form-data

file: (audio/video file)
output_format: text|srt|both (default: text)
language: zh|en (default: zh)
enable_speaker_diarization: true|false (default: false)
hotwords: (optional, one per line with format "word weight")
```

### Transcribe URL
```http
POST /transcribe_url
Content-Type: application/json

{
  "url": "https://example.com/audio.wav",
  "output_format": "both",
  "language": "zh",
  "enable_speaker_diarization": false,
  "hotwords": {"测试": 20, "语音识别": 30}
}
```

## API Response Format

### Standard Response (text/srt)
```json
{
  "success": true,
  "transcription": "transcribed text or SRT content",
  "format": "text",
  "language": "zh",
  "speaker_diarization": false,
  "duration": 123.45,
  "speakers": ["spk0", "spk1"]
}
```

### Both Format Response
When `output_format: "both"`, you get both text and SRT:
```json
{
  "success": true,
  "transcription": "plain text transcription",
  "transcription_srt": "1  [spk0]\n00:00:01,300 --> 00:00:03,460\nhello world\n\n",
  "format": "both",
  "language": "zh",
  "speaker_diarization": true,
  "duration": 123.45,
  "speakers": ["spk0", "spk1"]
}
```

## Configuration

### Environment Variables

- `HOST`: Server host (default: 0.0.0.0)
- `PORT`: Server port (default: 7869)
- `DEVICE`: Processing device (cpu or cuda:0)
- `LANGUAGE`: Default language (zh or en)
- `LOG_LEVEL`: Logging level (default: INFO)

### Hotwords Format

Hotwords can be provided to improve recognition accuracy:

**Form data format:**
```
测试 20
语音识别 30
重要词汇 40
```

**JSON format:**
```json
{
  "测试": 20,
  "语音识别": 30,
  "重要词汇": 40
}
```

### Speaker Names

Replace speaker IDs with custom names:

**Form data format:**
```
spk0:张三,spk1:李四,spk2:王五
```

**JSON format:**
```json
{
  "spk0": "张三",
  "spk1": "李四",
  "spk2": "王五"
}
```

## Testing

### Using the Test Script

1. Place test files in the project directory:
   - `test.wav` - Audio file for testing
   - `test.mp4` - Video file for testing

2. Run the test script:
```bash
python test_api.py
```

### Manual Testing with cURL

#### Health Check
```bash
curl -X GET http://localhost:7869/health
```

#### Transcribe Audio File
```bash
curl -X POST \
  -F "file=@test.wav" \
  -F "output_format=text" \
  -F "language=zh" \
  -F "enable_speaker_diarization=false" \
  -F "hotwords=测试 20" \
  http://localhost:7869/transcribe
```

#### Transcribe with Speaker Diarization
```bash
curl -X POST \
  -F "file=@test.mp4" \
  -F "output_format=srt" \
  -F "language=zh" \
  -F "enable_speaker_diarization=true" \
  http://localhost:7869/transcribe
```

## API Documentation

Once the server is running, you can access:

- **Swagger UI**: http://localhost:7869/docs
- **ReDoc**: http://localhost:7869/redoc

## Docker Deployment

### CPU Version
```bash
docker-compose --profile cpu up -d
```

### GPU Version (requires NVIDIA Docker)
```bash
docker-compose --profile gpu up -d
```

### Build Custom Image
```bash
# CPU version
docker build -t asr-api-v2:cpu .

# GPU version
docker build -f Dockerfile.gpu -t asr-api-v2:gpu .
```

### Volume Configuration

The Docker setup includes a persistent volume for ModelScope cache:

- **`modelscope_cache`**: Stores downloaded FunASR models to avoid re-downloading
- **Location**: `/root/.cache/modelscope` inside container
- **Benefits**: Faster startup times after initial model download

To manually manage the cache volume:
```bash
# View volume info
docker volume inspect asr_api_v2_modelscope_cache

# Remove volume to force model re-download
docker volume rm asr_api_v2_modelscope_cache
```

## Performance Considerations

- **GPU Acceleration**: Use GPU deployment for better performance
- **Memory Usage**: Large models require significant RAM
- **File Size Limits**: Adjust based on your use case
- **Concurrent Requests**: Limited by hardware resources
- **Model Caching**: Models are cached in Docker volumes to avoid re-downloading

## Troubleshooting

### Common Issues

1. **Model Download**: First run may take time to download models (subsequent runs use cached models)
2. **CUDA Errors**: Ensure NVIDIA Docker is properly installed
3. **Memory Issues**: Reduce concurrent requests or use CPU mode
4. **File Format**: Verify input files are in supported formats

### Logs

Check application logs for detailed error information:
```bash
docker logs asr-api-v2-cpu
# or
docker logs asr-api-v2-gpu
```

## Development

### Project Structure
```
ASR_API_V2/
├── main.py              # FastAPI application
├── asr_processor.py     # Core ASR processing logic
├── models.py            # Pydantic models
├── utils/               # Utility functions
├── test_api.py          # Test script
├── requirements.txt     # Python dependencies
├── Dockerfile           # CPU Docker image
├── Dockerfile.gpu       # GPU Docker image
├── docker-compose.yml   # Docker Compose configuration
└── README.md           # This file
```

## License

This project is based on Private-ASR and FunASR, following the MIT License.

## Credits

- Based on [Private-ASR](https://github.com/MotorBottle/Audio-Processor)
- Powered by [FunASR](https://github.com/alibaba-damo-academy/FunASR)
- Uses [FastAPI](https://fastapi.tiangolo.com/) for the web framework