File size: 3,335 Bytes
87ff28a
 
a4e10c8
87ff28a
 
 
 
 
 
a4e10c8
87ff28a
 
a4e10c8
 
 
 
 
 
87ff28a
 
 
 
 
 
a4e10c8
87ff28a
e898abd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87ff28a
 
e898abd
 
87ff28a
 
 
 
 
 
 
 
e898abd
 
 
 
 
 
87ff28a
 
 
 
 
 
 
 
 
e898abd
 
87ff28a
 
 
 
 
 
 
 
 
 
 
c36a916
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
<div align="center">   
    <img src="https://storage.googleapis.com/hume-public-logos/hume/hume-banner.png">   
    <h1>Hume AI | Expressive TTS Arena</h1>   
    <p>
        <strong>An interactive platform for comparing and evaluating the expressiveness of different text-to-speech engines</strong>   
    </p> 
</div>

## Overview
Expressive TTS Arena is an open-source web application that enables users to compare text-to-speech outputs with a focus on expressiveness rather than just audio quality. Built with [Gradio](https://www.gradio.app/), it provides a seamless interface for generating and comparing speech synthesis from different providers, including Hume AI and ElevenLabs.

## Features
- Text generation using Claude AI for creating expressive content.
- Direct text input or AI-assisted text generation.
- Comparative analysis of different TTS engines.
- Simple voting mechanism for preferred outputs.
- Random voice selection from multiple providers.
- Real-time speech synthesis comparison.

## Prerequisites

- Python >=3.11.11
- Virtual environment capability
- API keys for Hume AI, Anthropic, and ElevenLabs
- For a complete list of dependencies, see requirements.

## Project Structure
```
Expressive TTS Arena/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ integrations/
β”‚   β”‚   β”œβ”€β”€ __init__.py         # Makes integrations a package; exposes API clients
β”‚   β”‚   β”œβ”€β”€ anthropic_api.py    # Anthropic API integration
β”‚   β”‚   β”œβ”€β”€ elevenlabs_api.py   # ElevenLabs API integration
β”‚   β”‚   └── hume_api.py         # Hume API integration
β”‚   β”œβ”€β”€ __init__.py             # Makes src a package; exposes key functionality
β”‚   β”œβ”€β”€ app.py                  # Entry file
β”‚   β”œβ”€β”€ config.py               # Global config and logger setup
β”‚   β”œβ”€β”€ constants.py            # Global constants
β”‚   β”œβ”€β”€ theme.py                # Custom Gradio Theme
β”‚   └── utils.py                # Utility functions
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”œβ”€β”€ .pre-commit-config.yaml
└── requirements.txt
```

## Installation

1. Create and activate the virtual environment:
    ```
    sh python -m venv gradio-env
    source gradio-env/bin/activate  # On Windows, use: gradio-env\Scripts\activate
    ```

2. Install dependencies:
    ```sh
    pip install -r requirements.txt
    ```

3. Install pre-commit hook for automatic file formatting:
    ```sh
    pre-commit install
    ```

4. Configure environment variables:
    - Create a `.env` file based on `.env.example`
    - Add your API keys:

    ```sh
    HUME_API_KEY=YOUR_HUME_API_KEY
    ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
    ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY
    ```

5. Run the application:
    ```sh 
    watchfiles "python -m src.app"`
    ```

## User Flow

1. **Enter or Generate Text:** Type directly in the Text box, or optionally enter a Prompt, click "Generate text", and edit if needed.
2. **Synthesize Speech:** Click "Synthesize speech" to generate two audio outputs.
3. **Listen & Compare:** Playback both options (A & B) to hear the differences.
4. **Vote for Your Favorite:** Click "Vote for option A" or "Vote for option B" to choose your favorite.

## License
This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details.