Spaces:
Running
Running
File size: 3,335 Bytes
87ff28a a4e10c8 87ff28a a4e10c8 87ff28a a4e10c8 87ff28a a4e10c8 87ff28a e898abd 87ff28a e898abd 87ff28a e898abd 87ff28a e898abd 87ff28a c36a916 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
<div align="center">
<img src="https://storage.googleapis.com/hume-public-logos/hume/hume-banner.png">
<h1>Hume AI | Expressive TTS Arena</h1>
<p>
<strong>An interactive platform for comparing and evaluating the expressiveness of different text-to-speech engines</strong>
</p>
</div>
## Overview
Expressive TTS Arena is an open-source web application that enables users to compare text-to-speech outputs with a focus on expressiveness rather than just audio quality. Built with [Gradio](https://www.gradio.app/), it provides a seamless interface for generating and comparing speech synthesis from different providers, including Hume AI and ElevenLabs.
## Features
- Text generation using Claude AI for creating expressive content.
- Direct text input or AI-assisted text generation.
- Comparative analysis of different TTS engines.
- Simple voting mechanism for preferred outputs.
- Random voice selection from multiple providers.
- Real-time speech synthesis comparison.
## Prerequisites
- Python >=3.11.11
- Virtual environment capability
- API keys for Hume AI, Anthropic, and ElevenLabs
- For a complete list of dependencies, see requirements.
## Project Structure
```
Expressive TTS Arena/
βββ src/
β βββ integrations/
β β βββ __init__.py # Makes integrations a package; exposes API clients
β β βββ anthropic_api.py # Anthropic API integration
β β βββ elevenlabs_api.py # ElevenLabs API integration
β β βββ hume_api.py # Hume API integration
β βββ __init__.py # Makes src a package; exposes key functionality
β βββ app.py # Entry file
β βββ config.py # Global config and logger setup
β βββ constants.py # Global constants
β βββ theme.py # Custom Gradio Theme
β βββ utils.py # Utility functions
βββ .env.example
βββ .gitignore
βββ .pre-commit-config.yaml
βββ requirements.txt
```
## Installation
1. Create and activate the virtual environment:
```
sh python -m venv gradio-env
source gradio-env/bin/activate # On Windows, use: gradio-env\Scripts\activate
```
2. Install dependencies:
```sh
pip install -r requirements.txt
```
3. Install pre-commit hook for automatic file formatting:
```sh
pre-commit install
```
4. Configure environment variables:
- Create a `.env` file based on `.env.example`
- Add your API keys:
```sh
HUME_API_KEY=YOUR_HUME_API_KEY
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY
```
5. Run the application:
```sh
watchfiles "python -m src.app"`
```
## User Flow
1. **Enter or Generate Text:** Type directly in the Text box, or optionally enter a Prompt, click "Generate text", and edit if needed.
2. **Synthesize Speech:** Click "Synthesize speech" to generate two audio outputs.
3. **Listen & Compare:** Playback both options (A & B) to hear the differences.
4. **Vote for Your Favorite:** Click "Vote for option A" or "Vote for option B" to choose your favorite.
## License
This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details. |