Spaces:

Rcarvalo
/

speech-to-speech

Runtime error

App Files Files Community

Rcarvalo commited on 7 days ago

Commit

23d6a6e

verified ·

1 Parent(s): 30dbba5

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +49 -35

README.md CHANGED Viewed

@@ -1,62 +1,76 @@
 ---
-title: LFM2-Audio Speech-to-Speech
-emoji: 🎤
-colorFrom: blue
-colorTo: purple
 sdk: docker
 app_port: 7860
 pinned: false
 license: other
 ---
-# LFM2-Audio Speech-to-Speech Chat
-This is a demo of LFM2-Audio-1.5B, Liquid AI's first end-to-end audio foundation model. Built with low-latency in mind, the lightweight LFM2 backbone enables real-time speech-to-speech conversations without sacrificing quality.
-## Features
-- **Real-time speech-to-speech**: Talk to the model and get audio responses
-- **Multi-turn conversations**: Maintain context across multiple exchanges
-- **Interleaved text and audio**: See the text transcription while hearing the audio
-## How to Use
-1. **Record your voice**: Click the microphone button and speak your message
-2. **Adjust parameters** (optional):
-   - Temperature: Controls randomness (higher = more creative)
-   - Top-k: Limits sampling to top k tokens
-3. **Generate Response**: Click the button to get the model's response
-4. **Listen & Read**: Hear the audio response and read the text transcription
-## Parameters
-- **Temperature**:
-  - 0 = Greedy decoding (most deterministic)
-  - 1.0 = Default (balanced)
-  - 2.0 = Very creative (more random)
-- **Top-k**:
-  - 0 = No filtering
-  - 4 = Default (conservative)
-  - Higher values = more diversity
-## Technical Details
-- Model: LFM2-Audio-1.5B
-- Audio Codec: Mimi (24kHz)
-- Mode: Interleaved generation (optimal for real-time conversations)
-## Requirements
-- GPU recommended for real-time performance
-- Microphone access in your browser
-## Links
 - [Liquid AI Website](https://www.liquid.ai/)
 - [GitHub Repository](https://github.com/Liquid4All/liquid-audio/)
 - [Model on Hugging Face](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B)
-## License
 Licensed under the LFM Open License v1.0

 ---
+title: LFM2-Audio Real-time Speech-to-Speech
+emoji: 🎙️
+colorFrom: purple
+colorTo: pink
 sdk: docker
 app_port: 7860
 pinned: false
 license: other
 ---
+# LFM2-Audio Real-time Speech-to-Speech Chat
+Real-time WebRTC streaming demo of LFM2-Audio-1.5B, Liquid AI's first end-to-end audio foundation model.
+## ✨ Features
+- **🔴 Real-time WebRTC streaming** - Instant response with minimal latency
+- **🎙️ Continuous listening** - Natural conversation flow with automatic pause detection
+- **💬 Interleaved output** - Simultaneous text and audio generation
+- **🔄 Multi-turn memory** - Context-aware conversations
+- **⚡ Low latency** - Optimized for real-time interaction
+## 🚀 How to Use
+1. **Grant microphone access** when prompted by your browser
+2. **Start speaking** - The model listens continuously
+3. **Pause briefly** - The model detects pauses and responds automatically
+4. **Continue conversation** - Build multi-turn dialogues naturally
+## 🎛️ Parameters
+### Temperature
+- **0**: Greedy decoding (most deterministic)
+- **1.0**: Default (balanced creativity and coherence)
+- **2.0**: Maximum creativity (more diverse outputs)
+### Top-k
+- **0**: No filtering (full vocabulary)
+- **4**: Default (conservative, high quality)
+- **Higher values**: More diverse but potentially less coherent
+## 🏗️ Technical Details
+- **Model**: LFM2-Audio-1.5B
+- **Generation Mode**: Interleaved (optimized for real-time)
+- **Audio Codec**: Mimi (24kHz)
+- **Streaming**: WebRTC via fastrtc
+- **Backend**: PyTorch with CUDA acceleration
+## 🔧 Differences from Standard Demo
+This demo uses **fastrtc** for WebRTC streaming, enabling:
+- Continuous audio streaming without manual recording
+- Automatic voice activity detection (VAD)
+- Lower latency through chunked processing
+- More natural conversation flow
+## 📚 Resources
 - [Liquid AI Website](https://www.liquid.ai/)
 - [GitHub Repository](https://github.com/Liquid4All/liquid-audio/)
 - [Model on Hugging Face](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B)
+- [fastrtc Documentation](https://github.com/freddyaboulton/fastrtc)
+## 📝 License
 Licensed under the LFM Open License v1.0
+## 💡 Tips
+- Speak clearly and pause briefly between thoughts
+- Use a good quality microphone for best results
+- Adjust temperature for different creativity levels
+- Lower top-k values produce more consistent responses
+- GPU acceleration is recommended for real-time performance