Spaces:

Rcarvalo
/

speech-to-speech

Runtime error

App Files Files Community

speech-to-speech / README.md

Rcarvalo

Upload README.md with huggingface_hub

23d6a6e verified 10 days ago

preview code

raw

history blame contribute delete

2.45 kB

	---
	title: LFM2-Audio Real-time Speech-to-Speech
	emoji: 🎙️
	colorFrom: purple
	colorTo: pink
	sdk: docker
	app_port: 7860
	pinned: false
	license: other
	---

	# LFM2-Audio Real-time Speech-to-Speech Chat

	Real-time WebRTC streaming demo of LFM2-Audio-1.5B, Liquid AI's first end-to-end audio foundation model.

	## ✨ Features

	- 🔴 Real-time WebRTC streaming - Instant response with minimal latency
	- 🎙️ Continuous listening - Natural conversation flow with automatic pause detection
	- 💬 Interleaved output - Simultaneous text and audio generation
	- 🔄 Multi-turn memory - Context-aware conversations
	- ⚡ Low latency - Optimized for real-time interaction

	## 🚀 How to Use

	1. Grant microphone access when prompted by your browser
	2. Start speaking - The model listens continuously
	3. Pause briefly - The model detects pauses and responds automatically
	4. Continue conversation - Build multi-turn dialogues naturally

	## 🎛️ Parameters

	### Temperature
	- 0: Greedy decoding (most deterministic)
	- 1.0: Default (balanced creativity and coherence)
	- 2.0: Maximum creativity (more diverse outputs)

	### Top-k
	- 0: No filtering (full vocabulary)
	- 4: Default (conservative, high quality)
	- Higher values: More diverse but potentially less coherent

	## 🏗️ Technical Details

	- Model: LFM2-Audio-1.5B
	- Generation Mode: Interleaved (optimized for real-time)
	- Audio Codec: Mimi (24kHz)
	- Streaming: WebRTC via fastrtc
	- Backend: PyTorch with CUDA acceleration

	## 🔧 Differences from Standard Demo

	This demo uses fastrtc for WebRTC streaming, enabling:
	- Continuous audio streaming without manual recording
	- Automatic voice activity detection (VAD)
	- Lower latency through chunked processing
	- More natural conversation flow

	## 📚 Resources

	- [Liquid AI Website](https://www.liquid.ai/)
	- [GitHub Repository](https://github.com/Liquid4All/liquid-audio/)
	- [Model on Hugging Face](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B)
	- [fastrtc Documentation](https://github.com/freddyaboulton/fastrtc)

	## 📝 License

	Licensed under the LFM Open License v1.0

	## 💡 Tips

	- Speak clearly and pause briefly between thoughts
	- Use a good quality microphone for best results
	- Adjust temperature for different creativity levels
	- Lower top-k values produce more consistent responses
	- GPU acceleration is recommended for real-time performance