Rcarvalo commited on
Commit
23d6a6e
Β·
verified Β·
1 Parent(s): 30dbba5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +49 -35
README.md CHANGED
@@ -1,62 +1,76 @@
1
  ---
2
- title: LFM2-Audio Speech-to-Speech
3
- emoji: 🎀
4
- colorFrom: blue
5
- colorTo: purple
6
  sdk: docker
7
  app_port: 7860
8
  pinned: false
9
  license: other
10
  ---
11
 
12
- # LFM2-Audio Speech-to-Speech Chat
13
 
14
- This is a demo of LFM2-Audio-1.5B, Liquid AI's first end-to-end audio foundation model. Built with low-latency in mind, the lightweight LFM2 backbone enables real-time speech-to-speech conversations without sacrificing quality.
15
 
16
- ## Features
17
 
18
- - **Real-time speech-to-speech**: Talk to the model and get audio responses
19
- - **Multi-turn conversations**: Maintain context across multiple exchanges
20
- - **Interleaved text and audio**: See the text transcription while hearing the audio
 
 
21
 
22
- ## How to Use
23
 
24
- 1. **Record your voice**: Click the microphone button and speak your message
25
- 2. **Adjust parameters** (optional):
26
- - Temperature: Controls randomness (higher = more creative)
27
- - Top-k: Limits sampling to top k tokens
28
- 3. **Generate Response**: Click the button to get the model's response
29
- 4. **Listen & Read**: Hear the audio response and read the text transcription
30
 
31
- ## Parameters
32
 
33
- - **Temperature**:
34
- - 0 = Greedy decoding (most deterministic)
35
- - 1.0 = Default (balanced)
36
- - 2.0 = Very creative (more random)
37
 
38
- - **Top-k**:
39
- - 0 = No filtering
40
- - 4 = Default (conservative)
41
- - Higher values = more diversity
42
 
43
- ## Technical Details
44
 
45
- - Model: LFM2-Audio-1.5B
46
- - Audio Codec: Mimi (24kHz)
47
- - Mode: Interleaved generation (optimal for real-time conversations)
 
 
48
 
49
- ## Requirements
50
 
51
- - GPU recommended for real-time performance
52
- - Microphone access in your browser
 
 
 
53
 
54
- ## Links
55
 
56
  - [Liquid AI Website](https://www.liquid.ai/)
57
  - [GitHub Repository](https://github.com/Liquid4All/liquid-audio/)
58
  - [Model on Hugging Face](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B)
 
59
 
60
- ## License
61
 
62
  Licensed under the LFM Open License v1.0
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: LFM2-Audio Real-time Speech-to-Speech
3
+ emoji: πŸŽ™οΈ
4
+ colorFrom: purple
5
+ colorTo: pink
6
  sdk: docker
7
  app_port: 7860
8
  pinned: false
9
  license: other
10
  ---
11
 
12
+ # LFM2-Audio Real-time Speech-to-Speech Chat
13
 
14
+ Real-time WebRTC streaming demo of LFM2-Audio-1.5B, Liquid AI's first end-to-end audio foundation model.
15
 
16
+ ## ✨ Features
17
 
18
+ - **πŸ”΄ Real-time WebRTC streaming** - Instant response with minimal latency
19
+ - **πŸŽ™οΈ Continuous listening** - Natural conversation flow with automatic pause detection
20
+ - **πŸ’¬ Interleaved output** - Simultaneous text and audio generation
21
+ - **πŸ”„ Multi-turn memory** - Context-aware conversations
22
+ - **⚑ Low latency** - Optimized for real-time interaction
23
 
24
+ ## πŸš€ How to Use
25
 
26
+ 1. **Grant microphone access** when prompted by your browser
27
+ 2. **Start speaking** - The model listens continuously
28
+ 3. **Pause briefly** - The model detects pauses and responds automatically
29
+ 4. **Continue conversation** - Build multi-turn dialogues naturally
 
 
30
 
31
+ ## πŸŽ›οΈ Parameters
32
 
33
+ ### Temperature
34
+ - **0**: Greedy decoding (most deterministic)
35
+ - **1.0**: Default (balanced creativity and coherence)
36
+ - **2.0**: Maximum creativity (more diverse outputs)
37
 
38
+ ### Top-k
39
+ - **0**: No filtering (full vocabulary)
40
+ - **4**: Default (conservative, high quality)
41
+ - **Higher values**: More diverse but potentially less coherent
42
 
43
+ ## πŸ—οΈ Technical Details
44
 
45
+ - **Model**: LFM2-Audio-1.5B
46
+ - **Generation Mode**: Interleaved (optimized for real-time)
47
+ - **Audio Codec**: Mimi (24kHz)
48
+ - **Streaming**: WebRTC via fastrtc
49
+ - **Backend**: PyTorch with CUDA acceleration
50
 
51
+ ## πŸ”§ Differences from Standard Demo
52
 
53
+ This demo uses **fastrtc** for WebRTC streaming, enabling:
54
+ - Continuous audio streaming without manual recording
55
+ - Automatic voice activity detection (VAD)
56
+ - Lower latency through chunked processing
57
+ - More natural conversation flow
58
 
59
+ ## πŸ“š Resources
60
 
61
  - [Liquid AI Website](https://www.liquid.ai/)
62
  - [GitHub Repository](https://github.com/Liquid4All/liquid-audio/)
63
  - [Model on Hugging Face](https://huggingface.co/LiquidAI/LFM2-Audio-1.5B)
64
+ - [fastrtc Documentation](https://github.com/freddyaboulton/fastrtc)
65
 
66
+ ## πŸ“ License
67
 
68
  Licensed under the LFM Open License v1.0
69
+
70
+ ## πŸ’‘ Tips
71
+
72
+ - Speak clearly and pause briefly between thoughts
73
+ - Use a good quality microphone for best results
74
+ - Adjust temperature for different creativity levels
75
+ - Lower top-k values produce more consistent responses
76
+ - GPU acceleration is recommended for real-time performance