Joseph Pollack commited on
Commit
0dbfeed
Β·
unverified Β·
1 Parent(s): 77f56a9

implements final interface fixes

Browse files
Files changed (3) hide show
  1. AUDIO_INPUT_FIX.md +90 -0
  2. README.md +39 -12
  3. src/app.py +6 -3
AUDIO_INPUT_FIX.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Audio Input Display Fix
2
+
3
+ ## Issue
4
+ The audio input (microphone button) was not displaying in the ChatInterface multimodal textbox.
5
+
6
+ ## Root Cause
7
+ When `multimodal=True` is set on `gr.ChatInterface`, it should automatically show image and audio buttons. However:
8
+ 1. The buttons might be hidden in a dropdown menu
9
+ 2. Browser permissions might be blocking microphone access
10
+ 3. The `file_types` parameter might not have been explicitly set
11
+
12
+ ## Fix Applied
13
+
14
+ ### 1. Added `file_types` Parameter
15
+ Explicitly specified which file types are accepted to ensure audio is enabled:
16
+
17
+ ```python
18
+ gr.ChatInterface(
19
+ fn=research_agent,
20
+ multimodal=True,
21
+ file_types=["image", "audio", "video"], # Explicitly enable image, audio, and video
22
+ ...
23
+ )
24
+ ```
25
+
26
+ **File:** `src/app.py` (line 929)
27
+
28
+ ### 2. Enhanced UI Description
29
+ Updated the description to make it clearer where to find the audio input:
30
+
31
+ - Added explicit instructions about clicking the πŸ“· and 🎀 icons
32
+ - Added a tip about looking for icons in the text input box
33
+ - Clarified drag & drop functionality
34
+
35
+ **File:** `src/app.py` (lines 942-948)
36
+
37
+ ## How It Works Now
38
+
39
+ 1. **Audio Recording Button**: The 🎀 microphone icon should appear in the textbox toolbar when `multimodal=True` is set
40
+ 2. **File Upload**: Users can drag & drop audio files or click to upload
41
+ 3. **Browser Permissions**: Browser will prompt for microphone access when user clicks the audio button
42
+
43
+ ## Testing
44
+
45
+ To verify the fix:
46
+ 1. Look for the 🎀 microphone icon in the text input box
47
+ 2. Click it to start recording (browser will ask for microphone permission)
48
+ 3. Alternatively, drag & drop an audio file into the textbox
49
+ 4. Check browser console for any permission errors
50
+
51
+ ## Browser Requirements
52
+
53
+ - **Chrome/Edge**: Should work with microphone permissions
54
+ - **Firefox**: Should work with microphone permissions
55
+ - **Safari**: May require additional configuration
56
+ - **HTTPS Required**: Microphone access typically requires HTTPS (or localhost)
57
+
58
+ ## Troubleshooting
59
+
60
+ If audio input still doesn't appear:
61
+
62
+ 1. **Check Browser Permissions**:
63
+ - Open browser settings
64
+ - Check microphone permissions for the site
65
+ - Ensure microphone is not blocked
66
+
67
+ 2. **Check Browser Console**:
68
+ - Open Developer Tools (F12)
69
+ - Look for permission errors or warnings
70
+ - Check for any JavaScript errors
71
+
72
+ 3. **Try Different Browser**:
73
+ - Some browsers have stricter permission policies
74
+ - Try Chrome or Firefox if Safari doesn't work
75
+
76
+ 4. **Check Gradio Version**:
77
+ - Ensure `gradio>=6.0.0` is installed
78
+ - Update if needed: `pip install --upgrade gradio`
79
+
80
+ 5. **HTTPS Requirement**:
81
+ - Microphone access requires HTTPS (or localhost)
82
+ - If deploying, ensure SSL is configured
83
+
84
+ ## Additional Notes
85
+
86
+ - The audio button is part of the MultimodalTextbox component
87
+ - It should appear as an icon in the textbox toolbar
88
+ - If it's still not visible, it might be in a dropdown menu (click the "+" or "..." button)
89
+ - The `file_types` parameter ensures audio files are accepted for upload
90
+
README.md CHANGED
@@ -21,6 +21,14 @@ tags:
21
  - pydantic-ai
22
  - llamaindex
23
  - modal
 
 
 
 
 
 
 
 
24
  ---
25
 
26
  > [!IMPORTANT]
@@ -58,11 +66,21 @@ The DETERMINATOR is a powerful generalist deep research agent system that stops
58
 
59
  For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
60
 
 
 
 
 
 
 
 
 
 
 
61
  ## Deep Critical In the Medial
62
 
63
  - Social Medial Posts about Deep Critical :
64
- -
65
- -
66
  -
67
  -
68
  -
@@ -100,24 +118,33 @@ For this hackathon we're proposing a simple yet powerful Deep Research Agent tha
100
  - [x] **Specialized Research Teams of Agents**:
101
 
102
  ### Team
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
- - ZJ
105
- - MarioAderman
106
- - Josephrp
107
 
108
  ## Acknowledgements
109
 
110
- - McSwaggins
111
- - Magentic
112
- - Huggingface
113
- - Gradio
114
- - DeepCritical
115
- - Sponsors
116
  - Microsoft
117
  - Pydantic
118
  - Llama-index
119
  - Anthhropic/MCP
120
- - List of Tools Makers
121
 
122
 
123
  ## Links
 
21
  - pydantic-ai
22
  - llamaindex
23
  - modal
24
+ - building-mcp-track-enterprise
25
+ - building-mcp-track-consumer
26
+ - mcp-in-action-track-enterprise
27
+ - mcp-in-action-track-consumer
28
+ - building-mcp-track-modal
29
+ - building-mcp-track-blaxel
30
+ - building-mcp-track-llama-index
31
+ - building-mcp-track-HUGGINGFACE
32
  ---
33
 
34
  > [!IMPORTANT]
 
66
 
67
  For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
68
 
69
+
70
+ > [!IMPORTANT]
71
+ > **IF YOU ARE A JUDGE**
72
+ >
73
+ > This project was produced with passion by a group of volunteers please check out or documentation and readmes and please do keep reading below for our story
74
+ >
75
+ > - πŸ“š **Documentation**: See our [technical documentation](deepcritical.github.io/GradioDemo/) for detailed information
76
+ > - πŸ“– **Complete README**: Check out the [full README](.github/README.md) for setup, configuration, and contribution guidelines
77
+ > - πŸ† **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
78
+
79
  ## Deep Critical In the Medial
80
 
81
  - Social Medial Posts about Deep Critical :
82
+ - [![X](https://x.com/marioaderman/status/1995247432444133471)]
83
+ - [![LinkedIn](https://www.linkedin.com/feed/update/urn:li:activity:7400984658496081920/)]
84
  -
85
  -
86
  -
 
118
  - [x] **Specialized Research Teams of Agents**:
119
 
120
  ### Team
121
+ - **ZJ**
122
+ - πŸ€— [HuggingFace](https://huggingface.co/Tonic)
123
+ - πŸ’Ό [LinkedIn](https://www.linkedin.com/in/josephpollack/)
124
+ - 𝕏 [X](https://x.com/josephpollack)
125
+ - **Mario Aderman**
126
+ - πŸ€— [HuggingFace](https://huggingface.co/SeasonalFall84)
127
+ - πŸ’Ό [LinkedIn](https://www.linkedin.com/in/mario-aderman/)
128
+ - 𝕏 [X](https://x.com/marioaderman)
129
+ - **Joseph Pollack
130
+ - πŸ€— [HuggingFace](https://huggingface.co/Tonic)
131
+ - πŸ’Ό [LinkedIn](https://www.linkedin.com/in/josephpollack/)
132
+ - 𝕏 [X](https://x.com/josephpollack)
133
 
 
 
 
134
 
135
  ## Acknowledgements
136
 
137
+ - [DeepBoner](https://hf.co/spaces/mcp-1st-birthday/deepboner)
138
+ - Magentic Paper
139
+ - [Huggingface](https://hf.co)
140
+ - [Gradio](https://gradio.app)
141
+ - [DeepCritical](https://github.com/DeepCritical)
142
+ - [Modal](https://modal.com)
143
  - Microsoft
144
  - Pydantic
145
  - Llama-index
146
  - Anthhropic/MCP
147
+ - All our Tool Providers
148
 
149
 
150
  ## Links
src/app.py CHANGED
@@ -925,6 +925,7 @@ def create_demo() -> gr.Blocks:
925
  gr.ChatInterface(
926
  fn=research_agent,
927
  multimodal=True, # Enable multimodal input (text + images + audio)
 
928
  title="πŸ”¬ The DETERMINATOR",
929
  description=(
930
  "*Generalist Deep Research Agent β€” stops at nothing until finding precise answers to complex questions*\n\n"
@@ -939,9 +940,11 @@ def create_demo() -> gr.Blocks:
939
  "- πŸ“Š Evidence synthesis with citations\n\n"
940
  "**MCP Server Active**: Connect Claude Desktop to `/gradio_api/mcp/`\n\n"
941
  "**πŸ“·πŸŽ€ Multimodal Input Support**:\n"
942
- "- **Images**: Upload images to extract text using OCR\n"
943
- "- **Audio**: Record audio or upload audio files for speech-to-text transcription\n"
944
- "- **Text**: Type your research questions directly\n"
 
 
945
  "Configure multimodal inputs in the sidebar settings.\n\n"
946
  "**⚠️ Authentication Required**: Please **sign in with HuggingFace** above before using this application."
947
  ),
 
925
  gr.ChatInterface(
926
  fn=research_agent,
927
  multimodal=True, # Enable multimodal input (text + images + audio)
928
+ file_types=["image", "audio", "video"], # Explicitly enable image, audio, and video file types
929
  title="πŸ”¬ The DETERMINATOR",
930
  description=(
931
  "*Generalist Deep Research Agent β€” stops at nothing until finding precise answers to complex questions*\n\n"
 
940
  "- πŸ“Š Evidence synthesis with citations\n\n"
941
  "**MCP Server Active**: Connect Claude Desktop to `/gradio_api/mcp/`\n\n"
942
  "**πŸ“·πŸŽ€ Multimodal Input Support**:\n"
943
+ "- **Images**: Click the πŸ“· image icon in the textbox to upload images (OCR)\n"
944
+ "- **Audio**: Click the 🎀 microphone icon in the textbox to record audio (STT)\n"
945
+ "- **Files**: Drag & drop or click to upload image/audio files\n"
946
+ "- **Text**: Type your research questions directly\n\n"
947
+ "πŸ’‘ **Tip**: Look for the πŸ“· and 🎀 icons in the text input box below!\n\n"
948
  "Configure multimodal inputs in the sidebar settings.\n\n"
949
  "**⚠️ Authentication Required**: Please **sign in with HuggingFace** above before using this application."
950
  ),