Spaces:

MCP-1st-Birthday
/

DETERMINATOR

Running

App Files Files Community

Joseph Pollack commited on 13 days ago

Commit

0dbfeed

unverified ·

1 Parent(s): 77f56a9

implements final interface fixes

Browse files

Files changed (3) hide show

AUDIO_INPUT_FIX.md +90 -0
README.md +39 -12
src/app.py +6 -3

AUDIO_INPUT_FIX.md ADDED Viewed

	@@ -0,0 +1,90 @@

+# Audio Input Display Fix
+## Issue
+The audio input (microphone button) was not displaying in the ChatInterface multimodal textbox.
+## Root Cause
+When `multimodal=True` is set on `gr.ChatInterface`, it should automatically show image and audio buttons. However:
+1. The buttons might be hidden in a dropdown menu
+2. Browser permissions might be blocking microphone access
+3. The `file_types` parameter might not have been explicitly set
+## Fix Applied
+### 1. Added `file_types` Parameter
+Explicitly specified which file types are accepted to ensure audio is enabled:
+```python
+gr.ChatInterface(
+    fn=research_agent,
+    multimodal=True,
+    file_types=["image", "audio", "video"],  # Explicitly enable image, audio, and video
+    ...
+)
+```
+**File:** `src/app.py` (line 929)
+### 2. Enhanced UI Description
+Updated the description to make it clearer where to find the audio input:
+- Added explicit instructions about clicking the 📷 and 🎤 icons
+- Added a tip about looking for icons in the text input box
+- Clarified drag & drop functionality
+**File:** `src/app.py` (lines 942-948)
+## How It Works Now
+1. **Audio Recording Button**: The 🎤 microphone icon should appear in the textbox toolbar when `multimodal=True` is set
+2. **File Upload**: Users can drag & drop audio files or click to upload
+3. **Browser Permissions**: Browser will prompt for microphone access when user clicks the audio button
+## Testing
+To verify the fix:
+1. Look for the 🎤 microphone icon in the text input box
+2. Click it to start recording (browser will ask for microphone permission)
+3. Alternatively, drag & drop an audio file into the textbox
+4. Check browser console for any permission errors
+## Browser Requirements
+- **Chrome/Edge**: Should work with microphone permissions
+- **Firefox**: Should work with microphone permissions
+- **Safari**: May require additional configuration
+- **HTTPS Required**: Microphone access typically requires HTTPS (or localhost)
+## Troubleshooting
+If audio input still doesn't appear:
+1. **Check Browser Permissions**:
+   - Open browser settings
+   - Check microphone permissions for the site
+   - Ensure microphone is not blocked
+2. **Check Browser Console**:
+   - Open Developer Tools (F12)
+   - Look for permission errors or warnings
+   - Check for any JavaScript errors
+3. **Try Different Browser**:
+   - Some browsers have stricter permission policies
+   - Try Chrome or Firefox if Safari doesn't work
+4. **Check Gradio Version**:
+   - Ensure `gradio>=6.0.0` is installed
+   - Update if needed: `pip install --upgrade gradio`
+5. **HTTPS Requirement**:
+   - Microphone access requires HTTPS (or localhost)
+   - If deploying, ensure SSL is configured
+## Additional Notes
+- The audio button is part of the MultimodalTextbox component
+- It should appear as an icon in the textbox toolbar
+- If it's still not visible, it might be in a dropdown menu (click the "+" or "..." button)
+- The `file_types` parameter ensures audio files are accepted for upload

README.md CHANGED Viewed

@@ -21,6 +21,14 @@ tags:
   - pydantic-ai
   - llamaindex
   - modal
 ---
 > [!IMPORTANT]
@@ -58,11 +66,21 @@ The DETERMINATOR is a powerful generalist deep research agent system that stops
 For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
 ## Deep Critical In the Medial
 - Social Medial Posts about Deep Critical :
-  -
-  -
   -
   -
   -
@@ -100,24 +118,33 @@ For this hackathon we're proposing a simple yet powerful Deep Research Agent tha
 - [x] **Specialized Research Teams of Agents**:
 ### Team
-- ZJ
-- MarioAderman
-- Josephrp
 ## Acknowledgements
-- McSwaggins
-- Magentic
-- Huggingface
-- Gradio
-- DeepCritical
-- Sponsors
 - Microsoft
 - Pydantic
 - Llama-index
 - Anthhropic/MCP
-- List of Tools Makers
 ## Links

   - pydantic-ai
   - llamaindex
   - modal
+  - building-mcp-track-enterprise
+  - building-mcp-track-consumer
+  - mcp-in-action-track-enterprise
+  - mcp-in-action-track-consumer
+  - building-mcp-track-modal
+  - building-mcp-track-blaxel
+  - building-mcp-track-llama-index
+  - building-mcp-track-HUGGINGFACE
 ---
 > [!IMPORTANT]
 For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
+> [!IMPORTANT]
+> **IF YOU ARE A JUDGE**
+>
+> This project was produced with passion by a group of volunteers please check out or documentation and readmes and please do keep reading below for our story
+>
+> - 📚 **Documentation**: See our [technical documentation](deepcritical.github.io/GradioDemo/) for detailed information
+> - 📖 **Complete README**: Check out the [full README](.github/README.md) for setup, configuration, and contribution guidelines
+> - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
 ## Deep Critical In the Medial
 - Social Medial Posts about Deep Critical :
+  - [![X](https://x.com/marioaderman/status/1995247432444133471)]
+  - [![LinkedIn](https://www.linkedin.com/feed/update/urn:li:activity:7400984658496081920/)]
   -
   -
   -
 - [x] **Specialized Research Teams of Agents**:
 ### Team
+- **ZJ**
+    - 🤗 [HuggingFace](https://huggingface.co/Tonic)
+    - 💼 [LinkedIn](https://www.linkedin.com/in/josephpollack/)
+    - 𝕏 [X](https://x.com/josephpollack)
+- **Mario Aderman**
+    - 🤗 [HuggingFace](https://huggingface.co/SeasonalFall84)
+    - 💼 [LinkedIn](https://www.linkedin.com/in/mario-aderman/)
+    - 𝕏 [X](https://x.com/marioaderman)
+- **Joseph Pollack
+    - 🤗 [HuggingFace](https://huggingface.co/Tonic)
+    - 💼 [LinkedIn](https://www.linkedin.com/in/josephpollack/)
+    - 𝕏 [X](https://x.com/josephpollack)
 ## Acknowledgements
+- [DeepBoner](https://hf.co/spaces/mcp-1st-birthday/deepboner)
+- Magentic Paper
+- [Huggingface](https://hf.co)
+- [Gradio](https://gradio.app)
+- [DeepCritical](https://github.com/DeepCritical)
+- [Modal](https://modal.com)
 - Microsoft
 - Pydantic
 - Llama-index
 - Anthhropic/MCP
+- All our Tool Providers
 ## Links

src/app.py CHANGED Viewed

@@ -925,6 +925,7 @@ def create_demo() -> gr.Blocks:
         gr.ChatInterface(
             fn=research_agent,
             multimodal=True,  # Enable multimodal input (text + images + audio)
             title="🔬 The DETERMINATOR",
             description=(
                 "*Generalist Deep Research Agent — stops at nothing until finding precise answers to complex questions*\n\n"
@@ -939,9 +940,11 @@ def create_demo() -> gr.Blocks:
                 "- 📊 Evidence synthesis with citations\n\n"
                 "**MCP Server Active**: Connect Claude Desktop to `/gradio_api/mcp/`\n\n"
                 "**📷🎤 Multimodal Input Support**:\n"
-                "- **Images**: Upload images to extract text using OCR\n"
-                "- **Audio**: Record audio or upload audio files for speech-to-text transcription\n"
-                "- **Text**: Type your research questions directly\n"
                 "Configure multimodal inputs in the sidebar settings.\n\n"
                 "**⚠️ Authentication Required**: Please **sign in with HuggingFace** above before using this application."
             ),

         gr.ChatInterface(
             fn=research_agent,
             multimodal=True,  # Enable multimodal input (text + images + audio)
+            file_types=["image", "audio", "video"],  # Explicitly enable image, audio, and video file types
             title="🔬 The DETERMINATOR",
             description=(
                 "*Generalist Deep Research Agent — stops at nothing until finding precise answers to complex questions*\n\n"
                 "- 📊 Evidence synthesis with citations\n\n"
                 "**MCP Server Active**: Connect Claude Desktop to `/gradio_api/mcp/`\n\n"
                 "**📷🎤 Multimodal Input Support**:\n"
+                "- **Images**: Click the 📷 image icon in the textbox to upload images (OCR)\n"
+                "- **Audio**: Click the 🎤 microphone icon in the textbox to record audio (STT)\n"
+                "- **Files**: Drag & drop or click to upload image/audio files\n"
+                "- **Text**: Type your research questions directly\n\n"
+                "💡 **Tip**: Look for the 📷 and 🎤 icons in the text input box below!\n\n"
                 "Configure multimodal inputs in the sidebar settings.\n\n"
                 "**⚠️ Authentication Required**: Please **sign in with HuggingFace** above before using this application."
             ),