DETERMINATOR

Running

App Files Files Community

Joseph Pollack commited on 6 days ago

Commit

77f56a9

unverified ·

1 Parent(s): 8095967

implements fixes

Browse files

Files changed (21) hide show

ERROR_FIXES_SUMMARY.md +152 -0
FILE_OUTPUT_VERIFICATION.md +220 -0
MULTIMODAL_SETTINGS_IMPLEMENTATION_PLAN.md +382 -0
MULTIMODAL_SETTINGS_IMPLEMENTATION_SUMMARY.md +153 -0
docs/api/orchestrators.md +1 -1
docs/architecture/graph-orchestration.md +0 -152
docs/architecture/graph_orchestration.md +0 -1
docs/architecture/orchestrators.md +1 -3
docs/architecture/workflow-diagrams.md +1 -3
docs/architecture/workflows.md +0 -662
docs/getting-started/examples.md +1 -1
docs/getting-started/mcp-integration.md +1 -1
docs/getting-started/quick-start.md +1 -1
docs/overview/quick-start.md +1 -1
mkdocs.yml +0 -1
src/agent_factory/graph_builder.py +29 -3
src/app.py +55 -16
src/orchestrator/graph_orchestrator.py +56 -8
src/services/image_ocr.py +3 -1
src/services/multimodal_processing.py +1 -1
src/utils/config.py +10 -0

ERROR_FIXES_SUMMARY.md ADDED Viewed

	@@ -0,0 +1,152 @@

+# Error Fixes Summary
+## Issues Identified and Fixed
+### 1. ✅ `'Settings' object has no attribute 'ocr_api_url'`
+**Error Location:** `src/services/image_ocr.py:33`
+**Root Cause:**
+The code was trying to access `settings.ocr_api_url` which doesn't exist in older versions of the config. This happens when running a previous version of the app where `ocr_api_url` wasn't added to the Settings class yet.
+**Fix Applied:**
+- Added defensive coding using `getattr()` with a fallback default URL
+- Default URL: `"https://prithivmlmods-multimodal-ocr3.hf.space"`
+**Code Change:**
+```python
+# Before:
+self.api_url = api_url or settings.ocr_api_url
+# After:
+default_url = getattr(settings, "ocr_api_url", None) or "https://prithivmlmods-multimodal-ocr3.hf.space"
+self.api_url = api_url or default_url
+```
+**File:** `src/services/image_ocr.py`
+---
+### 2. ✅ `Expected code to be unreachable, but got: ('research_complete', False)`
+**Error Location:** `src/orchestrator/graph_orchestrator.py` (decision node execution)
+**Root Cause:**
+When Pydantic AI encounters a validation error or returns an unexpected format, it may return a tuple like `('research_complete', False)` instead of the expected `KnowledgeGapOutput` object. The decision function was trying to access `result.research_complete` on a tuple, causing the error.
+**Fix Applied:**
+1. **Enhanced decision function** in `graph_builder.py` to handle tuple formats
+2. **Improved tuple handling** in `graph_orchestrator.py` decision node execution
+3. **Better reconstruction** of `KnowledgeGapOutput` from validation error tuples
+**Code Changes:**
+**File: `src/agent_factory/graph_builder.py`**
+- Replaced lambda with named function `_decision_function()` that handles tuples
+- Added logic to extract `research_complete` from various tuple formats
+- Handles: `('research_complete', False)`, dicts in tuples, boolean values in tuples
+**File: `src/orchestrator/graph_orchestrator.py`**
+- Enhanced tuple detection and reconstruction in `_execute_decision_node()`
+- Added specific handling for `('research_complete', False)` format
+- Improved fallback logic for unexpected tuple formats
+- Better error messages and logging
+**File: `src/orchestrator/graph_orchestrator.py` (agent node execution)**
+- Improved handling of tuple outputs in `_execute_agent_node()`
+- Better reconstruction of `KnowledgeGapOutput` from validation errors
+- More graceful fallback for non-knowledge_gap nodes
+---
+### 3. ⚠️ `Local state is not initialized - app is not locally available` (Modal TTS)
+**Error Location:** Modal TTS service
+**Root Cause:**
+This is expected behavior when Modal credentials are not configured or the app is not running in a Modal environment. It's not a critical error - TTS will simply be unavailable.
+**Status:**
+- This is **not an error** - it's expected when Modal isn't configured
+- The app gracefully degrades and continues without TTS
+- Users can still use the app, just without audio output
+**No Fix Needed:** This is working as designed with graceful degradation.
+---
+### 4. ⚠️ `Invalid file descriptor: -1` (Asyncio cleanup)
+**Error Location:** Python asyncio event loop cleanup
+**Root Cause:**
+This is a Python asyncio cleanup warning that occurs during shutdown. It's not critical and doesn't affect functionality.
+**Status:**
+- This is a **warning**, not an error
+- Occurs during application shutdown
+- Doesn't affect runtime functionality
+- Common in Python 3.11+ with certain asyncio patterns
+**No Fix Needed:** This is a known Python asyncio quirk and doesn't impact functionality.
+---
+### 5. ⚠️ MCP Server Warning: `gr.State input will not be updated between tool calls`
+**Error Location:** Gradio MCP server setup
+**Root Cause:**
+Some MCP tools use `gr.State` inputs, which Gradio warns won't update between tool calls. This is a limitation of how MCP tools interact with Gradio state.
+**Status:**
+- This is a **warning**, not an error
+- MCP tools will still work, but state won't persist between calls
+- This is a known Gradio MCP limitation
+**No Fix Needed:** This is a Gradio limitation, not a bug in our code.
+---
+## Summary of Fixes
+### Critical Fixes (Applied):
+1. ✅ **OCR API URL Attribute Error** - Fixed with defensive coding
+2. ✅ **Graph Orchestrator Tuple Handling** - Fixed with enhanced tuple detection and reconstruction
+### Non-Critical (Expected Behavior):
+3. ⚠️ **Modal TTS Error** - Expected when Modal not configured (graceful degradation)
+4. ⚠️ **Asyncio Cleanup Warning** - Python asyncio quirk (non-critical)
+5. ⚠️ **MCP State Warning** - Gradio limitation (non-critical)
+## Testing Recommendations
+1. **Test OCR functionality:**
+   - Upload an image with text
+   - Verify OCR processing works
+   - Check logs for any remaining errors
+2. **Test graph execution:**
+   - Run a research query
+   - Verify knowledge gap evaluation works
+   - Check that decision nodes route correctly
+   - Monitor logs for tuple handling warnings
+3. **Test with/without Modal:**
+   - Verify app works without Modal credentials
+   - Test TTS if Modal is configured
+   - Verify graceful degradation
+## Files Modified
+1. `src/services/image_ocr.py` - Added defensive `ocr_api_url` access
+2. `src/orchestrator/graph_orchestrator.py` - Enhanced tuple handling in decision and agent nodes
+3. `src/agent_factory/graph_builder.py` - Improved decision function to handle tuples
+## Next Steps
+1. Test the fixes with the reported error scenarios
+2. Monitor logs for any remaining issues
+3. Consider adding unit tests for tuple handling edge cases
+4. Document the tuple format handling for future reference

FILE_OUTPUT_VERIFICATION.md ADDED Viewed

	@@ -0,0 +1,220 @@

+# File Output Implementation Verification
+## Status: ✅ ALL CHANGES RETAINED
+All file output functionality has been successfully implemented and retained in the codebase.
+---
+## Verification Checklist
+### ✅ PROJECT 1: File Writing Service
+- **File**: `src/services/report_file_service.py`
+- **Status**: ✅ EXISTS
+- **Key Components**:
+  - `ReportFileService` class
+  - `save_report()` method
+  - `save_report_multiple_formats()` method
+  - `_generate_filename()` helper
+  - `_sanitize_filename()` helper
+  - `cleanup_old_files()` method
+  - `get_report_file_service()` singleton function
+### ✅ PROJECT 2: Configuration Updates
+- **File**: `src/utils/config.py`
+- **Status**: ✅ ALL SETTINGS PRESENT
+- **Settings Added** (lines 181-195):
+  - ✅ `save_reports_to_file: bool = True`
+  - ✅ `report_output_directory: str | None = None`
+  - ✅ `report_file_format: Literal["md", "md_html", "md_pdf"] = "md"`
+  - ✅ `report_filename_template: str = "report_{timestamp}_{query_hash}.md"`
+### ✅ PROJECT 3: Graph Orchestrator Integration
+- **File**: `src/orchestrator/graph_orchestrator.py`
+- **Status**: ✅ FULLY INTEGRATED
+#### Imports (Line 35)
+```python
+from src.services.report_file_service import ReportFileService, get_report_file_service
+```
+✅ Present
+#### File Service Initialization (Line 152)
+```python
+self._file_service: ReportFileService | None = None
+```
+✅ Present
+#### Helper Method (Lines 162-175)
+```python
+def _get_file_service(self) -> ReportFileService | None:
+    """Get file service instance (lazy initialization)."""
+    ...
+```
+✅ Present
+#### Synthesizer Node File Saving (Lines 673-694)
+- ✅ Saves report after `long_writer_agent.write_report()`
+- ✅ Returns dict with `{"message": report, "file": file_path}` if file saved
+- ✅ Returns string if file saving fails (backward compatible)
+- ✅ Error handling with logging
+#### Writer Node File Saving (Lines 729-748)
+- ✅ Saves report after `writer_agent.write_report()`
+- ✅ Returns dict with `{"message": report, "file": file_path}` if file saved
+- ✅ Returns string if file saving fails (backward compatible)
+- ✅ Error handling with logging
+#### Final Event Handling (Lines 558-585)
+- ✅ Extracts file path from final result dict
+- ✅ Adds file path to `event_data["file"]` or `event_data["files"]`
+- ✅ Handles both single file and multiple files
+- ✅ Sets appropriate message
+### ✅ PROJECT 4: Research Flow Integration
+- **File**: `src/orchestrator/research_flow.py`
+- **Status**: ✅ FULLY INTEGRATED
+#### Imports (Line 28)
+```python
+from src.services.report_file_service import ReportFileService, get_report_file_service
+```
+✅ Present
+#### IterativeResearchFlow
+- **File Service Initialization** (Line 117): ✅ Present
+- **Helper Method** (Lines 119-132): ✅ Present
+- **File Saving in `_create_final_report()`** (Lines 683-692): ✅ Present
+  - Saves after `writer_agent.write_report()`
+  - Logs file path
+  - Error handling with logging
+#### DeepResearchFlow
+- **File Service Initialization** (Line 761): ✅ Present
+- **Helper Method** (Lines 763-776): ✅ Present
+- **File Saving in `_create_final_report()`** (Lines 1055-1064): ✅ Present
+  - Saves after `long_writer_agent.write_report()` or `proofreader_agent.proofread()`
+  - Logs file path
+  - Error handling with logging
+### ✅ PROJECT 5: Gradio Integration
+- **File**: `src/app.py`
+- **Status**: ✅ ALREADY IMPLEMENTED (from previous work)
+- **Function**: `event_to_chat_message()` (Lines 209-350)
+- **Features**:
+  - ✅ Detects file paths in `event.data["file"]` or `event.data["files"]`
+  - ✅ Formats files as markdown download links
+  - ✅ Handles both single and multiple files
+  - ✅ Validates file paths with `_is_file_path()` helper
+---
+## Implementation Summary
+### File Saving Locations
+1. **Graph Orchestrator - Synthesizer Node** (Deep Research)
+   - Location: `src/orchestrator/graph_orchestrator.py:673-694`
+   - Trigger: After `long_writer_agent.write_report()`
+   - Returns: Dict with file path or string (backward compatible)
+2. **Graph Orchestrator - Writer Node** (Iterative Research)
+   - Location: `src/orchestrator/graph_orchestrator.py:729-748`
+   - Trigger: After `writer_agent.write_report()`
+   - Returns: Dict with file path or string (backward compatible)
+3. **IterativeResearchFlow**
+   - Location: `src/orchestrator/research_flow.py:683-692`
+   - Trigger: After `writer_agent.write_report()` in `_create_final_report()`
+   - Returns: String (file path logged, not returned)
+4. **DeepResearchFlow**
+   - Location: `src/orchestrator/research_flow.py:1055-1064`
+   - Trigger: After `long_writer_agent.write_report()` or `proofreader_agent.proofread()`
+   - Returns: String (file path logged, not returned)
+### File Path Flow
+```
+Report Generation
+    ↓
+ReportFileService.save_report()
+    ↓
+File saved to disk (temp directory or configured directory)
+    ↓
+File path returned to orchestrator
+    ↓
+File path included in result dict: {"message": report, "file": file_path}
+    ↓
+Result dict stored in GraphExecutionContext
+    ↓
+Final event extraction (lines 558-585)
+    ↓
+File path added to AgentEvent.data["file"]
+    ↓
+event_to_chat_message() (src/app.py)
+    ↓
+File path formatted as markdown download link
+    ↓
+Gradio ChatInterface displays download link
+```
+---
+## Testing Recommendations
+### Unit Tests
+- [ ] Test `ReportFileService.save_report()` with various inputs
+- [ ] Test filename generation with templates
+- [ ] Test file sanitization
+- [ ] Test error handling (permission errors, disk full, etc.)
+### Integration Tests
+- [ ] Test graph orchestrator file saving for synthesizer node
+- [ ] Test graph orchestrator file saving for writer node
+- [ ] Test file path inclusion in AgentEvent
+- [ ] Test Gradio message conversion with file paths
+- [ ] Test file download in Gradio UI
+### Manual Testing
+- [ ] Run iterative research flow and verify file is created
+- [ ] Run deep research flow and verify file is created
+- [ ] Verify file appears as download link in Gradio ChatInterface
+- [ ] Test with file saving disabled (`save_reports_to_file=False`)
+- [ ] Test with custom output directory
+---
+## Configuration Options
+All settings are in `src/utils/config.py`:
+```python
+# Enable/disable file saving
+save_reports_to_file: bool = True
+# Custom output directory (None = use temp directory)
+report_output_directory: str | None = None
+# File format (currently only "md" is fully implemented)
+report_file_format: Literal["md", "md_html", "md_pdf"] = "md"
+# Filename template with placeholders
+report_filename_template: str = "report_{timestamp}_{query_hash}.md"
+```
+---
+## Conclusion
+✅ **All file output functionality has been successfully implemented and retained.**
+The implementation is:
+- ✅ Complete (all planned features implemented)
+- ✅ Backward compatible (existing code continues to work)
+- ✅ Error resilient (file saving failures don't crash workflows)
+- ✅ Configurable (can be enabled/disabled via settings)
+- ✅ Integrated with Gradio (file paths appear as download links)
+No reimplementation needed. All changes are present and correct.

MULTIMODAL_SETTINGS_IMPLEMENTATION_PLAN.md ADDED Viewed

	@@ -0,0 +1,382 @@

+# Multimodal Settings & File Rendering - Implementation Plan
+## Executive Summary
+This document provides a comprehensive analysis of the current settings implementation, multimodal input handling, and file rendering in `src/app.py`, along with a detailed implementation plan to improve the user experience.
+## 1. Current Settings Analysis
+### 1.1 Settings Structure in `src/app.py`
+**Current Implementation (Lines 741-887):**
+1. **Sidebar Structure:**
+   - Authentication section (lines 745-750)
+   - About section (lines 752-764)
+   - Settings section (lines 767-850):
+     - Research Configuration Accordion (lines 771-796):
+       - `mode_radio`: Orchestrator mode selector
+       - `graph_mode_radio`: Graph research mode selector
+       - `use_graph_checkbox`: Graph execution toggle
+     - Audio Output Accordion (lines 798-850):
+       - `enable_audio_output_checkbox`: TTS enable/disable
+       - `tts_voice_dropdown`: Voice selection
+       - `tts_speed_slider`: Speech speed control
+       - `tts_gpu_dropdown`: GPU type (non-interactive, visible only if Modal available)
+2. **Hidden Components (Lines 852-865):**
+   - `hf_model_dropdown`: Hidden Textbox for model selection
+   - `hf_provider_dropdown`: Hidden Textbox for provider selection
+3. **Main Area Components (Lines 867-887):**
+   - `audio_output`: Audio output component (visible based on `settings.enable_audio_output`)
+   - Visibility update function for TTS components
+### 1.2 Settings Flow
+**Settings → Function Parameters:**
+- Settings from sidebar accordions are passed via `additional_inputs` to `research_agent()` function
+- Hidden textboxes are also passed but use empty strings (converted to None)
+- OAuth token/profile are automatically passed by Gradio
+**Function Signature (Lines 535-546):**
+```python
+async def research_agent(
+    message: str | MultimodalPostprocess,
+    history: list[dict[str, Any]],
+    mode: str = "simple",
+    hf_model: str | None = None,
+    hf_provider: str | None = None,
+    graph_mode: str = "auto",
+    use_graph: bool = True,
+    tts_voice: str = "af_heart",
+    tts_speed: float = 1.0,
+    oauth_token: gr.OAuthToken | None = None,
+    oauth_profile: gr.OAuthProfile | None = None,
+)
+```
+### 1.3 Issues Identified
+1. **Settings Organization:**
+   - Audio output component is in main area, not sidebar
+   - Hidden components (hf_model, hf_provider) should be visible or removed
+   - No image input enable/disable setting (only audio input has this)
+2. **Visibility:**
+   - Audio output visibility is controlled by checkbox, but component placement is suboptimal
+   - TTS settings visibility is controlled by checkbox change event
+3. **Configuration Gaps:**
+   - No `enable_image_input` setting in config (only `enable_audio_input` exists)
+   - Image processing always happens if files are present (line 626 comment says "not just when enable_image_input is True" but setting doesn't exist)
+## 2. Multimodal Input Analysis
+### 2.1 Current Implementation
+**ChatInterface Configuration (Line 892-958):**
+- `multimodal=True`: Enables MultimodalTextbox component
+- MultimodalTextbox automatically provides:
+  - Text input
+  - Image upload button
+  - Audio recording button
+  - File upload support
+**Input Processing (Lines 613-642):**
+- Message can be `str` or `MultimodalPostprocess` (dict format)
+- MultimodalPostprocess format: `{"text": str, "files": list[FileData], "audio": tuple | None}`
+- Processing happens in `research_agent()` function:
+  - Extracts text, files, and audio from message
+  - Calls `multimodal_service.process_multimodal_input()`
+  - Condition: `if files or (audio_input_data is not None and settings.enable_audio_input)`
+**Multimodal Service (src/services/multimodal_processing.py):**
+- Processes audio input if `settings.enable_audio_input` is True
+- Processes image files (no enable/disable check - always processes if files present)
+- Extracts text from images using OCR service
+- Transcribes audio using STT service
+### 2.2 Gradio Documentation Findings
+**MultimodalTextbox (ChatInterface with multimodal=True):**
+- Automatically provides image and audio input capabilities
+- Inputs are always visible when ChatInterface is rendered
+- No explicit visibility control needed - it's part of the textbox component
+- Files are handled via `files` array in MultimodalPostprocess
+- Audio recording is handled via `audio` tuple in MultimodalPostprocess
+**Reference Implementation Pattern:**
+```python
+gr.ChatInterface(
+    fn=chat_function,
+    multimodal=True,  # Enables image/audio inputs
+    # ... other parameters
+)
+```
+### 2.3 Issues Identified
+1. **Visibility:**
+   - Multimodal inputs ARE always visible (they're part of MultimodalTextbox)
+   - No explicit control needed - this is working correctly
+   - However, users may not realize image/audio inputs are available
+2. **Configuration:**
+   - No `enable_image_input` setting to disable image processing
+   - Image processing always happens if files are present
+   - Audio processing respects `settings.enable_audio_input`
+3. **User Experience:**
+   - No visual indication that multimodal inputs are available
+   - Description mentions "🎤 Multimodal Support" but could be more prominent
+## 3. File Rendering Analysis
+### 3.1 Current Implementation
+**File Detection (Lines 168-195):**
+- `_is_file_path()`: Checks if text looks like a file path
+- Checks for file extensions and path separators
+**File Rendering in Events (Lines 242-298):**
+- For "complete" events, checks `event.data` for "files" or "file" keys
+- Validates files exist using `os.path.exists()`
+- Formats files as markdown download links: `📎 [Download: filename](filepath)`
+- Stores files in metadata for potential future use
+**File Links Format:**
+```python
+file_links = "\n\n".join([
+    f"📎 [Download: {_get_file_name(f)}]({f})"
+    for f in valid_files
+])
+result["content"] = f"{content}\n\n{file_links}"
+```
+### 3.2 Issues Identified
+1. **Rendering Method:**
+   - Uses markdown links in content string
+   - May not work reliably in all Gradio versions
+   - Better approach: Use Gradio's native file components or File component
+2. **File Validation:**
+   - Only checks if file exists
+   - Doesn't validate file type or size
+   - No error handling for inaccessible files
+3. **User Experience:**
+   - Files appear as text links, not as proper file components
+   - No preview for images/PDFs
+   - No file size information
+## 4. Implementation Plan
+### Activity 1: Settings Reorganization
+**Goal:** Move all settings to sidebar with better organization
+**File:** `src/app.py`
+**Tasks:**
+1. **Move Audio Output Component to Sidebar (Lines 867-887)**
+   - Move `audio_output` component into sidebar
+   - Place it in Audio Output accordion or create separate section
+   - Update visibility logic to work within sidebar
+2. **Add Image Input Settings (New)**
+   - Add `enable_image_input` checkbox to sidebar
+   - Create "Image Input" accordion or add to existing "Multimodal Input" accordion
+   - Update config to include `enable_image_input` setting
+3. **Organize Settings Accordions**
+   - Research Configuration (existing)
+   - Multimodal Input (new - combine image and audio input settings)
+   - Audio Output (existing - move component here)
+   - Model Configuration (new - for hf_model, hf_provider if we make them visible)
+**Subtasks:**
+- [ ] Line 867-871: Move `audio_output` component definition into sidebar
+- [ ] Line 873-887: Update visibility update function to work with sidebar placement
+- [ ] Line 798-850: Reorganize Audio Output accordion to include audio_output component
+- [ ] Line 767-796: Keep Research Configuration as-is
+- [ ] After line 796: Add new "Multimodal Input" accordion with enable_image_input and enable_audio_input checkboxes
+- [ ] Line 852-865: Consider making hf_model and hf_provider visible or remove them
+### Activity 2: Multimodal Input Visibility
+**Goal:** Ensure multimodal inputs are always visible and well-documented
+**File:** `src/app.py`
+**Tasks:**
+1. **Verify Multimodal Inputs Are Visible**
+   - Confirm `multimodal=True` in ChatInterface (already done - line 894)
+   - Add visual indicators in description
+   - Add tooltips or help text
+2. **Add Image Input Configuration**
+   - Add `enable_image_input` to config (src/utils/config.py)
+   - Update multimodal processing to respect this setting
+   - Add UI control in sidebar
+**Subtasks:**
+- [ ] Line 894: Verify `multimodal=True` is set (already correct)
+- [ ] Line 908: Enhance description to highlight multimodal capabilities
+- [ ] src/utils/config.py: Add `enable_image_input: bool = Field(default=True, ...)`
+- [ ] src/services/multimodal_processing.py: Add check for `settings.enable_image_input` before processing images
+- [ ] src/app.py: Add enable_image_input checkbox to sidebar
+### Activity 3: File Rendering Improvements
+**Goal:** Improve file rendering using proper Gradio components
+**File:** `src/app.py`
+**Tasks:**
+1. **Improve File Rendering Method**
+   - Use Gradio File component or proper file handling
+   - Add file previews for images
+   - Show file size and type information
+2. **Enhance File Validation**
+   - Validate file types
+   - Check file accessibility
+   - Handle errors gracefully
+**Subtasks:**
+- [ ] Line 280-296: Replace markdown link approach with proper file component rendering
+- [ ] Line 168-195: Enhance `_is_file_path()` to validate file types
+- [ ] Line 242-298: Update `event_to_chat_message()` to use Gradio File components
+- [ ] Add file preview functionality for images
+- [ ] Add error handling for inaccessible files
+### Activity 4: Configuration Updates
+**Goal:** Add missing configuration settings
+**File:** `src/utils/config.py`
+**Tasks:**
+1. **Add Image Input Setting**
+   - Add `enable_image_input` field
+   - Add `ocr_api_url` field if missing
+   - Add property methods for availability checks
+**Subtasks:**
+- [ ] After line 147: Add `enable_image_input: bool = Field(default=True, description="Enable image input (OCR) in multimodal interface")`
+- [ ] Check if `ocr_api_url` exists (should be in config)
+- [ ] Add `image_ocr_available` property if missing
+### Activity 5: Multimodal Service Updates
+**Goal:** Respect image input enable/disable setting
+**File:** `src/services/multimodal_processing.py`
+**Tasks:**
+1. **Add Image Input Check**
+   - Check `settings.enable_image_input` before processing images
+   - Log when image processing is skipped due to setting
+**Subtasks:**
+- [ ] Line 66-77: Add check for `settings.enable_image_input` before processing image files
+- [ ] Add logging when image processing is skipped
+## 5. Detailed File-Level Tasks
+### File: `src/app.py`
+**Line-Level Subtasks:**
+1. **Lines 741-850: Sidebar Reorganization**
+   - [ ] 741-765: Keep authentication and about sections
+   - [ ] 767-796: Keep Research Configuration accordion
+   - [ ] 797: Add new "Multimodal Input" accordion after Research Configuration
+   - [ ] 798-850: Reorganize Audio Output accordion, move audio_output component here
+   - [ ] 852-865: Review hidden components - make visible or remove
+2. **Lines 867-887: Audio Output Component**
+   - [ ] 867-871: Move `audio_output` definition into sidebar (Audio Output accordion)
+   - [ ] 873-887: Update visibility function to work with sidebar placement
+3. **Lines 892-958: ChatInterface Configuration**
+   - [ ] 894: Verify `multimodal=True` (already correct)
+   - [ ] 908: Enhance description with multimodal capabilities
+   - [ ] 946-956: Review `additional_inputs` - ensure all settings are included
+4. **Lines 242-298: File Rendering**
+   - [ ] 280-296: Replace markdown links with proper file component rendering
+   - [ ] Add file preview for images
+   - [ ] Add file size/type information
+5. **Lines 613-642: Multimodal Input Processing**
+   - [ ] 626: Update condition to check `settings.enable_image_input` for files
+   - [ ] Add logging for when image processing is skipped
+### File: `src/utils/config.py`
+**Line-Level Subtasks:**
+1. **Lines 143-180: Audio/Image Configuration**
+   - [ ] 144-147: `enable_audio_input` exists (keep as-is)
+   - [ ] After 147: Add `enable_image_input: bool = Field(default=True, description="Enable image input (OCR) in multimodal interface")`
+   - [ ] Check if `ocr_api_url` exists (add if missing)
+   - [ ] Add `image_ocr_available` property method
+### File: `src/services/multimodal_processing.py`
+**Line-Level Subtasks:**
+1. **Lines 65-77: Image Processing**
+   - [ ] 66: Add check: `if files and settings.enable_image_input:`
+   - [ ] 71-77: Keep image processing logic inside the new condition
+   - [ ] Add logging when image processing is skipped
+## 6. Testing Checklist
+- [ ] Verify all settings are in sidebar
+- [ ] Test multimodal inputs (image upload, audio recording)
+- [ ] Test file rendering (markdown, PDF, images)
+- [ ] Test enable/disable toggles for image and audio inputs
+- [ ] Test audio output generation and display
+- [ ] Test file download links
+- [ ] Verify settings persist across chat sessions
+- [ ] Test on different screen sizes (responsive design)
+## 7. Implementation Order
+1. **Phase 1: Configuration** (Foundation)
+   - Add `enable_image_input` to config
+   - Update multimodal service to respect setting
+2. **Phase 2: Settings Reorganization** (UI)
+   - Move audio output to sidebar
+   - Add image input settings to sidebar
+   - Organize accordions
+3. **Phase 3: File Rendering** (Enhancement)
+   - Improve file rendering method
+   - Add file previews
+   - Enhance validation
+4. **Phase 4: Testing & Refinement** (Quality)
+   - Test all functionality
+   - Fix any issues
+   - Refine UI/UX
+## 8. Success Criteria
+- ✅ All settings are in sidebar
+- ✅ Multimodal inputs are always visible and functional
+- ✅ Files are rendered properly with previews
+- ✅ Image and audio input can be enabled/disabled
+- ✅ Settings are well-organized and intuitive
+- ✅ No regressions in existing functionality

MULTIMODAL_SETTINGS_IMPLEMENTATION_SUMMARY.md ADDED Viewed

	@@ -0,0 +1,153 @@

+# Multimodal Settings & File Rendering - Implementation Summary
+## ✅ Completed Implementation
+### 1. Configuration Updates (`src/utils/config.py`)
+**Added Settings:**
+- ✅ `enable_image_input: bool = Field(default=True, ...)` - Enable/disable image OCR processing
+- ✅ `ocr_api_url: str | None = Field(default="https://prithivmlmods-multimodal-ocr3.hf.space", ...)` - OCR service URL
+**Location:** Lines 148-156 (after `enable_audio_output`)
+### 2. Multimodal Service Updates (`src/services/multimodal_processing.py`)
+**Changes:**
+- ✅ Added check for `settings.enable_image_input` before processing image files
+- ✅ Image processing now respects the enable/disable setting (similar to audio input)
+**Location:** Line 66 - Added condition: `if files and settings.enable_image_input:`
+### 3. Sidebar Reorganization (`src/app.py`)
+**New Accordion: "📷 Multimodal Input"**
+- ✅ Added `enable_image_input_checkbox` - Control image OCR processing
+- ✅ Added `enable_audio_input_checkbox` - Control audio STT processing
+- ✅ Located after "Research Configuration" accordion
+**Updated Accordion: "🔊 Audio Output"**
+- ✅ Moved `audio_output` component into this accordion (was in main area)
+- ✅ Component now appears in sidebar with other audio settings
+- ✅ Visibility controlled by `enable_audio_output_checkbox`
+**Settings Organization:**
+1. 🔬 Research Configuration (existing)
+2. 📷 Multimodal Input (NEW)
+3. 🔊 Audio Output (updated - now includes audio_output component)
+**Location:** Lines 770-850
+### 4. Function Signature Updates (`src/app.py`)
+**Updated `research_agent()` function:**
+- ✅ Added `enable_image_input: bool = True` parameter
+- ✅ Added `enable_audio_input: bool = True` parameter
+- ✅ Function now accepts UI settings directly from sidebar checkboxes
+**Location:** Lines 535-547
+### 5. Multimodal Input Processing (`src/app.py`)
+**Updates:**
+- ✅ Uses function parameters (`enable_image_input`, `enable_audio_input`) instead of only config settings
+- ✅ Filters files and audio based on UI settings before processing
+- ✅ More responsive to user changes (no need to restart app)
+**Location:** Lines 624-636
+### 6. File Rendering Improvements (`src/app.py`)
+**Enhancements:**
+- ✅ Added file size display in download links
+- ✅ Better error handling for file size retrieval
+- ✅ Improved formatting with file size information (B, KB, MB)
+**Location:** Lines 286-300
+### 7. UI Description Updates (`src/app.py`)
+**Enhanced Description:**
+- ✅ Better explanation of multimodal capabilities
+- ✅ Clear list of supported input types (Images, Audio, Text)
+- ✅ Reference to sidebar settings for configuration
+**Location:** Lines 907-912
+## 📋 Current Settings Structure
+### Sidebar Layout:
+```
+🔐 Authentication
+  - Login button
+  - About section
+⚙️ Settings
+  ├─ 🔬 Research Configuration
+  │   ├─ Orchestrator Mode
+  │   ├─ Graph Research Mode
+  │   └─ Use Graph Execution
+  │
+  ├─ 📷 Multimodal Input (NEW)
+  │   ├─ Enable Image Input (OCR)
+  │   └─ Enable Audio Input (STT)
+  │
+  └─ 🔊 Audio Output
+      ├─ Enable Audio Output
+      ├─ TTS Voice
+      ├─ TTS Speech Speed
+      ├─ TTS GPU Type (if Modal available)
+      └─ 🔊 Audio Response (moved from main area)
+```
+## 🔍 Key Features
+### Multimodal Inputs (Always Visible)
+- **Image Upload**: Available in ChatInterface textbox (multimodal=True)
+- **Audio Recording**: Available in ChatInterface textbox (multimodal=True)
+- **File Upload**: Supported via MultimodalTextbox
+- **Visibility**: Always visible - part of ChatInterface component
+- **Control**: Can be enabled/disabled via sidebar settings
+### File Rendering
+- **Method**: Markdown download links in chat content
+- **Format**: `📎 [Download: filename (size)](filepath)`
+- **Validation**: Checks file existence before rendering
+- **Metadata**: Files stored in message metadata for future use
+### Settings Flow
+1. User changes settings in sidebar checkboxes
+2. Settings passed to `research_agent()` via `additional_inputs`
+3. Function uses UI settings (with config defaults as fallback)
+4. Multimodal processing respects enable/disable flags
+5. Settings persist during chat session
+## 🧪 Testing Checklist
+- [ ] Verify all settings are in sidebar
+- [ ] Test image upload with OCR enabled/disabled
+- [ ] Test audio recording with STT enabled/disabled
+- [ ] Test file rendering (markdown, PDF, images)
+- [ ] Test audio output generation and display in sidebar
+- [ ] Test file download links
+- [ ] Verify settings work without requiring app restart
+- [ ] Test on different screen sizes (responsive design)
+## 📝 Notes
+1. **Multimodal Inputs Visibility**: The inputs are always visible because they're part of the `MultimodalTextbox` component when `multimodal=True` is set in ChatInterface. No additional visibility control is needed.
+2. **Settings Persistence**: Settings are passed via function parameters, so they persist during the chat session but reset when the app restarts. For persistent settings across sessions, consider using Gradio's state management or session storage.
+3. **File Rendering**: Gradio ChatInterface automatically handles markdown file links. The current implementation with file size information should work well. For more advanced file previews, consider using Gradio's File component in a custom Blocks layout.
+4. **Hidden Components**: The `hf_model_dropdown` and `hf_provider_dropdown` are still hidden. Consider making them visible in a "Model Configuration" accordion if needed, or remove them if not used.
+## 🚀 Next Steps (Optional Enhancements)
+1. **Model Configuration Accordion**: Make hf_model and hf_provider visible in sidebar
+2. **File Previews**: Add image previews for uploaded images in chat
+3. **Settings Persistence**: Implement session-based settings storage
+4. **Advanced File Rendering**: Use Gradio File component for better file handling
+5. **Error Handling**: Add better error messages for failed file operations

docs/api/orchestrators.md CHANGED Viewed

@@ -178,7 +178,7 @@ Runs Magentic orchestration.
 ## See Also
 - [Architecture - Orchestrators](../architecture/orchestrators.md) - Architecture overview
-- [Graph Orchestration](../architecture/graph-orchestration.md) - Graph execution details

 ## See Also
 - [Architecture - Orchestrators](../architecture/orchestrators.md) - Architecture overview
+- [Graph Orchestration](../architecture/graph_orchestration.md) - Graph execution details

docs/architecture/graph-orchestration.md DELETED Viewed

@@ -1,152 +0,0 @@
-# Graph Orchestration Architecture
-## Overview
-Phase 4 implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
-## Graph Structure
-### Nodes
-Graph nodes represent different stages in the research workflow:
-1. **Agent Nodes**: Execute Pydantic AI agents
-   - Input: Prompt/query
-   - Output: Structured or unstructured response
-   - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
-2. **State Nodes**: Update or read workflow state
-   - Input: Current state
-   - Output: Updated state
-   - Examples: Update evidence, update conversation history
-3. **Decision Nodes**: Make routing decisions based on conditions
-   - Input: Current state/results
-   - Output: Next node ID
-   - Examples: Continue research vs. complete research
-4. **Parallel Nodes**: Execute multiple nodes concurrently
-   - Input: List of node IDs
-   - Output: Aggregated results
-   - Examples: Parallel iterative research loops
-### Edges
-Edges define transitions between nodes:
-1. **Sequential Edges**: Always traversed (no condition)
-   - From: Source node
-   - To: Target node
-   - Condition: None (always True)
-2. **Conditional Edges**: Traversed based on condition
-   - From: Source node
-   - To: Target node
-   - Condition: Callable that returns bool
-   - Example: If research complete → go to writer, else → continue loop
-3. **Parallel Edges**: Used for parallel execution branches
-   - From: Parallel node
-   - To: Multiple target nodes
-   - Execution: All targets run concurrently
-## Graph Patterns
-### Iterative Research Graph
-```
-[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
-                                              ↓ No          ↓ Yes
-                                    [Tool Selector]    [Writer]
-                                              ↓
-                                    [Execute Tools] → [Loop Back]
-```
-### Deep Research Graph
-```
-[Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
-                           ↓         ↓         ↓
-                        [Loop1]  [Loop2]  [Loop3]
-```
-## State Management
-State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
-- **Evidence**: Collected evidence from searches
-- **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
-- **Embedding Service**: For semantic search
-State transitions occur at state nodes, which update the global workflow state.
-## Execution Flow
-1. **Graph Construction**: Build graph from nodes and edges
-2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
-3. **Graph Execution**: Traverse graph from entry node
-4. **Node Execution**: Execute each node based on type
-5. **Edge Evaluation**: Determine next node(s) based on edges
-6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
-7. **State Updates**: Update state at state nodes
-8. **Event Streaming**: Yield events during execution for UI
-## Conditional Routing
-Decision nodes evaluate conditions and return next node IDs:
-- **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
-- **Budget Decision**: If budget exceeded → exit, else → continue
-- **Iteration Decision**: If max iterations → exit, else → continue
-## Parallel Execution
-Parallel nodes execute multiple nodes concurrently:
-- Each parallel branch runs independently
-- Results are aggregated after all branches complete
-- State is synchronized after parallel execution
-- Errors in one branch don't stop other branches
-## Budget Enforcement
-Budget constraints are enforced at decision nodes:
-- **Token Budget**: Track LLM token usage
-- **Time Budget**: Track elapsed time
-- **Iteration Budget**: Track iteration count
-If any budget is exceeded, execution routes to exit node.
-## Error Handling
-Errors are handled at multiple levels:
-1. **Node Level**: Catch errors in individual node execution
-2. **Graph Level**: Handle errors during graph traversal
-3. **State Level**: Rollback state changes on error
-Errors are logged and yield error events for UI.
-## Backward Compatibility
-Graph execution is optional via feature flag:
-- `USE_GRAPH_EXECUTION=true`: Use graph-based execution
-- `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
-This allows gradual migration and fallback if needed.

docs/architecture/graph_orchestration.md CHANGED Viewed

@@ -216,7 +216,6 @@ This allows gradual migration and fallback if needed.
 ## See Also
 - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
-- [Workflows](workflows.md) - Workflow diagrams and patterns
 - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

 ## See Also
 - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
 - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/orchestrators.md CHANGED Viewed

@@ -190,9 +190,7 @@ class AgentEvent:
 ## See Also
-- [Graph Orchestration](graph-orchestration.md) - Graph-based execution details
-- [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
-- [Workflows](workflows.md) - Workflow diagrams and patterns
 - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

 ## See Also
+- [Graph Orchestration](graph_orchestration.md) - Graph-based execution details
 - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/workflow-diagrams.md CHANGED Viewed

@@ -664,7 +664,5 @@ No separate Judge Agent needed - manager does it all!
 ## See Also
 - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
-- [Graph Orchestration](graph-orchestration.md) - Graph-based execution overview
-- [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
-- [Workflows](workflows.md) - Workflow patterns summary
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

 ## See Also
 - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
+- [Graph Orchestration](graph_orchestration.md) - Graph-based execution overview
 - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/workflows.md DELETED Viewed

@@ -1,662 +0,0 @@
-# DeepCritical Workflow - Simplified Magentic Architecture
-> **Architecture Pattern**: Microsoft Magentic Orchestration
-> **Design Philosophy**: Simple, dynamic, manager-driven coordination
-> **Key Innovation**: Intelligent manager replaces rigid sequential phases
----
-## 1. High-Level Magentic Workflow
-```mermaid
-flowchart TD
-    Start([User Query]) --> Manager[Magentic Manager<br/>Plan • Select • Assess • Adapt]
-    Manager -->|Plans| Task1[Task Decomposition]
-    Task1 --> Manager
-    Manager -->|Selects & Executes| HypAgent[Hypothesis Agent]
-    Manager -->|Selects & Executes| SearchAgent[Search Agent]
-    Manager -->|Selects & Executes| AnalysisAgent[Analysis Agent]
-    Manager -->|Selects & Executes| ReportAgent[Report Agent]
-    HypAgent -->|Results| Manager
-    SearchAgent -->|Results| Manager
-    AnalysisAgent -->|Results| Manager
-    ReportAgent -->|Results| Manager
-    Manager -->|Assesses Quality| Decision{Good Enough?}
-    Decision -->|No - Refine| Manager
-    Decision -->|No - Different Agent| Manager
-    Decision -->|No - Stalled| Replan[Reset Plan]
-    Replan --> Manager
-    Decision -->|Yes| Synthesis[Synthesize Final Result]
-    Synthesis --> Output([Research Report])
-    style Start fill:#e1f5e1
-    style Manager fill:#ffe6e6
-    style HypAgent fill:#fff4e6
-    style SearchAgent fill:#fff4e6
-    style AnalysisAgent fill:#fff4e6
-    style ReportAgent fill:#fff4e6
-    style Decision fill:#ffd6d6
-    style Synthesis fill:#d4edda
-    style Output fill:#e1f5e1
-```
-## 2. Magentic Manager: The 6-Phase Cycle
-```mermaid
-flowchart LR
-    P1[1. Planning<br/>Analyze task<br/>Create strategy] --> P2[2. Agent Selection<br/>Pick best agent<br/>for subtask]
-    P2 --> P3[3. Execution<br/>Run selected<br/>agent with tools]
-    P3 --> P4[4. Assessment<br/>Evaluate quality<br/>Check progress]
-    P4 --> Decision{Quality OK?<br/>Progress made?}
-    Decision -->|Yes| P6[6. Synthesis<br/>Combine results<br/>Generate report]
-    Decision -->|No| P5[5. Iteration<br/>Adjust plan<br/>Try again]
-    P5 --> P2
-    P6 --> Done([Complete])
-    style P1 fill:#fff4e6
-    style P2 fill:#ffe6e6
-    style P3 fill:#e6f3ff
-    style P4 fill:#ffd6d6
-    style P5 fill:#fff3cd
-    style P6 fill:#d4edda
-    style Done fill:#e1f5e1
-```
-## 3. Simplified Agent Architecture
-```mermaid
-graph TB
-    subgraph "Orchestration Layer"
-        Manager[Magentic Manager<br/>• Plans workflow<br/>• Selects agents<br/>• Assesses quality<br/>• Adapts strategy]
-        SharedContext[(Shared Context<br/>• Hypotheses<br/>• Search Results<br/>• Analysis<br/>• Progress)]
-        Manager <--> SharedContext
-    end
-    subgraph "Specialist Agents"
-        HypAgent[Hypothesis Agent<br/>• Domain understanding<br/>• Hypothesis generation<br/>• Testability refinement]
-        SearchAgent[Search Agent<br/>• Multi-source search<br/>• RAG retrieval<br/>• Result ranking]
-        AnalysisAgent[Analysis Agent<br/>• Evidence extraction<br/>• Statistical analysis<br/>• Code execution]
-        ReportAgent[Report Agent<br/>• Report assembly<br/>• Visualization<br/>• Citation formatting]
-    end
-    subgraph "MCP Tools"
-        WebSearch[Web Search<br/>PubMed • arXiv • bioRxiv]
-        CodeExec[Code Execution<br/>Sandboxed Python]
-        RAG[RAG Retrieval<br/>Vector DB • Embeddings]
-        Viz[Visualization<br/>Charts • Graphs]
-    end
-    Manager -->|Selects & Directs| HypAgent
-    Manager -->|Selects & Directs| SearchAgent
-    Manager -->|Selects & Directs| AnalysisAgent
-    Manager -->|Selects & Directs| ReportAgent
-    HypAgent --> SharedContext
-    SearchAgent --> SharedContext
-    AnalysisAgent --> SharedContext
-    ReportAgent --> SharedContext
-    SearchAgent --> WebSearch
-    SearchAgent --> RAG
-    AnalysisAgent --> CodeExec
-    ReportAgent --> CodeExec
-    ReportAgent --> Viz
-    style Manager fill:#ffe6e6
-    style SharedContext fill:#ffe6f0
-    style HypAgent fill:#fff4e6
-    style SearchAgent fill:#fff4e6
-    style AnalysisAgent fill:#fff4e6
-    style ReportAgent fill:#fff4e6
-    style WebSearch fill:#e6f3ff
-    style CodeExec fill:#e6f3ff
-    style RAG fill:#e6f3ff
-    style Viz fill:#e6f3ff
-```
-## 4. Dynamic Workflow Example
-```mermaid
-sequenceDiagram
-    participant User
-    participant Manager
-    participant HypAgent
-    participant SearchAgent
-    participant AnalysisAgent
-    participant ReportAgent
-    User->>Manager: "Research protein folding in Alzheimer's"
-    Note over Manager: PLAN: Generate hypotheses → Search → Analyze → Report
-    Manager->>HypAgent: Generate 3 hypotheses
-    HypAgent-->>Manager: Returns 3 hypotheses
-    Note over Manager: ASSESS: Good quality, proceed
-    Manager->>SearchAgent: Search literature for hypothesis 1
-    SearchAgent-->>Manager: Returns 15 papers
-    Note over Manager: ASSESS: Good results, continue
-    Manager->>SearchAgent: Search for hypothesis 2
-    SearchAgent-->>Manager: Only 2 papers found
-    Note over Manager: ASSESS: Insufficient, refine search
-    Manager->>SearchAgent: Refined query for hypothesis 2
-    SearchAgent-->>Manager: Returns 12 papers
-    Note over Manager: ASSESS: Better, proceed
-    Manager->>AnalysisAgent: Analyze evidence for all hypotheses
-    AnalysisAgent-->>Manager: Returns analysis with code
-    Note over Manager: ASSESS: Complete, generate report
-    Manager->>ReportAgent: Create comprehensive report
-    ReportAgent-->>Manager: Returns formatted report
-    Note over Manager: SYNTHESIZE: Combine all results
-    Manager->>User: Final Research Report
-```
-## 5. Manager Decision Logic
-```mermaid
-flowchart TD
-    Start([Manager Receives Task]) --> Plan[Create Initial Plan]
-    Plan --> Select[Select Agent for Next Subtask]
-    Select --> Execute[Execute Agent]
-    Execute --> Collect[Collect Results]
-    Collect --> Assess[Assess Quality & Progress]
-    Assess --> Q1{Quality Sufficient?}
-    Q1 -->|No| Q2{Same Agent Can Fix?}
-    Q2 -->|Yes| Feedback[Provide Specific Feedback]
-    Feedback --> Execute
-    Q2 -->|No| Different[Try Different Agent]
-    Different --> Select
-    Q1 -->|Yes| Q3{Task Complete?}
-    Q3 -->|No| Q4{Making Progress?}
-    Q4 -->|Yes| Select
-    Q4 -->|No - Stalled| Replan[Reset Plan & Approach]
-    Replan --> Plan
-    Q3 -->|Yes| Synth[Synthesize Final Result]
-    Synth --> Done([Return Report])
-    style Start fill:#e1f5e1
-    style Plan fill:#fff4e6
-    style Select fill:#ffe6e6
-    style Execute fill:#e6f3ff
-    style Assess fill:#ffd6d6
-    style Q1 fill:#ffe6e6
-    style Q2 fill:#ffe6e6
-    style Q3 fill:#ffe6e6
-    style Q4 fill:#ffe6e6
-    style Synth fill:#d4edda
-    style Done fill:#e1f5e1
-```
-## 6. Hypothesis Agent Workflow
-```mermaid
-flowchart LR
-    Input[Research Query] --> Domain[Identify Domain<br/>& Key Concepts]
-    Domain --> Context[Retrieve Background<br/>Knowledge]
-    Context --> Generate[Generate 3-5<br/>Initial Hypotheses]
-    Generate --> Refine[Refine for<br/>Testability]
-    Refine --> Rank[Rank by<br/>Quality Score]
-    Rank --> Output[Return Top<br/>Hypotheses]
-    Output --> Struct[Hypothesis Structure:<br/>• Statement<br/>• Rationale<br/>• Testability Score<br/>• Data Requirements<br/>• Expected Outcomes]
-    style Input fill:#e1f5e1
-    style Output fill:#fff4e6
-    style Struct fill:#e6f3ff
-```
-## 7. Search Agent Workflow
-```mermaid
-flowchart TD
-    Input[Hypotheses] --> Strategy[Formulate Search<br/>Strategy per Hypothesis]
-    Strategy --> Multi[Multi-Source Search]
-    Multi --> PubMed[PubMed Search<br/>via MCP]
-    Multi --> ArXiv[arXiv Search<br/>via MCP]
-    Multi --> BioRxiv[bioRxiv Search<br/>via MCP]
-    PubMed --> Aggregate[Aggregate Results]
-    ArXiv --> Aggregate
-    BioRxiv --> Aggregate
-    Aggregate --> Filter[Filter & Rank<br/>by Relevance]
-    Filter --> Dedup[Deduplicate<br/>Cross-Reference]
-    Dedup --> Embed[Embed Documents<br/>via MCP]
-    Embed --> Vector[(Vector DB)]
-    Vector --> RAGRetrieval[RAG Retrieval<br/>Top-K per Hypothesis]
-    RAGRetrieval --> Output[Return Contextualized<br/>Search Results]
-    style Input fill:#fff4e6
-    style Multi fill:#ffe6e6
-    style Vector fill:#ffe6f0
-    style Output fill:#e6f3ff
-```
-## 8. Analysis Agent Workflow
-```mermaid
-flowchart TD
-    Input1[Hypotheses] --> Extract
-    Input2[Search Results] --> Extract[Extract Evidence<br/>per Hypothesis]
-    Extract --> Methods[Determine Analysis<br/>Methods Needed]
-    Methods --> Branch{Requires<br/>Computation?}
-    Branch -->|Yes| GenCode[Generate Python<br/>Analysis Code]
-    Branch -->|No| Qual[Qualitative<br/>Synthesis]
-    GenCode --> Execute[Execute Code<br/>via MCP Sandbox]
-    Execute --> Interpret1[Interpret<br/>Results]
-    Qual --> Interpret2[Interpret<br/>Findings]
-    Interpret1 --> Synthesize[Synthesize Evidence<br/>Across Sources]
-    Interpret2 --> Synthesize
-    Synthesize --> Verdict[Determine Verdict<br/>per Hypothesis]
-    Verdict --> Support[• Supported<br/>• Refuted<br/>• Inconclusive]
-    Support --> Gaps[Identify Knowledge<br/>Gaps & Limitations]
-    Gaps --> Output[Return Analysis<br/>Report]
-    style Input1 fill:#fff4e6
-    style Input2 fill:#e6f3ff
-    style Execute fill:#ffe6e6
-    style Output fill:#e6ffe6
-```
-## 9. Report Agent Workflow
-```mermaid
-flowchart TD
-    Input1[Query] --> Assemble
-    Input2[Hypotheses] --> Assemble
-    Input3[Search Results] --> Assemble
-    Input4[Analysis] --> Assemble[Assemble Report<br/>Sections]
-    Assemble --> Exec[Executive Summary]
-    Assemble --> Intro[Introduction]
-    Assemble --> Methods[Methods]
-    Assemble --> Results[Results per<br/>Hypothesis]
-    Assemble --> Discussion[Discussion]
-    Assemble --> Future[Future Directions]
-    Assemble --> Refs[References]
-    Results --> VizCheck{Needs<br/>Visualization?}
-    VizCheck -->|Yes| GenViz[Generate Viz Code]
-    GenViz --> ExecViz[Execute via MCP<br/>Create Charts]
-    ExecViz --> Combine
-    VizCheck -->|No| Combine[Combine All<br/>Sections]
-    Exec --> Combine
-    Intro --> Combine
-    Methods --> Combine
-    Discussion --> Combine
-    Future --> Combine
-    Refs --> Combine
-    Combine --> Format[Format Output]
-    Format --> MD[Markdown]
-    Format --> PDF[PDF]
-    Format --> JSON[JSON]
-    MD --> Output[Return Final<br/>Report]
-    PDF --> Output
-    JSON --> Output
-    style Input1 fill:#e1f5e1
-    style Input2 fill:#fff4e6
-    style Input3 fill:#e6f3ff
-    style Input4 fill:#e6ffe6
-    style Output fill:#d4edda
-```
-## 10. Data Flow & Event Streaming
-```mermaid
-flowchart TD
-    User[👤 User] -->|Research Query| UI[Gradio UI]
-    UI -->|Submit| Manager[Magentic Manager]
-    Manager -->|Event: Planning| UI
-    Manager -->|Select Agent| HypAgent[Hypothesis Agent]
-    HypAgent -->|Event: Delta/Message| UI
-    HypAgent -->|Hypotheses| Context[(Shared Context)]
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| SearchAgent[Search Agent]
-    SearchAgent -->|MCP Request| WebSearch[Web Search Tool]
-    WebSearch -->|Results| SearchAgent
-    SearchAgent -->|Event: Delta/Message| UI
-    SearchAgent -->|Documents| Context
-    SearchAgent -->|Embeddings| VectorDB[(Vector DB)]
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| AnalysisAgent[Analysis Agent]
-    AnalysisAgent -->|MCP Request| CodeExec[Code Execution Tool]
-    CodeExec -->|Results| AnalysisAgent
-    AnalysisAgent -->|Event: Delta/Message| UI
-    AnalysisAgent -->|Analysis| Context
-    Context -->|Retrieved by| Manager
-    Manager -->|Select Agent| ReportAgent[Report Agent]
-    ReportAgent -->|MCP Request| CodeExec
-    ReportAgent -->|Event: Delta/Message| UI
-    ReportAgent -->|Report| Context
-    Manager -->|Event: Final Result| UI
-    UI -->|Display| User
-    style User fill:#e1f5e1
-    style UI fill:#e6f3ff
-    style Manager fill:#ffe6e6
-    style Context fill:#ffe6f0
-    style VectorDB fill:#ffe6f0
-    style WebSearch fill:#f0f0f0
-    style CodeExec fill:#f0f0f0
-```
-## 11. MCP Tool Architecture
-```mermaid
-graph TB
-    subgraph "Agent Layer"
-        Manager[Magentic Manager]
-        HypAgent[Hypothesis Agent]
-        SearchAgent[Search Agent]
-        AnalysisAgent[Analysis Agent]
-        ReportAgent[Report Agent]
-    end
-    subgraph "MCP Protocol Layer"
-        Registry[MCP Tool Registry<br/>• Discovers tools<br/>• Routes requests<br/>• Manages connections]
-    end
-    subgraph "MCP Servers"
-        Server1[Web Search Server<br/>localhost:8001<br/>• PubMed<br/>• arXiv<br/>• bioRxiv]
-        Server2[Code Execution Server<br/>localhost:8002<br/>• Sandboxed Python<br/>• Package management]
-        Server3[RAG Server<br/>localhost:8003<br/>• Vector embeddings<br/>• Similarity search]
-        Server4[Visualization Server<br/>localhost:8004<br/>• Chart generation<br/>• Plot rendering]
-    end
-    subgraph "External Services"
-        PubMed[PubMed API]
-        ArXiv[arXiv API]
-        BioRxiv[bioRxiv API]
-        Modal[Modal Sandbox]
-        ChromaDB[(ChromaDB)]
-    end
-    SearchAgent -->|Request| Registry
-    AnalysisAgent -->|Request| Registry
-    ReportAgent -->|Request| Registry
-    Registry --> Server1
-    Registry --> Server2
-    Registry --> Server3
-    Registry --> Server4
-    Server1 --> PubMed
-    Server1 --> ArXiv
-    Server1 --> BioRxiv
-    Server2 --> Modal
-    Server3 --> ChromaDB
-    style Manager fill:#ffe6e6
-    style Registry fill:#fff4e6
-    style Server1 fill:#e6f3ff
-    style Server2 fill:#e6f3ff
-    style Server3 fill:#e6f3ff
-    style Server4 fill:#e6f3ff
-```
-## 12. Progress Tracking & Stall Detection
-```mermaid
-stateDiagram-v2
-    [*] --> Initialization: User Query
-    Initialization --> Planning: Manager starts
-    Planning --> AgentExecution: Select agent
-    AgentExecution --> Assessment: Collect results
-    Assessment --> QualityCheck: Evaluate output
-    QualityCheck --> AgentExecution: Poor quality<br/>(retry < max_rounds)
-    QualityCheck --> Planning: Poor quality<br/>(try different agent)
-    QualityCheck --> NextAgent: Good quality<br/>(task incomplete)
-    QualityCheck --> Synthesis: Good quality<br/>(task complete)
-    NextAgent --> AgentExecution: Select next agent
-    state StallDetection <<choice>>
-    Assessment --> StallDetection: Check progress
-    StallDetection --> Planning: No progress<br/>(stall count < max)
-    StallDetection --> ErrorRecovery: No progress<br/>(max stalls reached)
-    ErrorRecovery --> PartialReport: Generate partial results
-    PartialReport --> [*]
-    Synthesis --> FinalReport: Combine all outputs
-    FinalReport --> [*]
-    note right of QualityCheck
-        Manager assesses:
-        • Output completeness
-        • Quality metrics
-        • Progress made
-    end note
-    note right of StallDetection
-        Stall = no new progress
-        after agent execution
-        Triggers plan reset
-    end note
-```
-## 13. Gradio UI Integration
-```mermaid
-graph TD
-    App[Gradio App<br/>DeepCritical Research Agent]
-    App --> Input[Input Section]
-    App --> Status[Status Section]
-    App --> Output[Output Section]
-    Input --> Query[Research Question<br/>Text Area]
-    Input --> Controls[Controls]
-    Controls --> MaxHyp[Max Hypotheses: 1-10]
-    Controls --> MaxRounds[Max Rounds: 5-20]
-    Controls --> Submit[Start Research Button]
-    Status --> Log[Real-time Event Log<br/>• Manager planning<br/>• Agent selection<br/>• Execution updates<br/>• Quality assessment]
-    Status --> Progress[Progress Tracker<br/>• Current agent<br/>• Round count<br/>• Stall count]
-    Output --> Tabs[Tabbed Results]
-    Tabs --> Tab1[Hypotheses Tab<br/>Generated hypotheses with scores]
-    Tabs --> Tab2[Search Results Tab<br/>Papers & sources found]
-    Tabs --> Tab3[Analysis Tab<br/>Evidence & verdicts]
-    Tabs --> Tab4[Report Tab<br/>Final research report]
-    Tab4 --> Download[Download Report<br/>MD / PDF / JSON]
-    Submit -.->|Triggers| Workflow[Magentic Workflow]
-    Workflow -.->|MagenticOrchestratorMessageEvent| Log
-    Workflow -.->|MagenticAgentDeltaEvent| Log
-    Workflow -.->|MagenticAgentMessageEvent| Log
-    Workflow -.->|MagenticFinalResultEvent| Tab4
-    style App fill:#e1f5e1
-    style Input fill:#fff4e6
-    style Status fill:#e6f3ff
-    style Output fill:#e6ffe6
-    style Workflow fill:#ffe6e6
-```
-## 14. Complete System Context
-```mermaid
-graph LR
-    User[👤 Researcher<br/>Asks research questions] -->|Submits query| DC[DeepCritical<br/>Magentic Workflow]
-    DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
-    DC -->|Preprint search| ArXiv[arXiv API<br/>Scientific preprints]
-    DC -->|Biology search| BioRxiv[bioRxiv API<br/>Biology preprints]
-    DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
-    DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
-    DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
-    DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
-    PubMed -->|Results| DC
-    ArXiv -->|Results| DC
-    BioRxiv -->|Results| DC
-    Claude -->|Responses| DC
-    Modal -->|Output| DC
-    Chroma -->|Context| DC
-    DC -->|Research report| User
-    style User fill:#e1f5e1
-    style DC fill:#ffe6e6
-    style PubMed fill:#e6f3ff
-    style ArXiv fill:#e6f3ff
-    style BioRxiv fill:#e6f3ff
-    style Claude fill:#ffd6d6
-    style Modal fill:#f0f0f0
-    style Chroma fill:#ffe6f0
-    style HF fill:#d4edda
-```
-## 15. Workflow Timeline (Simplified)
-```mermaid
-gantt
-    title DeepCritical Magentic Workflow - Typical Execution
-    dateFormat mm:ss
-    axisFormat %M:%S
-    section Manager Planning
-    Initial planning         :p1, 00:00, 10s
-    section Hypothesis Agent
-    Generate hypotheses      :h1, after p1, 30s
-    Manager assessment       :h2, after h1, 5s
-    section Search Agent
-    Search hypothesis 1      :s1, after h2, 20s
-    Search hypothesis 2      :s2, after s1, 20s
-    Search hypothesis 3      :s3, after s2, 20s
-    RAG processing          :s4, after s3, 15s
-    Manager assessment      :s5, after s4, 5s
-    section Analysis Agent
-    Evidence extraction     :a1, after s5, 15s
-    Code generation        :a2, after a1, 20s
-    Code execution         :a3, after a2, 25s
-    Synthesis              :a4, after a3, 20s
-    Manager assessment     :a5, after a4, 5s
-    section Report Agent
-    Report assembly        :r1, after a5, 30s
-    Visualization          :r2, after r1, 15s
-    Formatting             :r3, after r2, 10s
-    section Manager Synthesis
-    Final synthesis        :f1, after r3, 10s
-```
----
-## Key Differences from Original Design
-| Aspect | Original (Judge-in-Loop) | New (Magentic) |
-|--------|-------------------------|----------------|
-| **Control Flow** | Fixed sequential phases | Dynamic agent selection |
-| **Quality Control** | Separate Judge Agent | Manager assessment built-in |
-| **Retry Logic** | Phase-level with feedback | Agent-level with adaptation |
-| **Flexibility** | Rigid 4-phase pipeline | Adaptive workflow |
-| **Complexity** | 5 agents (including Judge) | 4 agents (no Judge) |
-| **Progress Tracking** | Manual state management | Built-in round/stall detection |
-| **Agent Coordination** | Sequential handoff | Manager-driven dynamic selection |
-| **Error Recovery** | Retry same phase | Try different agent or replan |
----
-## Simplified Design Principles
-1. **Manager is Intelligent**: LLM-powered manager handles planning, selection, and quality assessment
-2. **No Separate Judge**: Manager's assessment phase replaces dedicated Judge Agent
-3. **Dynamic Workflow**: Agents can be called multiple times in any order based on need
-4. **Built-in Safety**: max_round_count (15) and max_stall_count (3) prevent infinite loops
-5. **Event-Driven UI**: Real-time streaming updates to Gradio interface
-6. **MCP-Powered Tools**: All external capabilities via Model Context Protocol
-7. **Shared Context**: Centralized state accessible to all agents
-8. **Progress Awareness**: Manager tracks what's been done and what's needed
----
-## Legend
-- 🔴 **Red/Pink**: Manager, orchestration, decision-making
-- 🟡 **Yellow/Orange**: Specialist agents, processing
-- 🔵 **Blue**: Data, tools, MCP services
-- 🟣 **Purple/Pink**: Storage, databases, state
-- 🟢 **Green**: User interactions, final outputs
-- ⚪ **Gray**: External services, APIs
----
-## Implementation Highlights
-**Simple 4-Agent Setup:**
-```python
-workflow = (
-    MagenticBuilder()
-    .participants(
-        hypothesis=HypothesisAgent(tools=[background_tool]),
-        search=SearchAgent(tools=[web_search, rag_tool]),
-        analysis=AnalysisAgent(tools=[code_execution]),
-        report=ReportAgent(tools=[code_execution, visualization])
-    )
-    .with_standard_manager(
-        chat_client=AnthropicClient(model="claude-sonnet-4"),
-        max_round_count=15,    # Prevent infinite loops
-        max_stall_count=3      # Detect stuck workflows
-    )
-    .build()
-)
-```
-**Manager handles quality assessment in its instructions:**
-- Checks hypothesis quality (testable, novel, clear)
-- Validates search results (relevant, authoritative, recent)
-- Assesses analysis soundness (methodology, evidence, conclusions)
-- Ensures report completeness (all sections, proper citations)
-No separate Judge Agent needed - manager does it all!
----
-**Document Version**: 2.0 (Magentic Simplified)
-**Last Updated**: 2025-11-24
-**Architecture**: Microsoft Magentic Orchestration Pattern
-**Agents**: 4 (Hypothesis, Search, Analysis, Report) + 1 Manager
-**License**: MIT

docs/getting-started/examples.md CHANGED Viewed

@@ -191,7 +191,7 @@ USE_GRAPH_EXECUTION=true
 ## Next Steps
 - Read the [Configuration Guide](../configuration/index.md) for all options
-- Explore the [Architecture Documentation](../architecture/graph-orchestration.md)
 - Check out the [API Reference](../api/agents.md) for programmatic usage

 ## Next Steps
 - Read the [Configuration Guide](../configuration/index.md) for all options
+- Explore the [Architecture Documentation](../architecture/graph_orchestration.md)
 - Check out the [API Reference](../api/agents.md) for programmatic usage

docs/getting-started/mcp-integration.md CHANGED Viewed

@@ -198,7 +198,7 @@ You can configure multiple DeepCritical instances:
 - Learn about [Configuration](../configuration/index.md) for advanced settings
 - Explore [Examples](examples.md) for use cases
-- Read the [Architecture Documentation](../architecture/graph-orchestration.md)

 - Learn about [Configuration](../configuration/index.md) for advanced settings
 - Explore [Examples](examples.md) for use cases
+- Read the [Architecture Documentation](../architecture/graph_orchestration.md)

docs/getting-started/quick-start.md CHANGED Viewed

@@ -138,7 +138,7 @@ What are the active clinical trials investigating Alzheimer's disease treatments
 - Learn about [MCP Integration](mcp-integration.md) to use The DETERMINATOR from Claude Desktop
 - Explore [Examples](examples.md) for more use cases
 - Read the [Configuration Guide](../configuration/index.md) for advanced settings
-- Check out the [Architecture Documentation](../architecture/graph-orchestration.md) to understand how it works

 - Learn about [MCP Integration](mcp-integration.md) to use The DETERMINATOR from Claude Desktop
 - Explore [Examples](examples.md) for more use cases
 - Read the [Configuration Guide](../configuration/index.md) for advanced settings
+- Check out the [Architecture Documentation](../architecture/graph_orchestration.md) to understand how it works

docs/overview/quick-start.md CHANGED Viewed

@@ -77,6 +77,6 @@ Connect DeepCritical to Claude Desktop:
 - Read the [Installation Guide](../getting-started/installation.md) for detailed setup
 - Learn about [Configuration](../configuration/index.md)
-- Explore the [Architecture](../architecture/graph-orchestration.md)
 - Check out [Examples](../getting-started/examples.md)

 - Read the [Installation Guide](../getting-started/installation.md) for detailed setup
 - Learn about [Configuration](../configuration/index.md)
+- Explore the [Architecture](../architecture/graph_orchestration.md)
 - Check out [Examples](../getting-started/examples.md)

mkdocs.yml CHANGED Viewed

@@ -91,7 +91,6 @@ nav:
     - configuration/CONFIGURATION.md
   - Architecture:
     - "Graph Orchestration": architecture/graph_orchestration.md
-    - "Workflows": architecture/workflows.md
     - "Workflow Diagrams": architecture/workflow-diagrams.md
     - "Agents": architecture/agents.md
     - "Orchestrators": architecture/orchestrators.md

     - configuration/CONFIGURATION.md
   - Architecture:
     - "Graph Orchestration": architecture/graph_orchestration.md
     - "Workflow Diagrams": architecture/workflow-diagrams.md
     - "Agents": architecture/agents.md
     - "Orchestrators": architecture/orchestrators.md

src/agent_factory/graph_builder.py CHANGED Viewed

@@ -487,11 +487,37 @@ def create_iterative_graph(
     # Add nodes
     builder.add_agent_node("thinking", thinking_agent, "Generate observations")
     builder.add_agent_node("knowledge_gap", knowledge_gap_agent, "Evaluate knowledge gaps")
     builder.add_decision_node(
         "continue_decision",
-        decision_function=lambda result: "writer"
-        if getattr(result, "research_complete", False)
-        else "tool_selector",
         options=["tool_selector", "writer"],
         description="Decide whether to continue research or write report",
     )

     # Add nodes
     builder.add_agent_node("thinking", thinking_agent, "Generate observations")
     builder.add_agent_node("knowledge_gap", knowledge_gap_agent, "Evaluate knowledge gaps")
+    def _decision_function(result: Any) -> str:
+        """Decision function for continue_decision node.
+        Args:
+            result: Result from knowledge_gap node (KnowledgeGapOutput or tuple)
+        Returns:
+            Next node ID: "writer" if research complete, "tool_selector" otherwise
+        """
+        # Handle case where result might be a tuple (validation error)
+        if isinstance(result, tuple):
+            # Try to extract research_complete from tuple
+            if len(result) == 2 and isinstance(result[0], str) and result[0] == "research_complete":
+                # Format: ('research_complete', False)
+                return "writer" if result[1] else "tool_selector"
+            # Try to find boolean value in tuple
+            for item in result:
+                if isinstance(item, bool):
+                    return "writer" if item else "tool_selector"
+                elif isinstance(item, dict) and "research_complete" in item:
+                    return "writer" if item["research_complete"] else "tool_selector"
+            # Default to continuing research if we can't determine
+            return "tool_selector"
+        # Normal case: result is KnowledgeGapOutput object
+        research_complete = getattr(result, "research_complete", False)
+        return "writer" if research_complete else "tool_selector"
     builder.add_decision_node(
         "continue_decision",
+        decision_function=_decision_function,
         options=["tool_selector", "writer"],
         description="Decide whether to continue research or write report",
     )

src/app.py CHANGED Viewed

@@ -284,11 +284,26 @@ def event_to_chat_message(event: AgentEvent) -> dict[str, Any]:
             if valid_files:
                 # Format files for Gradio: include as markdown download links
-                file_links = "\n\n".join([
-                    f"📎 [Download: {_get_file_name(f)}]({f})"
-                    for f in valid_files
-                ])
-                result["content"] = f"{content}\n\n{file_links}"
                 # Also store in metadata for potential future use
                 if "metadata" not in result:
@@ -540,6 +555,8 @@ async def research_agent(
     hf_provider: str | None = None,
     graph_mode: str = "auto",
     use_graph: bool = True,
     tts_voice: str = "af_heart",
     tts_speed: float = 1.0,
     oauth_token: gr.OAuthToken | None = None,
@@ -622,15 +639,17 @@ async def research_agent(
         audio_input_data = message.get("audio") or None
         # Process multimodal input (images, audio files, audio input)
-        # Always process if we have files or audio input, not just when enable_image_input is True
-        if files or (audio_input_data is not None and settings.enable_audio_input):
             try:
                 multimodal_service = get_multimodal_service()
                 # Prepend audio/image text to original text (prepend_multimodal=True)
                 processed_text = await multimodal_service.process_multimodal_input(
                     processed_text,
-                    files=files,
-                    audio_input=audio_input_data,
                     hf_token=token_value,
                     prepend_multimodal=True,  # Prepend audio/image text to text input
                 )
@@ -795,6 +814,20 @@ def create_demo() -> gr.Blocks:
                     info="Enable graph-based workflow execution",
                 )
             # Audio/TTS Configuration Accordion
             with gr.Accordion("🔊 Audio Output", open=False):
                 enable_audio_output_checkbox = gr.Checkbox(
@@ -848,6 +881,12 @@ def create_demo() -> gr.Blocks:
                     visible=settings.modal_available,
                     interactive=False,  # GPU type set at function definition time, requires restart
                 )
         # Hidden text components for model/provider (not dropdowns to avoid value mismatch)
         # These will be empty by default and use defaults in configure_orchestrator
@@ -863,12 +902,6 @@ def create_demo() -> gr.Blocks:
                 label="⚡ Inference Provider",
                 visible=False,  # Hidden from UI
             )
-        # Audio output component (for TTS response)
-        audio_output = gr.Audio(
-            label="🔊 Audio Response",
-            visible=settings.enable_audio_output,
-        )
         # Update TTS component visibility based on enable_audio_output_checkbox
         # This must be after audio_output is defined
@@ -905,7 +938,11 @@ def create_demo() -> gr.Blocks:
                 "- ⏹️ Stops only at configured limits (budget, time, iterations)\n"
                 "- 📊 Evidence synthesis with citations\n\n"
                 "**MCP Server Active**: Connect Claude Desktop to `/gradio_api/mcp/`\n\n"
-                "**🎤 Multimodal Support**: Upload images (OCR), record audio (STT), or type text.\n\n"
                 "**⚠️ Authentication Required**: Please **sign in with HuggingFace** above before using this application."
             ),
             examples=[
@@ -949,6 +986,8 @@ def create_demo() -> gr.Blocks:
                 hf_provider_dropdown,
                 graph_mode_radio,
                 use_graph_checkbox,
                 tts_voice_dropdown,
                 tts_speed_slider,
                 # Note: gr.OAuthToken and gr.OAuthProfile are automatically passed as function parameters

             if valid_files:
                 # Format files for Gradio: include as markdown download links
+                # Gradio ChatInterface automatically renders file links as downloadable files
+                import os
+                file_links = []
+                for f in valid_files:
+                    file_name = _get_file_name(f)
+                    try:
+                        file_size = os.path.getsize(f)
+                        # Format file size (bytes to KB/MB)
+                        if file_size < 1024:
+                            size_str = f"{file_size} B"
+                        elif file_size < 1024 * 1024:
+                            size_str = f"{file_size / 1024:.1f} KB"
+                        else:
+                            size_str = f"{file_size / (1024 * 1024):.1f} MB"
+                        file_links.append(f"📎 [Download: {file_name} ({size_str})]({f})")
+                    except OSError:
+                        # If we can't get file size, just show the name
+                        file_links.append(f"📎 [Download: {file_name}]({f})")
+                result["content"] = f"{content}\n\n" + "\n\n".join(file_links)
                 # Also store in metadata for potential future use
                 if "metadata" not in result:
     hf_provider: str | None = None,
     graph_mode: str = "auto",
     use_graph: bool = True,
+    enable_image_input: bool = True,
+    enable_audio_input: bool = True,
     tts_voice: str = "af_heart",
     tts_speed: float = 1.0,
     oauth_token: gr.OAuthToken | None = None,
         audio_input_data = message.get("audio") or None
         # Process multimodal input (images, audio files, audio input)
+        # Process if we have files (and image input enabled) or audio input (and audio input enabled)
+        # Use UI settings from function parameters
+        if (files and enable_image_input) or (audio_input_data is not None and enable_audio_input):
             try:
                 multimodal_service = get_multimodal_service()
                 # Prepend audio/image text to original text (prepend_multimodal=True)
+                # Filter files and audio based on UI settings
                 processed_text = await multimodal_service.process_multimodal_input(
                     processed_text,
+                    files=files if enable_image_input else [],
+                    audio_input=audio_input_data if enable_audio_input else None,
                     hf_token=token_value,
                     prepend_multimodal=True,  # Prepend audio/image text to text input
                 )
                     info="Enable graph-based workflow execution",
                 )
+            # Multimodal Input Configuration Accordion
+            with gr.Accordion("📷 Multimodal Input", open=False):
+                enable_image_input_checkbox = gr.Checkbox(
+                    value=settings.enable_image_input,
+                    label="Enable Image Input (OCR)",
+                    info="Extract text from uploaded images using OCR",
+                )
+                enable_audio_input_checkbox = gr.Checkbox(
+                    value=settings.enable_audio_input,
+                    label="Enable Audio Input (STT)",
+                    info="Transcribe audio recordings using speech-to-text",
+                )
             # Audio/TTS Configuration Accordion
             with gr.Accordion("🔊 Audio Output", open=False):
                 enable_audio_output_checkbox = gr.Checkbox(
                     visible=settings.modal_available,
                     interactive=False,  # GPU type set at function definition time, requires restart
                 )
+                # Audio output component (for TTS response) - moved to sidebar
+                audio_output = gr.Audio(
+                    label="🔊 Audio Response",
+                    visible=settings.enable_audio_output,
+                )
         # Hidden text components for model/provider (not dropdowns to avoid value mismatch)
         # These will be empty by default and use defaults in configure_orchestrator
                 label="⚡ Inference Provider",
                 visible=False,  # Hidden from UI
             )
         # Update TTS component visibility based on enable_audio_output_checkbox
         # This must be after audio_output is defined
                 "- ⏹️ Stops only at configured limits (budget, time, iterations)\n"
                 "- 📊 Evidence synthesis with citations\n\n"
                 "**MCP Server Active**: Connect Claude Desktop to `/gradio_api/mcp/`\n\n"
+                "**📷🎤 Multimodal Input Support**:\n"
+                "- **Images**: Upload images to extract text using OCR\n"
+                "- **Audio**: Record audio or upload audio files for speech-to-text transcription\n"
+                "- **Text**: Type your research questions directly\n"
+                "Configure multimodal inputs in the sidebar settings.\n\n"
                 "**⚠️ Authentication Required**: Please **sign in with HuggingFace** above before using this application."
             ),
             examples=[
                 hf_provider_dropdown,
                 graph_mode_radio,
                 use_graph_checkbox,
+                enable_image_input_checkbox,
+                enable_audio_input_checkbox,
                 tts_voice_dropdown,
                 tts_speed_slider,
                 # Note: gr.OAuthToken and gr.OAuthProfile are automatically passed as function parameters

src/orchestrator/graph_orchestrator.py CHANGED Viewed

@@ -823,18 +823,34 @@ class GraphOrchestrator:
                     from src.utils.models import KnowledgeGapOutput
                     if node.node_id == "knowledge_gap":
                         output = KnowledgeGapOutput(
                             research_complete=output[1] if len(output) > 1 else False,
                             outstanding_gaps=[],
                         )
                     else:
-                        # For other nodes, log error and use fallback
-                        self.logger.error(
-                            "Cannot reconstruct output from tuple",
                             node_id=node.node_id,
                             tuple_value=output,
                         )
-                        raise ValueError(f"Cannot extract output from tuple: {output}")
         if node.output_transformer:
             output = node.output_transformer(output)
@@ -1010,7 +1026,7 @@ class GraphOrchestrator:
             else:
                 prev_result = context.get_node_result(context.current_node)
-        # Handle case where result might be a tuple (from pydantic-graph)
         # Extract the actual result object if it's a tuple
         if isinstance(prev_result, tuple) and len(prev_result) > 0:
             # Check if first element is a KnowledgeGapOutput-like object
@@ -1018,14 +1034,46 @@ class GraphOrchestrator:
                 prev_result = prev_result[0]
             elif len(prev_result) > 1 and hasattr(prev_result[1], "research_complete"):
                 prev_result = prev_result[1]
             else:
-                # If tuple doesn't contain the object, log warning and use first element
                 self.logger.warning(
-                    "Decision node received tuple result, extracting first element",
                     node_id=node.node_id,
                     tuple_length=len(prev_result),
                 )
-                prev_result = prev_result[0]
         # Make decision
         try:

                     from src.utils.models import KnowledgeGapOutput
                     if node.node_id == "knowledge_gap":
+                        # Reconstruct KnowledgeGapOutput from validation error tuple
                         output = KnowledgeGapOutput(
                             research_complete=output[1] if len(output) > 1 else False,
                             outstanding_gaps=[],
                         )
+                        self.logger.info(
+                            "Reconstructed KnowledgeGapOutput from validation error tuple",
+                            node_id=node.node_id,
+                            research_complete=output.research_complete,
+                        )
                     else:
+                        # For other nodes, try to extract meaningful output or use fallback
+                        self.logger.warning(
+                            "Agent node output is tuple format, attempting extraction",
                             node_id=node.node_id,
                             tuple_value=output,
                         )
+                        # Try to extract first meaningful element
+                        if len(output) > 0:
+                            # If first element is a string or dict, might be the actual output
+                            if isinstance(output[0], (str, dict)):
+                                output = output[0]
+                            else:
+                                # Last resort: use first element
+                                output = output[0]
+                        else:
+                            # Empty tuple - use None and let downstream handle it
+                            output = None
         if node.output_transformer:
             output = node.output_transformer(output)
             else:
                 prev_result = context.get_node_result(context.current_node)
+        # Handle case where result might be a tuple (from pydantic-ai validation errors)
         # Extract the actual result object if it's a tuple
         if isinstance(prev_result, tuple) and len(prev_result) > 0:
             # Check if first element is a KnowledgeGapOutput-like object
                 prev_result = prev_result[0]
             elif len(prev_result) > 1 and hasattr(prev_result[1], "research_complete"):
                 prev_result = prev_result[1]
+            elif len(prev_result) == 2 and isinstance(prev_result[0], str) and prev_result[0] == "research_complete":
+                # Handle validation error format: ('research_complete', False)
+                # Reconstruct KnowledgeGapOutput from tuple
+                from src.utils.models import KnowledgeGapOutput
+                self.logger.warning(
+                    "Decision node received validation error tuple, reconstructing KnowledgeGapOutput",
+                    node_id=node.node_id,
+                    tuple_value=prev_result,
+                )
+                prev_result = KnowledgeGapOutput(
+                    research_complete=prev_result[1] if len(prev_result) > 1 else False,
+                    outstanding_gaps=[],
+                )
             else:
+                # If tuple doesn't contain the object, try to reconstruct or use fallback
                 self.logger.warning(
+                    "Decision node received unexpected tuple format, attempting reconstruction",
                     node_id=node.node_id,
                     tuple_length=len(prev_result),
+                    tuple_types=[type(x).__name__ for x in prev_result],
                 )
+                # Try to reconstruct KnowledgeGapOutput if this is from knowledge_gap node
+                if prev_node_id == "knowledge_gap":
+                    from src.utils.models import KnowledgeGapOutput
+                    # Try to extract research_complete from tuple
+                    research_complete = False
+                    for item in prev_result:
+                        if isinstance(item, bool):
+                            research_complete = item
+                            break
+                        elif isinstance(item, dict) and "research_complete" in item:
+                            research_complete = item["research_complete"]
+                            break
+                    prev_result = KnowledgeGapOutput(
+                        research_complete=research_complete,
+                        outstanding_gaps=[],
+                    )
+                else:
+                    # For other nodes, use first element as fallback
+                    prev_result = prev_result[0]
         # Make decision
         try:

src/services/image_ocr.py CHANGED Viewed

@@ -30,7 +30,9 @@ class ImageOCRService:
         Raises:
             ConfigurationError: If API URL not configured
         """
-        self.api_url = api_url or settings.ocr_api_url
         if not self.api_url:
             raise ConfigurationError("OCR API URL not configured")
         self.hf_token = hf_token

         Raises:
             ConfigurationError: If API URL not configured
         """
+        # Defensively access ocr_api_url - may not exist in older config versions
+        default_url = getattr(settings, "ocr_api_url", None) or "https://prithivmlmods-multimodal-ocr3.hf.space"
+        self.api_url = api_url or default_url
         if not self.api_url:
             raise ConfigurationError("OCR API URL not configured")
         self.hf_token = hf_token

src/services/multimodal_processing.py CHANGED Viewed

@@ -63,7 +63,7 @@ class MultimodalService:
                 logger.warning("audio_processing_failed", error=str(e))
         # Process uploaded files (images and audio files)
-        if files:
             for file_data in files:
                 file_path = file_data.path if isinstance(file_data, FileData) else str(file_data)

                 logger.warning("audio_processing_failed", error=str(e))
         # Process uploaded files (images and audio files)
+        if files and settings.enable_image_input:
             for file_data in files:
                 file_path = file_data.path if isinstance(file_data, FileData) else str(file_data)

src/utils/config.py CHANGED Viewed

@@ -149,6 +149,10 @@ class Settings(BaseSettings):
         default=True,
         description="Enable audio output (text-to-speech) for responses",
     )
     tts_voice: str = Field(
         default="af_heart",
         description="TTS voice ID for Kokoro TTS (e.g., af_heart, am_michael)",
@@ -178,6 +182,12 @@ class Settings(BaseSettings):
         description="Target language for STT (full name like 'English', 'Spanish', etc.)",
     )
     # Report File Output Configuration
     save_reports_to_file: bool = Field(
         default=True,

         default=True,
         description="Enable audio output (text-to-speech) for responses",
     )
+    enable_image_input: bool = Field(
+        default=True,
+        description="Enable image input (OCR) in multimodal interface",
+    )
     tts_voice: str = Field(
         default="af_heart",
         description="TTS voice ID for Kokoro TTS (e.g., af_heart, am_michael)",
         description="Target language for STT (full name like 'English', 'Spanish', etc.)",
     )
+    # Image OCR Configuration
+    ocr_api_url: str | None = Field(
+        default="https://prithivmlmods-multimodal-ocr3.hf.space",
+        description="Gradio Space URL for OCR service (default: prithivMLmods/Multimodal-OCR3)",
+    )
     # Report File Output Configuration
     save_reports_to_file: bool = Field(
         default=True,