bhaveshgoel07 commited on
Commit
0805c5b
·
0 Parent(s):

Complete NeuroAnim HF Spaces deployment - all source files

Browse files
.gitignore ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
11
+ venv/
12
+ env/
13
+
14
+ # Environment variables (IMPORTANT: Never commit API keys!)
15
+ .env
16
+
17
+ # Docker
18
+ *.bak
19
+
20
+ # Sandbox deployment artifacts
21
+ .docker/
22
+
23
+ # Output files
24
+ outputs/
25
+ animations/
26
+ test_output/
27
+
28
+ # IDE
29
+ .vscode/
30
+ .idea/
31
+ *.swp
32
+ *.swo
33
+ .env
34
+ .env.*
35
+ .txt
36
+
37
+
38
+ # OS
39
+ .DS_Store
40
+ Thumbs.db
41
+
42
+ # Logs
43
+ *.log
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: NeuroAnim - STEM Animation Generator
3
+ emoji: 🧠
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 6.0.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # 🧠 NeuroAnim - AI-Powered Educational Animation Generator
14
+
15
+ NeuroAnim is an AI-powered system that automatically generates educational STEM animations with narration and quiz questions. Simply enter a topic, and watch as AI creates a complete animated video!
16
+
17
+ ## 🎯 Features
18
+
19
+ - **🎨 Automatic Animation Generation**: Creates professional Manim animations from topic descriptions
20
+ - **🗣️ AI Narration**: Generates educational narration scripts tailored to your audience
21
+ - **🔊 Text-to-Speech**: Converts narration to high-quality audio
22
+ - **📹 Video Production**: Renders and merges video with synchronized audio
23
+ - **❓ Quiz Generation**: Creates assessment questions to test understanding
24
+ - **🎓 Multi-Level Support**: Content appropriate for elementary through PhD levels
25
+
26
+ ## 🚀 How to Use
27
+
28
+ 1. **Enter a Topic**: Type any STEM concept (e.g., "Pythagorean Theorem", "Photosynthesis", "Newton's Laws")
29
+ 2. **Select Audience**: Choose the appropriate education level
30
+ 3. **Set Duration**: Pick animation length (0.5-10 minutes)
31
+ 4. **Choose Quality**: Select video quality (higher = slower but better)
32
+ 5. **Generate**: Click the button and wait for your animation!
33
+
34
+ ## 💡 Example Topics
35
+
36
+ - **Mathematics**: Pythagorean Theorem, Quadratic Formula, Circle Area Derivation
37
+ - **Physics**: Newton's Laws, Laws of Motion, Wave Properties
38
+ - **Biology**: Photosynthesis, Cell Division, DNA Structure
39
+ - **Computer Science**: Binary Numbers, Sorting Algorithms, Data Structures
40
+
41
+ ## 🔧 Technology Stack
42
+
43
+ - **Manim Community Edition**: Mathematical animation engine
44
+ - **Hugging Face Models**: AI-powered content generation
45
+ - **ElevenLabs**: High-quality text-to-speech synthesis
46
+ - **Blaxel**: Cloud-based secure rendering
47
+ - **Gradio**: Interactive web interface
48
+
49
+ ## 🔑 Setup Requirements
50
+
51
+ To run this space, you need:
52
+
53
+ 1. **Hugging Face API Key**: For AI content generation (required)
54
+ 2. **ElevenLabs API Key**: For high-quality TTS (optional, falls back to HF TTS)
55
+ 3. **Blaxel API Key**: For cloud rendering (optional, can use local rendering)
56
+
57
+ Set these as **Secrets** in your Hugging Face Space settings:
58
+ - `HUGGINGFACE_API_KEY`
59
+ - `ELEVENLABS_API_KEY` (optional)
60
+ - `BLAXEL_API_KEY` (optional)
61
+ - `MANIM_SANDBOX_IMAGE` (optional, for Blaxel cloud rendering)
62
+
63
+ ## 📝 Tips for Best Results
64
+
65
+ - **Be Specific**: Instead of "math", try "solving linear equations" or "area of a circle"
66
+ - **Choose Right Audience**: Match the complexity level to your target viewers
67
+ - **Optimal Duration**: 1.5-3 minutes works best for most concepts
68
+ - **Review Generated Content**: Check the narration and code tabs to see what was created
69
+
70
+ ## 🎬 How It Works
71
+
72
+ 1. **Concept Planning**: AI analyzes your topic and creates an educational plan
73
+ 2. **Script Writing**: Generates age-appropriate narration aligned with learning objectives
74
+ 3. **Code Generation**: Creates Manim Python code for visual representation
75
+ 4. **Rendering**: Executes Manim to produce the base animation
76
+ 5. **Audio Synthesis**: Converts narration to speech using TTS
77
+ 6. **Final Production**: Merges video and audio into complete animation
78
+ 7. **Assessment**: Generates quiz questions for the content
79
+
80
+ ## 📚 Use Cases
81
+
82
+ - **Teachers**: Create engaging lesson materials
83
+ - **Students**: Visualize complex concepts for better understanding
84
+ - **Content Creators**: Produce educational YouTube/social media content
85
+ - **Tutors**: Generate custom explanations for specific topics
86
+ - **Course Developers**: Build comprehensive educational video libraries
87
+
88
+ ## 🤝 Contributing
89
+
90
+ NeuroAnim is open source! Visit the [GitHub repository](https://github.com/yourusername/manim-agent) to:
91
+ - Report bugs or suggest features
92
+ - Submit pull requests with improvements
93
+ - Share your generated animations
94
+
95
+ ## 📄 License
96
+
97
+ MIT License - Free to use for educational and commercial purposes
98
+
99
+ ---
100
+
101
+ Made with ❤️ for educational content creation
README_HF.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: NeuroAnim - STEM Animation Generator
3
+ emoji: 🧠
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 6.0.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # 🧠 NeuroAnim - AI-Powered Educational Animation Generator
14
+
15
+ NeuroAnim is an AI-powered system that automatically generates educational STEM animations with narration and quiz questions. Simply enter a topic, and watch as AI creates a complete animated video!
16
+
17
+ ## 🎯 Features
18
+
19
+ - **🎨 Automatic Animation Generation**: Creates professional Manim animations from topic descriptions
20
+ - **🗣️ AI Narration**: Generates educational narration scripts tailored to your audience
21
+ - **🔊 Text-to-Speech**: Converts narration to high-quality audio
22
+ - **📹 Video Production**: Renders and merges video with synchronized audio
23
+ - **❓ Quiz Generation**: Creates assessment questions to test understanding
24
+ - **🎓 Multi-Level Support**: Content appropriate for elementary through PhD levels
25
+
26
+ ## 🚀 How to Use
27
+
28
+ 1. **Enter a Topic**: Type any STEM concept (e.g., "Pythagorean Theorem", "Photosynthesis", "Newton's Laws")
29
+ 2. **Select Audience**: Choose the appropriate education level
30
+ 3. **Set Duration**: Pick animation length (0.5-10 minutes)
31
+ 4. **Choose Quality**: Select video quality (higher = slower but better)
32
+ 5. **Generate**: Click the button and wait for your animation!
33
+
34
+ ## 💡 Example Topics
35
+
36
+ - **Mathematics**: Pythagorean Theorem, Quadratic Formula, Circle Area Derivation
37
+ - **Physics**: Newton's Laws, Laws of Motion, Wave Properties
38
+ - **Biology**: Photosynthesis, Cell Division, DNA Structure
39
+ - **Computer Science**: Binary Numbers, Sorting Algorithms, Data Structures
40
+
41
+ ## 🔧 Technology Stack
42
+
43
+ - **Manim Community Edition**: Mathematical animation engine
44
+ - **Hugging Face Models**: AI-powered content generation
45
+ - **ElevenLabs**: High-quality text-to-speech synthesis
46
+ - **Blaxel**: Cloud-based secure rendering
47
+ - **Gradio**: Interactive web interface
48
+
49
+ ## 🔑 Setup Requirements
50
+
51
+ To run this space, you need:
52
+
53
+ 1. **Hugging Face API Key**: For AI content generation (required)
54
+ 2. **ElevenLabs API Key**: For high-quality TTS (optional, falls back to HF TTS)
55
+ 3. **Blaxel API Key**: For cloud rendering (optional, can use local rendering)
56
+
57
+ Set these as **Secrets** in your Hugging Face Space settings:
58
+ - `HUGGINGFACE_API_KEY`
59
+ - `ELEVENLABS_API_KEY` (optional)
60
+ - `BLAXEL_API_KEY` (optional)
61
+ - `MANIM_SANDBOX_IMAGE` (optional, for Blaxel cloud rendering)
62
+
63
+ ## 📝 Tips for Best Results
64
+
65
+ - **Be Specific**: Instead of "math", try "solving linear equations" or "area of a circle"
66
+ - **Choose Right Audience**: Match the complexity level to your target viewers
67
+ - **Optimal Duration**: 1.5-3 minutes works best for most concepts
68
+ - **Review Generated Content**: Check the narration and code tabs to see what was created
69
+
70
+ ## 🎬 How It Works
71
+
72
+ 1. **Concept Planning**: AI analyzes your topic and creates an educational plan
73
+ 2. **Script Writing**: Generates age-appropriate narration aligned with learning objectives
74
+ 3. **Code Generation**: Creates Manim Python code for visual representation
75
+ 4. **Rendering**: Executes Manim to produce the base animation
76
+ 5. **Audio Synthesis**: Converts narration to speech using TTS
77
+ 6. **Final Production**: Merges video and audio into complete animation
78
+ 7. **Assessment**: Generates quiz questions for the content
79
+
80
+ ## 📚 Use Cases
81
+
82
+ - **Teachers**: Create engaging lesson materials
83
+ - **Students**: Visualize complex concepts for better understanding
84
+ - **Content Creators**: Produce educational YouTube/social media content
85
+ - **Tutors**: Generate custom explanations for specific topics
86
+ - **Course Developers**: Build comprehensive educational video libraries
87
+
88
+ ## 🤝 Contributing
89
+
90
+ NeuroAnim is open source! Visit the [GitHub repository](https://github.com/yourusername/manim-agent) to:
91
+ - Report bugs or suggest features
92
+ - Submit pull requests with improvements
93
+ - Share your generated animations
94
+
95
+ ## 📄 License
96
+
97
+ MIT License - Free to use for educational and commercial purposes
98
+
99
+ ---
100
+
101
+ Made with ❤️ for educational content creation
app.py ADDED
@@ -0,0 +1,661 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ NeuroAnim Gradio Web Interface
4
+
5
+ A comprehensive web UI for generating educational STEM animations with:
6
+ - Topic input and configuration
7
+ - Real-time progress tracking
8
+ - Video preview and download
9
+ - Generated content display (narration, code, quiz)
10
+ - Error handling and logging
11
+ """
12
+
13
+ import asyncio
14
+ import logging
15
+ import os
16
+ from datetime import datetime
17
+ from pathlib import Path
18
+ from typing import Any, Dict, Optional, Tuple
19
+
20
+ import gradio as gr
21
+ from dotenv import load_dotenv
22
+
23
+ from orchestrator import NeuroAnimOrchestrator
24
+
25
+ load_dotenv()
26
+
27
+ # Set up logging
28
+ logging.basicConfig(
29
+ level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
30
+ )
31
+ logger = logging.getLogger(__name__)
32
+
33
+
34
+ def format_quiz_markdown(quiz_text: str) -> str:
35
+ """Format quiz text into a nice markdown display."""
36
+ if not quiz_text or quiz_text == "Not available":
37
+ return "❓ No quiz generated yet."
38
+
39
+ # If it's already formatted or looks good, return as is with some styling
40
+ formatted = f"## 📝 Assessment Questions\n\n{quiz_text}"
41
+
42
+ # Try to add some structure if it's plain text
43
+ lines = quiz_text.split("\n")
44
+ formatted_lines = []
45
+ question_num = 0
46
+
47
+ for line in lines:
48
+ line = line.strip()
49
+ if not line:
50
+ formatted_lines.append("")
51
+ continue
52
+
53
+ # Detect question patterns
54
+ if line.lower().startswith(("q:", "question", "q.", f"{question_num + 1}.")):
55
+ question_num += 1
56
+ formatted_lines.append(f"\n### Question {question_num}")
57
+ # Remove the question prefix
58
+ clean_line = line.split(":", 1)[-1].strip() if ":" in line else line
59
+ formatted_lines.append(f"**{clean_line}**\n")
60
+ elif line.lower().startswith(("a)", "b)", "c)", "d)", "a.", "b.", "c.", "d.")):
61
+ # Format multiple choice options
62
+ formatted_lines.append(f"- {line}")
63
+ elif line.lower().startswith(("answer:", "a:", "correct:")):
64
+ # Format answers
65
+ formatted_lines.append(f"\n> ✅ {line}\n")
66
+ else:
67
+ formatted_lines.append(line)
68
+
69
+ # If we detected structure, use the formatted version
70
+ if question_num > 0:
71
+ return "## 📝 Assessment Questions\n\n" + "\n".join(formatted_lines)
72
+
73
+ # Otherwise return with basic formatting
74
+ return formatted
75
+
76
+
77
+ class NeuroAnimApp:
78
+ """Main application class for Gradio interface."""
79
+
80
+ def __init__(self):
81
+ self.orchestrator: Optional[NeuroAnimOrchestrator] = None
82
+ self.current_task: Optional[asyncio.Task] = None
83
+ self.is_generating = False
84
+ self.event_loop: Optional[asyncio.AbstractEventLoop] = None
85
+ self.current_progress = None # Store progress callback for dynamic updates
86
+
87
+ async def initialize_orchestrator(self):
88
+ """Initialize the orchestrator if not already done."""
89
+ if self.orchestrator is None:
90
+ self.orchestrator = NeuroAnimOrchestrator()
91
+ await self.orchestrator.initialize()
92
+ logger.info("Orchestrator initialized successfully")
93
+
94
+ async def cleanup_orchestrator(self):
95
+ """Clean up orchestrator resources."""
96
+ if self.orchestrator is not None:
97
+ await self.orchestrator.cleanup()
98
+ self.orchestrator = None
99
+ logger.info("Orchestrator cleaned up")
100
+
101
+ def cleanup_event_loop(self):
102
+ """Clean up the event loop on application shutdown."""
103
+ if self.event_loop is not None and not self.event_loop.is_closed():
104
+ self.event_loop.close()
105
+ self.event_loop = None
106
+ logger.info("Event loop closed")
107
+
108
+ async def generate_animation_async(
109
+ self, topic: str, audience: str, duration: float, quality: str, progress=gr.Progress()
110
+ ) -> Dict[str, Any]:
111
+ """
112
+ Generate animation with progress tracking.
113
+
114
+ Args:
115
+ topic: STEM topic to animate
116
+ audience: Target audience level
117
+ duration: Animation duration in minutes
118
+ quality: Video quality (low, medium, high, production_quality)
119
+ progress: Gradio progress tracker
120
+
121
+ Returns:
122
+ Results dictionary with generated content
123
+ """
124
+ try:
125
+ self.is_generating = True
126
+
127
+ # Validate inputs
128
+ if not topic or len(topic.strip()) < 3:
129
+ return {
130
+ "success": False,
131
+ "error": "Please provide a valid topic (at least 3 characters)",
132
+ }
133
+
134
+ if duration < 0.5 or duration > 10:
135
+ return {
136
+ "success": False,
137
+ "error": "Duration must be between 0.5 and 10 minutes",
138
+ }
139
+
140
+ # Initialize orchestrator
141
+ progress(0.05, desc="Initializing system...")
142
+ await self.initialize_orchestrator()
143
+
144
+ # Generate unique filename
145
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
146
+ safe_topic = "".join(c if c.isalnum() else "_" for c in topic)[:30]
147
+ output_filename = f"{safe_topic}_{timestamp}.mp4"
148
+
149
+ # Map quality from UI to orchestrator format
150
+ quality_map = {
151
+ "Low (480p, faster)": "low",
152
+ "Medium (720p, balanced)": "medium",
153
+ "High (1080p, slower)": "high",
154
+ "Production (4K, slowest)": "production_quality",
155
+ }
156
+ quality_param = quality_map.get(quality, "medium")
157
+
158
+ # Map audience from UI to orchestrator format
159
+ audience_map = {
160
+ "elementary": "elementary",
161
+ "middle_school": "middle_school",
162
+ "high_school": "high_school",
163
+ "undergraduate": "college", # Map to 'college' for LLM compatibility
164
+ "phd": "graduate", # Map to 'graduate' for LLM compatibility
165
+ "general": "general",
166
+ }
167
+ audience_param = audience_map.get(audience, audience)
168
+
169
+ # Dynamic progress tracking with step-based updates
170
+ step_times = {} # Track step start times
171
+ step_index = [0] # Current step index
172
+
173
+ steps = [
174
+ (0.1, "Planning concept"),
175
+ (0.25, "Generating narration script"),
176
+ (0.40, "Creating Manim animation code"),
177
+ (0.55, "Rendering animation video"),
178
+ (0.75, "Generating audio narration"),
179
+ (0.90, "Merging video and audio"),
180
+ (0.95, "Creating quiz questions"),
181
+ ]
182
+
183
+ import time
184
+
185
+ def progress_callback(step_name: str, step_progress: float):
186
+ """Callback for orchestrator to report progress."""
187
+ # Find matching step
188
+ for idx, (prog, desc) in enumerate(steps):
189
+ if desc.lower() in step_name.lower():
190
+ step_index[0] = idx
191
+
192
+ # Track timing
193
+ current_time = time.time()
194
+ if step_name not in step_times:
195
+ step_times[step_name] = current_time
196
+ elapsed = current_time - step_times[step_name]
197
+
198
+ # Add timing info for long steps
199
+ if elapsed > 30: # Show message if step takes more than 30s
200
+ desc_with_time = f"{desc} (taking longer than usual, please wait...)"
201
+ else:
202
+ desc_with_time = f"{desc}..."
203
+
204
+ progress(prog, desc=desc_with_time)
205
+ return
206
+
207
+ # If no match, use the provided progress directly
208
+ progress(step_progress, desc=f"{step_name}...")
209
+
210
+ # Start generation with dynamic progress
211
+ result = await self.orchestrator.generate_animation(
212
+ topic=topic,
213
+ target_audience=audience_param,
214
+ animation_length_minutes=duration,
215
+ output_filename=output_filename,
216
+ quality=quality_param,
217
+ progress_callback=progress_callback,
218
+ )
219
+
220
+ progress(1.0, desc="Complete!")
221
+ logger.info("Async generation completed, returning result")
222
+
223
+ return result
224
+
225
+ except Exception as e:
226
+ logger.error(f"Generation failed: {e}", exc_info=True)
227
+ return {"success": False, "error": str(e)}
228
+ finally:
229
+ self.is_generating = False
230
+
231
+ def generate_animation_sync(
232
+ self, topic: str, audience: str, duration: float, quality: str, progress=gr.Progress()
233
+ ) -> Tuple[str, str, str, str, str, str]:
234
+ """
235
+ Synchronous wrapper for Gradio interface.
236
+
237
+ Returns:
238
+ Tuple of (video_path, status, narration, code, quiz, concept_plan)
239
+ """
240
+ try:
241
+ # Reuse existing event loop or create a persistent one
242
+ if self.event_loop is None or self.event_loop.is_closed():
243
+ self.event_loop = asyncio.new_event_loop()
244
+ asyncio.set_event_loop(self.event_loop)
245
+ logger.info("Created new persistent event loop")
246
+ else:
247
+ asyncio.set_event_loop(self.event_loop)
248
+ logger.info("Reusing existing event loop")
249
+
250
+ logger.info("Starting event loop execution...")
251
+ result = self.event_loop.run_until_complete(
252
+ self.generate_animation_async(topic, audience, duration, quality, progress)
253
+ )
254
+ logger.info("Event loop execution completed")
255
+ # DO NOT close the loop - keep it for subsequent generations
256
+
257
+ if result["success"]:
258
+ logger.info("Processing successful result...")
259
+ video_path = result["output_file"]
260
+ status = f"✅ **Animation Generated Successfully!**\n\n**Topic:** {result['topic']}\n**Audience:** {result['target_audience']}\n**Output:** {os.path.basename(video_path)}"
261
+ narration = result.get("narration", "Not available")
262
+ code = result.get("manim_code", "Not available")
263
+ quiz_raw = result.get("quiz", "Not available")
264
+ quiz = format_quiz_markdown(quiz_raw)
265
+ concept = result.get("concept_plan", "Not available")
266
+
267
+ logger.info(f"Returning result to Gradio: {video_path}")
268
+ return video_path, video_path, status, narration, code, quiz, concept
269
+ else:
270
+ error_msg = result.get("error", "Unknown error")
271
+ status = f"❌ **Generation Failed**\n\n{error_msg}"
272
+ return None, None, status, "", "", "", ""
273
+
274
+ except Exception as e:
275
+ logger.error(f"Sync wrapper error: {e}", exc_info=True)
276
+ status = f"💥 **Unexpected Error**\n\n{str(e)}"
277
+ return None, None, status, "", "", "", ""
278
+
279
+
280
+ def create_interface() -> gr.Blocks:
281
+ """Create the Gradio interface."""
282
+
283
+ app = NeuroAnimApp()
284
+
285
+ # Custom CSS for better styling
286
+ custom_css = """
287
+ .main-title {
288
+ text-align: center;
289
+ color: #2563eb;
290
+ font-size: 2.5em;
291
+ font-weight: bold;
292
+ margin-bottom: 0.5em;
293
+ }
294
+ .subtitle {
295
+ text-align: center;
296
+ color: #64748b;
297
+ font-size: 1.2em;
298
+ margin-bottom: 2em;
299
+ }
300
+ .status-box {
301
+ padding: 1em;
302
+ border-radius: 8px;
303
+ margin: 1em 0;
304
+ }
305
+ .gradio-container {
306
+ max-width: 1400px !important;
307
+ }
308
+ /* Video player styling */
309
+ video {
310
+ border-radius: 8px;
311
+ box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
312
+ }
313
+ /* Quiz and content styling */
314
+ .markdown-text h2 {
315
+ color: #1e40af;
316
+ border-bottom: 2px solid #3b82f6;
317
+ padding-bottom: 0.5em;
318
+ margin-top: 1em;
319
+ }
320
+ .markdown-text h3 {
321
+ color: #1e293b;
322
+ margin-top: 1em;
323
+ }
324
+ .markdown-text blockquote {
325
+ background-color: #f0fdf4;
326
+ border-left: 4px solid #22c55e;
327
+ padding: 0.5em 1em;
328
+ margin: 1em 0;
329
+ }
330
+ /* Button styling */
331
+ .primary {
332
+ background: linear-gradient(135deg, #2563eb 0%, #1d4ed8 100%);
333
+ }
334
+ /* Code block styling */
335
+ .code-container {
336
+ border-radius: 8px;
337
+ margin: 1em 0;
338
+ }
339
+ """
340
+
341
+ with gr.Blocks(title="NeuroAnim - STEM Animation Generator") as interface:
342
+ # Apply custom CSS
343
+ interface.css = custom_css
344
+ # Header
345
+ gr.HTML("""
346
+ <div class="main-title">🧠 NeuroAnim</div>
347
+ <div class="subtitle">AI-Powered Educational Animation Generator</div>
348
+ """)
349
+
350
+ with gr.Tabs() as tabs:
351
+ # Main Generation Tab
352
+ with gr.TabItem("🎬 Generate Animation", id=0):
353
+ gr.Markdown("""
354
+ ### Create Your Educational Animation
355
+ Enter a mathematical or scientific concept, and NeuroAnim will generate a complete animated video with narration and quiz questions.
356
+ """)
357
+
358
+ with gr.Row():
359
+ with gr.Column(scale=1):
360
+ # Input Section
361
+ gr.Markdown("#### 📝 Animation Configuration")
362
+
363
+ topic_input = gr.Textbox(
364
+ label="Topic / Concept",
365
+ placeholder="e.g., Pythagorean Theorem, Photosynthesis, Newton's Laws, etc.",
366
+ lines=2,
367
+ info="Enter the STEM concept you want to explain",
368
+ )
369
+
370
+ with gr.Row():
371
+ audience_input = gr.Dropdown(
372
+ label="Target Audience",
373
+ choices=[
374
+ "elementary",
375
+ "middle_school",
376
+ "high_school",
377
+ "undergraduate",
378
+ "phd",
379
+ "general",
380
+ ],
381
+ value="high_school",
382
+ info="Select the appropriate education level",
383
+ )
384
+
385
+ duration_input = gr.Slider(
386
+ label="Duration (minutes)",
387
+ minimum=0.5,
388
+ maximum=10,
389
+ value=2.0,
390
+ step=0.5,
391
+ info="Animation length",
392
+ )
393
+
394
+ quality_input = gr.Dropdown(
395
+ label="Video Quality",
396
+ choices=[
397
+ "Low (480p, faster)",
398
+ "Medium (720p, balanced)",
399
+ "High (1080p, slower)",
400
+ "Production (4K, slowest)",
401
+ ],
402
+ value="Medium (720p, balanced)",
403
+ info="Higher quality takes longer to render",
404
+ )
405
+
406
+ generate_btn = gr.Button(
407
+ "🚀 Generate Animation", variant="primary", size="lg"
408
+ )
409
+
410
+ status_output = gr.Markdown(
411
+ label="Status",
412
+ value="Ready to generate...",
413
+ elem_classes=["status-box"],
414
+ )
415
+
416
+ # Example inputs
417
+ gr.Markdown("#### 💡 Example Topics")
418
+ gr.Examples(
419
+ examples=[
420
+ ["Pythagorean Theorem", "high_school", 2.0, "Medium (720p, balanced)"],
421
+ ["Laws of Motion", "middle_school", 2.5, "Low (480p, faster)"],
422
+ ["Binary Numbers", "middle_school", 1.5, "Medium (720p, balanced)"],
423
+ ["Photosynthesis Process", "elementary", 2.0, "Low (480p, faster)"],
424
+ ["Quadratic Formula", "high_school", 3.0, "Medium (720p, balanced)"],
425
+ ["Circle Area Derivation", "undergraduate", 2.5, "High (1080p, slower)"],
426
+ ],
427
+ inputs=[topic_input, audience_input, duration_input, quality_input],
428
+ )
429
+
430
+ with gr.Column(scale=1):
431
+ # Output Section
432
+ gr.Markdown("#### 🎥 Generated Animation")
433
+
434
+ video_output = gr.Video(
435
+ label="Animation Video", height=400, interactive=False
436
+ )
437
+
438
+ # Download button
439
+ download_file = gr.File(
440
+ label="📥 Download Animation",
441
+ interactive=False,
442
+ visible=True,
443
+ )
444
+
445
+ gr.Markdown(
446
+ "**Tip:** Click the download button above or use the ⋮ menu on the video player"
447
+ )
448
+
449
+ # Additional outputs in accordion
450
+ with gr.Accordion("📄 View Generated Content", open=True):
451
+ with gr.Tabs():
452
+ with gr.TabItem("📖 Narration Script"):
453
+ narration_output = gr.Textbox(
454
+ label="Narration Text",
455
+ lines=8,
456
+ interactive=False,
457
+ )
458
+
459
+ with gr.TabItem("💻 Manim Code"):
460
+ code_output = gr.Code(
461
+ label="Generated Python Code",
462
+ language="python",
463
+ interactive=False,
464
+ lines=15,
465
+ )
466
+
467
+ with gr.TabItem("❓ Quiz Questions"):
468
+ quiz_output = gr.Markdown(
469
+ label="Assessment Questions",
470
+ value="Quiz will appear here after generation...",
471
+ )
472
+
473
+ with gr.TabItem("📋 Concept Plan"):
474
+ concept_output = gr.Textbox(
475
+ label="Educational Plan",
476
+ lines=10,
477
+ interactive=False,
478
+ )
479
+
480
+ # Connect the generate button
481
+ generate_btn.click(
482
+ fn=app.generate_animation_sync,
483
+ inputs=[topic_input, audience_input, duration_input, quality_input],
484
+ outputs=[
485
+ video_output,
486
+ download_file,
487
+ status_output,
488
+ narration_output,
489
+ code_output,
490
+ quiz_output,
491
+ concept_output,
492
+ ],
493
+ api_name="generate",
494
+ )
495
+
496
+ # About Tab
497
+ with gr.TabItem("ℹ️ About", id=1):
498
+ gr.Markdown("""
499
+ # About NeuroAnim
500
+
501
+ NeuroAnim is an AI-powered educational animation generator that creates engaging STEM content automatically.
502
+
503
+ ## 🎯 Features
504
+
505
+ - **🎨 Automatic Animation Generation**: Creates professional Manim animations from topic descriptions
506
+ - **🗣️ AI Narration**: Generates educational narration scripts tailored to your audience
507
+ - **🔊 Text-to-Speech**: Converts narration to high-quality audio with ElevenLabs or Hugging Face
508
+ - **📹 Video Production**: Renders and merges video with synchronized audio
509
+ - **❓ Quiz Generation**: Creates assessment questions to test understanding
510
+ - **🎓 Multi-Level Support**: Content appropriate for elementary through undergraduate levels
511
+
512
+ ## 🔧 Technology Stack
513
+
514
+ - **Manim Community Edition**: Mathematical animation engine
515
+ - **Hugging Face Models**: AI-powered content generation
516
+ - **ElevenLabs**: High-quality text-to-speech synthesis
517
+ - **MCP (Model Context Protocol)**: Modular server architecture
518
+ - **Gradio**: Interactive web interface
519
+
520
+ ## 🚀 How It Works
521
+
522
+ 1. **Concept Planning**: AI analyzes your topic and creates an educational plan
523
+ 2. **Script Writing**: Generates age-appropriate narration aligned with learning objectives
524
+ 3. **Code Generation**: Creates Manim Python code for visual representation
525
+ 4. **Rendering**: Executes Manim to produce the base animation
526
+ 5. **Audio Synthesis**: Converts narration to speech using TTS
527
+ 6. **Final Production**: Merges video and audio into complete animation
528
+ 7. **Assessment**: Generates quiz questions for the content
529
+
530
+ ## 📝 Tips for Best Results
531
+
532
+ - **Be Specific**: Instead of "math", try "solving linear equations" or "area of a circle"
533
+ - **Choose Right Audience**: Match the complexity level to your target viewers
534
+ - **Optimal Duration**: 1.5-3 minutes works best for most concepts
535
+ - **Review Generated Content**: Check the narration and code tabs to see what was created
536
+ - **Iterate**: If results aren't perfect, try rewording your topic or adjusting parameters
537
+
538
+ ## 🔑 Setup Requirements
539
+
540
+ To use NeuroAnim, you need:
541
+ - **Hugging Face API Key**: For AI content generation (required)
542
+ - **ElevenLabs API Key**: For high-quality TTS (optional, falls back to HF TTS)
543
+
544
+ Set these in your `.env` file:
545
+ ```bash
546
+ HUGGINGFACE_API_KEY=your_key_here
547
+ ELEVENLABS_API_KEY=your_key_here # Optional
548
+ ```
549
+
550
+ ## 📚 Example Use Cases
551
+
552
+ - **Teachers**: Create engaging lesson materials
553
+ - **Students**: Visualize complex concepts for better understanding
554
+ - **Content Creators**: Produce educational YouTube/social media content
555
+ - **Tutors**: Generate custom explanations for specific topics
556
+ - **Course Developers**: Build comprehensive educational video libraries
557
+
558
+ ## 🤝 Contributing
559
+
560
+ NeuroAnim is open source! Contributions are welcome:
561
+ - Report bugs or suggest features via GitHub Issues
562
+ - Submit pull requests with improvements
563
+ - Share your generated animations with the community
564
+
565
+ ## 📄 License
566
+
567
+ MIT License - Free to use for educational and commercial purposes
568
+
569
+ ---
570
+
571
+ Made with ❤️ for educational content creation
572
+ """)
573
+
574
+ # Settings Tab
575
+ with gr.TabItem("⚙️ Settings", id=2):
576
+ gr.Markdown("""
577
+ # System Configuration
578
+
579
+ Configure API keys and system settings here.
580
+ """)
581
+
582
+ with gr.Group():
583
+ gr.Markdown("### 🔑 API Keys")
584
+
585
+ hf_key_status = gr.Textbox(
586
+ label="Hugging Face API Key Status",
587
+ value="✅ Configured"
588
+ if os.getenv("HUGGINGFACE_API_KEY")
589
+ else "❌ Not Set",
590
+ interactive=False,
591
+ )
592
+
593
+ eleven_key_status = gr.Textbox(
594
+ label="ElevenLabs API Key Status",
595
+ value="✅ Configured"
596
+ if os.getenv("ELEVENLABS_API_KEY")
597
+ else "⚠️ Not Set (will use fallback TTS)",
598
+ interactive=False,
599
+ )
600
+
601
+ gr.Markdown("""
602
+ **To configure API keys:**
603
+ 1. Create a `.env` file in the project root
604
+ 2. Add your keys:
605
+ ```
606
+ HUGGINGFACE_API_KEY=your_hf_key
607
+ ELEVENLABS_API_KEY=your_elevenlabs_key
608
+ ```
609
+ 3. Restart the application
610
+ """)
611
+
612
+ with gr.Group():
613
+ gr.Markdown("### 📊 System Info")
614
+
615
+ system_info = gr.Textbox(
616
+ label="System Status",
617
+ value=f"""
618
+ Output Directory: {Path("outputs").absolute()}
619
+ Working Directory: Temporary (auto-created)
620
+ Manim Version: Community Edition
621
+ Default Quality: Medium (720p, 30fps)
622
+ """.strip(),
623
+ interactive=False,
624
+ lines=6,
625
+ )
626
+
627
+ return interface
628
+
629
+
630
+ def main():
631
+ """Launch the Gradio application."""
632
+
633
+ # Check for API keys
634
+ if not os.getenv("HUGGINGFACE_API_KEY"):
635
+ logger.warning("HUGGINGFACE_API_KEY not set! Generation will fail.")
636
+ print("\n⚠️ WARNING: HUGGINGFACE_API_KEY environment variable not set!")
637
+ print("Please set it in your .env file or environment.\n")
638
+
639
+ if not os.getenv("ELEVENLABS_API_KEY"):
640
+ logger.info("ELEVENLABS_API_KEY not set, will use fallback TTS")
641
+ print(
642
+ "\nℹ️ Note: ELEVENLABS_API_KEY not set. Using fallback TTS (may have lower quality).\n"
643
+ )
644
+
645
+ # Create outputs directory
646
+ Path("outputs").mkdir(exist_ok=True)
647
+
648
+ # Build and launch interface
649
+ interface = create_interface()
650
+
651
+ logger.info("Launching Gradio interface...")
652
+
653
+ interface.launch(
654
+ server_name="0.0.0.0",
655
+ server_port=7860,
656
+ share=False,
657
+ )
658
+
659
+
660
+ if __name__ == "__main__":
661
+ main()
manim_mcp/README.md ADDED
@@ -0,0 +1,337 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Manim MCP Server
2
+
3
+ A comprehensive Model Context Protocol (MCP) server for creating educational STEM animations using Manim. This server combines AI-powered creative tools with rendering and video processing capabilities to streamline the animation creation workflow.
4
+
5
+ ## Features
6
+
7
+ ### 🎨 Creative Tools
8
+ - **Concept Planning**: AI-powered STEM concept planning with learning objectives and scene flow
9
+ - **Code Generation**: Intelligent Manim code generation with syntax validation
10
+ - **Code Refinement**: Automatic code improvement based on errors and feedback
11
+ - **Narration Generation**: Educational script writing tailored to target audiences
12
+ - **Quiz Generation**: Automated assessment question creation
13
+
14
+ ### 🎬 Rendering & Processing
15
+ - **Manim Rendering**: Full Manim animation rendering with quality controls
16
+ - **Video Processing**: FFmpeg-based video manipulation and conversion
17
+ - **Audio/Video Merging**: Seamless integration of narration with animations
18
+ - **File Management**: Comprehensive file system operations
19
+
20
+ ### 🤖 AI Integration
21
+ - **Vision Analysis**: Frame-by-frame quality assessment using vision models
22
+ - **Text-to-Speech**: Natural voice synthesis for narration
23
+ - **Multi-Model Support**: Flexible model selection for different tasks
24
+
25
+ ## Installation
26
+
27
+ ### Prerequisites
28
+
29
+ - Python 3.12+
30
+ - Manim Community Edition (`manim>=0.18.1`)
31
+ - FFmpeg (for video processing)
32
+ - HuggingFace API key (for AI features)
33
+
34
+ ### Setup
35
+
36
+ 1. Install the package and dependencies:
37
+
38
+ ```bash
39
+ pip install mcp huggingface_hub manim pydantic aiohttp httpx numpy Pillow
40
+ ```
41
+
42
+ 2. Set up your environment variables:
43
+
44
+ ```bash
45
+ export HUGGINGFACE_API_KEY="your_api_key_here"
46
+ ```
47
+
48
+ 3. Run the MCP server:
49
+
50
+ ```bash
51
+ python manim_mcp/server.py
52
+ ```
53
+
54
+ ## Usage
55
+
56
+ ### As an MCP Server
57
+
58
+ The server can be integrated into any MCP-compatible client (like Claude Desktop):
59
+
60
+ ```json
61
+ {
62
+ "mcpServers": {
63
+ "manim": {
64
+ "command": "python",
65
+ "args": ["path/to/manim_mcp/server.py"],
66
+ "env": {
67
+ "HUGGINGFACE_API_KEY": "your_key"
68
+ }
69
+ }
70
+ }
71
+ }
72
+ ```
73
+
74
+ ### Programmatic Usage
75
+
76
+ ```python
77
+ from mcp import ClientSession, StdioServerParameters
78
+ from mcp.client.stdio import stdio_client
79
+
80
+ # Initialize MCP client
81
+ params = StdioServerParameters(
82
+ command="python",
83
+ args=["manim_mcp/server.py"],
84
+ env={"HUGGINGFACE_API_KEY": "your_key"}
85
+ )
86
+
87
+ async with stdio_client(params) as (read, write):
88
+ session = ClientSession(read, write)
89
+ await session.initialize()
90
+
91
+ # Plan a concept
92
+ result = await session.call_tool("plan_concept", {
93
+ "topic": "Pythagorean Theorem",
94
+ "target_audience": "high_school",
95
+ "animation_length_minutes": 2.0
96
+ })
97
+ ```
98
+
99
+ ## Available Tools
100
+
101
+ ### Planning & Creative
102
+
103
+ #### `plan_concept`
104
+ Plan a STEM concept for animation with learning objectives and scene flow.
105
+
106
+ **Parameters:**
107
+ - `topic` (string, required): The STEM topic to animate
108
+ - `target_audience` (enum, required): elementary | middle_school | high_school | college | general
109
+ - `animation_length_minutes` (number, optional): Desired length in minutes
110
+ - `model` (string, optional): HuggingFace model to use
111
+
112
+ #### `generate_manim_code`
113
+ Generate complete, runnable Manim Python code.
114
+
115
+ **Parameters:**
116
+ - `concept` (string, required): Animation concept
117
+ - `scene_description` (string, required): Detailed scene description
118
+ - `visual_elements` (array, optional): List of visual elements to include
119
+ - `previous_code` (string, optional): For retry attempts
120
+ - `error_message` (string, optional): Error from previous attempt
121
+
122
+ #### `refine_animation`
123
+ Refine existing Manim code based on feedback.
124
+
125
+ **Parameters:**
126
+ - `original_code` (string, required): Code to refine
127
+ - `feedback` (string, required): Feedback or error message
128
+ - `improvement_goals` (array, optional): Specific improvements to make
129
+
130
+ #### `generate_narration`
131
+ Generate educational narration scripts.
132
+
133
+ **Parameters:**
134
+ - `concept` (string, required): Animation concept
135
+ - `scene_description` (string, required): Scene details
136
+ - `target_audience` (string, required): Target audience level
137
+ - `duration_seconds` (integer, optional): Script duration
138
+
139
+ #### `generate_quiz`
140
+ Generate educational quiz questions.
141
+
142
+ **Parameters:**
143
+ - `concept` (string, required): STEM concept
144
+ - `difficulty` (enum, required): easy | medium | hard
145
+ - `num_questions` (integer, required): Number of questions
146
+ - `question_types` (array, optional): Types of questions
147
+
148
+ ### Rendering & Processing
149
+
150
+ #### `write_manim_file`
151
+ Write Manim code to a file.
152
+
153
+ **Parameters:**
154
+ - `filepath` (string, required): Destination path
155
+ - `code` (string, required): Manim code to write
156
+
157
+ #### `render_manim_animation`
158
+ Render a Manim animation from a Python file.
159
+
160
+ **Parameters:**
161
+ - `scene_name` (string, required): Scene class name
162
+ - `file_path` (string, required): Path to Python file
163
+ - `output_dir` (string, required): Output directory
164
+ - `quality` (enum, optional): low | medium | high | production_quality
165
+ - `format` (enum, optional): mp4 | gif | png
166
+ - `frame_rate` (integer, optional): Frame rate (default: 30)
167
+
168
+ #### `merge_video_audio`
169
+ Merge video and audio files.
170
+
171
+ **Parameters:**
172
+ - `video_file` (string, required): Path to video
173
+ - `audio_file` (string, required): Path to audio
174
+ - `output_file` (string, required): Output path
175
+
176
+ #### `process_video_with_ffmpeg`
177
+ Process videos with custom FFmpeg arguments.
178
+
179
+ **Parameters:**
180
+ - `input_files` (array, required): Input file paths
181
+ - `output_file` (string, required): Output path
182
+ - `ffmpeg_args` (array, optional): Additional FFmpeg arguments
183
+
184
+ #### `check_file_exists`
185
+ Check file existence and get metadata.
186
+
187
+ **Parameters:**
188
+ - `filepath` (string, required): File path to check
189
+
190
+ ### Analysis
191
+
192
+ #### `analyze_frame`
193
+ Analyze animation frames using vision models.
194
+
195
+ **Parameters:**
196
+ - `image_path` (string, required): Path to image
197
+ - `analysis_type` (string, required): Type of analysis
198
+ - `context` (string, optional): Additional context
199
+ - `model` (string, optional): Vision model to use
200
+
201
+ #### `generate_speech`
202
+ Convert text to speech audio.
203
+
204
+ **Parameters:**
205
+ - `text` (string, required): Text to convert
206
+ - `output_path` (string, required): Audio output path
207
+ - `voice` (string, optional): Voice to use
208
+ - `model` (string, optional): TTS model to use
209
+
210
+ ## Complete Workflow Example
211
+
212
+ Here's a typical animation generation workflow:
213
+
214
+ 1. **Plan** the concept
215
+ 2. **Generate** narration script
216
+ 3. **Generate** Manim code
217
+ 4. **Write** code to file
218
+ 5. **Render** the animation
219
+ 6. **Generate** speech audio
220
+ 7. **Merge** video and audio
221
+ 8. **Generate** quiz questions
222
+
223
+ ```python
224
+ # 1. Plan concept
225
+ plan = await session.call_tool("plan_concept", {
226
+ "topic": "Newton's Laws of Motion",
227
+ "target_audience": "high_school"
228
+ })
229
+
230
+ # 2. Generate narration
231
+ narration = await session.call_tool("generate_narration", {
232
+ "concept": "Newton's Laws",
233
+ "scene_description": plan["text"],
234
+ "target_audience": "high_school",
235
+ "duration_seconds": 120
236
+ })
237
+
238
+ # 3. Generate code
239
+ code = await session.call_tool("generate_manim_code", {
240
+ "concept": "Newton's Laws",
241
+ "scene_description": plan["text"],
242
+ "visual_elements": ["text", "shapes", "arrows"]
243
+ })
244
+
245
+ # 4-7. Continue workflow...
246
+ ```
247
+
248
+ ## Configuration
249
+
250
+ ### Environment Variables
251
+
252
+ - `HUGGINGFACE_API_KEY`: Required for AI-powered tools
253
+ - `ELEVENLABS_API_KEY`: Optional for premium TTS (falls back to free alternatives)
254
+
255
+ ### Model Selection
256
+
257
+ By default, the server uses sensible model defaults, but you can specify custom models:
258
+
259
+ ```python
260
+ await session.call_tool("generate_manim_code", {
261
+ "concept": "topic",
262
+ "scene_description": "description",
263
+ "model": "Qwen/Qwen2.5-Coder-32B-Instruct" # Custom model
264
+ })
265
+ ```
266
+
267
+ ## Quality Settings
268
+
269
+ Rendering quality options:
270
+ - **low**: 480p15 - Fast, good for testing
271
+ - **medium**: 720p30 - Balanced quality/speed (default)
272
+ - **high**: 1080p60 - High quality, slower
273
+ - **production_quality**: 2160p60 - 4K, very slow
274
+
275
+ ## Error Handling
276
+
277
+ The server includes comprehensive error handling:
278
+ - Syntax validation for generated code
279
+ - Retry logic for code generation failures
280
+ - Graceful fallbacks for AI services
281
+ - Detailed error messages for debugging
282
+
283
+ ## Architecture
284
+
285
+ The server is organized into modular tool categories:
286
+
287
+ ```
288
+ manim_mcp/
289
+ ├── server.py # Main MCP server
290
+ ├── tools/
291
+ │ ├── planning.py # Concept planning
292
+ │ ├── code_generation.py # Code generation & refinement
293
+ │ ├── rendering.py # Manim rendering
294
+ │ ├── vision.py # Frame analysis
295
+ │ ├── audio.py # TTS & narration
296
+ │ ├── video.py # Video processing
297
+ │ └── quiz.py # Quiz generation
298
+ ```
299
+
300
+ ## Requirements
301
+
302
+ - Python >= 3.12
303
+ - mcp >= 1.0.0
304
+ - huggingface_hub >= 0.25.0
305
+ - manim >= 0.18.1
306
+ - pydantic >= 2.0.0
307
+ - aiohttp >= 3.8.0
308
+ - FFmpeg (system dependency)
309
+
310
+ ## Contributing
311
+
312
+ Contributions are welcome! Areas for improvement:
313
+ - Additional AI model integrations
314
+ - More video processing tools
315
+ - Enhanced error recovery
316
+ - Performance optimizations
317
+
318
+ ## License
319
+
320
+ MIT License - see LICENSE file for details
321
+
322
+ ## Support
323
+
324
+ For issues, questions, or feature requests, please open an issue on the repository.
325
+
326
+ ## Credits
327
+
328
+ Built with:
329
+ - [Manim Community Edition](https://www.manim.community/) - Mathematical animation engine
330
+ - [Model Context Protocol](https://modelcontextprotocol.io/) - AI integration framework
331
+ - [HuggingFace](https://huggingface.co/) - AI model hosting and inference
332
+
333
+ ---
334
+
335
+ **Version**: 0.1.0
336
+ **Author**: NeuroAnim Team
337
+ **Status**: Beta - Ready for production use with active development
manim_mcp/__init__.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Manim MCP - Model Context Protocol Server for Manim Animations
3
+
4
+ A unified MCP server providing comprehensive tools for STEM animation creation:
5
+ - Planning and ideation
6
+ - AI-powered code generation
7
+ - Manim rendering
8
+ - Vision-based analysis
9
+ - Audio narration and TTS
10
+ - Video processing
11
+
12
+ This package can be used standalone as an MCP server or integrated into
13
+ larger animation pipelines.
14
+ """
15
+
16
+ from .server import main, server
17
+
18
+ __version__ = "0.1.0"
19
+ __all__ = ["server", "main"]
manim_mcp/server.py ADDED
@@ -0,0 +1,480 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Manim MCP Server
3
+
4
+ A unified MCP server providing tools for STEM animation creation with Manim.
5
+ Combines creative AI tools (planning, code generation, narration) with
6
+ rendering and video processing capabilities.
7
+
8
+ This server is designed to be used standalone or integrated into larger
9
+ animation generation pipelines.
10
+ """
11
+
12
+ import asyncio
13
+ import logging
14
+ import os
15
+ import sys
16
+ from pathlib import Path
17
+ from typing import Any, Dict, Optional
18
+
19
+ # Ensure project root is on sys.path
20
+ PROJECT_ROOT = Path(__file__).resolve().parent.parent
21
+ if str(PROJECT_ROOT) not in sys.path:
22
+ sys.path.insert(0, str(PROJECT_ROOT))
23
+
24
+ from mcp.server import NotificationOptions, Server
25
+ from mcp.server.models import InitializationOptions
26
+ from mcp.server.stdio import stdio_server
27
+ from mcp.types import CallToolResult, ListToolsResult, TextContent, Tool
28
+
29
+ from manim_mcp.tools import (
30
+ analyze_frame,
31
+ check_file_exists,
32
+ generate_manim_code,
33
+ generate_narration,
34
+ generate_quiz,
35
+ generate_speech,
36
+ merge_video_audio,
37
+ plan_concept,
38
+ process_video_with_ffmpeg,
39
+ refine_animation,
40
+ render_manim_animation,
41
+ write_manim_file,
42
+ )
43
+ from utils.hf_wrapper import HFInferenceWrapper, get_hf_wrapper
44
+
45
+ # Set up logging
46
+ logging.basicConfig(
47
+ level=logging.INFO,
48
+ format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
49
+ )
50
+ logger = logging.getLogger(__name__)
51
+
52
+ # Create MCP server
53
+ server = Server("manim-mcp")
54
+
55
+ # Global HF wrapper instance
56
+ hf_wrapper: Optional[HFInferenceWrapper] = None
57
+
58
+
59
+ def get_hf_wrapper_instance() -> HFInferenceWrapper:
60
+ """Get or create the HuggingFace wrapper instance."""
61
+ global hf_wrapper
62
+ if hf_wrapper is None:
63
+ api_key = os.getenv("HUGGINGFACE_API_KEY")
64
+ hf_wrapper = get_hf_wrapper(api_key=api_key)
65
+ logger.info("Initialized HuggingFace wrapper")
66
+ return hf_wrapper
67
+
68
+
69
+ @server.list_tools()
70
+ async def list_tools() -> ListToolsResult:
71
+ """List all available tools in the Manim MCP server."""
72
+ tools = [
73
+ # Planning Tools
74
+ Tool(
75
+ name="plan_concept",
76
+ description="Plan a STEM concept for animation. Creates a structured plan with learning objectives, visual metaphors, scene flow, and educational value assessment.",
77
+ inputSchema={
78
+ "type": "object",
79
+ "properties": {
80
+ "topic": {
81
+ "type": "string",
82
+ "description": "The STEM topic to create an animation for",
83
+ },
84
+ "target_audience": {
85
+ "type": "string",
86
+ "enum": [
87
+ "elementary",
88
+ "middle_school",
89
+ "high_school",
90
+ "college",
91
+ "general",
92
+ ],
93
+ "description": "Target audience level",
94
+ },
95
+ "animation_length_minutes": {
96
+ "type": "number",
97
+ "description": "Desired animation length in minutes (default: 2.0)",
98
+ },
99
+ "model": {
100
+ "type": "string",
101
+ "description": "Hugging Face model to use (optional)",
102
+ },
103
+ },
104
+ "required": ["topic", "target_audience"],
105
+ },
106
+ ),
107
+ # Code Generation Tools
108
+ Tool(
109
+ name="generate_manim_code",
110
+ description="Generate Manim Python code for an animation concept. Produces complete, runnable code with proper syntax and Manim best practices.",
111
+ inputSchema={
112
+ "type": "object",
113
+ "properties": {
114
+ "concept": {
115
+ "type": "string",
116
+ "description": "The animation concept",
117
+ },
118
+ "scene_description": {
119
+ "type": "string",
120
+ "description": "Detailed scene description",
121
+ },
122
+ "visual_elements": {
123
+ "type": "array",
124
+ "items": {"type": "string"},
125
+ "description": "List of visual elements to include",
126
+ },
127
+ "model": {
128
+ "type": "string",
129
+ "description": "Hugging Face code model to use (optional)",
130
+ },
131
+ "previous_code": {
132
+ "type": "string",
133
+ "description": "Previous code attempt (for retries)",
134
+ },
135
+ "error_message": {
136
+ "type": "string",
137
+ "description": "Error from previous attempt (for retries)",
138
+ },
139
+ },
140
+ "required": ["concept", "scene_description"],
141
+ },
142
+ ),
143
+ Tool(
144
+ name="refine_animation",
145
+ description="Refine and improve existing Manim code based on feedback or errors. Outputs complete corrected code.",
146
+ inputSchema={
147
+ "type": "object",
148
+ "properties": {
149
+ "original_code": {
150
+ "type": "string",
151
+ "description": "The original Manim code to refine",
152
+ },
153
+ "feedback": {
154
+ "type": "string",
155
+ "description": "Feedback or error message about the code",
156
+ },
157
+ "improvement_goals": {
158
+ "type": "array",
159
+ "items": {"type": "string"},
160
+ "description": "List of specific improvement goals",
161
+ },
162
+ "model": {
163
+ "type": "string",
164
+ "description": "Hugging Face code model to use (optional)",
165
+ },
166
+ },
167
+ "required": ["original_code", "feedback"],
168
+ },
169
+ ),
170
+ # Rendering Tools
171
+ Tool(
172
+ name="write_manim_file",
173
+ description="Write Manim Python code to a file on the filesystem.",
174
+ inputSchema={
175
+ "type": "object",
176
+ "properties": {
177
+ "filepath": {
178
+ "type": "string",
179
+ "description": "Path where to write the Manim file",
180
+ },
181
+ "code": {
182
+ "type": "string",
183
+ "description": "Manim Python code to write",
184
+ },
185
+ },
186
+ "required": ["filepath", "code"],
187
+ },
188
+ ),
189
+ Tool(
190
+ name="render_manim_animation",
191
+ description="Render a Manim animation from a Python file. Uses local Manim installation with quality and format options.",
192
+ inputSchema={
193
+ "type": "object",
194
+ "properties": {
195
+ "scene_name": {
196
+ "type": "string",
197
+ "description": "Name of the Manim scene class to render",
198
+ },
199
+ "file_path": {
200
+ "type": "string",
201
+ "description": "Path to the Manim Python file",
202
+ },
203
+ "output_dir": {
204
+ "type": "string",
205
+ "description": "Directory to save the output animation",
206
+ },
207
+ "quality": {
208
+ "type": "string",
209
+ "enum": ["low", "medium", "high", "production_quality"],
210
+ "description": "Rendering quality (default: medium)",
211
+ },
212
+ "format": {
213
+ "type": "string",
214
+ "enum": ["mp4", "gif", "png"],
215
+ "description": "Output format (default: mp4)",
216
+ },
217
+ "frame_rate": {
218
+ "type": "integer",
219
+ "description": "Frame rate (default: 30)",
220
+ },
221
+ },
222
+ "required": ["scene_name", "file_path", "output_dir"],
223
+ },
224
+ ),
225
+ # Vision Tools
226
+ Tool(
227
+ name="analyze_frame",
228
+ description="Analyze an animation frame using vision models. Provides feedback on visual quality, clarity, and educational effectiveness.",
229
+ inputSchema={
230
+ "type": "object",
231
+ "properties": {
232
+ "image_path": {
233
+ "type": "string",
234
+ "description": "Path to the image file to analyze",
235
+ },
236
+ "analysis_type": {
237
+ "type": "string",
238
+ "description": "Type of analysis (e.g., quality, educational_value, clarity)",
239
+ },
240
+ "context": {
241
+ "type": "string",
242
+ "description": "Additional context about the animation",
243
+ },
244
+ "model": {
245
+ "type": "string",
246
+ "description": "Hugging Face vision model to use (optional)",
247
+ },
248
+ },
249
+ "required": ["image_path", "analysis_type"],
250
+ },
251
+ ),
252
+ # Audio Tools
253
+ Tool(
254
+ name="generate_narration",
255
+ description="Generate an educational narration script for an animation. Creates age-appropriate, engaging content aligned with learning objectives.",
256
+ inputSchema={
257
+ "type": "object",
258
+ "properties": {
259
+ "concept": {
260
+ "type": "string",
261
+ "description": "The animation concept",
262
+ },
263
+ "scene_description": {
264
+ "type": "string",
265
+ "description": "Description of the scene/animation",
266
+ },
267
+ "target_audience": {
268
+ "type": "string",
269
+ "description": "Target audience level",
270
+ },
271
+ "duration_seconds": {
272
+ "type": "integer",
273
+ "description": "Duration in seconds (default: 30)",
274
+ },
275
+ "model": {
276
+ "type": "string",
277
+ "description": "Hugging Face model to use (optional)",
278
+ },
279
+ },
280
+ "required": ["concept", "scene_description", "target_audience"],
281
+ },
282
+ ),
283
+ Tool(
284
+ name="generate_speech",
285
+ description="Convert text to speech audio file using TTS models.",
286
+ inputSchema={
287
+ "type": "object",
288
+ "properties": {
289
+ "text": {
290
+ "type": "string",
291
+ "description": "Text to convert to speech",
292
+ },
293
+ "output_path": {
294
+ "type": "string",
295
+ "description": "Path where to save the audio file",
296
+ },
297
+ "voice": {
298
+ "type": "string",
299
+ "description": "Voice to use for TTS (optional)",
300
+ },
301
+ "model": {
302
+ "type": "string",
303
+ "description": "Hugging Face TTS model to use (optional)",
304
+ },
305
+ },
306
+ "required": ["text", "output_path"],
307
+ },
308
+ ),
309
+ # Video Processing Tools
310
+ Tool(
311
+ name="process_video_with_ffmpeg",
312
+ description="Process video files using FFmpeg with custom arguments for conversion, filtering, and combining.",
313
+ inputSchema={
314
+ "type": "object",
315
+ "properties": {
316
+ "input_files": {
317
+ "type": "array",
318
+ "items": {"type": "string"},
319
+ "description": "List of input video/audio file paths",
320
+ },
321
+ "output_file": {
322
+ "type": "string",
323
+ "description": "Output file path",
324
+ },
325
+ "ffmpeg_args": {
326
+ "type": "array",
327
+ "items": {"type": "string"},
328
+ "description": "Additional FFmpeg command-line arguments",
329
+ },
330
+ },
331
+ "required": ["input_files", "output_file"],
332
+ },
333
+ ),
334
+ Tool(
335
+ name="merge_video_audio",
336
+ description="Merge a video file and an audio file into a single output file using FFmpeg.",
337
+ inputSchema={
338
+ "type": "object",
339
+ "properties": {
340
+ "video_file": {
341
+ "type": "string",
342
+ "description": "Path to the input video file",
343
+ },
344
+ "audio_file": {
345
+ "type": "string",
346
+ "description": "Path to the input audio file",
347
+ },
348
+ "output_file": {
349
+ "type": "string",
350
+ "description": "Path to the output merged file",
351
+ },
352
+ },
353
+ "required": ["video_file", "audio_file", "output_file"],
354
+ },
355
+ ),
356
+ Tool(
357
+ name="check_file_exists",
358
+ description="Check if a file exists and return its metadata (size, timestamps, type).",
359
+ inputSchema={
360
+ "type": "object",
361
+ "properties": {
362
+ "filepath": {
363
+ "type": "string",
364
+ "description": "Path to the file to check",
365
+ }
366
+ },
367
+ "required": ["filepath"],
368
+ },
369
+ ),
370
+ # Quiz Tools
371
+ Tool(
372
+ name="generate_quiz",
373
+ description="Generate educational quiz questions based on a STEM concept. Creates questions with answers and explanations.",
374
+ inputSchema={
375
+ "type": "object",
376
+ "properties": {
377
+ "concept": {
378
+ "type": "string",
379
+ "description": "The STEM concept to create quiz questions for",
380
+ },
381
+ "difficulty": {
382
+ "type": "string",
383
+ "enum": ["easy", "medium", "hard"],
384
+ "description": "Difficulty level",
385
+ },
386
+ "num_questions": {
387
+ "type": "integer",
388
+ "description": "Number of questions to generate",
389
+ },
390
+ "question_types": {
391
+ "type": "array",
392
+ "items": {"type": "string"},
393
+ "description": "Types of questions (e.g., multiple_choice, true_false)",
394
+ },
395
+ "model": {
396
+ "type": "string",
397
+ "description": "Hugging Face model to use (optional)",
398
+ },
399
+ },
400
+ "required": ["concept", "difficulty", "num_questions"],
401
+ },
402
+ ),
403
+ ]
404
+
405
+ return ListToolsResult(tools=tools)
406
+
407
+
408
+ @server.call_tool()
409
+ async def call_tool(tool_name: str, arguments: Dict[str, Any]) -> CallToolResult:
410
+ """
411
+ Dispatch tool calls to the appropriate handler functions.
412
+
413
+ Routes requests to the correct tool implementation based on tool name.
414
+ Handles errors gracefully and returns appropriate error responses.
415
+ """
416
+ try:
417
+ # Get HF wrapper for AI-powered tools
418
+ wrapper = get_hf_wrapper_instance()
419
+
420
+ # Route to appropriate tool handler
421
+ if tool_name == "plan_concept":
422
+ return await plan_concept(wrapper, arguments)
423
+ elif tool_name == "generate_manim_code":
424
+ return await generate_manim_code(wrapper, arguments)
425
+ elif tool_name == "refine_animation":
426
+ return await refine_animation(wrapper, arguments)
427
+ elif tool_name == "write_manim_file":
428
+ return await write_manim_file(arguments)
429
+ elif tool_name == "render_manim_animation":
430
+ return await render_manim_animation(arguments)
431
+ elif tool_name == "analyze_frame":
432
+ return await analyze_frame(wrapper, arguments)
433
+ elif tool_name == "generate_narration":
434
+ return await generate_narration(wrapper, arguments)
435
+ elif tool_name == "generate_speech":
436
+ return await generate_speech(wrapper, arguments)
437
+ elif tool_name == "process_video_with_ffmpeg":
438
+ return await process_video_with_ffmpeg(arguments)
439
+ elif tool_name == "merge_video_audio":
440
+ return await merge_video_audio(arguments)
441
+ elif tool_name == "check_file_exists":
442
+ return await check_file_exists(arguments)
443
+ elif tool_name == "generate_quiz":
444
+ return await generate_quiz(wrapper, arguments)
445
+ else:
446
+ logger.error(f"Unknown tool requested: {tool_name}")
447
+ return CallToolResult(
448
+ content=[TextContent(type="text", text=f"Unknown tool: {tool_name}")],
449
+ isError=True,
450
+ )
451
+
452
+ except Exception as e:
453
+ logger.error(f"Error in tool {tool_name}: {e}", exc_info=True)
454
+ return CallToolResult(
455
+ content=[TextContent(type="text", text=f"Error: {str(e)}")],
456
+ isError=True,
457
+ )
458
+
459
+
460
+ async def main():
461
+ """Main entry point for the Manim MCP server."""
462
+ logger.info("Starting Manim MCP Server...")
463
+
464
+ async with stdio_server() as (read_stream, write_stream):
465
+ await server.run(
466
+ read_stream,
467
+ write_stream,
468
+ InitializationOptions(
469
+ server_name="manim-mcp",
470
+ server_version="0.1.0",
471
+ capabilities=server.get_capabilities(
472
+ notification_options=NotificationOptions(),
473
+ experimental_capabilities={},
474
+ ),
475
+ ),
476
+ )
477
+
478
+
479
+ if __name__ == "__main__":
480
+ asyncio.run(main())
manim_mcp/tools/__init__.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Manim MCP Tools
3
+
4
+ This package contains all the tools for the Manim MCP server.
5
+ Tools are organized into logical modules:
6
+ - planning: Concept planning and ideation
7
+ - code_generation: Manim code generation and refinement
8
+ - rendering: Manim animation rendering
9
+ - vision: Frame analysis and visual feedback
10
+ - audio: Text-to-speech and narration
11
+ - video: Video processing and merging
12
+ """
13
+
14
+ from .audio import generate_narration, generate_speech
15
+ from .code_generation import generate_manim_code, refine_animation
16
+ from .planning import plan_concept
17
+ from .quiz import generate_quiz
18
+ from .rendering import render_manim_animation, write_manim_file
19
+ from .video import check_file_exists, merge_video_audio, process_video_with_ffmpeg
20
+ from .vision import analyze_frame
21
+
22
+ __all__ = [
23
+ # Planning
24
+ "plan_concept",
25
+ # Code Generation
26
+ "generate_manim_code",
27
+ "refine_animation",
28
+ # Rendering
29
+ "write_manim_file",
30
+ "render_manim_animation",
31
+ # Vision
32
+ "analyze_frame",
33
+ # Audio
34
+ "generate_narration",
35
+ "generate_speech",
36
+ # Video
37
+ "process_video_with_ffmpeg",
38
+ "merge_video_audio",
39
+ "check_file_exists",
40
+ # Quiz
41
+ "generate_quiz",
42
+ ]
manim_mcp/tools/audio.py ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Audio Tools for Manim MCP Server
3
+
4
+ This module provides tools for generating narration scripts and speech audio.
5
+ """
6
+
7
+ import json
8
+ import logging
9
+ from typing import Any, Dict, Optional
10
+
11
+ from mcp.types import CallToolResult, TextContent
12
+
13
+ from utils.hf_wrapper import HFInferenceWrapper, ModelConfig
14
+
15
+ logger = logging.getLogger(__name__)
16
+
17
+
18
+ async def generate_narration(
19
+ hf_wrapper: HFInferenceWrapper, arguments: Dict[str, Any]
20
+ ) -> CallToolResult:
21
+ """
22
+ Generate a narration script for an educational animation.
23
+
24
+ Uses a text LLM to create an engaging, age-appropriate narration script
25
+ that aligns with the animation concept and scene description.
26
+
27
+ Args:
28
+ hf_wrapper: HuggingFace inference wrapper instance
29
+ arguments: Dictionary containing:
30
+ - concept (str): The animation concept
31
+ - scene_description (str): Description of the scene/animation
32
+ - target_audience (str): Target audience level
33
+ - duration_seconds (int, optional): Duration in seconds (default: 30)
34
+ - model (str, optional): Hugging Face model to use
35
+
36
+ Returns:
37
+ CallToolResult with the narration script
38
+ """
39
+ concept = arguments["concept"]
40
+ scene_description = arguments["scene_description"]
41
+ target_audience = arguments["target_audience"]
42
+ duration = arguments.get("duration_seconds", 30)
43
+ model = arguments.get("model")
44
+
45
+ try:
46
+ model_config = ModelConfig()
47
+ selected_model = model or model_config.text_models[0]
48
+
49
+ prompt = f"""
50
+ Generate a narration script for an educational animation:
51
+
52
+ Concept: {concept}
53
+ Scene: {scene_description}
54
+ Target Audience: {target_audience}
55
+ Duration: {duration} seconds
56
+
57
+ Requirements:
58
+ 1. Clear, engaging, and age-appropriate language
59
+ 2. Educational value aligned with learning objectives
60
+ 3. Natural speaking pace (approximately {duration / 150} words for {duration} seconds)
61
+ 4. Include pauses and emphasis markers where appropriate
62
+ 5. Make it interesting and memorable
63
+
64
+ Format as a clean script ready for text-to-speech.
65
+ """
66
+
67
+ response = await hf_wrapper.text_generation(
68
+ model=selected_model,
69
+ prompt=prompt,
70
+ max_new_tokens=512,
71
+ temperature=0.6,
72
+ )
73
+
74
+ logger.info(f"Successfully generated narration for concept: {concept}")
75
+
76
+ return CallToolResult(
77
+ content=[
78
+ TextContent(
79
+ type="text",
80
+ text=f"Narration Script:\n\n{response}",
81
+ )
82
+ ]
83
+ )
84
+
85
+ except Exception as e:
86
+ logger.error(f"Narration generation failed: {str(e)}")
87
+ return CallToolResult(
88
+ content=[
89
+ TextContent(
90
+ type="text",
91
+ text=f"Narration generation failed: {str(e)}",
92
+ )
93
+ ],
94
+ isError=True,
95
+ )
96
+
97
+
98
+ async def generate_speech(
99
+ hf_wrapper: HFInferenceWrapper, arguments: Dict[str, Any]
100
+ ) -> CallToolResult:
101
+ """
102
+ Convert text to speech audio file.
103
+
104
+ Uses a TTS model to generate speech audio from text and saves it to a file.
105
+
106
+ Args:
107
+ hf_wrapper: HuggingFace inference wrapper instance
108
+ arguments: Dictionary containing:
109
+ - text (str): Text to convert to speech
110
+ - output_path (str): Path where to save the audio file
111
+ - voice (str, optional): Voice to use for TTS
112
+ - model (str, optional): Hugging Face TTS model to use
113
+
114
+ Returns:
115
+ CallToolResult with audio generation info
116
+ """
117
+ text = arguments["text"]
118
+ output_path = arguments["output_path"]
119
+ voice = arguments.get("voice")
120
+ model = arguments.get("model")
121
+
122
+ try:
123
+ model_config = ModelConfig()
124
+ selected_model = model or model_config.tts_models[0]
125
+
126
+ # Generate audio
127
+ audio_bytes = await hf_wrapper.text_to_speech(
128
+ model=selected_model,
129
+ text=text,
130
+ voice=voice,
131
+ )
132
+
133
+ # Save to file
134
+ success = await hf_wrapper.save_audio_to_file(audio_bytes, output_path)
135
+
136
+ if not success:
137
+ raise Exception("Failed to save audio file")
138
+
139
+ # Return audio info
140
+ audio_info = {
141
+ "output_path": output_path,
142
+ "text_length": len(text),
143
+ "estimated_duration": len(text) / 150, # Rough estimate
144
+ "model_used": selected_model,
145
+ }
146
+
147
+ logger.info(f"Successfully generated speech audio at: {output_path}")
148
+
149
+ return CallToolResult(
150
+ content=[
151
+ TextContent(
152
+ type="text",
153
+ text=f"Speech generated successfully!\n\n{json.dumps(audio_info, indent=2)}",
154
+ )
155
+ ]
156
+ )
157
+
158
+ except Exception as e:
159
+ logger.error(f"Speech generation failed: {str(e)}")
160
+ return CallToolResult(
161
+ content=[
162
+ TextContent(type="text", text=f"Speech generation failed: {str(e)}")
163
+ ],
164
+ isError=True,
165
+ )
manim_mcp/tools/code_generation.py ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Code Generation Tools for Manim MCP Server
3
+
4
+ This module provides tools for generating and refining Manim animation code.
5
+ """
6
+
7
+ import logging
8
+ from typing import Any, Dict, Optional
9
+
10
+ from mcp.types import CallToolResult, TextContent
11
+
12
+ from utils.hf_wrapper import HFInferenceWrapper, ModelConfig
13
+
14
+ logger = logging.getLogger(__name__)
15
+
16
+
17
+ async def generate_manim_code(
18
+ hf_wrapper: HFInferenceWrapper, arguments: Dict[str, Any]
19
+ ) -> CallToolResult:
20
+ """
21
+ Generate Manim Python code for an animation concept.
22
+
23
+ Uses a code LLM to generate complete, runnable Manim code based on:
24
+ - A concept description
25
+ - Scene details
26
+ - Desired visual elements
27
+ - Optional error feedback for retries
28
+
29
+ Args:
30
+ hf_wrapper: HuggingFace inference wrapper instance
31
+ arguments: Dictionary containing:
32
+ - concept (str): The animation concept
33
+ - scene_description (str): Detailed scene description
34
+ - visual_elements (list, optional): List of visual elements to include
35
+ - model (str, optional): Hugging Face model to use
36
+ - previous_code (str, optional): Previous code attempt (for retries)
37
+ - error_message (str, optional): Error from previous attempt (for retries)
38
+
39
+ Returns:
40
+ CallToolResult with the generated Manim code
41
+ """
42
+ concept = arguments["concept"]
43
+ scene_description = arguments["scene_description"]
44
+ visual_elements = arguments.get("visual_elements", [])
45
+ model = arguments.get("model")
46
+ previous_code = arguments.get("previous_code")
47
+ error_message = arguments.get("error_message")
48
+
49
+ try:
50
+ model_config = ModelConfig()
51
+ selected_model = model or model_config.code_models[0]
52
+
53
+ # Build prompt based on whether this is a retry
54
+ if previous_code and error_message:
55
+ prompt = f"""
56
+ You are an expert animation engineer using Manim Community Edition (v0.18.0+).
57
+
58
+ The previous code attempt had an error. Your task is to FIX the code.
59
+
60
+ PREVIOUS CODE:
61
+ ```python
62
+ {previous_code}
63
+ ```
64
+
65
+ ERROR ENCOUNTERED:
66
+ {error_message}
67
+
68
+ TASK: Fix the error in the code above. Pay special attention to:
69
+ - Closing all parentheses, brackets, and braces
70
+ - Completing all function calls
71
+ - Proper indentation
72
+ - Valid Python syntax
73
+
74
+ Concept: {concept}
75
+ Scene Description: {scene_description}
76
+ Visual Elements: {", ".join(visual_elements)}
77
+
78
+ STRICT CODE REQUIREMENTS:
79
+ 1. Header: MUST start with `from manim import *`
80
+ 2. Class Structure: Define a class inheriting from `MovingCameraScene` (use this instead of `Scene` to enable camera zoom/pan with `self.camera.frame`)
81
+ 3. Method: All logic must be inside the `def construct(self):` method
82
+ 4. SYNTAX: Ensure ALL parentheses, brackets, and function calls are properly closed
83
+ 5. Colors: Use ONLY valid Manim colors (WHITE, BLACK, RED, GREEN, BLUE, YELLOW, ORANGE, PINK, PURPLE, TEAL, GOLD, etc.)
84
+ 6. Text: Use `Text()` objects for strings
85
+ 7. Positioning: Use `.next_to()`, `.move_to()`, or `.shift()`
86
+ 8. Animations: Use Write(), Create(), FadeIn(), FadeOut(), Transform(), Flash(), Indicate() - capitalize properly!
87
+ 9. Pacing: Include `self.wait(1)` between animations
88
+
89
+ OUTPUT FORMAT:
90
+ Provide ONLY the complete, corrected Python code. No markdown blocks. No explanations.
91
+ """
92
+ else:
93
+ prompt = f"""
94
+ You are an expert animation engineer using Manim Community Edition (v0.18.0+).
95
+ Generate a complete, runnable Python script for the following request.
96
+
97
+ Concept: {concept}
98
+ Scene Description: {scene_description}
99
+ Visual Elements: {", ".join(visual_elements)}
100
+
101
+ STRICT CODE REQUIREMENTS:
102
+ 1. Header: MUST start with `from manim import *`
103
+ 2. Class Structure: Define a class inheriting from `MovingCameraScene` (e.g., `class GenScene(MovingCameraScene):`) - this enables camera operations like zoom/pan via `self.camera.frame`
104
+ 3. Method: All logic must be inside the `def construct(self):` method
105
+ 4. SYNTAX: Ensure ALL parentheses, brackets, and function calls are properly closed
106
+ 5. Colors: Use ONLY these valid Manim color constants:
107
+ - Basic: WHITE, BLACK, GRAY, GREY, LIGHT_GRAY, DARK_GRAY
108
+ - Primary: RED, GREEN, BLUE, YELLOW, ORANGE, PINK, PURPLE, TEAL, GOLD, MAROON
109
+ - Variants: RED_A, RED_B, RED_C, RED_D, RED_E, GREEN_A, GREEN_B, GREEN_C, GREEN_D, GREEN_E,
110
+ BLUE_A, BLUE_B, BLUE_C, BLUE_D, BLUE_E, YELLOW_A, YELLOW_B, YELLOW_C, YELLOW_D, YELLOW_E
111
+ - NEVER use: DARK_GREEN, LIGHT_GREEN, DARK_BLUE, LIGHT_BLUE, DARK_RED, LIGHT_RED (these don't exist!)
112
+ 6. Text: Use `Text()` objects for strings. Avoid `Tex()` or `MathTex()` unless necessary
113
+ 7. Positioning: Use `.next_to()`, `.move_to()`, or `.shift()` to arrange elements
114
+ 8. Animations: Use ONLY these valid animations:
115
+ - Write(), Create(), FadeIn(), FadeOut(), GrowFromCenter(), ShrinkToCenter()
116
+ - Transform(), ReplacementTransform(), MoveToTarget(), ApplyMethod()
117
+ - Rotate(), Indicate(), Flash(), ShowCreation() - DO NOT use lowercase like 'flash'
118
+ - For custom effects use .animate.method() (e.g., obj.animate.scale(2), obj.animate.shift(UP))
119
+ 9. Pacing: Include `self.wait(1)` between major animation groups
120
+
121
+ OUTPUT FORMAT:
122
+ Provide ONLY the raw Python code. Do not wrap in markdown blocks (no ```python). Do not include conversational text.
123
+ """
124
+
125
+ response = await hf_wrapper.text_generation(
126
+ model=selected_model,
127
+ prompt=prompt,
128
+ max_new_tokens=2048,
129
+ temperature=0.3,
130
+ )
131
+
132
+ logger.info(f"Successfully generated Manim code for concept: {concept}")
133
+
134
+ return CallToolResult(
135
+ content=[
136
+ TextContent(
137
+ type="text",
138
+ text=f"Generated Manim Code:\n\n```python\n{response}\n```",
139
+ )
140
+ ]
141
+ )
142
+
143
+ except Exception as e:
144
+ logger.error(f"Code generation failed: {str(e)}")
145
+ return CallToolResult(
146
+ content=[
147
+ TextContent(type="text", text=f"Code generation failed: {str(e)}")
148
+ ],
149
+ isError=True,
150
+ )
151
+
152
+
153
+ async def refine_animation(
154
+ hf_wrapper: HFInferenceWrapper, arguments: Dict[str, Any]
155
+ ) -> CallToolResult:
156
+ """
157
+ Refine animation code based on feedback.
158
+
159
+ Uses a code LLM to improve existing Manim code based on:
160
+ - User feedback or error messages
161
+ - Specific improvement goals
162
+ - Visual or educational quality issues
163
+
164
+ Args:
165
+ hf_wrapper: HuggingFace inference wrapper instance
166
+ arguments: Dictionary containing:
167
+ - original_code (str): The original Manim code to refine
168
+ - feedback (str): Feedback or error message about the code
169
+ - improvement_goals (list, optional): List of specific improvement goals
170
+ - model (str, optional): Hugging Face model to use
171
+
172
+ Returns:
173
+ CallToolResult with the refined Manim code
174
+ """
175
+ original_code = arguments["original_code"]
176
+ feedback = arguments["feedback"]
177
+ improvement_goals = arguments.get("improvement_goals", [])
178
+ model = arguments.get("model")
179
+
180
+ try:
181
+ model_config = ModelConfig()
182
+ selected_model = model or model_config.code_models[0]
183
+
184
+ prompt = f"""
185
+ You are a Manim Code Repair Agent. Your task is to rewrite the FULL Python script to fix issues or apply improvements.
186
+
187
+ Previous Code:
188
+ {original_code}
189
+
190
+ User Feedback/Error:
191
+ {feedback}
192
+
193
+ Improvement Goals:
194
+ {", ".join(improvement_goals)}
195
+
196
+ INSTRUCTIONS:
197
+ 1. Output the COMPLETE corrected script, including `from manim import *`.
198
+ 2. Do not output diffs or partial snippets.
199
+ 3. Ensure the class inherits from `MovingCameraScene` and uses `def construct(self):`.
200
+ 4. Fix logic errors based on the feedback.
201
+ 5. Animations: Use ONLY valid animations like Write(), FadeIn(), FadeOut(), Create(), Flash(), Transform() - NEVER lowercase!
202
+ 6. Colors: Use ONLY these valid Manim color constants:
203
+ - Basic: WHITE, BLACK, GRAY, GREY, LIGHT_GRAY, DARK_GRAY
204
+ - Primary: RED, GREEN, BLUE, YELLOW, ORANGE, PINK, PURPLE, TEAL, GOLD, MAROON
205
+ - Variants: RED_A, RED_B, RED_C, RED_D, RED_E, GREEN_A, GREEN_B, GREEN_C, GREEN_D, GREEN_E,
206
+ BLUE_A, BLUE_B, BLUE_C, BLUE_D, BLUE_E, YELLOW_A, YELLOW_B, YELLOW_C, YELLOW_D, YELLOW_E
207
+ - NEVER use: DARK_GREEN, LIGHT_GREEN, DARK_BLUE, LIGHT_BLUE, DARK_RED, LIGHT_RED (these don't exist!)
208
+ - For darker/lighter variants, use the letter suffixes (e.g., GREEN_E for dark green, GREEN_A for light green).
209
+
210
+ OUTPUT:
211
+ Return ONLY the raw Python code. No markdown backticks. No explanation.
212
+ """
213
+
214
+ response = await hf_wrapper.text_generation(
215
+ model=selected_model,
216
+ prompt=prompt,
217
+ max_new_tokens=2048,
218
+ temperature=0.3,
219
+ )
220
+
221
+ logger.info("Successfully refined animation code")
222
+
223
+ return CallToolResult(
224
+ content=[
225
+ TextContent(
226
+ type="text",
227
+ text=f"Refined Manim Code:\n\n```python\n{response}\n```",
228
+ )
229
+ ]
230
+ )
231
+
232
+ except Exception as e:
233
+ logger.error(f"Code refinement failed: {str(e)}")
234
+ return CallToolResult(
235
+ content=[
236
+ TextContent(type="text", text=f"Code refinement failed: {str(e)}")
237
+ ],
238
+ isError=True,
239
+ )
manim_mcp/tools/planning.py ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Planning Tools for Manim MCP Server
3
+
4
+ This module provides tools for concept planning and ideation for STEM animations.
5
+ """
6
+
7
+ import json
8
+ import logging
9
+ from typing import Any, Dict, Optional
10
+
11
+ from mcp.types import CallToolResult, TextContent
12
+
13
+ from utils.hf_wrapper import HFInferenceWrapper, ModelConfig
14
+
15
+ logger = logging.getLogger(__name__)
16
+
17
+
18
+ async def plan_concept(
19
+ hf_wrapper: HFInferenceWrapper, arguments: Dict[str, Any]
20
+ ) -> CallToolResult:
21
+ """
22
+ Plan a STEM concept for animation.
23
+
24
+ Uses a text LLM to create a structured animation plan including:
25
+ - Learning objectives
26
+ - Visual metaphors
27
+ - Scene flow with timestamps
28
+ - Educational value assessment
29
+
30
+ Args:
31
+ hf_wrapper: HuggingFace inference wrapper instance
32
+ arguments: Dictionary containing:
33
+ - topic (str): The STEM topic to create an animation for
34
+ - target_audience (str): Target audience level (elementary, middle_school, high_school, college, general)
35
+ - animation_length_minutes (float, optional): Desired animation length in minutes
36
+ - model (str, optional): Hugging Face model to use
37
+
38
+ Returns:
39
+ CallToolResult with the structured animation plan
40
+ """
41
+ topic = arguments["topic"]
42
+ target_audience = arguments["target_audience"]
43
+ animation_length = arguments.get("animation_length_minutes", 2.0)
44
+ model = arguments.get("model")
45
+
46
+ try:
47
+ model_config = ModelConfig()
48
+ selected_model = model or model_config.text_models[0]
49
+
50
+ prompt = f"""
51
+ You are a STEM Curriculum Designer. Create a structured animation plan.
52
+
53
+ Topic: {topic}
54
+ Audience: {target_audience}
55
+ Length: {animation_length} min
56
+
57
+ Return a valid JSON object with exactly these keys:
58
+ {{
59
+ "learning_objectives": ["string", "string"],
60
+ "visual_metaphors": ["string", "string"],
61
+ "scene_flow": [
62
+ {{
63
+ "timestamp": "0:00-0:30",
64
+ "action": "description of visual action",
65
+ "voiceover": "key narration points"
66
+ }}
67
+ ],
68
+ "estimated_educational_value": "string"
69
+ }}
70
+
71
+ Do not include markdown formatting like ```json. Return raw JSON only.
72
+ """
73
+
74
+ response = await hf_wrapper.text_generation(
75
+ model=selected_model,
76
+ prompt=prompt,
77
+ max_new_tokens=1024,
78
+ temperature=0.7,
79
+ )
80
+
81
+ logger.info(f"Successfully planned concept for topic: {topic}")
82
+
83
+ return CallToolResult(
84
+ content=[
85
+ TextContent(
86
+ type="text",
87
+ text=f"Animation Concept Plan:\n\n{response}",
88
+ )
89
+ ]
90
+ )
91
+
92
+ except Exception as e:
93
+ logger.error(f"Concept planning failed: {str(e)}")
94
+ return CallToolResult(
95
+ content=[
96
+ TextContent(
97
+ type="text",
98
+ text=f"Concept planning failed: {str(e)}",
99
+ )
100
+ ],
101
+ isError=True,
102
+ )
manim_mcp/tools/quiz.py ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Quiz Tools for Manim MCP Server
3
+
4
+ This module provides tools for generating educational quiz questions based on STEM concepts.
5
+ """
6
+
7
+ import logging
8
+ from typing import Any, Dict, Optional
9
+
10
+ from mcp.types import CallToolResult, TextContent
11
+
12
+ from utils.hf_wrapper import HFInferenceWrapper, ModelConfig
13
+
14
+ logger = logging.getLogger(__name__)
15
+
16
+
17
+ async def generate_quiz(
18
+ hf_wrapper: HFInferenceWrapper, arguments: Dict[str, Any]
19
+ ) -> CallToolResult:
20
+ """
21
+ Generate quiz questions for a STEM concept.
22
+
23
+ Uses a text LLM to create educational quiz questions that assess
24
+ understanding of the animation concept. Questions can be multiple choice,
25
+ true/false, or short answer format.
26
+
27
+ Args:
28
+ hf_wrapper: HuggingFace inference wrapper instance
29
+ arguments: Dictionary containing:
30
+ - concept (str): The STEM concept to create quiz questions for
31
+ - difficulty (str): Difficulty level (easy, medium, hard)
32
+ - num_questions (int): Number of questions to generate
33
+ - question_types (list, optional): Types of questions (default: ["multiple_choice"])
34
+ - model (str, optional): Hugging Face model to use
35
+
36
+ Returns:
37
+ CallToolResult with the generated quiz questions in JSON format
38
+ """
39
+ concept = arguments["concept"]
40
+ difficulty = arguments["difficulty"]
41
+ num_questions = arguments["num_questions"]
42
+ question_types = arguments.get("question_types", ["multiple_choice"])
43
+ model = arguments.get("model")
44
+
45
+ try:
46
+ model_config = ModelConfig()
47
+ selected_model = model or model_config.text_models[0]
48
+
49
+ prompt = f"""
50
+ Generate {num_questions} quiz questions for the following STEM concept:
51
+
52
+ Concept: {concept}
53
+ Difficulty: {difficulty}
54
+ Question Types: {", ".join(question_types)}
55
+
56
+ For each question provide:
57
+ 1. The question
58
+ 2. Possible answers (for multiple choice)
59
+ 3. Correct answer
60
+ 4. Brief explanation
61
+
62
+ Format as JSON array of question objects with this structure:
63
+ [
64
+ {{
65
+ "question": "question text",
66
+ "options": ["A", "B", "C", "D"],
67
+ "correct_answer": "A",
68
+ "explanation": "why this is correct"
69
+ }}
70
+ ]
71
+
72
+ Return only valid JSON without markdown formatting.
73
+ """
74
+
75
+ response = await hf_wrapper.text_generation(
76
+ model=selected_model,
77
+ prompt=prompt,
78
+ max_new_tokens=1024,
79
+ temperature=0.5,
80
+ )
81
+
82
+ logger.info(
83
+ f"Successfully generated {num_questions} quiz questions for concept: {concept}"
84
+ )
85
+
86
+ return CallToolResult(
87
+ content=[
88
+ TextContent(
89
+ type="text",
90
+ text=f"Generated Quiz Questions:\n\n{response}",
91
+ )
92
+ ]
93
+ )
94
+
95
+ except Exception as e:
96
+ logger.error(f"Quiz generation failed: {str(e)}")
97
+ return CallToolResult(
98
+ content=[
99
+ TextContent(type="text", text=f"Quiz generation failed: {str(e)}")
100
+ ],
101
+ isError=True,
102
+ )
manim_mcp/tools/rendering.py ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Rendering Tools for Manim MCP Server
3
+
4
+ This module provides tools for writing and rendering Manim animations.
5
+ """
6
+
7
+ import asyncio
8
+ import glob
9
+ import json
10
+ import logging
11
+ import os
12
+ import shutil
13
+ from pathlib import Path
14
+ from typing import Any, Dict
15
+
16
+ from mcp.types import CallToolResult, TextContent
17
+
18
+ logger = logging.getLogger(__name__)
19
+
20
+
21
+ async def write_manim_file(arguments: Dict[str, Any]) -> CallToolResult:
22
+ """
23
+ Write a Manim Python file to the filesystem.
24
+
25
+ Takes Manim code and writes it to a specified file path, creating
26
+ directories as needed.
27
+
28
+ Args:
29
+ arguments: Dictionary containing:
30
+ - filepath (str): Path where to write the Manim file
31
+ - code (str): Manim Python code to write
32
+
33
+ Returns:
34
+ CallToolResult indicating success or failure
35
+ """
36
+ filepath = arguments["filepath"]
37
+ code = arguments["code"]
38
+
39
+ try:
40
+ # Ensure directory exists
41
+ Path(filepath).parent.mkdir(parents=True, exist_ok=True)
42
+
43
+ # Write the file
44
+ with open(filepath, "w") as f:
45
+ f.write(code)
46
+
47
+ logger.info(f"Successfully wrote Manim file to: {filepath}")
48
+
49
+ return CallToolResult(
50
+ content=[
51
+ TextContent(
52
+ type="text", text=f"Successfully wrote Manim file to {filepath}"
53
+ )
54
+ ]
55
+ )
56
+
57
+ except Exception as e:
58
+ logger.error(f"Failed to write file: {str(e)}")
59
+ return CallToolResult(
60
+ content=[TextContent(type="text", text=f"Failed to write file: {str(e)}")],
61
+ isError=True,
62
+ )
63
+
64
+
65
+ async def render_manim_animation(arguments: Dict[str, Any]) -> CallToolResult:
66
+ """
67
+ Render a Manim animation using local Manim installation.
68
+
69
+ Executes the Manim CLI to render an animation scene from a Python file.
70
+ Uses the project's .venv if available, otherwise falls back to system Manim.
71
+
72
+ Args:
73
+ arguments: Dictionary containing:
74
+ - scene_name (str): Name of the Manim scene class to render
75
+ - file_path (str): Path to the Manim Python file
76
+ - output_dir (str): Directory to save the output animation
77
+ - quality (str, optional): Rendering quality (low, medium, high, production_quality)
78
+ - format (str, optional): Output format (mp4, gif, png)
79
+ - frame_rate (int, optional): Frame rate (default: 30)
80
+
81
+ Returns:
82
+ CallToolResult with rendering status and output file location
83
+ """
84
+ scene_name = arguments["scene_name"]
85
+ file_path = arguments["file_path"]
86
+ output_dir = arguments["output_dir"]
87
+ quality = arguments.get("quality", "medium")
88
+ format_type = arguments.get("format", "mp4")
89
+ frame_rate = arguments.get("frame_rate", 30)
90
+
91
+ try:
92
+ # Ensure output directory exists
93
+ Path(output_dir).mkdir(parents=True, exist_ok=True)
94
+
95
+ # Map quality to manim flags
96
+ quality_flags = {
97
+ "low": "-ql",
98
+ "medium": "-qm",
99
+ "high": "-qh",
100
+ "production_quality": "-qp",
101
+ }
102
+ quality_flag = quality_flags.get(quality, "-qm")
103
+
104
+ # Find the project root and .venv
105
+ project_root = Path(__file__).resolve().parent.parent.parent
106
+ venv_python = project_root / ".venv" / "bin" / "python"
107
+ venv_manim = project_root / ".venv" / "bin" / "manim"
108
+
109
+ # Use venv manim if it exists, otherwise fall back to system manim
110
+ if venv_manim.exists():
111
+ manim_cmd = str(venv_manim)
112
+ logger.info(f"Using .venv manim at: {manim_cmd}")
113
+ else:
114
+ manim_cmd = "manim"
115
+ logger.warning(f".venv manim not found at {venv_manim}, using system manim")
116
+
117
+ # Build the manim command
118
+ cmd = [
119
+ manim_cmd,
120
+ quality_flag,
121
+ "--fps",
122
+ str(frame_rate),
123
+ "-o",
124
+ f"{scene_name}.{format_type}",
125
+ file_path,
126
+ scene_name,
127
+ ]
128
+
129
+ logger.info(f"Running Manim command: {' '.join(cmd)}")
130
+
131
+ # Execute the command with .venv in PATH
132
+ env = os.environ.copy()
133
+ if venv_manim.exists():
134
+ venv_bin = project_root / ".venv" / "bin"
135
+ env["PATH"] = f"{venv_bin}:{env.get('PATH', '')}"
136
+ env["VIRTUAL_ENV"] = str(project_root / ".venv")
137
+
138
+ # Execute the command
139
+ process = await asyncio.create_subprocess_exec(
140
+ *cmd,
141
+ stdout=asyncio.subprocess.PIPE,
142
+ stderr=asyncio.subprocess.PIPE,
143
+ cwd=output_dir,
144
+ env=env,
145
+ )
146
+
147
+ stdout, stderr = await process.communicate()
148
+
149
+ if process.returncode != 0:
150
+ error_msg = f"Manim rendering failed:\nSTDOUT: {stdout.decode()}\nSTDERR: {stderr.decode()}"
151
+ logger.error(error_msg)
152
+ return CallToolResult(
153
+ content=[TextContent(type="text", text=error_msg)], isError=True
154
+ )
155
+
156
+ # Log output for debugging
157
+ logger.info(f"Manim stdout: {stdout.decode()}")
158
+ if stderr:
159
+ logger.info(f"Manim stderr: {stderr.decode()}")
160
+
161
+ # Find the output file
162
+ # Manim outputs to paths like: media/videos/{filename}/{resolution}/SceneName.mp4
163
+ quality_to_resolution = {
164
+ "low": ["480p15", "854x480", "480p"],
165
+ "medium": ["720p30", "1280x720", "720p"],
166
+ "high": ["1080p60", "1920x1080", "1080p"],
167
+ "production_quality": ["2160p60", "3840x2160", "2160p"],
168
+ }
169
+
170
+ resolutions = quality_to_resolution.get(quality, ["720p30"])
171
+
172
+ # Build search patterns
173
+ output_patterns = []
174
+ for res in resolutions:
175
+ output_patterns.extend(
176
+ [
177
+ f"{output_dir}/media/videos/*/{res}/{scene_name}.{format_type}",
178
+ f"{output_dir}/media/videos/**/{res}/{scene_name}.{format_type}",
179
+ ]
180
+ )
181
+
182
+ # Fallback patterns
183
+ output_patterns.extend(
184
+ [
185
+ f"{output_dir}/media/videos/*/*/{scene_name}.{format_type}",
186
+ f"{output_dir}/media/videos/**/{scene_name}.{format_type}",
187
+ f"{output_dir}/**/{scene_name}.{format_type}",
188
+ f"{output_dir}/{scene_name}.{format_type}",
189
+ ]
190
+ )
191
+
192
+ # Search for output file
193
+ output_files = []
194
+ for pattern in output_patterns:
195
+ matches = glob.glob(pattern, recursive=True)
196
+ if matches:
197
+ logger.info(f"Found output files: {matches}")
198
+ output_files.extend(matches)
199
+ break
200
+
201
+ if not output_files:
202
+ error_msg = f"Could not find rendered output file.\nSearched in: {output_dir}\nStdout: {stdout.decode()}"
203
+ logger.error(error_msg)
204
+ return CallToolResult(
205
+ content=[TextContent(type="text", text=error_msg)], isError=True
206
+ )
207
+
208
+ # Move output to expected location
209
+ output_file = output_files[0]
210
+ final_output = Path(output_dir) / f"{scene_name}.{format_type}"
211
+
212
+ shutil.move(output_file, final_output)
213
+
214
+ # Build success message
215
+ file_size = final_output.stat().st_size if final_output.exists() else 0
216
+ result_msg = (
217
+ f"Successfully rendered animation!\n"
218
+ f"Scene: {scene_name}\n"
219
+ f"Output: {final_output}\n"
220
+ f"Quality: {quality}\n"
221
+ f"Format: {format_type}\n"
222
+ f"Size: {file_size} bytes"
223
+ )
224
+
225
+ logger.info(result_msg)
226
+
227
+ return CallToolResult(content=[TextContent(type="text", text=result_msg)])
228
+
229
+ except Exception as e:
230
+ import traceback
231
+
232
+ error_details = traceback.format_exc()
233
+ error_msg = f"Error during rendering: {str(e)}\nDetails: {error_details}"
234
+ logger.error(error_msg)
235
+ return CallToolResult(
236
+ content=[TextContent(type="text", text=error_msg)], isError=True
237
+ )
manim_mcp/tools/video.py ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Video Processing Tools for Manim MCP Server
3
+
4
+ This module provides tools for video processing, merging, and file management using FFmpeg.
5
+ """
6
+
7
+ import asyncio
8
+ import json
9
+ import logging
10
+ from pathlib import Path
11
+ from typing import Any, Dict
12
+
13
+ from mcp.types import CallToolResult, TextContent
14
+
15
+ logger = logging.getLogger(__name__)
16
+
17
+
18
+ async def process_video_with_ffmpeg(arguments: Dict[str, Any]) -> CallToolResult:
19
+ """
20
+ Process video files using FFmpeg.
21
+
22
+ Provides flexible video processing capabilities including conversion,
23
+ filtering, and combining multiple inputs.
24
+
25
+ Args:
26
+ arguments: Dictionary containing:
27
+ - input_files (list): List of input video/audio file paths
28
+ - output_file (str): Output file path
29
+ - ffmpeg_args (list, optional): Additional FFmpeg command-line arguments
30
+
31
+ Returns:
32
+ CallToolResult indicating success or failure
33
+ """
34
+ input_files = arguments["input_files"]
35
+ output_file = arguments["output_file"]
36
+ ffmpeg_args = arguments.get("ffmpeg_args", [])
37
+
38
+ try:
39
+ # Ensure output directory exists
40
+ Path(output_file).parent.mkdir(parents=True, exist_ok=True)
41
+
42
+ # Build FFmpeg command
43
+ cmd = ["ffmpeg"]
44
+
45
+ # Add input files
46
+ for input_file in input_files:
47
+ cmd.extend(["-i", input_file])
48
+
49
+ # Add additional arguments
50
+ cmd.extend(ffmpeg_args)
51
+
52
+ # Add output file
53
+ cmd.append(output_file)
54
+
55
+ logger.info(f"Running FFmpeg command: {' '.join(cmd)}")
56
+
57
+ # Execute FFmpeg
58
+ process = await asyncio.create_subprocess_exec(
59
+ *cmd,
60
+ stdout=asyncio.subprocess.PIPE,
61
+ stderr=asyncio.subprocess.PIPE,
62
+ )
63
+
64
+ stdout, stderr = await process.communicate()
65
+
66
+ if process.returncode != 0:
67
+ error_msg = f"FFmpeg processing failed:\n{stderr.decode()}"
68
+ logger.error(error_msg)
69
+ return CallToolResult(
70
+ content=[TextContent(type="text", text=error_msg)],
71
+ isError=True,
72
+ )
73
+
74
+ result_msg = f"Successfully processed video with FFmpeg: {output_file}"
75
+ logger.info(result_msg)
76
+
77
+ return CallToolResult(content=[TextContent(type="text", text=result_msg)])
78
+
79
+ except Exception as e:
80
+ error_msg = f"Error during FFmpeg processing: {str(e)}"
81
+ logger.error(error_msg)
82
+ return CallToolResult(
83
+ content=[TextContent(type="text", text=error_msg)],
84
+ isError=True,
85
+ )
86
+
87
+
88
+ async def merge_video_audio(arguments: Dict[str, Any]) -> CallToolResult:
89
+ """
90
+ Merge video and audio files into a single output file.
91
+
92
+ Combines a video file with an audio file using FFmpeg. The video stream
93
+ is copied without re-encoding, while the audio is encoded to AAC.
94
+ The output duration matches the shorter of the two inputs.
95
+
96
+ Args:
97
+ arguments: Dictionary containing:
98
+ - video_file (str): Path to the input video file
99
+ - audio_file (str): Path to the input audio file
100
+ - output_file (str): Path to the output merged file
101
+
102
+ Returns:
103
+ CallToolResult indicating success or failure
104
+ """
105
+ video_file = arguments["video_file"]
106
+ audio_file = arguments["audio_file"]
107
+ output_file = arguments["output_file"]
108
+
109
+ try:
110
+ # Ensure output directory exists
111
+ Path(output_file).parent.mkdir(parents=True, exist_ok=True)
112
+
113
+ # Build FFmpeg merge command
114
+ cmd = [
115
+ "ffmpeg",
116
+ "-i",
117
+ video_file,
118
+ "-i",
119
+ audio_file,
120
+ "-c:v",
121
+ "copy", # Copy video stream without re-encoding
122
+ "-c:a",
123
+ "aac", # Encode audio to AAC
124
+ "-shortest", # Match duration of shortest input
125
+ "-y", # Overwrite output file if it exists
126
+ output_file,
127
+ ]
128
+
129
+ logger.info(f"Merging video and audio: {' '.join(cmd)}")
130
+
131
+ # Execute FFmpeg
132
+ process = await asyncio.create_subprocess_exec(
133
+ *cmd,
134
+ stdout=asyncio.subprocess.PIPE,
135
+ stderr=asyncio.subprocess.PIPE,
136
+ )
137
+
138
+ stdout, stderr = await process.communicate()
139
+
140
+ if process.returncode != 0:
141
+ error_msg = f"Video/audio merge failed:\n{stderr.decode()}"
142
+ logger.error(error_msg)
143
+ return CallToolResult(
144
+ content=[TextContent(type="text", text=error_msg)],
145
+ isError=True,
146
+ )
147
+
148
+ result_msg = f"Successfully merged video and audio: {output_file}"
149
+ logger.info(result_msg)
150
+
151
+ return CallToolResult(content=[TextContent(type="text", text=result_msg)])
152
+
153
+ except Exception as e:
154
+ error_msg = f"Error during video/audio merge: {str(e)}"
155
+ logger.error(error_msg)
156
+ return CallToolResult(
157
+ content=[TextContent(type="text", text=error_msg)],
158
+ isError=True,
159
+ )
160
+
161
+
162
+ async def check_file_exists(arguments: Dict[str, Any]) -> CallToolResult:
163
+ """
164
+ Check if a file exists and return its metadata.
165
+
166
+ Provides information about file existence, type, size, and timestamps.
167
+ Useful for verifying outputs before processing or debugging file issues.
168
+
169
+ Args:
170
+ arguments: Dictionary containing:
171
+ - filepath (str): Path to the file to check
172
+
173
+ Returns:
174
+ CallToolResult with file metadata or error if file doesn't exist
175
+ """
176
+ filepath = arguments["filepath"]
177
+
178
+ try:
179
+ path = Path(filepath)
180
+
181
+ if not path.exists():
182
+ return CallToolResult(
183
+ content=[
184
+ TextContent(
185
+ type="text",
186
+ text=f"File does not exist: {filepath}",
187
+ )
188
+ ],
189
+ isError=True,
190
+ )
191
+
192
+ stat = path.stat()
193
+
194
+ metadata = {
195
+ "filepath": str(path.absolute()),
196
+ "exists": True,
197
+ "is_file": path.is_file(),
198
+ "is_directory": path.is_dir(),
199
+ "size_bytes": stat.st_size,
200
+ "created": stat.st_ctime,
201
+ "modified": stat.st_mtime,
202
+ }
203
+
204
+ logger.info(f"File exists: {filepath} ({stat.st_size} bytes)")
205
+
206
+ return CallToolResult(
207
+ content=[
208
+ TextContent(
209
+ type="text",
210
+ text=f"File metadata:\n{json.dumps(metadata, indent=2)}",
211
+ )
212
+ ]
213
+ )
214
+
215
+ except Exception as e:
216
+ error_msg = f"Error checking file: {str(e)}"
217
+ logger.error(error_msg)
218
+ return CallToolResult(
219
+ content=[TextContent(type="text", text=error_msg)],
220
+ isError=True,
221
+ )
manim_mcp/tools/vision.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Vision Tools for Manim MCP Server
3
+
4
+ This module provides tools for analyzing animation frames using vision models.
5
+ """
6
+
7
+ import logging
8
+ from typing import Any, Dict, Optional
9
+
10
+ from mcp.types import CallToolResult, TextContent
11
+
12
+ from utils.hf_wrapper import HFInferenceWrapper, ModelConfig
13
+
14
+ logger = logging.getLogger(__name__)
15
+
16
+
17
+ async def analyze_frame(
18
+ hf_wrapper: HFInferenceWrapper, arguments: Dict[str, Any]
19
+ ) -> CallToolResult:
20
+ """
21
+ Analyze an animation frame using vision-language models.
22
+
23
+ Uses a vision model to provide feedback on:
24
+ - Visual clarity and composition
25
+ - Educational effectiveness
26
+ - Technical quality
27
+ - Suggestions for improvement
28
+
29
+ Args:
30
+ hf_wrapper: HuggingFace inference wrapper instance
31
+ arguments: Dictionary containing:
32
+ - image_path (str): Path to the image file to analyze
33
+ - analysis_type (str): Type of analysis (e.g., "quality", "educational_value", "clarity")
34
+ - context (str, optional): Additional context about the animation
35
+ - model (str, optional): Hugging Face vision model to use
36
+
37
+ Returns:
38
+ CallToolResult with the frame analysis feedback
39
+ """
40
+ image_path = arguments["image_path"]
41
+ analysis_type = arguments["analysis_type"]
42
+ context = arguments.get("context", "")
43
+ model = arguments.get("model")
44
+
45
+ try:
46
+ model_config = ModelConfig()
47
+ selected_model = model or model_config.vision_models[0]
48
+
49
+ # Read the image file
50
+ with open(image_path, "rb") as f:
51
+ image_bytes = f.read()
52
+
53
+ # Build analysis prompt
54
+ prompt = f"""
55
+ Analyze this {analysis_type} for an educational animation frame.
56
+ Context: {context}
57
+
58
+ Provide specific feedback on:
59
+ - {analysis_type.replace("_", " ").title()} assessment
60
+ - Educational effectiveness
61
+ - Visual clarity
62
+ - Suggestions for improvement
63
+ """
64
+
65
+ # Call vision model
66
+ response = await hf_wrapper.vision_analysis(
67
+ model=selected_model,
68
+ image=image_bytes,
69
+ text=prompt,
70
+ )
71
+
72
+ logger.info(f"Successfully analyzed frame: {image_path} ({analysis_type})")
73
+
74
+ return CallToolResult(
75
+ content=[
76
+ TextContent(
77
+ type="text",
78
+ text=f"Frame Analysis ({analysis_type}):\n\n{response}",
79
+ )
80
+ ]
81
+ )
82
+
83
+ except Exception as e:
84
+ logger.error(f"Frame analysis failed: {str(e)}")
85
+ return CallToolResult(
86
+ content=[TextContent(type="text", text=f"Frame analysis failed: {str(e)}")],
87
+ isError=True,
88
+ )
mcp_servers/__init__.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ MCP Servers for NeuroAnim.
3
+
4
+ This package contains the MCP servers that provide different capabilities:
5
+ - renderer.py: Animation rendering using Manim and FFmpeg
6
+ - creative.py: Creative tasks using Hugging Face models
7
+ """
8
+
9
+ from . import renderer
10
+ from . import creative
11
+
12
+ __all__ = ["renderer", "creative"]
mcp_servers/creative.py ADDED
@@ -0,0 +1,803 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Creative MCP Server
3
+
4
+ This MCP server provides tools for creative tasks using Hugging Face models:
5
+ - Concept Planning (Text LLM)
6
+ - Code Generation (Coder LLM)
7
+ - Vision Analysis (Vision-Language LLM)
8
+ - Text-to-Speech (Audio model)
9
+ """
10
+
11
+ import asyncio
12
+ import base64
13
+ import json
14
+ import logging
15
+ import os
16
+ import sys
17
+ import tempfile
18
+ from pathlib import Path
19
+ from typing import Any, Dict, List, Optional
20
+
21
+ # Ensure project root (which contains the `utils` package) is on sys.path
22
+ PROJECT_ROOT = Path(__file__).resolve().parent.parent
23
+ if str(PROJECT_ROOT) not in sys.path:
24
+ sys.path.insert(0, str(PROJECT_ROOT))
25
+
26
+ from mcp.server import NotificationOptions, Server
27
+ from mcp.server.models import InitializationOptions
28
+ from mcp.server.stdio import stdio_server
29
+ from mcp.types import (
30
+ CallToolResult,
31
+ ListToolsResult,
32
+ TextContent,
33
+ Tool,
34
+ )
35
+
36
+ from utils.hf_wrapper import HFInferenceWrapper, ModelConfig, get_hf_wrapper
37
+
38
+ logger = logging.getLogger(__name__)
39
+
40
+ # Create MCP server
41
+ server = Server("neuroanim-creative")
42
+
43
+ # Global HF wrapper instance
44
+ hf_wrapper: Optional[HFInferenceWrapper] = None
45
+
46
+
47
+ class CreativeTool:
48
+ """Base class for creative tools."""
49
+
50
+ @staticmethod
51
+ def get_hf_wrapper() -> HFInferenceWrapper:
52
+ """Get or create the HF wrapper instance."""
53
+ global hf_wrapper
54
+ if hf_wrapper is None:
55
+ api_key = os.getenv("HUGGINGFACE_API_KEY")
56
+ hf_wrapper = get_hf_wrapper(api_key=api_key)
57
+ return hf_wrapper
58
+
59
+
60
+ @server.list_tools()
61
+ async def list_tools() -> ListToolsResult:
62
+ """List available creative tools."""
63
+ tools = [
64
+ Tool(
65
+ name="plan_concept",
66
+ description="Plan a STEM concept for animation using text LLM",
67
+ inputSchema={
68
+ "type": "object",
69
+ "properties": {
70
+ "topic": {
71
+ "type": "string",
72
+ "description": "The STEM topic to create an animation for",
73
+ },
74
+ "target_audience": {
75
+ "type": "string",
76
+ "enum": [
77
+ "elementary",
78
+ "middle_school",
79
+ "high_school",
80
+ "college",
81
+ "general",
82
+ ],
83
+ "description": "Target audience level",
84
+ },
85
+ "animation_length_minutes": {
86
+ "type": "number",
87
+ "description": "Desired animation length in minutes",
88
+ },
89
+ "model": {
90
+ "type": "string",
91
+ "description": "Hugging Face model to use (optional, will use default if not provided)",
92
+ },
93
+ },
94
+ "required": ["topic", "target_audience"],
95
+ },
96
+ ),
97
+ Tool(
98
+ name="generate_manim_code",
99
+ description="Generate Manim Python code for an animation concept",
100
+ inputSchema={
101
+ "type": "object",
102
+ "properties": {
103
+ "concept": {
104
+ "type": "string",
105
+ "description": "The animation concept description",
106
+ },
107
+ "scene_description": {
108
+ "type": "string",
109
+ "description": "Detailed description of what should happen in the scene",
110
+ },
111
+ "visual_elements": {
112
+ "type": "array",
113
+ "items": {"type": "string"},
114
+ "description": "List of visual elements to include",
115
+ },
116
+ "model": {
117
+ "type": "string",
118
+ "description": "Code model to use (optional, will use default if not provided)",
119
+ },
120
+ },
121
+ "required": ["concept", "scene_description"],
122
+ },
123
+ ),
124
+ Tool(
125
+ name="analyze_frame",
126
+ description="Analyze an animation frame using vision model for quality assessment",
127
+ inputSchema={
128
+ "type": "object",
129
+ "properties": {
130
+ "image_path": {
131
+ "type": "string",
132
+ "description": "Path to the image file to analyze",
133
+ },
134
+ "analysis_type": {
135
+ "type": "string",
136
+ "enum": [
137
+ "quality",
138
+ "content",
139
+ "educational_value",
140
+ "clarity",
141
+ ],
142
+ "description": "Type of analysis to perform",
143
+ },
144
+ "context": {
145
+ "type": "string",
146
+ "description": "Context about what should be in the image",
147
+ },
148
+ "model": {
149
+ "type": "string",
150
+ "description": "Vision model to use (optional, will use default if not provided)",
151
+ },
152
+ },
153
+ "required": ["image_path", "analysis_type"],
154
+ },
155
+ ),
156
+ Tool(
157
+ name="generate_narration",
158
+ description="Generate narration script for an animation",
159
+ inputSchema={
160
+ "type": "object",
161
+ "properties": {
162
+ "concept": {
163
+ "type": "string",
164
+ "description": "The animation concept",
165
+ },
166
+ "scene_description": {
167
+ "type": "string",
168
+ "description": "Description of the scene to narrate",
169
+ },
170
+ "target_audience": {
171
+ "type": "string",
172
+ "enum": [
173
+ "elementary",
174
+ "middle_school",
175
+ "high_school",
176
+ "college",
177
+ "general",
178
+ ],
179
+ "description": "Target audience",
180
+ },
181
+ "duration_seconds": {
182
+ "type": "number",
183
+ "description": "Desired narration duration in seconds",
184
+ },
185
+ "model": {
186
+ "type": "string",
187
+ "description": "Text model to use (optional, will use default if not provided)",
188
+ },
189
+ },
190
+ "required": ["concept", "scene_description", "target_audience"],
191
+ },
192
+ ),
193
+ Tool(
194
+ name="generate_speech",
195
+ description="Convert text narration to speech audio",
196
+ inputSchema={
197
+ "type": "object",
198
+ "properties": {
199
+ "text": {
200
+ "type": "string",
201
+ "description": "Text to convert to speech",
202
+ },
203
+ "voice": {
204
+ "type": "string",
205
+ "description": "Voice preference (optional)",
206
+ },
207
+ "output_path": {
208
+ "type": "string",
209
+ "description": "Path to save the audio file",
210
+ },
211
+ "model": {
212
+ "type": "string",
213
+ "description": "TTS model to use (optional, will use default if not provided)",
214
+ },
215
+ },
216
+ "required": ["text", "output_path"],
217
+ },
218
+ ),
219
+ Tool(
220
+ name="refine_animation",
221
+ description="Refine and improve animation based on feedback",
222
+ inputSchema={
223
+ "type": "object",
224
+ "properties": {
225
+ "original_code": {
226
+ "type": "string",
227
+ "description": "Original Manim code",
228
+ },
229
+ "feedback": {
230
+ "type": "string",
231
+ "description": "Feedback or issues to address",
232
+ },
233
+ "improvement_goals": {
234
+ "type": "array",
235
+ "items": {"type": "string"},
236
+ "description": "List of improvement goals",
237
+ },
238
+ "model": {
239
+ "type": "string",
240
+ "description": "Code model to use (optional, will use default if not provided)",
241
+ },
242
+ },
243
+ "required": ["original_code", "feedback"],
244
+ },
245
+ ),
246
+ Tool(
247
+ name="generate_quiz",
248
+ description="Generate quiz questions based on animation content",
249
+ inputSchema={
250
+ "type": "object",
251
+ "properties": {
252
+ "concept": {
253
+ "type": "string",
254
+ "description": "The STEM concept covered in the animation",
255
+ },
256
+ "difficulty": {
257
+ "type": "string",
258
+ "enum": ["easy", "medium", "hard"],
259
+ "description": "Quiz difficulty level",
260
+ },
261
+ "num_questions": {
262
+ "type": "number",
263
+ "description": "Number of questions to generate",
264
+ },
265
+ "question_types": {
266
+ "type": "array",
267
+ "items": {
268
+ "type": "string",
269
+ "enum": ["multiple_choice", "true_false", "short_answer"],
270
+ },
271
+ "description": "Types of questions to include",
272
+ },
273
+ "model": {
274
+ "type": "string",
275
+ "description": "Text model to use (optional, will use default if not provided)",
276
+ },
277
+ },
278
+ "required": ["concept", "difficulty", "num_questions"],
279
+ },
280
+ ),
281
+ ]
282
+
283
+ return ListToolsResult(tools=tools)
284
+
285
+
286
+ @server.call_tool()
287
+ async def call_tool(tool_name: str, arguments: Dict[str, Any]) -> CallToolResult:
288
+ """Dispatch creative tool calls.
289
+
290
+ The low-level MCP server passes `(tool_name, arguments)` into this
291
+ handler, so we accept two positional arguments rather than a
292
+ `CallToolRequest` instance.
293
+ """
294
+
295
+ try:
296
+ if tool_name == "plan_concept":
297
+ return await plan_concept(arguments)
298
+ elif tool_name == "generate_manim_code":
299
+ return await generate_manim_code(arguments)
300
+ elif tool_name == "analyze_frame":
301
+ return await analyze_frame(arguments)
302
+ elif tool_name == "generate_narration":
303
+ return await generate_narration(arguments)
304
+ elif tool_name == "generate_speech":
305
+ return await generate_speech(arguments)
306
+ elif tool_name == "refine_animation":
307
+ return await refine_animation(arguments)
308
+ elif tool_name == "generate_quiz":
309
+ return await generate_quiz(arguments)
310
+ else:
311
+ return CallToolResult(
312
+ content=[TextContent(type="text", text=f"Unknown tool: {tool_name}")],
313
+ isError=True,
314
+ )
315
+ except Exception as e:
316
+ logger.error(f"Error in tool {tool_name}: {e}")
317
+ return CallToolResult(
318
+ content=[TextContent(type="text", text=f"Error: {str(e)}")],
319
+ isError=True,
320
+ )
321
+
322
+
323
+ async def plan_concept(arguments: Dict[str, Any]) -> CallToolResult:
324
+ """Plan a STEM concept for animation."""
325
+ topic = arguments["topic"]
326
+ target_audience = arguments["target_audience"]
327
+ animation_length = arguments.get("animation_length_minutes", 2.0)
328
+ model = arguments.get("model")
329
+
330
+ try:
331
+ wrapper = CreativeTool.get_hf_wrapper()
332
+ model_config = ModelConfig()
333
+ selected_model = model or model_config.text_models[0]
334
+
335
+ prompt = f"""
336
+ You are a STEM Curriculum Designer. Create a structured animation plan.
337
+
338
+ Topic: {topic}
339
+ Audience: {target_audience}
340
+ Length: {animation_length} min
341
+
342
+ Return a valid JSON object with exactly these keys:
343
+ {{
344
+ "learning_objectives": ["string", "string"],
345
+ "visual_metaphors": ["string", "string"],
346
+ "scene_flow": [
347
+ {{
348
+ "timestamp": "0:00-0:30",
349
+ "action": "description of visual action",
350
+ "voiceover": "key narration points"
351
+ }}
352
+ ],
353
+ "estimated_educational_value": "string"
354
+ }}
355
+
356
+ Do not include markdown formatting like ```json. Return raw JSON only.
357
+ """
358
+
359
+ response = await wrapper.text_generation(
360
+ model=selected_model,
361
+ prompt=prompt,
362
+ max_new_tokens=1024,
363
+ temperature=0.7,
364
+ )
365
+
366
+ return CallToolResult(
367
+ content=[
368
+ TextContent(
369
+ type="text",
370
+ text=f"Animation Concept Plan:\n\n{response}",
371
+ )
372
+ ]
373
+ )
374
+
375
+ except Exception as e:
376
+ return CallToolResult(
377
+ content=[
378
+ TextContent(
379
+ type="text",
380
+ text=f"Concept planning failed: {str(e)}",
381
+ )
382
+ ],
383
+ isError=True,
384
+ )
385
+
386
+
387
+ async def generate_manim_code(arguments: Dict[str, Any]) -> CallToolResult:
388
+ """Generate Manim Python code."""
389
+ concept = arguments["concept"]
390
+ scene_description = arguments["scene_description"]
391
+ visual_elements = arguments.get("visual_elements", [])
392
+ model = arguments.get("model")
393
+ previous_code = arguments.get("previous_code")
394
+ error_message = arguments.get("error_message")
395
+
396
+ try:
397
+ wrapper = CreativeTool.get_hf_wrapper()
398
+ model_config = ModelConfig()
399
+ selected_model = model or model_config.code_models[0]
400
+
401
+ # Build base prompt
402
+ if previous_code and error_message:
403
+ # This is a retry - include error feedback
404
+ prompt = f"""
405
+ You are an expert animation engineer using Manim Community Edition (v0.18.0+).
406
+
407
+ The previous code attempt had an error. Your task is to FIX the code.
408
+
409
+ PREVIOUS CODE:
410
+ ```python
411
+ {previous_code}
412
+ ```
413
+
414
+ ERROR ENCOUNTERED:
415
+ {error_message}
416
+
417
+ TASK: Fix the error in the code above. Pay special attention to:
418
+ - Closing all parentheses, brackets, and braces
419
+ - Completing all function calls
420
+ - Proper indentation
421
+ - Valid Python syntax
422
+
423
+ Concept: {concept}
424
+ Scene Description: {scene_description}
425
+ Visual Elements: {", ".join(visual_elements)}
426
+
427
+ STRICT CODE REQUIREMENTS:
428
+ 1. Header: MUST start with `from manim import *`
429
+ 2. Class Structure: Define a class inheriting from `MovingCameraScene` (use this instead of `Scene` to enable camera zoom/pan with `self.camera.frame`)
430
+ 3. Method: All logic must be inside the `def construct(self):` method
431
+ 4. SYNTAX: Ensure ALL parentheses, brackets, and function calls are properly closed
432
+ 5. Colors: Use ONLY valid Manim colors (WHITE, BLACK, RED, GREEN, BLUE, YELLOW, ORANGE, PINK, PURPLE, TEAL, GOLD, etc.)
433
+ 6. Text: Use `Text()` objects for strings
434
+ 7. Positioning: Use `.next_to()`, `.move_to()`, or `.shift()`
435
+ 8. Animations: Use Write(), Create(), FadeIn(), FadeOut(), Transform(), Flash(), Indicate() - capitalize properly!
436
+ 9. Pacing: Include `self.wait(1)` between animations
437
+
438
+ OUTPUT FORMAT:
439
+ Provide ONLY the complete, corrected Python code. No markdown blocks. No explanations.
440
+ """
441
+ else:
442
+ # First attempt - generate fresh code
443
+ prompt = f"""
444
+ You are an expert animation engineer using Manim Community Edition (v0.18.0+).
445
+ Generate a complete, runnable Python script for the following request.
446
+
447
+ Concept: {concept}
448
+ Scene Description: {scene_description}
449
+ Visual Elements: {", ".join(visual_elements)}
450
+
451
+ STRICT CODE REQUIREMENTS:
452
+ 1. Header: MUST start with `from manim import *`
453
+ 2. Class Structure: Define a class inheriting from `MovingCameraScene` (e.g., `class GenScene(MovingCameraScene):`) - this enables camera operations like zoom/pan via `self.camera.frame`
454
+ 3. Method: All logic must be inside the `def construct(self):` method
455
+ 4. SYNTAX: Ensure ALL parentheses, brackets, and function calls are properly closed
456
+ 5. Colors: Use ONLY these valid Manim color constants:
457
+ - Basic: WHITE, BLACK, GRAY, GREY, LIGHT_GRAY, DARK_GRAY
458
+ - Primary: RED, GREEN, BLUE, YELLOW, ORANGE, PINK, PURPLE, TEAL, GOLD, MAROON
459
+ - Variants: RED_A, RED_B, RED_C, RED_D, RED_E, GREEN_A, GREEN_B, GREEN_C, GREEN_D, GREEN_E,
460
+ BLUE_A, BLUE_B, BLUE_C, BLUE_D, BLUE_E, YELLOW_A, YELLOW_B, YELLOW_C, YELLOW_D, YELLOW_E
461
+ - NEVER use: DARK_GREEN, LIGHT_GREEN, DARK_BLUE, LIGHT_BLUE, DARK_RED, LIGHT_RED (these don't exist!)
462
+ 6. Text: Use `Text()` objects for strings. Avoid `Tex()` or `MathTex()` unless necessary
463
+ 7. Positioning: Use `.next_to()`, `.move_to()`, or `.shift()` to arrange elements
464
+ 8. Animations: Use ONLY these valid animations:
465
+ - Write(), Create(), FadeIn(), FadeOut(), GrowFromCenter(), ShrinkToCenter()
466
+ - Transform(), ReplacementTransform(), MoveToTarget(), ApplyMethod()
467
+ - Rotate(), Indicate(), Flash(), ShowCreation() - DO NOT use lowercase like 'flash'
468
+ - For custom effects use .animate.method() (e.g., obj.animate.scale(2), obj.animate.shift(UP))
469
+ 9. Pacing: Include `self.wait(1)` between major animation groups
470
+
471
+ OUTPUT FORMAT:
472
+ Provide ONLY the raw Python code. Do not wrap in markdown blocks (no ```python). Do not include conversational text.
473
+ """
474
+
475
+ response = await wrapper.text_generation(
476
+ model=selected_model,
477
+ prompt=prompt,
478
+ max_new_tokens=2048,
479
+ temperature=0.3,
480
+ )
481
+
482
+ return CallToolResult(
483
+ content=[
484
+ TextContent(
485
+ type="text",
486
+ text=f"Generated Manim Code:\n\n```python\n{response}\n```",
487
+ )
488
+ ]
489
+ )
490
+
491
+ except Exception as e:
492
+ return CallToolResult(
493
+ content=[
494
+ TextContent(type="text", text=f"Code generation failed: {str(e)}")
495
+ ],
496
+ isError=True,
497
+ )
498
+
499
+
500
+ async def analyze_frame(arguments: Dict[str, Any]) -> CallToolResult:
501
+ """Analyze an animation frame."""
502
+ image_path = arguments["image_path"]
503
+ analysis_type = arguments["analysis_type"]
504
+ context = arguments.get("context", "")
505
+ model = arguments.get("model")
506
+
507
+ try:
508
+ wrapper = CreativeTool.get_hf_wrapper()
509
+ model_config = ModelConfig()
510
+ selected_model = model or model_config.vision_models[0]
511
+
512
+ with open(image_path, "rb") as f:
513
+ image_bytes = f.read()
514
+
515
+ prompt = f"""
516
+ Analyze this {analysis_type} for an educational animation frame.
517
+ Context: {context}
518
+
519
+ Provide specific feedback on:
520
+ {analysis_type.replace("_", " ").title()} assessment
521
+ Educational effectiveness
522
+ Visual clarity
523
+ Suggestions for improvement
524
+ """
525
+
526
+ response = await wrapper.vision_analysis(
527
+ model=selected_model,
528
+ image=image_bytes,
529
+ text=prompt,
530
+ )
531
+
532
+ return CallToolResult(
533
+ content=[
534
+ TextContent(
535
+ type="text",
536
+ text=f"Frame Analysis ({analysis_type}):\n\n{response}",
537
+ )
538
+ ]
539
+ )
540
+
541
+ except Exception as e:
542
+ return CallToolResult(
543
+ content=[TextContent(type="text", text=f"Frame analysis failed: {str(e)}")],
544
+ isError=True,
545
+ )
546
+
547
+
548
+ async def generate_narration(arguments: Dict[str, Any]) -> CallToolResult:
549
+ """Generate narration script."""
550
+ concept = arguments["concept"]
551
+ scene_description = arguments["scene_description"]
552
+ target_audience = arguments["target_audience"]
553
+ duration = arguments.get("duration_seconds", 30)
554
+ model = arguments.get("model")
555
+
556
+ try:
557
+ wrapper = CreativeTool.get_hf_wrapper()
558
+ model_config = ModelConfig()
559
+ selected_model = model or model_config.text_models[0]
560
+
561
+ prompt = f"""
562
+ Generate a narration script for an educational animation:
563
+
564
+ Concept: {concept}
565
+ Scene: {scene_description}
566
+ Target Audience: {target_audience}
567
+ Duration: {duration} seconds
568
+
569
+ Requirements:
570
+ 1. Clear, engaging, and age-appropriate language
571
+ 2. Educational value aligned with learning objectives
572
+ 3. Natural speaking pace (approximately {duration / 150} words for {duration} seconds)
573
+ 4. Include pauses and emphasis markers where appropriate
574
+ 5. Make it interesting and memorable
575
+
576
+ Format as a clean script ready for text-to-speech.
577
+ """
578
+
579
+ response = await wrapper.text_generation(
580
+ model=selected_model,
581
+ prompt=prompt,
582
+ max_new_tokens=512,
583
+ temperature=0.6,
584
+ )
585
+
586
+ return CallToolResult(
587
+ content=[
588
+ TextContent(
589
+ type="text",
590
+ text=f"Narration Script:\n\n{response}",
591
+ )
592
+ ]
593
+ )
594
+
595
+ except Exception as e:
596
+ return CallToolResult(
597
+ content=[
598
+ TextContent(
599
+ type="text",
600
+ text=f"Narration generation failed: {str(e)}",
601
+ )
602
+ ],
603
+ isError=True,
604
+ )
605
+
606
+
607
+ async def generate_speech(arguments: Dict[str, Any]) -> CallToolResult:
608
+ """Convert text to speech."""
609
+ text = arguments["text"]
610
+ voice = arguments.get("voice")
611
+ output_path = arguments["output_path"]
612
+ model = arguments.get("model")
613
+
614
+ try:
615
+ wrapper = CreativeTool.get_hf_wrapper()
616
+ model_config = ModelConfig()
617
+ selected_model = model or model_config.tts_models[0]
618
+
619
+ # Generate audio
620
+ audio_bytes = await wrapper.text_to_speech(
621
+ model=selected_model,
622
+ text=text,
623
+ voice=voice,
624
+ )
625
+
626
+ # Save to file
627
+ success = await wrapper.save_audio_to_file(audio_bytes, output_path)
628
+
629
+ if not success:
630
+ raise Exception("Failed to save audio file")
631
+
632
+ # Return audio info
633
+ audio_info = {
634
+ "output_path": output_path,
635
+ "text_length": len(text),
636
+ "estimated_duration": len(text) / 150, # Rough estimate
637
+ "model_used": selected_model,
638
+ }
639
+
640
+ return CallToolResult(
641
+ content=[
642
+ TextContent(
643
+ type="text",
644
+ text=f"Speech generated successfully!\n\n{json.dumps(audio_info, indent=2)}",
645
+ )
646
+ ]
647
+ )
648
+
649
+ except Exception as e:
650
+ return CallToolResult(
651
+ content=[
652
+ TextContent(type="text", text=f"Speech generation failed: {str(e)}")
653
+ ],
654
+ isError=True,
655
+ )
656
+
657
+
658
+ async def refine_animation(arguments: Dict[str, Any]) -> CallToolResult:
659
+ """Refine animation code based on feedback."""
660
+ original_code = arguments["original_code"]
661
+ feedback = arguments["feedback"]
662
+ improvement_goals = arguments.get("improvement_goals", [])
663
+ model = arguments.get("model")
664
+
665
+ try:
666
+ wrapper = CreativeTool.get_hf_wrapper()
667
+ model_config = ModelConfig()
668
+ selected_model = model or model_config.code_models[0]
669
+
670
+ prompt = f"""
671
+ You are a Manim Code Repair Agent. Your task is to rewrite the FULL Python script to fix issues or apply improvements.
672
+
673
+ Previous Code:
674
+ {original_code}
675
+
676
+ User Feedback/Error:
677
+ {feedback}
678
+
679
+ Improvement Goals:
680
+ {", ".join(improvement_goals)}
681
+
682
+ INSTRUCTIONS:
683
+ 1. Output the COMPLETE corrected script, including `from manim import *`.
684
+ 2. Do not output diffs or partial snippets.
685
+ 3. Ensure the class inherits from `MovingCameraScene` and uses `def construct(self):`.
686
+ 4. Fix logic errors based on the feedback.
687
+ 5. Animations: Use ONLY valid animations like Write(), FadeIn(), FadeOut(), Create(), Flash(), Transform() - NEVER lowercase!
688
+ 6. Colors: Use ONLY these valid Manim color constants:
689
+ - Basic: WHITE, BLACK, GRAY, GREY, LIGHT_GRAY, DARK_GRAY
690
+ - Primary: RED, GREEN, BLUE, YELLOW, ORANGE, PINK, PURPLE, TEAL, GOLD, MAROON
691
+ - Variants: RED_A, RED_B, RED_C, RED_D, RED_E, GREEN_A, GREEN_B, GREEN_C, GREEN_D, GREEN_E,
692
+ BLUE_A, BLUE_B, BLUE_C, BLUE_D, BLUE_E, YELLOW_A, YELLOW_B, YELLOW_C, YELLOW_D, YELLOW_E
693
+ - NEVER use: DARK_GREEN, LIGHT_GREEN, DARK_BLUE, LIGHT_BLUE, DARK_RED, LIGHT_RED (these don't exist!)
694
+ - For darker/lighter variants, use the letter suffixes (e.g., GREEN_E for dark green, GREEN_A for light green).
695
+
696
+ OUTPUT:
697
+ Return ONLY the raw Python code. No markdown backticks. No explanation.
698
+ """
699
+
700
+ response = await wrapper.text_generation(
701
+ model=selected_model,
702
+ prompt=prompt,
703
+ max_new_tokens=2048,
704
+ temperature=0.3,
705
+ )
706
+
707
+ return CallToolResult(
708
+ content=[
709
+ TextContent(
710
+ type="text",
711
+ text=f"Refined Manim Code:\n\n```python\n{response}\n```",
712
+ )
713
+ ]
714
+ )
715
+
716
+ except Exception as e:
717
+ return CallToolResult(
718
+ content=[
719
+ TextContent(type="text", text=f"Code refinement failed: {str(e)}")
720
+ ],
721
+ isError=True,
722
+ )
723
+
724
+
725
+ async def generate_quiz(arguments: Dict[str, Any]) -> CallToolResult:
726
+ """Generate quiz questions."""
727
+ concept = arguments["concept"]
728
+ difficulty = arguments["difficulty"]
729
+ num_questions = arguments["num_questions"]
730
+ question_types = arguments.get("question_types", ["multiple_choice"])
731
+ model = arguments.get("model")
732
+
733
+ try:
734
+ wrapper = CreativeTool.get_hf_wrapper()
735
+ model_config = ModelConfig()
736
+ selected_model = model or model_config.text_models[0]
737
+
738
+ prompt = f"""
739
+ Generate {num_questions} quiz questions for the following STEM concept:
740
+
741
+ Concept: {concept}
742
+ Difficulty: {difficulty}
743
+ Question Types: {", ".join(question_types)}
744
+
745
+ For each question provide:
746
+ 1. The question
747
+ 2. Possible answers (for multiple choice)
748
+ 3. Correct answer
749
+ 4. Brief explanation
750
+
751
+ Format as JSON array of question objects.
752
+ """
753
+
754
+ response = await wrapper.text_generation(
755
+ model=selected_model,
756
+ prompt=prompt,
757
+ max_new_tokens=1024,
758
+ temperature=0.5,
759
+ )
760
+
761
+ return CallToolResult(
762
+ content=[
763
+ TextContent(
764
+ type="text",
765
+ text=f"Generated Quiz Questions:\n\n{response}",
766
+ )
767
+ ]
768
+ )
769
+
770
+ except Exception as e:
771
+ return CallToolResult(
772
+ content=[
773
+ TextContent(type="text", text=f"Quiz generation failed: {str(e)}")
774
+ ],
775
+ isError=True,
776
+ )
777
+
778
+
779
+ async def main():
780
+ """Main entry point for the creative MCP server."""
781
+ # Set up logging
782
+ logging.basicConfig(
783
+ level=logging.INFO,
784
+ format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
785
+ )
786
+
787
+ async with stdio_server() as (read_stream, write_stream):
788
+ await server.run(
789
+ read_stream,
790
+ write_stream,
791
+ InitializationOptions(
792
+ server_name="neuroanim-creative",
793
+ server_version="0.1.0",
794
+ capabilities=server.get_capabilities(
795
+ notification_options=NotificationOptions(),
796
+ experimental_capabilities={},
797
+ ),
798
+ ),
799
+ )
800
+
801
+
802
+ if __name__ == "__main__":
803
+ asyncio.run(main())
mcp_servers/renderer.py ADDED
@@ -0,0 +1,1464 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Renderer MCP Server
3
+
4
+ This MCP server provides tools for rendering animations using Manim and
5
+ processing videos with FFmpeg.
6
+ """
7
+
8
+ import asyncio
9
+ import base64
10
+ import json
11
+ import logging
12
+ import os
13
+ import subprocess
14
+ import tempfile
15
+ from pathlib import Path
16
+ from typing import Any, Dict, List, Optional
17
+
18
+ from blaxel.core.sandbox import SandboxInstance
19
+ from mcp.server import NotificationOptions, Server
20
+ from mcp.server.models import InitializationOptions
21
+ from mcp.server.stdio import stdio_server
22
+ from mcp.types import (
23
+ CallToolRequest,
24
+ CallToolResult,
25
+ ListToolsRequest,
26
+ ListToolsResult,
27
+ TextContent,
28
+ Tool,
29
+ )
30
+ from pydantic import BaseModel
31
+
32
+ logger = logging.getLogger(__name__)
33
+
34
+ # Create MCP server
35
+ server = Server("neuroanim-renderer")
36
+
37
+
38
+ class AnimationConfig(BaseModel):
39
+ """Configuration for Manim animations."""
40
+
41
+ scene_name: str
42
+ code: str
43
+ output_file: Optional[str] = None
44
+ quality: str = "medium" # low, medium, high, production_quality
45
+ format: str = "mp4" # mp4, gif, png
46
+ resolution: Optional[str] = None
47
+ frame_rate: int = 30
48
+
49
+
50
+ class RendererTool:
51
+ """Base class for renderer tools."""
52
+
53
+ @staticmethod
54
+ def create_temp_dir() -> Path:
55
+ """Create a temporary directory for rendering."""
56
+ return Path(tempfile.mkdtemp(prefix="neuroanim_"))
57
+
58
+ @staticmethod
59
+ def cleanup_temp_dir(temp_dir: Path):
60
+ """Clean up temporary directory."""
61
+ import shutil
62
+
63
+ shutil.rmtree(temp_dir, ignore_errors=True)
64
+
65
+ @staticmethod
66
+ async def execute_sandbox_process(
67
+ sandbox, process_config: dict, logger, operation_name: str
68
+ ):
69
+ """Execute a sandbox process with retry logic for connection timeouts and process conflicts."""
70
+ import uuid
71
+
72
+ # Store original name for retries
73
+ original_name = process_config.get("name", "unnamed-process")
74
+
75
+ # Check for existing processes with the same name and clean them up
76
+ try:
77
+ existing_processes = await sandbox.process.list()
78
+ for proc in existing_processes:
79
+ if proc.name == original_name:
80
+ logger.warning(
81
+ f"Found existing process '{original_name}' (status: {proc.status}), terminating it..."
82
+ )
83
+ try:
84
+ await sandbox.process.kill(original_name)
85
+ logger.info(
86
+ f"Successfully terminated process '{original_name}'"
87
+ )
88
+ # Wait a moment for cleanup
89
+ await asyncio.sleep(1)
90
+ except Exception as kill_error:
91
+ logger.warning(
92
+ f"Failed to kill process '{original_name}': {kill_error}"
93
+ )
94
+ # Continue anyway, might still work
95
+ except Exception as list_error:
96
+ logger.debug(f"Could not list existing processes: {list_error}")
97
+
98
+ # For the first attempt, use the original name
99
+ try:
100
+ result = await sandbox.process.exec(process_config)
101
+ return result
102
+ except Exception as exec_error:
103
+ error_str = str(exec_error).lower()
104
+ # Check if it's a duplicate process error
105
+ if "already exists" in error_str or "already running" in error_str:
106
+ logger.warning(
107
+ f"Process {original_name} already exists, creating unique variant..."
108
+ )
109
+ # Create a unique name and retry
110
+ unique_name = f"{original_name}-{uuid.uuid4().hex[:8]}"
111
+ process_config["name"] = unique_name
112
+ try:
113
+ result = await sandbox.process.exec(process_config)
114
+ return result
115
+ except Exception as unique_error:
116
+ logger.error(
117
+ f"Unique process {unique_name} also failed: {unique_error}"
118
+ )
119
+ raise exec_error # Raise original error
120
+ elif "timeout" in error_str or "connecttimeout" in error_str:
121
+ logger.warning(
122
+ f"{operation_name} connection timed out, retrying after delay..."
123
+ )
124
+ await asyncio.sleep(3) # Wait before retry
125
+ # For timeout, also try with a unique name to avoid conflicts
126
+ unique_name = f"{original_name}-{uuid.uuid4().hex[:8]}-retry"
127
+ process_config["name"] = unique_name
128
+ try:
129
+ result = await sandbox.process.exec(process_config)
130
+ logger.info(f"Retry successful for {operation_name}")
131
+ return result
132
+ except Exception as retry_error:
133
+ logger.error(f"Retry failed for {operation_name}: {retry_error}")
134
+ raise exec_error # Raise original error
135
+ else:
136
+ logger.error(
137
+ f"{operation_name} failed with non-timeout error: {exec_error}"
138
+ )
139
+ raise exec_error
140
+
141
+ @staticmethod
142
+ async def read_sandbox_file(sandbox, file_path: str, logger):
143
+ """Read a file from sandbox with retry logic for connection timeouts."""
144
+ try:
145
+ content = await sandbox.fs.read(file_path)
146
+ return content
147
+ except Exception as read_error:
148
+ error_str = str(read_error).lower()
149
+ if "timeout" in error_str or "connecttimeout" in error_str:
150
+ logger.warning(
151
+ f"File read from sandbox timed out, retrying after delay..."
152
+ )
153
+ await asyncio.sleep(3) # Wait before retry
154
+ try:
155
+ content = await sandbox.fs.read(file_path)
156
+ logger.info(f"Retry successful for file read: {file_path}")
157
+ return content
158
+ except Exception as retry_error:
159
+ logger.error(f"Retry failed for file read: {retry_error}")
160
+ raise read_error # Raise original error
161
+ else:
162
+ logger.error(f"File read failed with non-timeout error: {read_error}")
163
+ raise read_error
164
+
165
+ @staticmethod
166
+ async def write_sandbox_file(sandbox, file_path: str, content: str, logger):
167
+ """Write a file to sandbox with retry logic for connection timeouts."""
168
+ try:
169
+ await sandbox.fs.write(file_path, content)
170
+ return
171
+ except Exception as write_error:
172
+ error_str = str(write_error).lower()
173
+ if "timeout" in error_str or "connecttimeout" in error_str:
174
+ logger.warning(
175
+ f"File write to sandbox timed out, retrying after delay..."
176
+ )
177
+ await asyncio.sleep(3) # Wait before retry
178
+ try:
179
+ await sandbox.fs.write(file_path, content)
180
+ logger.info(f"Retry successful for file write: {file_path}")
181
+ return
182
+ except Exception as retry_error:
183
+ logger.error(f"Retry failed for file write: {retry_error}")
184
+ raise write_error # Raise original error
185
+ else:
186
+ logger.error(f"File write failed with non-timeout error: {write_error}")
187
+ raise write_error
188
+
189
+
190
+ @server.list_tools()
191
+ async def list_tools() -> ListToolsResult:
192
+ """List available renderer tools."""
193
+ tools = [
194
+ Tool(
195
+ name="write_manim_file",
196
+ description="Write a Manim Python file to the filesystem",
197
+ inputSchema={
198
+ "type": "object",
199
+ "properties": {
200
+ "filepath": {
201
+ "type": "string",
202
+ "description": "Path where to write the Manim file",
203
+ },
204
+ "code": {
205
+ "type": "string",
206
+ "description": "Manim Python code to write",
207
+ },
208
+ },
209
+ "required": ["filepath", "code"],
210
+ },
211
+ ),
212
+ Tool(
213
+ name="render_manim_animation",
214
+ description="Render a Manim animation using subprocess",
215
+ inputSchema={
216
+ "type": "object",
217
+ "properties": {
218
+ "scene_name": {
219
+ "type": "string",
220
+ "description": "Name of the Manim scene to render",
221
+ },
222
+ "file_path": {
223
+ "type": "string",
224
+ "description": "Path to the Manim Python file",
225
+ },
226
+ "output_dir": {
227
+ "type": "string",
228
+ "description": "Directory to save the output animation",
229
+ },
230
+ "quality": {
231
+ "type": "string",
232
+ "enum": ["low", "medium", "high", "production_quality"],
233
+ "description": "Rendering quality (default: medium)",
234
+ },
235
+ "format": {
236
+ "type": "string",
237
+ "enum": ["mp4", "gif", "png"],
238
+ "description": "Output format (default: mp4)",
239
+ },
240
+ "frame_rate": {
241
+ "type": "integer",
242
+ "description": "Frame rate (default: 30)",
243
+ },
244
+ },
245
+ "required": ["scene_name", "file_path", "output_dir"],
246
+ },
247
+ ),
248
+ Tool(
249
+ name="process_video_with_ffmpeg",
250
+ description="Process video using FFmpeg for merging, conversion, etc.",
251
+ inputSchema={
252
+ "type": "object",
253
+ "properties": {
254
+ "input_files": {
255
+ "type": "array",
256
+ "items": {"type": "string"},
257
+ "description": "List of input video/audio files",
258
+ },
259
+ "output_file": {
260
+ "type": "string",
261
+ "description": "Output file path",
262
+ },
263
+ "ffmpeg_args": {
264
+ "type": "array",
265
+ "items": {"type": "string"},
266
+ "description": "Additional FFmpeg arguments",
267
+ },
268
+ },
269
+ "required": ["input_files", "output_file"],
270
+ },
271
+ ),
272
+ Tool(
273
+ name="merge_video_audio",
274
+ description="Merge video and audio files using FFmpeg",
275
+ inputSchema={
276
+ "type": "object",
277
+ "properties": {
278
+ "video_file": {
279
+ "type": "string",
280
+ "description": "Path to the video file",
281
+ },
282
+ "audio_file": {
283
+ "type": "string",
284
+ "description": "Path to the audio file",
285
+ },
286
+ "output_file": {
287
+ "type": "string",
288
+ "description": "Path to the output merged file",
289
+ },
290
+ },
291
+ "required": ["video_file", "audio_file", "output_file"],
292
+ },
293
+ ),
294
+ Tool(
295
+ name="check_file_exists",
296
+ description="Check if a file exists and return its metadata",
297
+ inputSchema={
298
+ "type": "object",
299
+ "properties": {
300
+ "filepath": {
301
+ "type": "string",
302
+ "description": "Path to the file to check",
303
+ }
304
+ },
305
+ "required": ["filepath"],
306
+ },
307
+ ),
308
+ ]
309
+
310
+ return ListToolsResult(tools=tools)
311
+
312
+
313
+ @server.call_tool()
314
+ async def call_tool(tool_name: str, arguments: Dict[str, Any]) -> CallToolResult:
315
+ """Dispatch renderer tool calls.
316
+
317
+ As with the creative server, the low-level MCP server passes
318
+ `(tool_name, arguments)` into this handler.
319
+ """
320
+
321
+ try:
322
+ if tool_name == "write_manim_file":
323
+ return await write_manim_file(arguments)
324
+ elif tool_name == "render_manim_animation":
325
+ return await render_manim_animation(arguments)
326
+ elif tool_name == "process_video_with_ffmpeg":
327
+ return await process_video_with_ffmpeg(arguments)
328
+ elif tool_name == "merge_video_audio":
329
+ return await merge_video_audio(arguments)
330
+ elif tool_name == "check_file_exists":
331
+ return await check_file_exists(arguments)
332
+ else:
333
+ return CallToolResult(
334
+ content=[TextContent(type="text", text=f"Unknown tool: {tool_name}")],
335
+ isError=True,
336
+ )
337
+ except Exception as e:
338
+ logger.error(f"Error in tool {tool_name}: {e}")
339
+ return CallToolResult(
340
+ content=[TextContent(type="text", text=f"Error: {str(e)}")],
341
+ isError=True,
342
+ )
343
+
344
+
345
+ async def write_manim_file(arguments: Dict[str, Any]) -> CallToolResult:
346
+ """Write a Manim Python file."""
347
+ filepath = arguments["filepath"]
348
+ code = arguments["code"]
349
+
350
+ try:
351
+ # Ensure directory exists
352
+ Path(filepath).parent.mkdir(parents=True, exist_ok=True)
353
+
354
+ # Write the file
355
+ with open(filepath, "w") as f:
356
+ f.write(code)
357
+
358
+ logger.info(f"Manim file written to: {filepath}")
359
+
360
+ return CallToolResult(
361
+ content=[
362
+ TextContent(
363
+ type="text", text=f"Successfully wrote Manim file to {filepath}"
364
+ )
365
+ ]
366
+ )
367
+ except Exception as e:
368
+ return CallToolResult(
369
+ content=[TextContent(type="text", text=f"Failed to write file: {str(e)}")],
370
+ isError=True,
371
+ )
372
+
373
+
374
+ async def render_manim_animation(arguments: Dict[str, Any]) -> CallToolResult:
375
+ """Render a Manim animation using Blaxel sandbox execution with local fallback."""
376
+ scene_name = arguments["scene_name"]
377
+ file_path = arguments["file_path"]
378
+ output_dir = arguments["output_dir"]
379
+ quality = arguments.get("quality", "medium")
380
+ format_type = arguments.get("format", "mp4")
381
+ frame_rate = arguments.get("frame_rate", 30)
382
+
383
+ # Skip sandbox rendering and use local rendering directly with .venv
384
+ logger.info("Using local Manim rendering with .venv environment...")
385
+
386
+ local_result = await _render_manim_locally(
387
+ scene_name, file_path, output_dir, quality, format_type, frame_rate
388
+ )
389
+
390
+ return CallToolResult(
391
+ content=[TextContent(type="text", text=local_result["text"])],
392
+ isError=local_result.get("isError", False),
393
+ )
394
+
395
+
396
+ async def _render_manim_with_sandbox(
397
+ scene_name: str,
398
+ file_path: str,
399
+ output_dir: str,
400
+ quality: str,
401
+ format_type: str,
402
+ frame_rate: int,
403
+ ) -> Dict[str, Any]:
404
+ """Render a Manim animation using Blaxel sandbox execution."""
405
+ # Map quality to manim flags
406
+ quality_flags = {
407
+ "low": "-ql",
408
+ "medium": "-qm",
409
+ "high": "-qh",
410
+ "production_quality": "-qp",
411
+ }
412
+ quality_flag = quality_flags.get(quality, "-qm")
413
+
414
+ try:
415
+ # Ensure output directory exists
416
+ Path(output_dir).mkdir(parents=True, exist_ok=True)
417
+
418
+ # Read the Manim code file
419
+ with open(file_path, "r") as f:
420
+ manim_code = f.read()
421
+
422
+ logger.info(f"Creating Blaxel sandbox for scene: {scene_name}")
423
+
424
+ # Sanitize scene name for valid sandbox name
425
+ sanitized_scene_name = scene_name.lower().replace(" ", "-").replace("_", "-")
426
+ # Ensure name is not too long and only contains valid characters
427
+ import re
428
+
429
+ sanitized_scene_name = re.sub(r"[^a-z0-9\-]", "", sanitized_scene_name)[:20]
430
+ if not sanitized_scene_name:
431
+ sanitized_scene_name = "default"
432
+
433
+ try:
434
+ # Create or get sandbox using Blaxel SDK
435
+ # Uses BL_WORKSPACE and BL_API_KEY from environment or .env file
436
+ logger.info(f"Creating Blaxel sandbox: manim-render-{sanitized_scene_name}")
437
+ try:
438
+ # Create sandbox with proper virtual environment
439
+ sandbox = await SandboxInstance.create(
440
+ {
441
+ "name": f"manim-render-{sanitized_scene_name}",
442
+ "image": "blaxel/py-app:latest",
443
+ "memory": 4096,
444
+ # Use virtual environment instead of system
445
+ "virtual": True,
446
+ }
447
+ )
448
+ logger.info(f"Successfully created sandbox: {sandbox.metadata.name}")
449
+
450
+ # Wait a moment for sandbox to fully initialize
451
+ logger.info("Waiting for sandbox to initialize...")
452
+ await asyncio.sleep(2)
453
+
454
+ except Exception as create_error:
455
+ # Handle connection timeouts by retrying
456
+ error_str = str(create_error).lower()
457
+ if "timeout" in error_str or "connecttimeout" in error_str:
458
+ logger.warning(
459
+ "Sandbox creation connection timed out, retrying after delay..."
460
+ )
461
+ await asyncio.sleep(5) # Wait longer before retry
462
+ try:
463
+ # Retry once
464
+ sandbox = await SandboxInstance.create(
465
+ {
466
+ "name": f"manim-render-{sanitized_scene_name}",
467
+ "image": "blaxel/py-app:latest",
468
+ "memory": 4096,
469
+ }
470
+ )
471
+ logger.info(
472
+ f"Retry successful: Created sandbox: {sandbox.metadata.name}"
473
+ )
474
+
475
+ # Wait for sandbox to initialize
476
+ logger.info("Waiting for sandbox to initialize after retry...")
477
+ await asyncio.sleep(3)
478
+
479
+ except Exception as retry_error:
480
+ logger.error(f"Retry failed: {retry_error}")
481
+ raise create_error # Raise original error
482
+ else:
483
+ logger.error(
484
+ f"Sandbox creation failed with non-timeout error: {create_error}"
485
+ )
486
+ raise create_error
487
+ except Exception as sandbox_error:
488
+ error_msg = f"Failed to create Blaxel sandbox: {str(sandbox_error)}"
489
+ logger.error(error_msg)
490
+ return {"text": error_msg, "isError": True}
491
+
492
+ try:
493
+ # Write the Manim code to the sandbox
494
+ sandbox_file_path = f"/tmp/{scene_name}.py"
495
+ logger.info(f"Writing Manim code to sandbox: {sandbox_file_path}")
496
+ await RendererTool.write_sandbox_file(
497
+ sandbox, sandbox_file_path, manim_code, logger
498
+ )
499
+ logger.info(
500
+ f"Successfully wrote Manim code to sandbox: {sandbox_file_path}"
501
+ )
502
+
503
+ # Initialize flag for Manim installation check
504
+ manim_already_installed = False
505
+
506
+ # Test what's available in the sandbox
507
+ logger.info("Testing sandbox environment...")
508
+ try:
509
+ test_result = await RendererTool.execute_sandbox_process(
510
+ sandbox,
511
+ {
512
+ "name": "test-environment",
513
+ "command": "which python3 && python3 --version && which pip && pip --version",
514
+ "wait_for_completion": True,
515
+ },
516
+ logger,
517
+ "Environment test",
518
+ )
519
+ logger.info(f"Environment test result: {test_result}")
520
+
521
+ # Get test logs
522
+ try:
523
+ test_logs = await sandbox.process.logs("test-environment", "all")
524
+ logger.info(f"Environment test logs: {test_logs}")
525
+ except Exception as log_error:
526
+ logger.warning(f"Could not retrieve test logs: {log_error}")
527
+ except Exception as test_error:
528
+ logger.warning(f"Environment test failed: {test_error}")
529
+
530
+ # Test if apt-get is available
531
+ logger.info("Testing if apt-get is available...")
532
+ try:
533
+ apt_test_result = await RendererTool.execute_sandbox_process(
534
+ sandbox,
535
+ {
536
+ "name": "test-apt",
537
+ "command": "which apt-get || echo 'apt-get not found'",
538
+ "wait_for_completion": True,
539
+ },
540
+ logger,
541
+ "Apt availability test",
542
+ )
543
+ logger.info(f"Apt test result: {apt_test_result}")
544
+
545
+ # Get apt test logs
546
+ try:
547
+ apt_test_logs = await sandbox.process.logs("test-apt", "all")
548
+ logger.info(f"Apt test logs: {apt_test_logs}")
549
+ except Exception as log_error:
550
+ logger.warning(f"Could not retrieve apt test logs: {log_error}")
551
+ except Exception as apt_test_error:
552
+ logger.warning(f"Apt test failed: {apt_test_error}")
553
+
554
+ # Try a simple pip install first to see if it works
555
+ logger.info("Testing pip install...")
556
+ try:
557
+ pip_test_result = await RendererTool.execute_sandbox_process(
558
+ sandbox,
559
+ {
560
+ "name": "test-pip",
561
+ "command": "pip install --dry-run manim",
562
+ "wait_for_completion": True,
563
+ },
564
+ logger,
565
+ "Pip test",
566
+ )
567
+ logger.info(f"Pip test result: {pip_test_result}")
568
+
569
+ # Get pip test logs
570
+ try:
571
+ pip_test_logs = await sandbox.process.logs("test-pip", "all")
572
+ logger.info(f"Pip test logs: {pip_test_logs}")
573
+ except Exception as log_error:
574
+ logger.warning(f"Could not retrieve pip test logs: {log_error}")
575
+ except Exception as pip_test_error:
576
+ logger.warning(f"Pip test failed: {pip_test_error}")
577
+
578
+ # Check if Manim is already installed
579
+ logger.info("Checking if Manim is already installed...")
580
+ try:
581
+ manim_check_result = await RendererTool.execute_sandbox_process(
582
+ sandbox,
583
+ {
584
+ "name": "check-manim",
585
+ "command": "python3 -c \"import manim; print('Manim version:', manim.__version__)\" || echo 'Manim not found'",
586
+ "wait_for_completion": True,
587
+ },
588
+ logger,
589
+ "Manim check",
590
+ )
591
+ logger.info(f"Manim check result: {manim_check_result}")
592
+
593
+ # Get manim check logs
594
+ try:
595
+ manim_check_logs = await sandbox.process.logs("check-manim", "all")
596
+ logger.info(f"Manim check logs: {manim_check_logs}")
597
+
598
+ # Check if Manim is installed
599
+ if "Manim version:" in str(manim_check_logs):
600
+ logger.info("Manim is already installed, skipping installation")
601
+ manim_already_installed = True
602
+ else:
603
+ logger.info(
604
+ "Manim is not installed, proceeding with installation"
605
+ )
606
+ manim_already_installed = False
607
+ except Exception as log_error:
608
+ logger.warning(f"Could not retrieve manim check logs: {log_error}")
609
+ manim_already_installed = False
610
+ except Exception as manim_check_error:
611
+ logger.warning(f"Manim check failed: {manim_check_error}")
612
+ manim_already_installed = False
613
+
614
+ # Install manim and its dependencies in the sandbox
615
+ logger.info("Installing manim and dependencies in the sandbox...")
616
+
617
+ # Check if ffmpeg is already available
618
+ logger.info("Checking if ffmpeg is available...")
619
+ try:
620
+ ffmpeg_check_result = await RendererTool.execute_sandbox_process(
621
+ sandbox,
622
+ {
623
+ "name": "check-ffmpeg",
624
+ "command": "which ffmpeg || echo 'ffmpeg not found'",
625
+ "wait_for_completion": True,
626
+ },
627
+ logger,
628
+ "FFmpeg availability check",
629
+ )
630
+ logger.info(f"Ffmpeg check result: {ffmpeg_check_result}")
631
+
632
+ # Get ffmpeg check logs
633
+ try:
634
+ ffmpeg_check_logs = await sandbox.process.logs(
635
+ "check-ffmpeg", "all"
636
+ )
637
+ logger.info(f"Ffmpeg check logs: {ffmpeg_check_logs}")
638
+ except Exception as log_error:
639
+ logger.warning(f"Could not retrieve ffmpeg check logs: {log_error}")
640
+ except Exception as ffmpeg_check_error:
641
+ logger.warning(f"Ffmpeg check failed: {ffmpeg_check_error}")
642
+
643
+ # Skip installation if Manim is already installed
644
+ if manim_already_installed:
645
+ logger.info(
646
+ "Skipping dependencies installation as Manim is already installed"
647
+ )
648
+ manim_installed = True
649
+ else:
650
+ # Try to install system dependencies step-by-step for better reliability
651
+ logger.info("Installing system dependencies step-by-step...")
652
+
653
+ # First update package lists
654
+ try:
655
+ logger.info("Updating package lists...")
656
+ update_result = await RendererTool.execute_sandbox_process(
657
+ sandbox,
658
+ {
659
+ "name": "apt-update",
660
+ "command": "apt-get update",
661
+ "wait_for_completion": True,
662
+ "timeout": 120,
663
+ },
664
+ logger,
665
+ "Package list update",
666
+ )
667
+ logger.info(f"Package update result: {update_result}")
668
+
669
+ if update_result.status != "exited" or (
670
+ hasattr(update_result, "exit_code")
671
+ and update_result.exit_code != 0
672
+ ):
673
+ logger.warning("Package update failed, but continuing...")
674
+ except Exception as update_error:
675
+ logger.warning(
676
+ f"Package update failed: {update_error}, continuing..."
677
+ )
678
+
679
+ # Install ffmpeg
680
+ try:
681
+ logger.info("Installing ffmpeg...")
682
+ ffmpeg_result = await RendererTool.execute_sandbox_process(
683
+ sandbox,
684
+ {
685
+ "name": "install-ffmpeg",
686
+ "command": "apt-get install -y ffmpeg",
687
+ "wait_for_completion": True,
688
+ "timeout": 180,
689
+ },
690
+ logger,
691
+ "FFmpeg installation",
692
+ )
693
+ logger.info(f"FFmpeg installation result: {ffmpeg_result}")
694
+
695
+ if ffmpeg_result.status != "exited" or (
696
+ hasattr(ffmpeg_result, "exit_code")
697
+ and ffmpeg_result.exit_code != 0
698
+ ):
699
+ logger.warning("FFmpeg installation failed, but continuing...")
700
+ except Exception as ffmpeg_error:
701
+ logger.warning(
702
+ f"FFmpeg installation failed: {ffmpeg_error}, continuing..."
703
+ )
704
+
705
+ # Install libcairo2-dev
706
+ try:
707
+ logger.info("Installing libcairo2-dev...")
708
+ cairo_result = await RendererTool.execute_sandbox_process(
709
+ sandbox,
710
+ {
711
+ "name": "install-cairo",
712
+ "command": "apt-get install -y libcairo2-dev",
713
+ "wait_for_completion": True,
714
+ "timeout": 180,
715
+ },
716
+ logger,
717
+ "Cairo installation",
718
+ )
719
+ logger.info(f"Cairo installation result: {cairo_result}")
720
+
721
+ if cairo_result.status != "exited" or (
722
+ hasattr(cairo_result, "exit_code")
723
+ and cairo_result.exit_code != 0
724
+ ):
725
+ logger.warning("Cairo installation failed, but continuing...")
726
+ except Exception as cairo_error:
727
+ logger.warning(
728
+ f"Cairo installation failed: {cairo_error}, continuing..."
729
+ )
730
+
731
+ # Install Python dependencies - try lighter alternatives first
732
+ logger.info("Installing Python dependencies...")
733
+ manim_installed = False
734
+
735
+ # Try installing manim Community Edition (lighter than full manim)
736
+ install_commands = [
737
+ ("pip install manimlib", "manimlib installation"),
738
+ ("pip install manim", "full manim installation"),
739
+ (
740
+ "pip install --no-deps manim && pip install numpy scipy matplotlib",
741
+ "minimal manim with deps",
742
+ ),
743
+ ]
744
+
745
+ for install_cmd, description in install_commands:
746
+ if manim_installed:
747
+ break
748
+
749
+ try:
750
+ logger.info(f"Trying {description}: {install_cmd}")
751
+ install_result = await RendererTool.execute_sandbox_process(
752
+ sandbox,
753
+ {
754
+ "name": "install-manim-attempt",
755
+ "command": install_cmd,
756
+ "wait_for_completion": True,
757
+ "timeout": 600, # 10 minute timeout
758
+ },
759
+ logger,
760
+ description,
761
+ )
762
+ logger.info(f"{description} result: {install_result}")
763
+
764
+ if install_result.status == "exited" and (
765
+ not hasattr(install_result, "exit_code")
766
+ or install_result.exit_code == 0
767
+ ):
768
+ logger.info(f"Successfully installed with: {install_cmd}")
769
+ manim_installed = True
770
+
771
+ # Verify installation
772
+ try:
773
+ verify_result = await RendererTool.execute_sandbox_process(
774
+ sandbox,
775
+ {
776
+ "name": "verify-manim",
777
+ "command": "python3 -c \"import manim; print('Manim version:', getattr(manim, '__version__', 'unknown'))\"",
778
+ "wait_for_completion": True,
779
+ "timeout": 30,
780
+ },
781
+ logger,
782
+ "Manim verification",
783
+ )
784
+ logger.info(
785
+ f"Manim verification result: {verify_result}"
786
+ )
787
+ except Exception as verify_error:
788
+ logger.warning(
789
+ f"Manim verification failed: {verify_error}"
790
+ )
791
+ else:
792
+ logger.warning(
793
+ f"{description} failed, trying next option..."
794
+ )
795
+
796
+ # Get installation logs for debugging (for the last attempt)
797
+ try:
798
+ install_logs = await sandbox.process.logs(
799
+ "install-manim-attempt", "all"
800
+ )
801
+ logger.info(f"Manim installation logs: {install_logs}")
802
+ except Exception as log_error:
803
+ logger.warning(
804
+ f"Could not retrieve installation logs: {log_error}"
805
+ )
806
+
807
+ # Check if the last installation attempt was successful
808
+ if install_result.status != "exited" or (
809
+ hasattr(install_result, "exit_code")
810
+ and install_result.exit_code != 0
811
+ ):
812
+ error_msg = f"Manim installation failed with status: {install_result.status}"
813
+ if hasattr(install_result, "exit_code"):
814
+ error_msg += f", exit code: {install_result.exit_code}"
815
+
816
+ # Try to get more detailed logs
817
+ try:
818
+ install_logs = await sandbox.process.logs(
819
+ "install-manim-attempt", "all"
820
+ )
821
+ error_msg += f"\nLogs: {install_logs}"
822
+ except Exception as log_error:
823
+ error_msg += f"\nCould not retrieve logs: {log_error}"
824
+
825
+ logger.error(error_msg)
826
+ # Don't return error here, continue to check if any installation worked
827
+
828
+ except Exception as install_error:
829
+ # Handle timeout specifically
830
+ error_str = str(install_error).lower()
831
+ if "timeout" in error_str or "readtimeout" in error_str:
832
+ logger.warning(
833
+ "Pip install manim timed out - this might be OK if packages were already installed or partially installed"
834
+ )
835
+ # Try to check if manim was actually installed despite timeout
836
+ try:
837
+ manim_check_after = await RendererTool.execute_sandbox_process(
838
+ sandbox,
839
+ {
840
+ "name": "check-manim-after-install",
841
+ "command": "python3 -c \"import manim; print('Manim available after install timeout')\" || echo 'Manim not available after install timeout'",
842
+ "wait_for_completion": True,
843
+ },
844
+ logger,
845
+ "Post-install Manim check",
846
+ )
847
+ logger.info(
848
+ f"Post-install Manim check result: {manim_check_after}"
849
+ )
850
+
851
+ # Get logs
852
+ try:
853
+ check_logs = await sandbox.process.logs(
854
+ "check-manim-after-install", "all"
855
+ )
856
+ logger.info(
857
+ f"Post-install check logs: {check_logs}"
858
+ )
859
+
860
+ if "manim available" in str(check_logs).lower():
861
+ logger.info(
862
+ "Manim appears to be installed despite timeout, continuing..."
863
+ )
864
+ manim_installed = True
865
+ else:
866
+ logger.warning(
867
+ "Manim not available after install timeout, may cause render failure"
868
+ )
869
+ except Exception as log_error:
870
+ logger.warning(
871
+ f"Could not check post-install logs: {log_error}"
872
+ )
873
+ except Exception as check_error:
874
+ logger.warning(
875
+ f"Could not verify Manim installation after timeout: {check_error}"
876
+ )
877
+ else:
878
+ import traceback
879
+
880
+ error_details = traceback.format_exc()
881
+ error_msg = f"Error during pip install manim: {str(install_error)}\nDetails: {error_details}"
882
+
883
+ # Try to get installation logs for debugging
884
+ try:
885
+ install_logs = await sandbox.process.logs(
886
+ "install-manim-attempt", "all"
887
+ )
888
+ error_msg += f"\nInstallation logs: {install_logs}"
889
+ except Exception as log_error:
890
+ error_msg += f"\nCould not retrieve installation logs: {log_error}"
891
+
892
+ logger.error(error_msg)
893
+ # Don't return error here, continue to try other installation methods
894
+
895
+ logger.warning(f"{description} failed: {install_error}")
896
+ continue
897
+
898
+ # Final check: ensure Manim is actually installed before proceeding to render
899
+ if not manim_already_installed and not manim_installed:
900
+ logger.warning(
901
+ "Manim installation appears to have failed, attempting final verification..."
902
+ )
903
+
904
+ # Final verification attempt
905
+ try:
906
+ final_check = await RendererTool.execute_sandbox_process(
907
+ sandbox,
908
+ {
909
+ "name": "final-manim-check",
910
+ "command": "python3 -c \"import manim; print('SUCCESS: Manim is available')\" || echo 'FAILED: Manim not available'",
911
+ "wait_for_completion": True,
912
+ "timeout": 30,
913
+ },
914
+ logger,
915
+ "Final Manim availability check",
916
+ )
917
+
918
+ # Get logs to check result
919
+ try:
920
+ check_logs = await sandbox.process.logs(
921
+ "final-manim-check", "all"
922
+ )
923
+ if "SUCCESS" in str(check_logs):
924
+ logger.info(
925
+ "Final check confirms Manim is available, proceeding with render"
926
+ )
927
+ manim_installed = True
928
+ else:
929
+ error_msg = f"Final verification shows Manim is not available. Installation appears to have failed.\nCheck logs: {check_logs}"
930
+ logger.error(error_msg)
931
+ return {"text": error_msg, "isError": True}
932
+ except Exception as log_error:
933
+ logger.warning(
934
+ f"Could not retrieve final check logs: {log_error}"
935
+ )
936
+
937
+ except Exception as final_check_error:
938
+ error_msg = f"Cannot verify Manim installation: {final_check_error}"
939
+ logger.error(error_msg)
940
+ return {"text": error_msg, "isError": True}
941
+
942
+ # Run the Manim render command - try different possible commands
943
+ render_commands = [
944
+ f"manim {quality_flag} --fps {frame_rate} -o {scene_name}.{format_type} {sandbox_file_path} {scene_name}",
945
+ f"python3 -m manim {quality_flag} --fps {frame_rate} -o {scene_name}.{format_type} {sandbox_file_path} {scene_name}",
946
+ f"manimce {quality_flag} --fps {frame_rate} -o {scene_name}.{format_type} {sandbox_file_path} {scene_name}",
947
+ ]
948
+
949
+ render_success = False
950
+ render_result = None
951
+
952
+ for cmd in render_commands:
953
+ if render_success:
954
+ break
955
+
956
+ logger.info(f"Trying render command: {cmd}")
957
+ try:
958
+ render_result = await RendererTool.execute_sandbox_process(
959
+ sandbox,
960
+ {
961
+ "name": "render-manim",
962
+ "command": cmd,
963
+ "wait_for_completion": True,
964
+ "timeout": 600, # 10 minute timeout for rendering
965
+ },
966
+ logger,
967
+ f"Manim rendering with '{cmd}'",
968
+ )
969
+ logger.info(f"Render result: {render_result}")
970
+
971
+ if render_result.status == "exited" and (
972
+ not hasattr(render_result, "exit_code")
973
+ or render_result.exit_code == 0
974
+ ):
975
+ logger.info(f"Successfully rendered with: {cmd}")
976
+ render_success = True
977
+ else:
978
+ logger.warning(
979
+ f"Render failed with command '{cmd}', trying next option..."
980
+ )
981
+
982
+ except Exception as render_error:
983
+ logger.warning(f"Render failed with '{cmd}': {render_error}")
984
+ continue
985
+ # Check if rendering was successful
986
+ if not render_success:
987
+ error_msg = "All render command attempts failed."
988
+
989
+ # Try to get logs for debugging from the last attempt
990
+ try:
991
+ logs = await sandbox.process.logs("render-manim", "all")
992
+ error_msg += f"\nLast render logs: {logs}"
993
+ except Exception as log_error:
994
+ error_msg += f"\nCould not retrieve logs: {log_error}"
995
+
996
+ logger.error(error_msg)
997
+ return {"text": error_msg, "isError": True}
998
+ # If we get here and render wasn't successful, it's an error
999
+ if not render_success:
1000
+ error_msg = "Manim rendering failed - no working render command found"
1001
+ logger.error(error_msg)
1002
+ return {"text": error_msg, "isError": True}
1003
+
1004
+ except Exception as render_error:
1005
+ # Handle timeout specifically
1006
+ error_str = str(render_error).lower()
1007
+ if "timeout" in error_str or "readtimeout" in error_str:
1008
+ logger.warning(
1009
+ "Manim render timed out - this indicates a long-running render process"
1010
+ )
1011
+ # Try to continue and check if output was generated
1012
+ else:
1013
+ error_msg = f"Error during manim rendering: {str(render_error)}"
1014
+ logger.error(error_msg)
1015
+ return {"text": error_msg, "isError": True}
1016
+
1017
+ # Find the output file in the sandbox
1018
+ # Manim typically outputs to media/videos/{scene_name}/{quality}/
1019
+ possible_paths = [
1020
+ f"/tmp/media/videos/{scene_name}/{quality}/{scene_name}.{format_type}",
1021
+ f"/tmp/media/videos/{scene_name.lower()}/{quality}/{scene_name}.{format_type}",
1022
+ f"/tmp/{scene_name}.{format_type}",
1023
+ f"/root/media/videos/{scene_name}/{quality}/{scene_name}.{format_type}",
1024
+ ]
1025
+
1026
+ output_content = None
1027
+ found_path = None
1028
+
1029
+ for sandbox_path in possible_paths:
1030
+ try:
1031
+ output_content = await RendererTool.read_sandbox_file(
1032
+ sandbox, sandbox_path, logger
1033
+ )
1034
+ found_path = sandbox_path
1035
+ logger.info(f"Found output at: {sandbox_path}")
1036
+ break
1037
+ except Exception:
1038
+ continue
1039
+
1040
+ if not output_content:
1041
+ # List files to debug
1042
+ try:
1043
+ ls_result = await RendererTool.execute_sandbox_process(
1044
+ sandbox,
1045
+ {
1046
+ "name": "find-output",
1047
+ "command": "find /tmp -name '*.mp4' -o -name '*.gif' 2>/dev/null || true",
1048
+ "wait_for_completion": True,
1049
+ },
1050
+ logger,
1051
+ "Find output files",
1052
+ )
1053
+ find_logs = await sandbox.process.logs("find-output", "stdout")
1054
+ logger.info(f"Found video files: {find_logs}")
1055
+ except Exception:
1056
+ pass
1057
+
1058
+ error_msg = f"Could not find rendered output file. Searched paths: {possible_paths}"
1059
+ logger.error(error_msg)
1060
+ return {"text": error_msg, "isError": True}
1061
+
1062
+ # Write the output to local filesystem
1063
+ output_path = Path(output_dir) / f"{scene_name}.{format_type}"
1064
+
1065
+ # Handle the content - it may be base64 encoded or bytes
1066
+ if isinstance(output_content, str):
1067
+ try:
1068
+ decoded_content = base64.b64decode(output_content)
1069
+ with open(output_path, "wb") as f:
1070
+ f.write(decoded_content)
1071
+ except Exception:
1072
+ with open(output_path, "w") as f:
1073
+ f.write(output_content)
1074
+ elif isinstance(output_content, (bytes, bytearray)):
1075
+ with open(output_path, "wb") as f:
1076
+ f.write(output_content)
1077
+ else:
1078
+ with open(output_path, "wb") as f:
1079
+ f.write(output_content)
1080
+
1081
+ result_msg = (
1082
+ f"Successfully rendered animation using Blaxel sandbox!\n"
1083
+ f"Scene: {scene_name}\n"
1084
+ f"Output file: {output_path}\n"
1085
+ f"Quality: {quality}\n"
1086
+ f"Format: {format_type}\n"
1087
+ f"File size: {output_path.stat().st_size if output_path.exists() else 'Unknown'} bytes"
1088
+ )
1089
+
1090
+ logger.info(result_msg)
1091
+ return {"text": result_msg, "isError": False}
1092
+
1093
+ finally:
1094
+ # Clean up sandbox
1095
+ try:
1096
+ await SandboxInstance.delete(sandbox.metadata.name)
1097
+ logger.info(f"Deleted sandbox: {sandbox.metadata.name}")
1098
+ except Exception as cleanup_error:
1099
+ logger.warning(f"Failed to delete sandbox: {cleanup_error}")
1100
+
1101
+ except asyncio.TimeoutError:
1102
+ error_msg = "Blaxel sandbox execution timed out"
1103
+ logger.error(error_msg)
1104
+ return {"text": error_msg, "isError": True}
1105
+ except Exception as e:
1106
+ # Get detailed exception information
1107
+ import traceback
1108
+
1109
+ error_details = traceback.format_exc()
1110
+ error_msg = (
1111
+ f"Error during Blaxel sandbox rendering: {str(e)}\nDetails: {error_details}"
1112
+ )
1113
+ logger.error(error_msg)
1114
+ return {"text": error_msg, "isError": True}
1115
+
1116
+
1117
+ async def _render_manim_locally(
1118
+ scene_name: str,
1119
+ file_path: str,
1120
+ output_dir: str,
1121
+ quality: str,
1122
+ format_type: str,
1123
+ frame_rate: int,
1124
+ ) -> Dict[str, Any]:
1125
+ """Render a Manim animation using local Manim installation."""
1126
+ try:
1127
+ # Ensure output directory exists
1128
+ Path(output_dir).mkdir(parents=True, exist_ok=True)
1129
+
1130
+ # Map quality to manim flags
1131
+ quality_flags = {
1132
+ "low": "-ql",
1133
+ "medium": "-qm",
1134
+ "high": "-qh",
1135
+ "production_quality": "-qp",
1136
+ }
1137
+ quality_flag = quality_flags.get(quality, "-qm")
1138
+
1139
+ # Find the project root and .venv
1140
+ # Assume the project root contains .venv directory
1141
+ project_root = Path(__file__).resolve().parent.parent
1142
+ venv_python = project_root / ".venv" / "bin" / "python"
1143
+ venv_manim = project_root / ".venv" / "bin" / "manim"
1144
+
1145
+ # Use venv manim if it exists, otherwise fall back to system manim
1146
+ if venv_manim.exists():
1147
+ manim_cmd = str(venv_manim)
1148
+ logger.info(f"Using .venv manim at: {manim_cmd}")
1149
+ else:
1150
+ manim_cmd = "manim"
1151
+ logger.warning(f".venv manim not found at {venv_manim}, using system manim")
1152
+
1153
+ # Build the manim command
1154
+ cmd = [
1155
+ manim_cmd,
1156
+ quality_flag,
1157
+ "--fps",
1158
+ str(frame_rate),
1159
+ "-o",
1160
+ f"{scene_name}.{format_type}",
1161
+ file_path,
1162
+ scene_name,
1163
+ ]
1164
+
1165
+ logger.info(f"Running local Manim command: {' '.join(cmd)}")
1166
+
1167
+ # Execute the command with .venv in PATH
1168
+ env = os.environ.copy()
1169
+ if venv_manim.exists():
1170
+ venv_bin = project_root / ".venv" / "bin"
1171
+ env["PATH"] = f"{venv_bin}:{env.get('PATH', '')}"
1172
+ env["VIRTUAL_ENV"] = str(project_root / ".venv")
1173
+
1174
+ # Execute the command
1175
+ process = await asyncio.create_subprocess_exec(
1176
+ *cmd,
1177
+ stdout=asyncio.subprocess.PIPE,
1178
+ stderr=asyncio.subprocess.PIPE,
1179
+ cwd=output_dir, # Run in output directory
1180
+ env=env,
1181
+ )
1182
+
1183
+ stdout, stderr = await process.communicate()
1184
+
1185
+ if process.returncode != 0:
1186
+ error_msg = f"Local Manim rendering failed:\nSTDOUT: {stdout.decode()}\nSTDERR: {stderr.decode()}"
1187
+ logger.error(error_msg)
1188
+ return {"text": error_msg, "isError": True}
1189
+
1190
+ # Log the stdout for debugging
1191
+ logger.info(f"Manim stdout: {stdout.decode()}")
1192
+ logger.info(f"Manim stderr: {stderr.decode()}")
1193
+
1194
+ # Find the output file
1195
+ # Manim typically outputs to media/videos/{filename}/{quality}/
1196
+ import glob
1197
+
1198
+ # First, let's see what files actually exist in the output directory
1199
+ logger.info(f"Listing all files in output directory: {output_dir}")
1200
+ try:
1201
+ all_files = list(Path(output_dir).rglob("*"))
1202
+ logger.info(f"Found {len(all_files)} files/dirs:")
1203
+ for f in all_files[:50]: # Log first 50 to avoid spam
1204
+ logger.info(f" - {f}")
1205
+ except Exception as list_error:
1206
+ logger.warning(f"Could not list files: {list_error}")
1207
+
1208
+ # Manim outputs to paths like: media/videos/{filename}/{resolution}/SceneName.mp4
1209
+ # where resolution is like: 720p30, 480p15, 1080p60, 2160p60
1210
+ # Quality flags map to resolutions:
1211
+ # -ql (low): 480p15
1212
+ # -qm (medium): 720p30
1213
+ # -qh (high): 1080p60
1214
+ # -qp (production): 2160p60
1215
+
1216
+ # Map quality to likely resolution folder names
1217
+ quality_to_resolution = {
1218
+ "low": ["480p15", "854x480", "480p"],
1219
+ "medium": ["720p30", "1280x720", "720p"],
1220
+ "high": ["1080p60", "1920x1080", "1080p"],
1221
+ "production_quality": ["2160p60", "3840x2160", "2160p"],
1222
+ }
1223
+
1224
+ resolutions = quality_to_resolution.get(quality, ["720p30"])
1225
+
1226
+ output_patterns = []
1227
+
1228
+ # Search with specific resolutions
1229
+ for res in resolutions:
1230
+ output_patterns.extend(
1231
+ [
1232
+ f"{output_dir}/media/videos/*/{res}/{scene_name}.{format_type}",
1233
+ f"{output_dir}/media/videos/**/{res}/{scene_name}.{format_type}",
1234
+ ]
1235
+ )
1236
+
1237
+ # Fallback: search all resolution patterns
1238
+ output_patterns.extend(
1239
+ [
1240
+ f"{output_dir}/media/videos/*/*/{scene_name}.{format_type}",
1241
+ f"{output_dir}/media/videos/**/{scene_name}.{format_type}",
1242
+ f"{output_dir}/videos/*/*/{scene_name}.{format_type}",
1243
+ f"{output_dir}/**/{scene_name}.{format_type}",
1244
+ f"{output_dir}/{scene_name}.{format_type}",
1245
+ ]
1246
+ )
1247
+
1248
+ output_files = []
1249
+ for pattern in output_patterns:
1250
+ logger.info(f"Trying pattern: {pattern}")
1251
+ matches = glob.glob(pattern, recursive=True)
1252
+ if matches:
1253
+ logger.info(f" Found matches: {matches}")
1254
+ output_files.extend(matches)
1255
+ break
1256
+
1257
+ if not output_files:
1258
+ error_msg = f"Could not find rendered output file.\nSearched patterns: {output_patterns}\nStdout: {stdout.decode()}\nStderr: {stderr.decode()}"
1259
+ logger.error(error_msg)
1260
+ return {"text": error_msg, "isError": True}
1261
+
1262
+ output_file = output_files[0] # Take the first match
1263
+ final_output = Path(output_dir) / f"{scene_name}.{format_type}"
1264
+
1265
+ # Move the output file to the expected location
1266
+ import shutil
1267
+
1268
+ shutil.move(output_file, final_output)
1269
+
1270
+ result_msg = (
1271
+ f"Successfully rendered animation locally!\n"
1272
+ f"Scene: {scene_name}\n"
1273
+ f"Output file: {final_output}\n"
1274
+ f"Quality: {quality}\n"
1275
+ f"Format: {format_type}\n"
1276
+ f"File size: {final_output.stat().st_size if final_output.exists() else 'Unknown'} bytes"
1277
+ )
1278
+
1279
+ logger.info(result_msg)
1280
+ return {"text": result_msg, "isError": False}
1281
+
1282
+ except Exception as e:
1283
+ import traceback
1284
+
1285
+ error_details = traceback.format_exc()
1286
+ error_msg = (
1287
+ f"Error during local Manim rendering: {str(e)}\nDetails: {error_details}"
1288
+ )
1289
+ logger.error(error_msg)
1290
+ return {"text": error_msg, "isError": True}
1291
+
1292
+
1293
+ async def process_video_with_ffmpeg(arguments: Dict[str, Any]) -> CallToolResult:
1294
+ """Process video using FFmpeg."""
1295
+ input_files = arguments["input_files"]
1296
+ output_file = arguments["output_file"]
1297
+ ffmpeg_args = arguments.get("ffmpeg_args", [])
1298
+
1299
+ try:
1300
+ # Ensure output directory exists
1301
+ Path(output_file).parent.mkdir(parents=True, exist_ok=True)
1302
+
1303
+ # Build FFmpeg command
1304
+ cmd = ["ffmpeg"]
1305
+
1306
+ # Add input files
1307
+ for input_file in input_files:
1308
+ cmd.extend(["-i", input_file])
1309
+
1310
+ # Add additional arguments
1311
+ cmd.extend(ffmpeg_args)
1312
+
1313
+ # Add output file
1314
+ cmd.append(output_file)
1315
+
1316
+ logger.info(f"Running FFmpeg command: {' '.join(cmd)}")
1317
+
1318
+ process = await asyncio.create_subprocess_exec(
1319
+ *cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
1320
+ )
1321
+
1322
+ stdout, stderr = await process.communicate()
1323
+
1324
+ if process.returncode != 0:
1325
+ error_msg = f"FFmpeg processing failed: {stderr.decode()}"
1326
+ logger.error(error_msg)
1327
+ return CallToolResult(
1328
+ content=[TextContent(type="text", text=error_msg)], isError=True
1329
+ )
1330
+
1331
+ result_msg = f"Successfully processed video with FFmpeg: {output_file}"
1332
+ logger.info(result_msg)
1333
+
1334
+ return CallToolResult(content=[TextContent(type="text", text=result_msg)])
1335
+
1336
+ except Exception as e:
1337
+ error_msg = f"Error during FFmpeg processing: {str(e)}"
1338
+ logger.error(error_msg)
1339
+ return CallToolResult(
1340
+ content=[TextContent(type="text", text=error_msg)], isError=True
1341
+ )
1342
+
1343
+
1344
+ async def merge_video_audio(arguments: Dict[str, Any]) -> CallToolResult:
1345
+ """Merge video and audio files."""
1346
+ video_file = arguments["video_file"]
1347
+ audio_file = arguments["audio_file"]
1348
+ output_file = arguments["output_file"]
1349
+
1350
+ try:
1351
+ # Ensure output directory exists
1352
+ Path(output_file).parent.mkdir(parents=True, exist_ok=True)
1353
+
1354
+ # Build FFmpeg merge command
1355
+ cmd = [
1356
+ "ffmpeg",
1357
+ "-i",
1358
+ video_file,
1359
+ "-i",
1360
+ audio_file,
1361
+ "-c:v",
1362
+ "copy",
1363
+ "-c:a",
1364
+ "aac",
1365
+ "-shortest",
1366
+ "-y", # Overwrite output file
1367
+ output_file,
1368
+ ]
1369
+
1370
+ logger.info(f"Merging video and audio: {' '.join(cmd)}")
1371
+
1372
+ process = await asyncio.create_subprocess_exec(
1373
+ *cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
1374
+ )
1375
+
1376
+ stdout, stderr = await process.communicate()
1377
+
1378
+ if process.returncode != 0:
1379
+ error_msg = f"Video/audio merge failed: {stderr.decode()}"
1380
+ logger.error(error_msg)
1381
+ return CallToolResult(
1382
+ content=[TextContent(type="text", text=error_msg)], isError=True
1383
+ )
1384
+
1385
+ result_msg = f"Successfully merged video and audio: {output_file}"
1386
+ logger.info(result_msg)
1387
+
1388
+ return CallToolResult(content=[TextContent(type="text", text=result_msg)])
1389
+
1390
+ except Exception as e:
1391
+ error_msg = f"Error during video/audio merge: {str(e)}"
1392
+ logger.error(error_msg)
1393
+ return CallToolResult(
1394
+ content=[TextContent(type="text", text=error_msg)], isError=True
1395
+ )
1396
+
1397
+
1398
+ async def check_file_exists(arguments: Dict[str, Any]) -> CallToolResult:
1399
+ """Check if a file exists and return its metadata."""
1400
+ filepath = arguments["filepath"]
1401
+
1402
+ try:
1403
+ path = Path(filepath)
1404
+
1405
+ if not path.exists():
1406
+ return CallToolResult(
1407
+ content=[
1408
+ TextContent(type="text", text=f"File does not exist: {filepath}")
1409
+ ],
1410
+ isError=True,
1411
+ )
1412
+
1413
+ stat = path.stat()
1414
+
1415
+ metadata = {
1416
+ "filepath": str(path.absolute()),
1417
+ "exists": True,
1418
+ "is_file": path.is_file(),
1419
+ "is_directory": path.is_dir(),
1420
+ "size_bytes": stat.st_size,
1421
+ "created": stat.st_ctime,
1422
+ "modified": stat.st_mtime,
1423
+ }
1424
+
1425
+ return CallToolResult(
1426
+ content=[
1427
+ TextContent(
1428
+ type="text", text=f"File metadata: {json.dumps(metadata, indent=2)}"
1429
+ )
1430
+ ]
1431
+ )
1432
+
1433
+ except Exception as e:
1434
+ return CallToolResult(
1435
+ content=[TextContent(type="text", text=f"Error checking file: {str(e)}")],
1436
+ isError=True,
1437
+ )
1438
+
1439
+
1440
+ async def main():
1441
+ """Main entry point for the renderer MCP server."""
1442
+ # Set up logging
1443
+ logging.basicConfig(
1444
+ level=logging.INFO,
1445
+ format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
1446
+ )
1447
+
1448
+ async with stdio_server() as (read_stream, write_stream):
1449
+ await server.run(
1450
+ read_stream,
1451
+ write_stream,
1452
+ InitializationOptions(
1453
+ server_name="neuroanim-renderer",
1454
+ server_version="0.1.0",
1455
+ capabilities=server.get_capabilities(
1456
+ notification_options=NotificationOptions(),
1457
+ experimental_capabilities={},
1458
+ ),
1459
+ ),
1460
+ )
1461
+
1462
+
1463
+ if __name__ == "__main__":
1464
+ asyncio.run(main())
neuroanim/__init__.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ NeuroAnim - LangGraph-based Animation Pipeline
3
+
4
+ This package provides a modular, graph-based workflow for generating
5
+ educational STEM animations using Manim, AI models, and TTS.
6
+
7
+ The pipeline uses LangGraph to coordinate multiple agent nodes that handle:
8
+ - Concept planning
9
+ - Code generation
10
+ - Rendering
11
+ - Audio generation
12
+ - Video processing
13
+ """
14
+
15
+ from neuroanim.graph.state import AnimationState, create_initial_state
16
+ from neuroanim.graph.workflow import run_animation_pipeline
17
+
18
+ __version__ = "0.1.0"
19
+ __all__ = ["run_animation_pipeline", "create_initial_state", "AnimationState"]
neuroanim/agents/__init__.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ NeuroAnim Agents Module
3
+
4
+ This module contains agent node implementations for the LangGraph workflow.
5
+ Each agent node handles a specific step in the animation generation pipeline.
6
+ """
7
+
8
+ from neuroanim.agents.nodes import AnimationNodes
9
+
10
+ __all__ = ["AnimationNodes"]
neuroanim/agents/nodes.py ADDED
@@ -0,0 +1,574 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LangGraph Agent Nodes for NeuroAnim Pipeline
3
+
4
+ This module contains all the node functions used in the LangGraph workflow.
5
+ Each node represents a step in the animation generation pipeline and
6
+ communicates with the Manim MCP server to perform its task.
7
+ """
8
+
9
+ import ast
10
+ import json
11
+ import logging
12
+ import re
13
+ import tempfile
14
+ import time
15
+ from pathlib import Path
16
+ from typing import Any, Dict
17
+
18
+ from mcp import ClientSession
19
+
20
+ from neuroanim.graph.state import AnimationState
21
+ from utils.tts import TTSGenerator
22
+
23
+ logger = logging.getLogger(__name__)
24
+
25
+
26
+ class AnimationNodes:
27
+ """Container for all animation pipeline nodes."""
28
+
29
+ def __init__(
30
+ self,
31
+ mcp_session: ClientSession,
32
+ tts_generator: TTSGenerator,
33
+ work_dir: Path,
34
+ output_dir: Path,
35
+ ):
36
+ """
37
+ Initialize the animation nodes.
38
+
39
+ Args:
40
+ mcp_session: MCP client session for tool calls
41
+ tts_generator: TTS generator instance
42
+ work_dir: Working directory for temporary files
43
+ output_dir: Output directory for final files
44
+ """
45
+ self.mcp_session = mcp_session
46
+ self.tts_generator = tts_generator
47
+ self.work_dir = work_dir
48
+ self.output_dir = output_dir
49
+
50
+ async def call_mcp_tool(
51
+ self, tool_name: str, arguments: Dict[str, Any]
52
+ ) -> Dict[str, Any]:
53
+ """
54
+ Call an MCP tool and return the result.
55
+
56
+ Args:
57
+ tool_name: Name of the tool to call
58
+ arguments: Arguments to pass to the tool
59
+
60
+ Returns:
61
+ Dictionary with 'text' and 'isError' keys
62
+ """
63
+ try:
64
+ result = await self.mcp_session.call_tool(tool_name, arguments)
65
+
66
+ if hasattr(result, "content") and result.content:
67
+ content = result.content[0]
68
+ if hasattr(content, "text"):
69
+ return {
70
+ "text": content.text,
71
+ "isError": getattr(result, "isError", False),
72
+ }
73
+
74
+ return {"text": str(result), "isError": False}
75
+
76
+ except Exception as e:
77
+ logger.error(f"MCP tool call failed for {tool_name}: {e}")
78
+ return {"text": f"Tool call failed: {str(e)}", "isError": True}
79
+
80
+ async def initialize_node(self, state: AnimationState) -> AnimationState:
81
+ """
82
+ Initialize the pipeline state and working directories.
83
+
84
+ Args:
85
+ state: Current animation state
86
+
87
+ Returns:
88
+ Updated state with initialized paths and metadata
89
+ """
90
+ logger.info("🚀 Initializing animation pipeline")
91
+
92
+ state["start_time"] = time.time()
93
+ state["work_dir"] = str(self.work_dir)
94
+ state["output_dir"] = str(self.output_dir)
95
+ state["current_step"] = "initialization"
96
+ state["completed_steps"].append("initialization")
97
+
98
+ logger.info(f"Working directory: {self.work_dir}")
99
+ logger.info(f"Output directory: {self.output_dir}")
100
+
101
+ return state
102
+
103
+ async def plan_concept_node(self, state: AnimationState) -> AnimationState:
104
+ """
105
+ Plan the animation concept using AI.
106
+
107
+ Args:
108
+ state: Current animation state
109
+
110
+ Returns:
111
+ Updated state with concept plan
112
+ """
113
+ logger.info("📋 Planning concept...")
114
+ state["current_step"] = "concept_planning"
115
+
116
+ try:
117
+ result = await self.call_mcp_tool(
118
+ "plan_concept",
119
+ {
120
+ "topic": state["topic"],
121
+ "target_audience": state["target_audience"],
122
+ "animation_length_minutes": state["animation_length_minutes"],
123
+ },
124
+ )
125
+
126
+ if result["isError"]:
127
+ state["errors"].append(f"Concept planning failed: {result['text']}")
128
+ return state
129
+
130
+ concept_plan = result["text"]
131
+ state["concept_plan"] = concept_plan
132
+
133
+ # Try to parse JSON from the concept plan
134
+ try:
135
+ # Extract JSON if it's embedded in the response
136
+ json_match = re.search(r"\{.*\}", concept_plan, re.DOTALL)
137
+ if json_match:
138
+ plan_data = json.loads(json_match.group())
139
+ state["learning_objectives"] = plan_data.get(
140
+ "learning_objectives", []
141
+ )
142
+ state["visual_metaphors"] = plan_data.get("visual_metaphors", [])
143
+ state["scene_flow"] = plan_data.get("scene_flow", [])
144
+ except json.JSONDecodeError:
145
+ logger.warning("Could not parse concept plan as JSON")
146
+
147
+ state["completed_steps"].append("concept_planning")
148
+ logger.info("✅ Concept planning completed")
149
+
150
+ except Exception as e:
151
+ logger.error(f"Concept planning failed: {e}")
152
+ state["errors"].append(f"Concept planning error: {str(e)}")
153
+
154
+ return state
155
+
156
+ async def generate_narration_node(self, state: AnimationState) -> AnimationState:
157
+ """
158
+ Generate narration script for the animation.
159
+
160
+ Args:
161
+ state: Current animation state
162
+
163
+ Returns:
164
+ Updated state with narration text
165
+ """
166
+ logger.info("🎙️ Generating narration...")
167
+ state["current_step"] = "narration_generation"
168
+
169
+ try:
170
+ duration_seconds = int(state["animation_length_minutes"] * 60)
171
+
172
+ result = await self.call_mcp_tool(
173
+ "generate_narration",
174
+ {
175
+ "concept": state["topic"],
176
+ "scene_description": state.get("concept_plan", ""),
177
+ "target_audience": state["target_audience"],
178
+ "duration_seconds": duration_seconds,
179
+ },
180
+ )
181
+
182
+ if result["isError"]:
183
+ state["errors"].append(f"Narration generation failed: {result['text']}")
184
+ return state
185
+
186
+ # Extract narration text from response
187
+ narration_text = result["text"]
188
+ if "Narration Script:" in narration_text:
189
+ narration_text = narration_text.split("Narration Script:")[-1].strip()
190
+
191
+ state["narration_text"] = narration_text
192
+ state["narration_duration"] = duration_seconds
193
+ state["completed_steps"].append("narration_generation")
194
+
195
+ logger.info("✅ Narration generation completed")
196
+
197
+ except Exception as e:
198
+ logger.error(f"Narration generation failed: {e}")
199
+ state["errors"].append(f"Narration generation error: {str(e)}")
200
+
201
+ return state
202
+
203
+ async def generate_code_node(self, state: AnimationState) -> AnimationState:
204
+ """
205
+ Generate Manim code for the animation.
206
+
207
+ Args:
208
+ state: Current animation state
209
+
210
+ Returns:
211
+ Updated state with generated code
212
+ """
213
+ logger.info("💻 Generating Manim code...")
214
+ state["current_step"] = "code_generation"
215
+
216
+ try:
217
+ # Check if this is a retry
218
+ previous_code = None
219
+ error_message = None
220
+ if state["code_generation_attempts"] > 0:
221
+ previous_code = state.get("manim_code")
222
+ if state.get("previous_code_errors"):
223
+ error_message = state["previous_code_errors"][-1]
224
+
225
+ state["code_generation_attempts"] += 1
226
+
227
+ arguments = {
228
+ "concept": state["topic"],
229
+ "scene_description": state.get("concept_plan", ""),
230
+ "visual_elements": ["text", "shapes", "animations"],
231
+ }
232
+
233
+ if previous_code and error_message:
234
+ arguments["previous_code"] = previous_code
235
+ arguments["error_message"] = error_message
236
+ logger.info(
237
+ f"Retrying code generation (attempt {state['code_generation_attempts']})"
238
+ )
239
+
240
+ result = await self.call_mcp_tool("generate_manim_code", arguments)
241
+
242
+ if result["isError"]:
243
+ state["errors"].append(f"Code generation failed: {result['text']}")
244
+ return state
245
+
246
+ # Extract Python code from response
247
+ code_text = result["text"]
248
+ manim_code = self._extract_python_code(code_text)
249
+
250
+ # Validate syntax
251
+ validation_error = self._validate_python_syntax(manim_code)
252
+ if validation_error:
253
+ logger.warning(f"Code validation failed: {validation_error}")
254
+ if not state.get("previous_code_errors"):
255
+ state["previous_code_errors"] = []
256
+ state["previous_code_errors"].append(validation_error)
257
+ state["warnings"].append(f"Code validation issue: {validation_error}")
258
+ else:
259
+ logger.info("✅ Code validation passed")
260
+
261
+ state["manim_code"] = manim_code
262
+ state["scene_name"] = self._extract_scene_name(manim_code)
263
+ state["completed_steps"].append("code_generation")
264
+
265
+ logger.info(f"✅ Code generation completed (scene: {state['scene_name']})")
266
+
267
+ except Exception as e:
268
+ logger.error(f"Code generation failed: {e}")
269
+ state["errors"].append(f"Code generation error: {str(e)}")
270
+
271
+ return state
272
+
273
+ async def write_file_node(self, state: AnimationState) -> AnimationState:
274
+ """
275
+ Write the Manim code to a file.
276
+
277
+ Args:
278
+ state: Current animation state
279
+
280
+ Returns:
281
+ Updated state with file path
282
+ """
283
+ logger.info("📝 Writing Manim file...")
284
+ state["current_step"] = "file_writing"
285
+
286
+ try:
287
+ manim_file = Path(state["work_dir"]) / "animation.py"
288
+ state["manim_file_path"] = str(manim_file)
289
+
290
+ result = await self.call_mcp_tool(
291
+ "write_manim_file",
292
+ {"filepath": str(manim_file), "code": state["manim_code"]},
293
+ )
294
+
295
+ if result["isError"]:
296
+ state["errors"].append(f"File writing failed: {result['text']}")
297
+ return state
298
+
299
+ state["completed_steps"].append("file_writing")
300
+ logger.info(f"✅ Manim file written to {manim_file}")
301
+
302
+ except Exception as e:
303
+ logger.error(f"File writing failed: {e}")
304
+ state["errors"].append(f"File writing error: {str(e)}")
305
+
306
+ return state
307
+
308
+ async def render_animation_node(self, state: AnimationState) -> AnimationState:
309
+ """
310
+ Render the Manim animation.
311
+
312
+ Args:
313
+ state: Current animation state
314
+
315
+ Returns:
316
+ Updated state with rendered video path
317
+ """
318
+ logger.info("🎬 Rendering animation...")
319
+ state["current_step"] = "rendering"
320
+
321
+ try:
322
+ result = await self.call_mcp_tool(
323
+ "render_manim_animation",
324
+ {
325
+ "scene_name": state["scene_name"],
326
+ "file_path": state["manim_file_path"],
327
+ "output_dir": state["work_dir"],
328
+ "quality": state["rendering_quality"],
329
+ "format": state["rendering_format"],
330
+ "frame_rate": state["frame_rate"],
331
+ },
332
+ )
333
+
334
+ if result["isError"]:
335
+ state["errors"].append(f"Rendering failed: {result['text']}")
336
+ return state
337
+
338
+ # Find the rendered video file
339
+ video_file = (
340
+ Path(state["work_dir"])
341
+ / f"{state['scene_name']}.{state['rendering_format']}"
342
+ )
343
+ if not video_file.exists():
344
+ state["errors"].append(f"Rendered video not found at {video_file}")
345
+ return state
346
+
347
+ state["video_file_path"] = str(video_file)
348
+ state["completed_steps"].append("rendering")
349
+
350
+ logger.info(f"✅ Animation rendered: {video_file}")
351
+
352
+ except Exception as e:
353
+ logger.error(f"Rendering failed: {e}")
354
+ state["errors"].append(f"Rendering error: {str(e)}")
355
+
356
+ return state
357
+
358
+ async def generate_audio_node(self, state: AnimationState) -> AnimationState:
359
+ """
360
+ Generate speech audio from narration text.
361
+
362
+ Args:
363
+ state: Current animation state
364
+
365
+ Returns:
366
+ Updated state with audio file path
367
+ """
368
+ logger.info("🔊 Generating speech audio...")
369
+ state["current_step"] = "audio_generation"
370
+
371
+ try:
372
+ audio_file = Path(state["work_dir"]) / "narration.mp3"
373
+ state["audio_file_path"] = str(audio_file)
374
+
375
+ # Use TTS generator with automatic fallback
376
+ tts_result = await self.tts_generator.generate_speech(
377
+ text=state["narration_text"], output_path=audio_file, voice="rachel"
378
+ )
379
+
380
+ logger.info(f"Audio generated with {tts_result['provider']}")
381
+
382
+ # Validate audio file
383
+ validation = self.tts_generator.validate_audio_file(audio_file)
384
+ if not validation["valid"]:
385
+ state["warnings"].append(
386
+ f"Audio validation warning: {validation.get('error', 'Unknown issue')}"
387
+ )
388
+ else:
389
+ logger.info(
390
+ f"Audio validated: {validation.get('duration', 'N/A')}s, {validation.get('size', 0)} bytes"
391
+ )
392
+
393
+ state["completed_steps"].append("audio_generation")
394
+ logger.info(f"✅ Audio generated: {audio_file}")
395
+
396
+ except Exception as e:
397
+ logger.error(f"Audio generation failed: {e}")
398
+ state["errors"].append(f"Audio generation error: {str(e)}")
399
+
400
+ return state
401
+
402
+ async def merge_video_audio_node(self, state: AnimationState) -> AnimationState:
403
+ """
404
+ Merge video and audio into final output.
405
+
406
+ Args:
407
+ state: Current animation state
408
+
409
+ Returns:
410
+ Updated state with final output path
411
+ """
412
+ logger.info("🎞️ Merging video and audio...")
413
+ state["current_step"] = "video_audio_merge"
414
+
415
+ try:
416
+ final_output = Path(state["output_dir"]) / state["output_filename"]
417
+ state["final_output_path"] = str(final_output)
418
+
419
+ result = await self.call_mcp_tool(
420
+ "merge_video_audio",
421
+ {
422
+ "video_file": state["video_file_path"],
423
+ "audio_file": state["audio_file_path"],
424
+ "output_file": str(final_output),
425
+ },
426
+ )
427
+
428
+ if result["isError"]:
429
+ state["errors"].append(f"Video/audio merge failed: {result['text']}")
430
+ return state
431
+
432
+ state["completed_steps"].append("video_audio_merge")
433
+ logger.info(f"✅ Video and audio merged: {final_output}")
434
+
435
+ except Exception as e:
436
+ logger.error(f"Video/audio merge failed: {e}")
437
+ state["errors"].append(f"Merge error: {str(e)}")
438
+
439
+ return state
440
+
441
+ async def generate_quiz_node(self, state: AnimationState) -> AnimationState:
442
+ """
443
+ Generate quiz questions for the topic.
444
+
445
+ Args:
446
+ state: Current animation state
447
+
448
+ Returns:
449
+ Updated state with quiz content
450
+ """
451
+ logger.info("❓ Generating quiz...")
452
+ state["current_step"] = "quiz_generation"
453
+
454
+ try:
455
+ result = await self.call_mcp_tool(
456
+ "generate_quiz",
457
+ {
458
+ "concept": state["topic"],
459
+ "difficulty": "medium",
460
+ "num_questions": 3,
461
+ "question_types": ["multiple_choice"],
462
+ },
463
+ )
464
+
465
+ if result["isError"]:
466
+ state["warnings"].append(f"Quiz generation failed: {result['text']}")
467
+ state["quiz_content"] = "Quiz generation failed"
468
+ else:
469
+ state["quiz_content"] = result["text"]
470
+ # Try to parse quiz questions
471
+ try:
472
+ json_match = re.search(r"\[.*\]", result["text"], re.DOTALL)
473
+ if json_match:
474
+ state["quiz_questions"] = json.loads(json_match.group())
475
+ except json.JSONDecodeError:
476
+ logger.warning("Could not parse quiz as JSON")
477
+
478
+ state["completed_steps"].append("quiz_generation")
479
+ logger.info("✅ Quiz generation completed")
480
+
481
+ except Exception as e:
482
+ logger.error(f"Quiz generation failed: {e}")
483
+ state["warnings"].append(f"Quiz generation error: {str(e)}")
484
+ state["quiz_content"] = "Quiz generation failed"
485
+
486
+ return state
487
+
488
+ async def finalize_node(self, state: AnimationState) -> AnimationState:
489
+ """
490
+ Finalize the pipeline and compute metadata.
491
+
492
+ Args:
493
+ state: Current animation state
494
+
495
+ Returns:
496
+ Final state with metadata
497
+ """
498
+ logger.info("🏁 Finalizing pipeline...")
499
+ state["current_step"] = "finalization"
500
+
501
+ state["end_time"] = time.time()
502
+ state["total_duration"] = state["end_time"] - state["start_time"]
503
+
504
+ # Check if pipeline succeeded
505
+ if not state["errors"] and state.get("final_output_path"):
506
+ state["success"] = True
507
+ logger.info(
508
+ f"✅ Pipeline completed successfully in {state['total_duration']:.2f}s"
509
+ )
510
+ else:
511
+ state["success"] = False
512
+ logger.error(f"❌ Pipeline failed with {len(state['errors'])} error(s)")
513
+
514
+ state["completed_steps"].append("finalization")
515
+
516
+ return state
517
+
518
+ # Helper methods
519
+
520
+ def _extract_python_code(self, response_text: str) -> str:
521
+ """Extract Python code from markdown response."""
522
+ if "```python" in response_text:
523
+ start = response_text.find("```python") + 9
524
+ end = response_text.find("```", start)
525
+ if end == -1:
526
+ end = len(response_text)
527
+ return response_text[start:end].strip()
528
+ elif "```" in response_text:
529
+ start = response_text.find("```") + 3
530
+ end = response_text.find("```", start)
531
+ if end == -1:
532
+ end = len(response_text)
533
+ return response_text[start:end].strip()
534
+ else:
535
+ return response_text.strip()
536
+
537
+ def _extract_scene_name(self, code: str) -> str:
538
+ """Extract the scene class name from Manim code."""
539
+ try:
540
+ tree = ast.parse(code)
541
+ for node in ast.walk(tree):
542
+ if isinstance(node, ast.ClassDef):
543
+ # Check if it inherits from Scene or MovingCameraScene
544
+ for base in node.bases:
545
+ if isinstance(base, ast.Name) and base.id in [
546
+ "Scene",
547
+ "MovingCameraScene",
548
+ "ThreeDScene",
549
+ ]:
550
+ return node.name
551
+ except SyntaxError:
552
+ pass
553
+
554
+ # Fallback: use regex
555
+ match = re.search(r"class\s+(\w+)\s*\(.*Scene.*\):", code)
556
+ if match:
557
+ return match.group(1)
558
+
559
+ return "GenScene"
560
+
561
+ def _validate_python_syntax(self, code: str) -> str | None:
562
+ """
563
+ Validate Python code syntax.
564
+
565
+ Returns:
566
+ Error message if validation fails, None if valid
567
+ """
568
+ try:
569
+ ast.parse(code)
570
+ return None
571
+ except SyntaxError as e:
572
+ return f"Syntax error at line {e.lineno}: {e.msg}"
573
+ except Exception as e:
574
+ return f"Validation error: {str(e)}"
neuroanim/graph/__init__.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ NeuroAnim Graph Module
3
+
4
+ This module contains the LangGraph workflow definition and state management
5
+ for the animation generation pipeline.
6
+ """
7
+
8
+ from neuroanim.graph.state import AnimationState, create_initial_state
9
+ from neuroanim.graph.workflow import create_animation_workflow, run_animation_pipeline
10
+
11
+ __all__ = [
12
+ "AnimationState",
13
+ "create_initial_state",
14
+ "create_animation_workflow",
15
+ "run_animation_pipeline",
16
+ ]
neuroanim/graph/state.py ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LangGraph State Definition for NeuroAnim Pipeline
3
+
4
+ This module defines the state structure that flows through the animation
5
+ generation workflow. The state is updated by each node in the graph.
6
+ """
7
+
8
+ from typing import Any, Dict, List, Optional, TypedDict
9
+
10
+
11
+ class AnimationState(TypedDict, total=False):
12
+ """
13
+ State for the animation generation pipeline.
14
+
15
+ This state is passed through all nodes in the LangGraph workflow.
16
+ Each node reads from and writes to this state to coordinate the
17
+ animation generation process.
18
+ """
19
+
20
+ # Input Parameters
21
+ topic: str
22
+ target_audience: str
23
+ animation_length_minutes: float
24
+ output_filename: str
25
+
26
+ # Concept Planning
27
+ concept_plan: Optional[str]
28
+ learning_objectives: Optional[List[str]]
29
+ visual_metaphors: Optional[List[str]]
30
+ scene_flow: Optional[List[Dict[str, str]]]
31
+
32
+ # Narration
33
+ narration_text: Optional[str]
34
+ narration_duration: Optional[float]
35
+
36
+ # Code Generation
37
+ manim_code: Optional[str]
38
+ scene_name: Optional[str]
39
+ code_generation_attempts: int
40
+ previous_code_errors: Optional[List[str]]
41
+
42
+ # File Paths
43
+ work_dir: Optional[str]
44
+ output_dir: Optional[str]
45
+ manim_file_path: Optional[str]
46
+ video_file_path: Optional[str]
47
+ audio_file_path: Optional[str]
48
+ final_output_path: Optional[str]
49
+
50
+ # Rendering
51
+ rendering_quality: str
52
+ rendering_format: str
53
+ frame_rate: int
54
+
55
+ # Analysis & Feedback
56
+ frame_analysis: Optional[str]
57
+ visual_quality_score: Optional[float]
58
+ needs_refinement: bool
59
+ refinement_feedback: Optional[str]
60
+
61
+ # Quiz
62
+ quiz_content: Optional[str]
63
+ quiz_questions: Optional[List[Dict[str, Any]]]
64
+
65
+ # Error Handling
66
+ errors: List[str]
67
+ warnings: List[str]
68
+ current_step: str
69
+ retry_count: Dict[str, int]
70
+ max_retries: int
71
+
72
+ # Status
73
+ success: bool
74
+ completed_steps: List[str]
75
+
76
+ # Metadata
77
+ start_time: Optional[float]
78
+ end_time: Optional[float]
79
+ total_duration: Optional[float]
80
+
81
+
82
+ def create_initial_state(
83
+ topic: str,
84
+ target_audience: str = "general",
85
+ animation_length_minutes: float = 2.0,
86
+ output_filename: str = "animation.mp4",
87
+ rendering_quality: str = "medium",
88
+ rendering_format: str = "mp4",
89
+ frame_rate: int = 30,
90
+ max_retries: int = 3,
91
+ ) -> AnimationState:
92
+ """
93
+ Create the initial state for the animation pipeline.
94
+
95
+ Args:
96
+ topic: The STEM topic to animate
97
+ target_audience: Target audience level
98
+ animation_length_minutes: Desired animation length
99
+ output_filename: Name for the final output file
100
+ rendering_quality: Manim rendering quality
101
+ rendering_format: Output video format
102
+ frame_rate: Video frame rate
103
+ max_retries: Maximum retry attempts per step
104
+
105
+ Returns:
106
+ Initial AnimationState with default values
107
+ """
108
+ return AnimationState(
109
+ # Input parameters
110
+ topic=topic,
111
+ target_audience=target_audience,
112
+ animation_length_minutes=animation_length_minutes,
113
+ output_filename=output_filename,
114
+ # Initialize optional fields
115
+ concept_plan=None,
116
+ learning_objectives=None,
117
+ visual_metaphors=None,
118
+ scene_flow=None,
119
+ narration_text=None,
120
+ narration_duration=None,
121
+ manim_code=None,
122
+ scene_name=None,
123
+ code_generation_attempts=0,
124
+ previous_code_errors=None,
125
+ # File paths
126
+ work_dir=None,
127
+ output_dir=None,
128
+ manim_file_path=None,
129
+ video_file_path=None,
130
+ audio_file_path=None,
131
+ final_output_path=None,
132
+ # Rendering config
133
+ rendering_quality=rendering_quality,
134
+ rendering_format=rendering_format,
135
+ frame_rate=frame_rate,
136
+ # Analysis
137
+ frame_analysis=None,
138
+ visual_quality_score=None,
139
+ needs_refinement=False,
140
+ refinement_feedback=None,
141
+ # Quiz
142
+ quiz_content=None,
143
+ quiz_questions=None,
144
+ # Error handling
145
+ errors=[],
146
+ warnings=[],
147
+ current_step="initialization",
148
+ retry_count={},
149
+ max_retries=max_retries,
150
+ # Status
151
+ success=False,
152
+ completed_steps=[],
153
+ # Metadata
154
+ start_time=None,
155
+ end_time=None,
156
+ total_duration=None,
157
+ )
neuroanim/graph/workflow.py ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LangGraph Workflow Definition for NeuroAnim Pipeline
3
+
4
+ This module defines the complete animation generation workflow using LangGraph.
5
+ The workflow coordinates multiple agent nodes to transform a STEM topic into
6
+ an educational animation with narration.
7
+ """
8
+
9
+ import logging
10
+ import tempfile
11
+ from pathlib import Path
12
+ from typing import Any, Dict
13
+
14
+ from langgraph.graph import END, StateGraph
15
+
16
+ from neuroanim.agents.nodes import AnimationNodes
17
+ from neuroanim.graph.state import AnimationState, create_initial_state
18
+
19
+ logger = logging.getLogger(__name__)
20
+
21
+
22
+ def should_retry_code_generation(state: AnimationState) -> str:
23
+ """
24
+ Determine if code generation should be retried.
25
+
26
+ Args:
27
+ state: Current animation state
28
+
29
+ Returns:
30
+ Next node name: "generate_code" for retry, "write_file" to proceed
31
+ """
32
+ if (
33
+ state.get("previous_code_errors")
34
+ and state["code_generation_attempts"] < state["max_retries"]
35
+ ):
36
+ logger.info(
37
+ f"Code has errors, retrying (attempt {state['code_generation_attempts']}/{state['max_retries']})"
38
+ )
39
+ return "generate_code"
40
+ return "write_file"
41
+
42
+
43
+ def should_continue_after_error(state: AnimationState) -> str:
44
+ """
45
+ Determine if pipeline should continue after errors.
46
+
47
+ Args:
48
+ state: Current animation state
49
+
50
+ Returns:
51
+ Next node name or END
52
+ """
53
+ if state["errors"]:
54
+ logger.error(f"Pipeline encountered {len(state['errors'])} error(s), stopping")
55
+ return "finalize"
56
+ return "next"
57
+
58
+
59
+ def create_animation_workflow(nodes: AnimationNodes) -> StateGraph:
60
+ """
61
+ Create the LangGraph workflow for animation generation.
62
+
63
+ The workflow follows this sequence:
64
+ 1. Initialize - Set up directories and state
65
+ 2. Plan Concept - Generate animation concept plan
66
+ 3. Generate Narration - Create narration script
67
+ 4. Generate Code - Create Manim code (with retry logic)
68
+ 5. Write File - Save code to file
69
+ 6. Render Animation - Execute Manim rendering
70
+ 7. Generate Audio - Create speech audio
71
+ 8. Merge Video/Audio - Combine into final output
72
+ 9. Generate Quiz - Create assessment questions
73
+ 10. Finalize - Compute metadata and complete
74
+
75
+ Args:
76
+ nodes: AnimationNodes instance with all node functions
77
+
78
+ Returns:
79
+ Compiled StateGraph ready for execution
80
+ """
81
+ # Create the graph
82
+ workflow = StateGraph(AnimationState)
83
+
84
+ # Add all nodes
85
+ workflow.add_node("initialize", nodes.initialize_node)
86
+ workflow.add_node("plan_concept", nodes.plan_concept_node)
87
+ workflow.add_node("generate_narration", nodes.generate_narration_node)
88
+ workflow.add_node("generate_code", nodes.generate_code_node)
89
+ workflow.add_node("write_file", nodes.write_file_node)
90
+ workflow.add_node("render_animation", nodes.render_animation_node)
91
+ workflow.add_node("generate_audio", nodes.generate_audio_node)
92
+ workflow.add_node("merge_video_audio", nodes.merge_video_audio_node)
93
+ workflow.add_node("generate_quiz", nodes.generate_quiz_node)
94
+ workflow.add_node("finalize", nodes.finalize_node)
95
+
96
+ # Set entry point
97
+ workflow.set_entry_point("initialize")
98
+
99
+ # Define the workflow edges (sequential flow with error checking)
100
+
101
+ # Initialize -> Plan Concept
102
+ workflow.add_edge("initialize", "plan_concept")
103
+
104
+ # Plan Concept -> Check for errors -> Generate Narration
105
+ workflow.add_conditional_edges(
106
+ "plan_concept",
107
+ lambda state: "generate_narration" if not state["errors"] else "finalize",
108
+ )
109
+
110
+ # Generate Narration -> Check for errors -> Generate Code
111
+ workflow.add_conditional_edges(
112
+ "generate_narration",
113
+ lambda state: "generate_code" if not state["errors"] else "finalize",
114
+ )
115
+
116
+ # Generate Code -> Check syntax -> Retry or Write File
117
+ workflow.add_conditional_edges(
118
+ "generate_code",
119
+ should_retry_code_generation,
120
+ )
121
+
122
+ # Write File -> Check for errors -> Render
123
+ workflow.add_conditional_edges(
124
+ "write_file",
125
+ lambda state: "render_animation" if not state["errors"] else "finalize",
126
+ )
127
+
128
+ # Render -> Check for errors -> Generate Audio
129
+ workflow.add_conditional_edges(
130
+ "render_animation",
131
+ lambda state: "generate_audio" if not state["errors"] else "finalize",
132
+ )
133
+
134
+ # Generate Audio -> Check for errors -> Merge
135
+ workflow.add_conditional_edges(
136
+ "generate_audio",
137
+ lambda state: "merge_video_audio" if not state["errors"] else "finalize",
138
+ )
139
+
140
+ # Merge -> Check for errors -> Generate Quiz
141
+ workflow.add_conditional_edges(
142
+ "merge_video_audio",
143
+ lambda state: "generate_quiz" if not state["errors"] else "finalize",
144
+ )
145
+
146
+ # Generate Quiz -> Finalize (quiz errors are non-critical)
147
+ workflow.add_edge("generate_quiz", "finalize")
148
+
149
+ # Finalize -> END
150
+ workflow.add_edge("finalize", END)
151
+
152
+ # Compile the graph
153
+ return workflow.compile()
154
+
155
+
156
+ async def run_animation_pipeline(
157
+ mcp_session: Any,
158
+ tts_generator: Any,
159
+ topic: str,
160
+ target_audience: str = "general",
161
+ animation_length_minutes: float = 2.0,
162
+ output_filename: str = "animation.mp4",
163
+ rendering_quality: str = "medium",
164
+ max_retries: int = 3,
165
+ ) -> Dict[str, Any]:
166
+ """
167
+ Run the complete animation generation pipeline.
168
+
169
+ This is the main entry point for generating animations. It creates
170
+ the workflow, initializes the state, and executes all steps.
171
+
172
+ Args:
173
+ mcp_session: MCP client session
174
+ tts_generator: TTS generator instance
175
+ topic: STEM topic to animate
176
+ target_audience: Target audience level
177
+ animation_length_minutes: Desired animation length
178
+ output_filename: Name for output file
179
+ rendering_quality: Manim rendering quality
180
+ max_retries: Maximum retry attempts
181
+
182
+ Returns:
183
+ Dictionary with pipeline results including:
184
+ - success: Whether pipeline completed successfully
185
+ - final_output_path: Path to final video
186
+ - errors: List of errors encountered
187
+ - warnings: List of warnings
188
+ - completed_steps: List of completed steps
189
+ - metadata: Timing and other metadata
190
+ """
191
+ # Create working directories
192
+ work_dir = Path(tempfile.mkdtemp(prefix="neuroanim_work_"))
193
+ output_dir = Path("outputs")
194
+ output_dir.mkdir(exist_ok=True)
195
+
196
+ logger.info(f"📁 Working directory: {work_dir}")
197
+ logger.info(f"📁 Output directory: {output_dir}")
198
+
199
+ # Initialize nodes
200
+ nodes = AnimationNodes(
201
+ mcp_session=mcp_session,
202
+ tts_generator=tts_generator,
203
+ work_dir=work_dir,
204
+ output_dir=output_dir,
205
+ )
206
+
207
+ # Create workflow
208
+ workflow = create_animation_workflow(nodes)
209
+
210
+ # Create initial state
211
+ initial_state = create_initial_state(
212
+ topic=topic,
213
+ target_audience=target_audience,
214
+ animation_length_minutes=animation_length_minutes,
215
+ output_filename=output_filename,
216
+ rendering_quality=rendering_quality,
217
+ max_retries=max_retries,
218
+ )
219
+
220
+ logger.info(f"🎬 Starting animation pipeline for topic: '{topic}'")
221
+
222
+ try:
223
+ # Run the workflow
224
+ final_state = await workflow.ainvoke(initial_state)
225
+
226
+ # Build result summary
227
+ result = {
228
+ "success": final_state.get("success", False),
229
+ "topic": final_state["topic"],
230
+ "target_audience": final_state["target_audience"],
231
+ "final_output_path": final_state.get("final_output_path"),
232
+ "concept_plan": final_state.get("concept_plan"),
233
+ "narration": final_state.get("narration_text"),
234
+ "manim_code": final_state.get("manim_code"),
235
+ "quiz": final_state.get("quiz_content"),
236
+ "errors": final_state.get("errors", []),
237
+ "warnings": final_state.get("warnings", []),
238
+ "completed_steps": final_state.get("completed_steps", []),
239
+ "total_duration": final_state.get("total_duration"),
240
+ "work_dir": str(work_dir),
241
+ "output_dir": str(output_dir),
242
+ }
243
+
244
+ if result["success"]:
245
+ logger.info(f"✅ Animation pipeline completed successfully!")
246
+ logger.info(f"📹 Output file: {result['final_output_path']}")
247
+ logger.info(f"⏱️ Total time: {result['total_duration']:.2f}s")
248
+ else:
249
+ logger.error(f"❌ Animation pipeline failed")
250
+ logger.error(f"Errors: {result['errors']}")
251
+
252
+ return result
253
+
254
+ except Exception as e:
255
+ logger.error(f"Pipeline execution failed: {e}", exc_info=True)
256
+ return {
257
+ "success": False,
258
+ "error": str(e),
259
+ "work_dir": str(work_dir),
260
+ "output_dir": str(output_dir),
261
+ }
262
+
263
+ finally:
264
+ # Note: We don't clean up work_dir here so users can inspect artifacts
265
+ logger.info(f"Work directory preserved at: {work_dir}")
orchestrator.py ADDED
@@ -0,0 +1,785 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ NeuroAnim Orchestrator
3
+
4
+ This script coordinates the entire STEM animation generation pipeline:
5
+ 1. Concept Planning
6
+ 2. Code Generation
7
+ 3. Rendering
8
+ 4. Vision-based Analysis
9
+ 5. Audio Generation
10
+ 6. Final Merging
11
+
12
+ It uses the MCP servers (renderer and creative) to accomplish these tasks.
13
+ """
14
+
15
+ import ast
16
+ import asyncio
17
+ import json
18
+ import logging
19
+ import os
20
+ import tempfile
21
+ from pathlib import Path
22
+ from typing import Any, Dict, List, Optional
23
+
24
+ import aiofiles
25
+ from dotenv import load_dotenv
26
+ from mcp import ClientSession, StdioServerParameters
27
+ from mcp.client.stdio import stdio_client
28
+
29
+ from utils.tts import TTSGenerator
30
+
31
+ load_dotenv()
32
+ # Set up logging
33
+ logging.basicConfig(
34
+ level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
35
+ )
36
+ logger = logging.getLogger(__name__)
37
+
38
+
39
+ class NeuroAnimOrchestrator:
40
+ """Main orchestrator for NeuroAnim pipeline."""
41
+
42
+ def __init__(
43
+ self, hf_api_key: Optional[str] = None, elevenlabs_api_key: Optional[str] = None
44
+ ):
45
+ self.hf_api_key = hf_api_key or os.getenv("HUGGINGFACE_API_KEY")
46
+ self.elevenlabs_api_key = elevenlabs_api_key or os.getenv("ELEVENLABS_API_KEY")
47
+ self.renderer_session: Optional[ClientSession] = None
48
+ self.creative_session: Optional[ClientSession] = None
49
+
50
+ # Initialize TTS generator
51
+ self.tts_generator = TTSGenerator(
52
+ elevenlabs_api_key=self.elevenlabs_api_key,
53
+ hf_api_key=self.hf_api_key,
54
+ fallback_enabled=True,
55
+ )
56
+
57
+ # Context managers for MCP client connections
58
+ self._renderer_cm = None
59
+ self._creative_cm = None
60
+ self._renderer_streams = None
61
+ self._creative_streams = None
62
+
63
+ # Working directories
64
+ self.work_dir: Optional[Path] = None
65
+ self.output_dir: Optional[Path] = None
66
+
67
+ async def initialize(self):
68
+ """Initialize MCP server connections."""
69
+ # Set up working directories
70
+ self.work_dir = Path(tempfile.mkdtemp(prefix="neuroanim_work_"))
71
+ self.output_dir = Path("outputs")
72
+ self.output_dir.mkdir(exist_ok=True)
73
+
74
+ logger.info(f"Working directory: {self.work_dir}")
75
+ logger.info(f"Output directory: {self.output_dir}")
76
+
77
+ # Initialize renderer server
78
+ # stdio_client is an async context manager, must use async with
79
+ renderer_params = StdioServerParameters(
80
+ command="python", args=["mcp_servers/renderer.py"]
81
+ )
82
+
83
+ self._renderer_cm = stdio_client(renderer_params)
84
+ self._renderer_streams = await self._renderer_cm.__aenter__()
85
+ read_stream, write_stream = self._renderer_streams
86
+ self.renderer_session = ClientSession(read_stream, write_stream)
87
+ # Start background receive loop for the client session
88
+ await self.renderer_session.__aenter__()
89
+ await self.renderer_session.initialize()
90
+ logger.info("Renderer MCP server connected")
91
+
92
+ # Initialize creative server
93
+ creative_params = StdioServerParameters(
94
+ command="python",
95
+ args=["mcp_servers/creative.py"],
96
+ env={"HUGGINGFACE_API_KEY": self.hf_api_key} if self.hf_api_key else None,
97
+ )
98
+
99
+ self._creative_cm = stdio_client(creative_params)
100
+ self._creative_streams = await self._creative_cm.__aenter__()
101
+ read_stream, write_stream = self._creative_streams
102
+ self.creative_session = ClientSession(read_stream, write_stream)
103
+ # Start background receive loop for the client session
104
+ await self.creative_session.__aenter__()
105
+ await self.creative_session.initialize()
106
+ logger.info("Creative MCP server connected")
107
+
108
+ async def cleanup(self):
109
+ """Clean up resources."""
110
+ import shutil
111
+
112
+ # Close sessions first
113
+ if self.renderer_session:
114
+ try:
115
+ await self.renderer_session.__aexit__(None, None, None)
116
+ except (Exception, asyncio.CancelledError) as e:
117
+ logger.debug(f"Error closing renderer session: {e}")
118
+
119
+ if self.creative_session:
120
+ try:
121
+ await self.creative_session.__aexit__(None, None, None)
122
+ except (Exception, asyncio.CancelledError) as e:
123
+ logger.debug(f"Error closing creative session: {e}")
124
+
125
+ # Then close the stdio_client context managers with timeout
126
+ if self._renderer_cm:
127
+ try:
128
+ async with asyncio.timeout(2): # 2 second timeout
129
+ await self._renderer_cm.__aexit__(None, None, None)
130
+ except (Exception, asyncio.CancelledError, TimeoutError) as e:
131
+ logger.debug(f"Error closing renderer context manager: {e}")
132
+
133
+ if self._creative_cm:
134
+ try:
135
+ async with asyncio.timeout(2): # 2 second timeout
136
+ await self._creative_cm.__aexit__(None, None, None)
137
+ except (Exception, asyncio.CancelledError, TimeoutError) as e:
138
+ logger.debug(f"Error closing creative context manager: {e}")
139
+
140
+ # Clean up working directory
141
+ if self.work_dir and self.work_dir.exists():
142
+ try:
143
+ shutil.rmtree(self.work_dir)
144
+ logger.info(f"Cleaned up working directory: {self.work_dir}")
145
+ except Exception as e:
146
+ logger.warning(f"Failed to clean up working directory: {e}")
147
+
148
+ async def call_tool(
149
+ self, session: ClientSession, tool_name: str, arguments: Dict[str, Any]
150
+ ) -> Dict[str, Any]:
151
+ """Call a tool on an MCP server."""
152
+ result = await session.call_tool(tool_name, arguments)
153
+
154
+ if hasattr(result, "content") and result.content:
155
+ content = result.content[0]
156
+ if hasattr(content, "text"):
157
+ return {
158
+ "text": content.text,
159
+ "isError": getattr(result, "isError", False),
160
+ }
161
+
162
+ return {"text": str(result), "isError": False}
163
+
164
+ async def generate_animation(
165
+ self,
166
+ topic: str,
167
+ target_audience: str = "general",
168
+ animation_length_minutes: float = 2.0,
169
+ output_filename: str = "animation.mp4",
170
+ quality: str = "medium",
171
+ progress_callback: Optional[callable] = None,
172
+ ) -> Dict[str, Any]:
173
+ """Complete animation generation pipeline."""
174
+
175
+ try:
176
+ logger.info(f"Starting animation generation for: {topic}")
177
+
178
+ # Step 1: Concept Planning
179
+ logger.info("Step 1: Planning concept...")
180
+ if progress_callback:
181
+ progress_callback("Planning concept", 0.1)
182
+ concept_result = await self.call_tool(
183
+ self.creative_session,
184
+ "plan_concept",
185
+ {
186
+ "topic": topic,
187
+ "target_audience": target_audience,
188
+ "animation_length_minutes": animation_length_minutes,
189
+ },
190
+ )
191
+
192
+ if concept_result["isError"]:
193
+ raise Exception(f"Concept planning failed: {concept_result['text']}")
194
+
195
+ concept_plan = concept_result["text"]
196
+ logger.info("Concept planning completed")
197
+
198
+ # Step 2: Generate Narration
199
+ logger.info("Step 2: Generating narration...")
200
+ if progress_callback:
201
+ progress_callback("Generating narration script", 0.25)
202
+ narration_result = await self.call_tool(
203
+ self.creative_session,
204
+ "generate_narration",
205
+ {
206
+ "concept": topic,
207
+ "scene_description": concept_plan,
208
+ "target_audience": target_audience,
209
+ "duration_seconds": int(animation_length_minutes * 60),
210
+ },
211
+ )
212
+
213
+ if narration_result["isError"]:
214
+ raise Exception(
215
+ f"Narration generation failed: {narration_result['text']}"
216
+ )
217
+
218
+ # Clean narration text - remove title/prefix before TTS
219
+ narration_text = self._clean_narration_text(narration_result["text"])
220
+ logger.info("Narration generation completed")
221
+ logger.info(f"Narration preview: {narration_text[:100]}...")
222
+
223
+ # Step 3: Generate Manim Code with retry logic
224
+ logger.info("Step 3: Generating Manim code...")
225
+ if progress_callback:
226
+ progress_callback("Creating Manim animation code", 0.40)
227
+ target_duration_seconds = int(animation_length_minutes * 60)
228
+ manim_code = await self._generate_and_validate_code(
229
+ topic=topic,
230
+ concept_plan=concept_plan,
231
+ duration_seconds=target_duration_seconds,
232
+ max_retries=3,
233
+ )
234
+ logger.info("Manim code generation completed and validated")
235
+
236
+ # Step 4: Write Manim File
237
+ logger.info("Step 4: Writing Manim file...")
238
+ manim_file = self.work_dir / "animation.py"
239
+ write_result = await self.call_tool(
240
+ self.renderer_session,
241
+ "write_manim_file",
242
+ {"filepath": str(manim_file), "code": manim_code},
243
+ )
244
+
245
+ if write_result["isError"]:
246
+ raise Exception(f"File writing failed: {write_result['text']}")
247
+
248
+ # Extract scene name from code
249
+ scene_name = self._extract_scene_name(manim_code)
250
+ logger.info(f"Scene name detected: {scene_name}")
251
+
252
+ # Step 5: Render Animation with retry on runtime errors
253
+ logger.info("Step 5: Rendering animation...")
254
+ if progress_callback:
255
+ progress_callback("Rendering animation video", 0.55)
256
+ max_render_retries = 5
257
+ video_file = None
258
+
259
+ for render_attempt in range(max_render_retries):
260
+ render_result = await self.call_tool(
261
+ self.renderer_session,
262
+ "render_manim_animation",
263
+ {
264
+ "scene_name": scene_name,
265
+ "file_path": str(manim_file),
266
+ "output_dir": str(self.work_dir),
267
+ "quality": quality, # Use the quality parameter
268
+ "format": "mp4",
269
+ "frame_rate": 30,
270
+ },
271
+ )
272
+
273
+ if not render_result["isError"]:
274
+ # Success! Find the rendered file
275
+ video_file = self._find_output_file(self.work_dir, scene_name, "mp4")
276
+ if video_file:
277
+ # Check video duration
278
+ try:
279
+ actual_duration = self._get_video_duration(video_file)
280
+ logger.info(f"Rendered video duration: {actual_duration:.2f}s (Target: {target_duration_seconds}s)")
281
+
282
+ if actual_duration < target_duration_seconds * 0.5:
283
+ logger.warning(f"Video is too short ({actual_duration:.2f}s < {target_duration_seconds * 0.5}s). Forcing retry...")
284
+ error_text = (
285
+ f"The generated animation was TOO SHORT ({actual_duration:.1f}s). "
286
+ f"The target duration is {target_duration_seconds}s. "
287
+ "You MUST make the animation longer by adding more `self.wait()` calls "
288
+ "and ensuring animations play slower (use run_time parameter)."
289
+ )
290
+ # Fall through to error handling logic below
291
+ else:
292
+ break
293
+ except Exception as e:
294
+ logger.warning(f"Could not verify video duration: {e}")
295
+ break
296
+ else:
297
+ logger.warning("Render succeeded but could not find output file")
298
+ if render_attempt < max_render_retries - 1:
299
+ continue
300
+
301
+ # Rendering failed - check if it's a runtime error we can fix
302
+ error_text = render_result["text"]
303
+ logger.warning(f"Render attempt {render_attempt + 1} failed: {error_text[:200]}...")
304
+
305
+ # Check if this is a Manim runtime error (not a "no scene" error)
306
+ if render_attempt < max_render_retries - 1 and (
307
+ "TypeError" in error_text
308
+ or "AttributeError" in error_text
309
+ or "ValueError" in error_text
310
+ or "KeyError" in error_text
311
+ ):
312
+ logger.info(f"Detected runtime error in Manim code. Regenerating code (attempt {render_attempt + 2}/{max_render_retries})...")
313
+
314
+ # Regenerate code with error feedback
315
+ runtime_error_msg = f"Runtime Error during Manim rendering:\n{error_text}\n\nPlease fix the code to be compatible with Manim version 0.19.0."
316
+ manim_code = await self._generate_and_validate_code(
317
+ topic=topic,
318
+ concept_plan=concept_plan,
319
+ duration_seconds=target_duration_seconds,
320
+ max_retries=3, # Allow retries for syntax errors during fix
321
+ previous_error=runtime_error_msg,
322
+ previous_code=manim_code,
323
+ )
324
+
325
+ # Write the new code
326
+ write_result = await self.call_tool(
327
+ self.renderer_session,
328
+ "write_manim_file",
329
+ {"filepath": str(manim_file), "code": manim_code},
330
+ )
331
+
332
+ if write_result["isError"]:
333
+ raise Exception(f"File writing failed: {write_result['text']}")
334
+
335
+ # Extract scene name from new code
336
+ scene_name = self._extract_scene_name(manim_code)
337
+ logger.info(f"Regenerated code with scene: {scene_name}")
338
+
339
+ # Loop will retry rendering with new code
340
+ continue
341
+ else:
342
+ # Not a runtime error or out of retries
343
+ raise Exception(f"Rendering failed: {error_text}")
344
+
345
+ if not video_file:
346
+ raise Exception("Could not find rendered video file after all attempts")
347
+
348
+ logger.info(f"Animation rendered: {video_file}")
349
+
350
+ # Step 6: Generate Speech Audio
351
+ logger.info("Step 6: Generating speech audio...")
352
+ if progress_callback:
353
+ progress_callback("Generating audio narration", 0.75)
354
+ audio_file = self.work_dir / "narration.mp3"
355
+
356
+ # Use TTS generator with automatic fallback
357
+ try:
358
+ tts_result = await self.tts_generator.generate_speech(
359
+ text=narration_text, output_path=audio_file, voice="rachel"
360
+ )
361
+ logger.info(
362
+ f"Audio generated with {tts_result['provider']}: {audio_file}"
363
+ )
364
+
365
+ # Validate audio file
366
+ validation = self.tts_generator.validate_audio_file(audio_file)
367
+ if not validation["valid"]:
368
+ logger.warning(
369
+ f"Audio validation warning: {validation.get('error', 'Unknown issue')}"
370
+ )
371
+ logger.info("Audio file may have issues but continuing...")
372
+ else:
373
+ logger.info(
374
+ f"Audio validated: {validation.get('duration', 'N/A')}s, {validation.get('size', 0)} bytes"
375
+ )
376
+
377
+ except Exception as e:
378
+ logger.error(f"TTS generation failed: {e}")
379
+ raise Exception(f"Speech generation failed: {str(e)}")
380
+
381
+ # Step 7: Merge Video and Audio
382
+ logger.info("Step 7: Merging video and audio...")
383
+ if progress_callback:
384
+ progress_callback("Merging video and audio", 0.90)
385
+ final_output = self.output_dir / output_filename
386
+ merge_result = await self.call_tool(
387
+ self.renderer_session,
388
+ "merge_video_audio",
389
+ {
390
+ "video_file": str(video_file),
391
+ "audio_file": str(audio_file),
392
+ "output_file": str(final_output),
393
+ },
394
+ )
395
+
396
+ if merge_result["isError"]:
397
+ raise Exception(f"Merging failed: {merge_result['text']}")
398
+
399
+ # Step 8: Generate Quiz
400
+ logger.info("Step 8: Generating quiz...")
401
+ if progress_callback:
402
+ progress_callback("Creating quiz questions", 0.95)
403
+ quiz_result = await self.call_tool(
404
+ self.creative_session,
405
+ "generate_quiz",
406
+ {
407
+ "concept": topic,
408
+ "difficulty": "medium",
409
+ "num_questions": 3,
410
+ "question_types": ["multiple_choice"],
411
+ },
412
+ )
413
+
414
+ quiz_content = (
415
+ quiz_result["text"]
416
+ if not quiz_result["isError"]
417
+ else "Quiz generation failed"
418
+ )
419
+
420
+ # Return results
421
+ results = {
422
+ "success": True,
423
+ "topic": topic,
424
+ "target_audience": target_audience,
425
+ "concept_plan": concept_plan,
426
+ "narration": narration_text,
427
+ "manim_code": manim_code,
428
+ "output_file": str(final_output),
429
+ "quiz": quiz_content,
430
+ "work_dir": str(self.work_dir),
431
+ }
432
+
433
+ logger.info(f"Animation generation completed successfully: {final_output}")
434
+ return results
435
+
436
+ except Exception as e:
437
+ logger.error(f"Animation generation failed: {str(e)}")
438
+ return {
439
+ "success": False,
440
+ "error": str(e),
441
+ "work_dir": str(self.work_dir) if self.work_dir else None,
442
+ }
443
+
444
+ def _clean_narration_text(self, text: str) -> str:
445
+ """
446
+ Clean narration text by removing title prefixes and formatting artifacts.
447
+
448
+ The creative server returns text with prefixes like "Narration Script:\n\n"
449
+ which should not be sent to TTS.
450
+ """
451
+ # Remove common prefixes
452
+ prefixes_to_remove = [
453
+ "Narration Script:",
454
+ "Script:",
455
+ "Narration:",
456
+ "Text:",
457
+ ]
458
+
459
+ cleaned = text.strip()
460
+
461
+ # Remove any of the prefixes (case-insensitive)
462
+ for prefix in prefixes_to_remove:
463
+ if cleaned.lower().startswith(prefix.lower()):
464
+ cleaned = cleaned[len(prefix) :].strip()
465
+ break
466
+
467
+ # Remove leading newlines and whitespace
468
+ cleaned = cleaned.lstrip("\n").strip()
469
+
470
+ # Remove any markdown code block markers
471
+ if cleaned.startswith("```"):
472
+ lines = cleaned.split("\n")
473
+ # Remove first line (opening ```)
474
+ if len(lines) > 1:
475
+ lines = lines[1:]
476
+ # Remove last line if it's closing ```
477
+ if lines and lines[-1].strip() == "```":
478
+ lines = lines[:-1]
479
+ cleaned = "\n".join(lines).strip()
480
+
481
+ return cleaned
482
+
483
+ def _extract_python_code(self, text: str) -> str:
484
+ """Extract Python code from markdown response."""
485
+ # Look for code blocks
486
+ if "```python" in text:
487
+ start = text.find("```python") + 9
488
+ end = text.find("```", start)
489
+ if end == -1:
490
+ end = len(text)
491
+ return text[start:end].strip()
492
+ elif "```" in text:
493
+ start = text.find("```") + 3
494
+ end = text.find("```", start)
495
+ if end == -1:
496
+ end = len(text)
497
+ return text[start:end].strip()
498
+ else:
499
+ return text.strip()
500
+
501
+ async def _generate_and_validate_code(
502
+ self,
503
+ topic: str,
504
+ concept_plan: str,
505
+ duration_seconds: int = 60,
506
+ max_retries: int = 3,
507
+ previous_error: Optional[str] = None,
508
+ previous_code: Optional[str] = None,
509
+ ) -> str:
510
+ """Generate Manim code with retry logic for syntax errors."""
511
+ for attempt in range(max_retries):
512
+ try:
513
+ logger.info(f"Code generation attempt {attempt + 1}/{max_retries}")
514
+
515
+ # Build arguments for code generation
516
+ arguments = {
517
+ "concept": topic,
518
+ "scene_description": concept_plan,
519
+ "visual_elements": ["text", "shapes", "animations"],
520
+ "duration_seconds": duration_seconds,
521
+ }
522
+
523
+ # If this is a retry, include error feedback
524
+ if previous_error:
525
+ if previous_code:
526
+ arguments["previous_code"] = previous_code
527
+ arguments["error_message"] = previous_error
528
+ logger.info(
529
+ f"Retrying with error feedback: {previous_error[:100]}..."
530
+ )
531
+
532
+ # Generate code
533
+ code_result = await self.call_tool(
534
+ self.creative_session, "generate_manim_code", arguments
535
+ )
536
+
537
+ if code_result["isError"]:
538
+ if attempt < max_retries - 1:
539
+ logger.warning(
540
+ f"Code generation failed, retrying: {code_result['text']}"
541
+ )
542
+ previous_error = code_result["text"]
543
+ # Keep previous_code if we had it, for better context in retry
544
+ continue
545
+ else:
546
+ raise Exception(
547
+ f"Code generation failed: {code_result['text']}"
548
+ )
549
+
550
+ # Extract Python code from response
551
+ manim_code = self._extract_python_code(code_result["text"])
552
+
553
+ # Validate Python syntax
554
+ syntax_errors = self._validate_python_syntax(manim_code)
555
+ if syntax_errors:
556
+ if attempt < max_retries - 1:
557
+ logger.warning(
558
+ f"Syntax error detected, retrying: {syntax_errors}"
559
+ )
560
+ previous_error = f"Syntax Error:\n{syntax_errors}"
561
+ previous_code = manim_code
562
+ continue
563
+ else:
564
+ raise Exception(
565
+ f"Generated code has syntax errors after {max_retries} attempts:\n{syntax_errors}"
566
+ )
567
+
568
+ # Validate that code contains a Scene class
569
+ has_scene = self._validate_has_scene_class(manim_code)
570
+ if not has_scene:
571
+ if attempt < max_retries - 1:
572
+ logger.warning(
573
+ "No Scene class found in generated code, retrying..."
574
+ )
575
+ previous_error = (
576
+ "Error: The generated code does not contain any Scene class. "
577
+ "Please ensure you create a class that inherits from manim.Scene, "
578
+ "manim.MovingCameraScene, or manim.ThreeDScene."
579
+ )
580
+ previous_code = manim_code
581
+ continue
582
+ else:
583
+ raise Exception(
584
+ f"Generated code does not contain a Scene class after {max_retries} attempts"
585
+ )
586
+
587
+ # Success!
588
+ logger.info(f"Valid code generated on attempt {attempt + 1}")
589
+ return manim_code
590
+
591
+ except Exception as e:
592
+ if attempt < max_retries - 1:
593
+ logger.warning(f"Attempt {attempt + 1} failed: {str(e)}")
594
+ previous_error = str(e)
595
+ continue
596
+ else:
597
+ raise
598
+
599
+ raise Exception("Failed to generate valid code after all retries")
600
+
601
+ def _validate_python_syntax(self, code: str) -> Optional[str]:
602
+ """Validate Python code syntax. Returns error message if invalid, None if valid."""
603
+ try:
604
+ ast.parse(code)
605
+ return None
606
+ except SyntaxError as e:
607
+ # Build detailed error message with context
608
+ error_msg = f"Line {e.lineno}: {e.msg}"
609
+
610
+ # Show surrounding context (3 lines before and after)
611
+ if e.lineno is not None:
612
+ code_lines = code.split("\n")
613
+ start_line = max(0, e.lineno - 4) # 3 lines before
614
+ end_line = min(len(code_lines), e.lineno + 2) # 2 lines after
615
+
616
+ error_msg += "\n\nContext:"
617
+ for i in range(start_line, end_line):
618
+ line_num = i + 1
619
+ prefix = ">>> " if line_num == e.lineno else " "
620
+ error_msg += f"\n{prefix}{line_num:3d} | {code_lines[i]}"
621
+
622
+ # Add pointer for error line
623
+ if line_num == e.lineno and e.offset:
624
+ error_msg += f"\n {' ' * 4}{' ' * (e.offset - 1)}^"
625
+
626
+ return error_msg
627
+ except Exception as e:
628
+ return f"Unexpected error during syntax validation: {str(e)}"
629
+
630
+ def _validate_has_scene_class(self, code: str) -> bool:
631
+ """Check if code contains at least one Scene class."""
632
+ import re
633
+
634
+ # Check for Scene class inheritance
635
+ scene_patterns = [
636
+ r"class\s+\w+\s*\(\s*Scene\s*\)",
637
+ r"class\s+\w+\s*\(\s*MovingCameraScene\s*\)",
638
+ r"class\s+\w+\s*\(\s*ThreeDScene\s*\)",
639
+ r"class\s+\w+\s*\(\s*\w*Scene\s*\)",
640
+ ]
641
+
642
+ for pattern in scene_patterns:
643
+ if re.search(pattern, code):
644
+ return True
645
+
646
+ # Also check using AST parsing as a backup
647
+ try:
648
+ tree = ast.parse(code)
649
+ for node in ast.walk(tree):
650
+ if isinstance(node, ast.ClassDef):
651
+ # Check if any base class contains "Scene"
652
+ for base in node.bases:
653
+ if isinstance(base, ast.Name) and "Scene" in base.id:
654
+ return True
655
+ except Exception:
656
+ pass
657
+
658
+ return False
659
+
660
+ def _extract_scene_name(self, code: str) -> str:
661
+ """Extract scene class name from Manim code."""
662
+ import re
663
+
664
+ # Try multiple patterns to find Scene class
665
+ patterns = [
666
+ r"class\s+(\w+)\s*\(\s*Scene\s*\)", # class Name(Scene)
667
+ r"class\s+(\w+)\s*\(\s*MovingCameraScene\s*\)", # class Name(MovingCameraScene)
668
+ r"class\s+(\w+)\s*\(\s*ThreeDScene\s*\)", # class Name(ThreeDScene)
669
+ r"class\s+(\w+)\s*\(\s*\w*Scene\s*\)", # class Name(AnyScene)
670
+ ]
671
+
672
+ for pattern in patterns:
673
+ match = re.search(pattern, code)
674
+ if match:
675
+ scene_name = match.group(1)
676
+ logger.info(f"Found scene class: {scene_name}")
677
+ return scene_name
678
+
679
+ # If no scene found, look for any class definition and warn
680
+ any_class = re.search(r"class\s+(\w+)\s*\(", code)
681
+ if any_class:
682
+ class_name = any_class.group(1)
683
+ logger.warning(
684
+ f"Could not find Scene class, using first class found: {class_name}"
685
+ )
686
+ return class_name
687
+
688
+ # Last resort - parse the AST to find classes
689
+ try:
690
+ tree = ast.parse(code)
691
+ for node in ast.walk(tree):
692
+ if isinstance(node, ast.ClassDef):
693
+ logger.warning(
694
+ f"Using first class from AST parsing: {node.name}"
695
+ )
696
+ return node.name
697
+ except Exception as e:
698
+ logger.error(f"Failed to parse code AST: {e}")
699
+
700
+ # Absolute fallback
701
+ logger.error("No scene class found in code! This will likely cause rendering to fail.")
702
+ return "Scene" # fallback
703
+
704
+ def _find_output_file(
705
+ self, directory: Path, scene_name: str, extension: str
706
+ ) -> Optional[Path]:
707
+ """Find output file with given scene name and extension."""
708
+ for file in directory.glob(f"{scene_name}*.{extension}"):
709
+ return file
710
+ return None
711
+
712
+
713
+ async def main():
714
+ """Main function for running the orchestrator."""
715
+ import argparse
716
+
717
+ parser = argparse.ArgumentParser(description="NeuroAnim STEM Animation Generator")
718
+ parser.add_argument("topic", help="STEM topic for the animation")
719
+ parser.add_argument(
720
+ "--audience",
721
+ choices=["elementary", "middle_school", "high_school", "college", "general"],
722
+ default="general",
723
+ help="Target audience",
724
+ )
725
+ parser.add_argument(
726
+ "--duration", type=float, default=2.0, help="Animation duration in minutes"
727
+ )
728
+ parser.add_argument("--output", default="animation.mp4", help="Output filename")
729
+ parser.add_argument(
730
+ "--api-key", help="Hugging Face API key (or set HUGGINGFACE_API_KEY env var)"
731
+ )
732
+ parser.add_argument(
733
+ "--elevenlabs-key",
734
+ help="ElevenLabs API key (or set ELEVENLABS_API_KEY env var)",
735
+ )
736
+
737
+ args = parser.parse_args()
738
+
739
+ # Initialize and run orchestrator
740
+ orchestrator = NeuroAnimOrchestrator(
741
+ hf_api_key=args.api_key, elevenlabs_api_key=args.elevenlabs_key
742
+ )
743
+
744
+ try:
745
+ await orchestrator.initialize()
746
+
747
+ results = await orchestrator.generate_animation(
748
+ topic=args.topic,
749
+ target_audience=args.audience,
750
+ animation_length_minutes=args.duration,
751
+ output_filename=args.output,
752
+ )
753
+
754
+ if results["success"]:
755
+ print("\n🎉 Animation Generated Successfully!")
756
+ print(f"📹 Output file: {results['output_file']}")
757
+ print(f"🎯 Topic: {results['topic']}")
758
+ print(f"👥 Audience: {results['target_audience']}")
759
+ print(f"\n📝 Concept Plan:")
760
+ print(
761
+ results["concept_plan"][:500] + "..."
762
+ if len(results["concept_plan"]) > 500
763
+ else results["concept_plan"]
764
+ )
765
+ print(f"\n🎭 Narration:")
766
+ print(
767
+ results["narration"][:300] + "..."
768
+ if len(results["narration"]) > 300
769
+ else results["narration"]
770
+ )
771
+ print(f"\n📚 Quiz Questions:")
772
+ print(results["quiz"])
773
+ else:
774
+ print(f"\n❌ Animation Generation Failed: {results['error']}")
775
+
776
+ except KeyboardInterrupt:
777
+ print("\n⚠️ Process interrupted by user")
778
+ except Exception as e:
779
+ print(f"\n💥 Unexpected error: {str(e)}")
780
+ finally:
781
+ await orchestrator.cleanup()
782
+
783
+
784
+ if __name__ == "__main__":
785
+ asyncio.run(main())
pyproject.toml ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "neuroanim"
3
+ version = "0.1.0"
4
+ description = "Modular STEM animation generator using MCP and Hugging Face"
5
+ requires-python = ">=3.12"
6
+ dependencies = [
7
+ "mcp>=1.0.0",
8
+ "langgraph>=0.0.26",
9
+ "langchain-core>=0.1.0",
10
+ "huggingface_hub>=0.25.0",
11
+ "manim>=0.18.1",
12
+ "pydantic>=2.0.0",
13
+ "aiohttp>=3.8.0",
14
+ "httpx>=0.24.0",
15
+ "numpy>=1.24.0",
16
+ "Pillow>=10.0.0",
17
+ "gtts>=2.3.0",
18
+ "pydub>=0.25.0",
19
+ "python-dotenv>=1.0.0",
20
+ "elevenlabs>=0.2.0",
21
+ "blaxel>=0.1.0",
22
+ "gradio>=4.0.0",
23
+ "textstat>=0.7.0",
24
+ ]
25
+
26
+ [build-system]
27
+ requires = ["hatchling"]
28
+ build-backend = "hatchling.build"
29
+
30
+ [tool.black]
31
+ line-length = 88
32
+ target-version = ['py312']
33
+
34
+ [tool.isort]
35
+ profile = "black"
requirements.txt ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core dependencies for Hugging Face Spaces
2
+ gradio>=6.0.0
3
+ python-dotenv>=1.0.0
4
+
5
+ # AI and LLM
6
+ mcp>=1.0.0
7
+ langgraph>=0.0.26
8
+ langchain-core>=0.1.0
9
+ huggingface-hub>=0.25.0
10
+
11
+ # Animation and rendering
12
+ manim>=0.18.1
13
+ Pillow>=10.0.0
14
+ numpy>=1.24.0
15
+
16
+ # Audio processing
17
+ gtts>=2.3.0
18
+ pydub>=0.25.0
19
+ elevenlabs>=0.2.0
20
+
21
+ # Cloud rendering
22
+ blaxel>=0.1.0
23
+
24
+ # Utilities
25
+ pydantic>=2.0.0
26
+ aiohttp>=3.8.0
27
+ httpx>=0.24.0
28
+ textstat>=0.7.0
29
+ requests>=2.32.0
30
+
31
+ # Additional dependencies
32
+ beautifulsoup4>=4.14.0
33
+ tqdm>=4.67.0
utils/__init__.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Utilities for NeuroAnim.
3
+
4
+ This package contains utility modules for the NeuroAnim project.
5
+ """
6
+
7
+ from .hf_wrapper import HFInferenceWrapper, ModelConfig, get_hf_wrapper
8
+
9
+ __all__ = ["HFInferenceWrapper", "ModelConfig", "get_hf_wrapper"]
utils/hf_wrapper.py ADDED
@@ -0,0 +1,369 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Hugging Face Inference API Wrapper
3
+
4
+ This module provides a robust wrapper around the Hugging Face Inference API
5
+ with rate limiting, error handling, and support for various model types.
6
+ """
7
+
8
+ import asyncio
9
+ import base64
10
+ import io
11
+ import logging
12
+ import time
13
+ from typing import Any, BinaryIO, Dict, List, Optional, Union
14
+
15
+ import aiohttp
16
+ from huggingface_hub import AsyncInferenceClient, InferenceClient
17
+ from pydantic import BaseModel, Field
18
+
19
+ logger = logging.getLogger(__name__)
20
+
21
+
22
+ class RateLimiter:
23
+ """Simple rate limiter for API calls."""
24
+
25
+ def __init__(self, max_calls: int = 60, time_window: int = 60):
26
+ self.max_calls = max_calls
27
+ self.time_window = time_window
28
+ self.calls = []
29
+
30
+ async def acquire(self):
31
+ """Wait if rate limit would be exceeded."""
32
+ now = time.time()
33
+ # Remove calls outside the time window
34
+ self.calls = [
35
+ call_time for call_time in self.calls if now - call_time < self.time_window
36
+ ]
37
+
38
+ if len(self.calls) >= self.max_calls:
39
+ # Calculate wait time
40
+ oldest_call = min(self.calls)
41
+ wait_time = self.time_window - (now - oldest_call)
42
+ if wait_time > 0:
43
+ logger.info(f"Rate limit reached, waiting {wait_time:.2f} seconds")
44
+ await asyncio.sleep(wait_time)
45
+
46
+ self.calls.append(now)
47
+
48
+
49
+ class HFInferenceWrapper:
50
+ """
51
+ Wrapper for Hugging Face Inference API with rate limiting and error handling.
52
+ """
53
+
54
+ def __init__(self, api_key: Optional[str] = None, max_calls_per_minute: int = 60):
55
+ self.client = AsyncInferenceClient(token=api_key)
56
+ self.rate_limiter = RateLimiter(max_calls=max_calls_per_minute, time_window=60)
57
+
58
+ async def text_generation(
59
+ self,
60
+ model: str,
61
+ prompt: str,
62
+ max_new_tokens: int = 512,
63
+ temperature: float = 0.7,
64
+ **kwargs,
65
+ ) -> str:
66
+ """Generate text using a language model.
67
+
68
+ Notes:
69
+ - Uses AsyncInferenceClient by default.
70
+ - Works around a known issue where `AsyncInferenceClient.text_generation`
71
+ may raise `StopIteration` ("coroutine raised StopIteration") by
72
+ falling back to the synchronous `InferenceClient` inside a thread.
73
+ - Automatically detects if a model supports conversational tasks and
74
+ uses chat_completion instead of text_generation.
75
+ - Always normalizes the result to a plain string, extracting
76
+ `generated_text` when the client returns a `TextGenerationOutput`
77
+ object.
78
+ """
79
+ await self.rate_limiter.acquire()
80
+
81
+ try:
82
+ # Check if this is a conversational model that doesn't support text_generation
83
+ if self._is_conversational_model(model):
84
+ logger.info(f"Using chat_completion for conversational model: {model}")
85
+ return await self._chat_completion_fallback(
86
+ model, prompt, max_new_tokens, temperature, **kwargs
87
+ )
88
+
89
+ # Primary path: async client with text_generation
90
+ response = await self.client.text_generation(
91
+ prompt=prompt,
92
+ model=model,
93
+ max_new_tokens=max_new_tokens,
94
+ temperature=temperature,
95
+ **kwargs,
96
+ )
97
+ except Exception as e:
98
+ # Check if this is a model capability issue
99
+ if "not supported for task text-generation" in str(e):
100
+ logger.info(f"Falling back to chat_completion for model: {model}")
101
+ return await self._chat_completion_fallback(
102
+ model, prompt, max_new_tokens, temperature, **kwargs
103
+ )
104
+
105
+ # Newer versions of `huggingface_hub` sometimes surface a
106
+ # `RuntimeError` with message "coroutine raised StopIteration" from
107
+ # the async client. Detect that pattern (or a raw StopIteration)
108
+ # and fall back to the sync client in a background thread.
109
+ is_stop_iteration_like = isinstance(
110
+ e, StopIteration
111
+ ) or "StopIteration" in str(e)
112
+
113
+ if is_stop_iteration_like: # pragma: no cover - defensive against HF bug
114
+ logger.warning(
115
+ "Async text_generation raised/contained StopIteration for "
116
+ "model %s; falling back to sync InferenceClient: %s",
117
+ model,
118
+ e,
119
+ )
120
+
121
+ def _call_sync() -> str:
122
+ """Synchronous text-generation call for asyncio.to_thread."""
123
+ sync_client = InferenceClient(token=self.client.token)
124
+ # Check if this is a conversational model
125
+ if self._is_conversational_model(model):
126
+ messages = [{"role": "user", "content": prompt}]
127
+ chat_response = sync_client.chat.completions.create(
128
+ model=model,
129
+ messages=messages,
130
+ max_tokens=max_new_tokens,
131
+ temperature=temperature,
132
+ **kwargs,
133
+ )
134
+ return chat_response.choices[0].message.content
135
+ else:
136
+ return sync_client.text_generation(
137
+ prompt=prompt,
138
+ model=model,
139
+ max_new_tokens=max_new_tokens,
140
+ temperature=temperature,
141
+ **kwargs,
142
+ )
143
+
144
+ response = await asyncio.to_thread(_call_sync)
145
+ else:
146
+ logger.error(f"Text generation failed with model {model}: {e}")
147
+ raise
148
+
149
+ # Normalize various possible return types to a plain string
150
+ try:
151
+ from huggingface_hub.inference._generated.types.text_generation import (
152
+ TextGenerationOutput,
153
+ )
154
+ except Exception: # pragma: no cover - type import fallback
155
+ TextGenerationOutput = None # type: ignore
156
+
157
+ if TextGenerationOutput is not None and isinstance(
158
+ response, TextGenerationOutput
159
+ ):
160
+ return response.generated_text
161
+
162
+ if isinstance(response, str):
163
+ return response
164
+
165
+ # Fallback: best-effort stringification
166
+ return str(response)
167
+
168
+ def _is_conversational_model(self, model: str) -> bool:
169
+ """Check if a model is primarily conversational (doesn't support text_generation)."""
170
+ conversational_models = [
171
+ "zai-org/GLM-4.6",
172
+ # Add other known conversational-only models here
173
+ ]
174
+ return model in conversational_models
175
+
176
+ async def _chat_completion_fallback(
177
+ self,
178
+ model: str,
179
+ prompt: str,
180
+ max_new_tokens: int = 512,
181
+ temperature: float = 0.7,
182
+ **kwargs,
183
+ ) -> str:
184
+ """Fallback method using chat.completions for conversational models."""
185
+ messages = [{"role": "user", "content": prompt}]
186
+
187
+ try:
188
+ # Try async first
189
+ response = await self.client.chat.completions.create(
190
+ model=model,
191
+ messages=messages,
192
+ max_tokens=max_new_tokens,
193
+ temperature=temperature,
194
+ **kwargs,
195
+ )
196
+ return response.choices[0].message.content
197
+ except Exception as e:
198
+ logger.warning(f"Async chat_completion failed, falling back to sync: {e}")
199
+
200
+ # Fall back to sync if async fails
201
+ def _sync_chat_completion():
202
+ sync_client = InferenceClient(token=self.client.token)
203
+ response = sync_client.chat.completions.create(
204
+ model=model,
205
+ messages=messages,
206
+ max_tokens=max_new_tokens,
207
+ temperature=temperature,
208
+ **kwargs,
209
+ )
210
+ return response.choices[0].message.content
211
+
212
+ return await asyncio.to_thread(_sync_chat_completion)
213
+
214
+ async def conversation(
215
+ self,
216
+ model: str,
217
+ messages: List[Dict[str, str]],
218
+ max_tokens: int = 512,
219
+ temperature: float = 0.7,
220
+ **kwargs,
221
+ ) -> str:
222
+ """Generate response in a conversation format."""
223
+ await self.rate_limiter.acquire()
224
+
225
+ try:
226
+ response = await self.client.chat.completions.create(
227
+ model=model,
228
+ messages=messages,
229
+ max_tokens=max_tokens,
230
+ temperature=temperature,
231
+ **kwargs,
232
+ )
233
+ return response.choices[0].message.content
234
+ except Exception as e:
235
+ logger.error(f"Conversation failed with model {model}: {e}")
236
+ raise
237
+
238
+ async def image_generation(
239
+ self,
240
+ model: str,
241
+ prompt: str,
242
+ negative_prompt: Optional[str] = None,
243
+ width: int = 1024,
244
+ height: int = 1024,
245
+ **kwargs,
246
+ ) -> bytes:
247
+ """Generate an image and return as bytes."""
248
+ await self.rate_limiter.acquire()
249
+
250
+ try:
251
+ image_bytes = await self.client.text_to_image(
252
+ model=model,
253
+ prompt=prompt,
254
+ negative_prompt=negative_prompt,
255
+ width=width,
256
+ height=height,
257
+ **kwargs,
258
+ )
259
+ return image_bytes
260
+ except Exception as e:
261
+ logger.error(f"Image generation failed with model {model}: {e}")
262
+ raise
263
+
264
+ async def text_to_speech(
265
+ self, model: str, text: str, voice: Optional[str] = None, **kwargs
266
+ ) -> bytes:
267
+ """Convert text to speech and return audio bytes.
268
+
269
+ Note: The voice parameter is kept for backwards compatibility but is not used
270
+ as the HuggingFace API doesn't support it.
271
+ """
272
+ await self.rate_limiter.acquire()
273
+
274
+ try:
275
+ # HuggingFace text_to_speech API: text as first arg, model as kwarg
276
+ audio_bytes = await self.client.text_to_speech(text, model=model)
277
+ return audio_bytes
278
+ except Exception as e:
279
+ logger.error(f"TTS failed with model {model}: {e}")
280
+ raise
281
+
282
+ async def vision_analysis(
283
+ self, model: str, image: Union[bytes, BinaryIO], text: str, **kwargs
284
+ ) -> str:
285
+ """Analyze an image with a vision model."""
286
+ await self.rate_limiter.acquire()
287
+
288
+ try:
289
+ response = await self.client.image_to_text(
290
+ model=model, image=image, text=text, **kwargs
291
+ )
292
+ return response
293
+ except Exception as e:
294
+ logger.error(f"Vision analysis failed with model {model}: {e}")
295
+ raise
296
+
297
+ async def save_audio_to_file(self, audio_bytes: bytes, output_path: str) -> bool:
298
+ """Save audio bytes to a file."""
299
+ try:
300
+ with open(output_path, "wb") as f:
301
+ f.write(audio_bytes)
302
+ logger.info(f"Audio saved to {output_path}")
303
+ return True
304
+ except Exception as e:
305
+ logger.error(f"Failed to save audio to {output_path}: {e}")
306
+ return False
307
+
308
+ def audio_bytes_to_base64(self, audio_bytes: bytes) -> str:
309
+ """Convert audio bytes to base64 string for transmission."""
310
+ return base64.b64encode(audio_bytes).decode("utf-8")
311
+
312
+ def base64_to_audio_bytes(self, base64_str: str) -> bytes:
313
+ """Convert base64 string back to audio bytes."""
314
+ return base64.b64decode(base64_str.encode("utf-8"))
315
+
316
+
317
+ class ModelConfig(BaseModel):
318
+ """Configuration for different model types."""
319
+
320
+ text_models: List[str] = Field(
321
+ default_factory=lambda: [
322
+ # Primary general/text models
323
+ "zai-org/GLM-4.6",
324
+ "mistralai/Mistral-Nemo-Instruct-2407",
325
+ "Qwen/Qwen2.5-7B-Instruct",
326
+ "meta-llama/Llama-3.1-8B-Instruct",
327
+ ]
328
+ )
329
+
330
+ code_models: List[str] = Field(
331
+ default_factory=lambda: [
332
+ # Primary code-capable models
333
+ "zai-org/GLM-4.6",
334
+ "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct",
335
+ "meta-llama/CodeLlama-70b-Instruct-hf",
336
+ # Kept last because it has caused auth issues in practice
337
+ "ZhipuAI/glm-4-9b-chat",
338
+ ]
339
+ )
340
+
341
+ vision_models: List[str] = Field(
342
+ default_factory=lambda: [
343
+ "llava-hf/llava-v1.6-mistral-7b-hf",
344
+ "Salesforce/blip2-flan-t5-xxl",
345
+ "google/paligemma-3b-mix-448",
346
+ ]
347
+ )
348
+
349
+ tts_models: List[str] = Field(
350
+ default_factory=lambda: [
351
+ "ResembleAI/chatterbox",
352
+ "suno/bark",
353
+ "facebook/mms-tts-all",
354
+ ]
355
+ )
356
+
357
+ image_models: List[str] = Field(
358
+ default_factory=lambda: [
359
+ "stabilityai/stable-diffusion-3-medium",
360
+ "black-forest-labs/FLUX.1-dev",
361
+ "prompthero/openjourney",
362
+ ]
363
+ )
364
+
365
+
366
+ # Global instance factory
367
+ def get_hf_wrapper(api_key: Optional[str] = None) -> HFInferenceWrapper:
368
+ """Get a configured HFInferenceWrapper instance."""
369
+ return HFInferenceWrapper(api_key=api_key)
utils/tts.py ADDED
@@ -0,0 +1,440 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Text-to-Speech (TTS) Utility Module
3
+
4
+ Supports multiple TTS providers:
5
+ - ElevenLabs (primary, high quality)
6
+ - Hugging Face (fallback)
7
+ - Google TTS (optional fallback)
8
+ """
9
+
10
+ import asyncio
11
+ import logging
12
+ import os
13
+ from enum import Enum
14
+ from pathlib import Path
15
+ from typing import Any, Dict, Optional
16
+
17
+ import httpx
18
+ from dotenv import load_dotenv
19
+
20
+ # Try to import ElevenLabs SDK
21
+ try:
22
+ from elevenlabs.client import ElevenLabs
23
+
24
+ ELEVENLABS_SDK_AVAILABLE = True
25
+ except ImportError:
26
+ ELEVENLABS_SDK_AVAILABLE = False
27
+ ElevenLabs = None
28
+
29
+ load_dotenv()
30
+
31
+ logger = logging.getLogger(__name__)
32
+
33
+
34
+ class TTSProvider(Enum):
35
+ """Available TTS providers."""
36
+
37
+ ELEVENLABS = "elevenlabs"
38
+ HUGGINGFACE = "huggingface"
39
+ GTTS = "gtts"
40
+
41
+
42
+ class TTSConfig:
43
+ """Configuration for TTS generation."""
44
+
45
+ # ElevenLabs voices
46
+ ELEVENLABS_VOICES = {
47
+ "rachel": "21m00Tcm4TlvDq8ikWAM", # Clear, neutral female
48
+ "adam": "pNInz6obpgDQGcFmaJgB", # Deep, confident male
49
+ "antoni": "ErXwobaYiN019PkySvjV", # Well-rounded male
50
+ "arnold": "VR6AewLTigWG4xSOukaG", # Crisp, articulate male
51
+ "bella": "EXAVITQu4vr4xnSDxMaL", # Soft, gentle female
52
+ "domi": "AZnzlk1XvdvUeBnXmlld", # Strong female
53
+ "elli": "MF3mGyEYCl7XYWbV9V6O", # Emotional, expressive female
54
+ "josh": "TxGEqnHWrfWFTfGW9XjX", # Young, energetic male
55
+ "sam": "yoZ06aMxZJJ28mfd3POQ", # Raspy male
56
+ }
57
+
58
+ # Default settings
59
+ ELEVENLABS_MODEL = "eleven_turbo_v2_5"
60
+ ELEVENLABS_STABILITY = 0.5
61
+ ELEVENLABS_SIMILARITY_BOOST = 0.75
62
+ ELEVENLABS_STYLE = 0.0
63
+ ELEVENLABS_USE_SPEAKER_BOOST = True
64
+
65
+ # Hugging Face models
66
+ HF_TTS_MODELS = [
67
+ "facebook/mms-tts-eng",
68
+ "microsoft/speecht5_tts",
69
+ "suno/bark",
70
+ ]
71
+
72
+ # Timeouts
73
+ ELEVENLABS_TIMEOUT = 60.0
74
+ HF_TIMEOUT = 120.0
75
+
76
+
77
+ class TTSGenerator:
78
+ """Main TTS generation class with multi-provider support."""
79
+
80
+ def __init__(
81
+ self,
82
+ elevenlabs_api_key: Optional[str] = None,
83
+ hf_api_key: Optional[str] = None,
84
+ default_voice: str = "rachel",
85
+ fallback_enabled: bool = True,
86
+ ):
87
+ """
88
+ Initialize TTS generator.
89
+
90
+ Args:
91
+ elevenlabs_api_key: ElevenLabs API key
92
+ hf_api_key: Hugging Face API key
93
+ default_voice: Default voice to use
94
+ fallback_enabled: Whether to fall back to other providers on failure
95
+ """
96
+ self.elevenlabs_api_key = elevenlabs_api_key or os.getenv("ELEVENLABS_API_KEY")
97
+ self.hf_api_key = hf_api_key or os.getenv("HUGGINGFACE_API_KEY")
98
+ self.default_voice = default_voice
99
+ self.fallback_enabled = fallback_enabled
100
+
101
+ async def generate_speech(
102
+ self,
103
+ text: str,
104
+ output_path: Path,
105
+ voice: Optional[str] = None,
106
+ provider: Optional[TTSProvider] = None,
107
+ **kwargs,
108
+ ) -> Dict[str, Any]:
109
+ """
110
+ Generate speech from text and save to file.
111
+
112
+ Args:
113
+ text: Text to convert to speech
114
+ output_path: Path to save audio file
115
+ voice: Voice ID or name
116
+ provider: Specific provider to use (if None, auto-select)
117
+ **kwargs: Provider-specific options
118
+
119
+ Returns:
120
+ Dict with generation info (provider, duration, etc.)
121
+ """
122
+ voice = voice or self.default_voice
123
+
124
+ # Auto-select provider if not specified
125
+ if provider is None:
126
+ if self.elevenlabs_api_key:
127
+ provider = TTSProvider.ELEVENLABS
128
+ elif self.hf_api_key:
129
+ provider = TTSProvider.HUGGINGFACE
130
+ else:
131
+ provider = TTSProvider.GTTS
132
+
133
+ # Try primary provider
134
+ try:
135
+ logger.info(f"Generating speech with {provider.value}...")
136
+
137
+ if provider == TTSProvider.ELEVENLABS:
138
+ result = await self._generate_elevenlabs(
139
+ text, output_path, voice, **kwargs
140
+ )
141
+ elif provider == TTSProvider.HUGGINGFACE:
142
+ result = await self._generate_huggingface(text, output_path, **kwargs)
143
+ else:
144
+ result = await self._generate_gtts(text, output_path, **kwargs)
145
+
146
+ logger.info(f"Successfully generated speech with {provider.value}")
147
+ return result
148
+
149
+ except Exception as e:
150
+ logger.error(f"{provider.value} TTS failed: {e}")
151
+
152
+ # Try fallback if enabled
153
+ if self.fallback_enabled:
154
+ return await self._fallback_generation(
155
+ text, output_path, provider, voice, **kwargs
156
+ )
157
+ else:
158
+ raise
159
+
160
+ async def _fallback_generation(
161
+ self,
162
+ text: str,
163
+ output_path: Path,
164
+ failed_provider: TTSProvider,
165
+ voice: str,
166
+ **kwargs,
167
+ ) -> Dict[str, Any]:
168
+ """Try alternative providers as fallback."""
169
+ logger.warning(f"Attempting fallback from {failed_provider.value}...")
170
+
171
+ # Define fallback order
172
+ if failed_provider == TTSProvider.ELEVENLABS:
173
+ fallback_order = [TTSProvider.HUGGINGFACE, TTSProvider.GTTS]
174
+ elif failed_provider == TTSProvider.HUGGINGFACE:
175
+ fallback_order = [TTSProvider.GTTS]
176
+ else:
177
+ raise Exception("All TTS providers failed")
178
+
179
+ for provider in fallback_order:
180
+ try:
181
+ logger.info(f"Trying fallback provider: {provider.value}")
182
+
183
+ if provider == TTSProvider.HUGGINGFACE and self.hf_api_key:
184
+ return await self._generate_huggingface(text, output_path, **kwargs)
185
+ elif provider == TTSProvider.GTTS:
186
+ return await self._generate_gtts(text, output_path, **kwargs)
187
+
188
+ except Exception as e:
189
+ logger.error(f"Fallback {provider.value} failed: {e}")
190
+ continue
191
+
192
+ raise Exception("All TTS providers failed")
193
+
194
+ async def _generate_elevenlabs(
195
+ self, text: str, output_path: Path, voice: str, **kwargs
196
+ ) -> Dict[str, Any]:
197
+ """Generate speech using ElevenLabs API."""
198
+ if not self.elevenlabs_api_key:
199
+ raise ValueError("ElevenLabs API key not provided")
200
+
201
+ if not ELEVENLABS_SDK_AVAILABLE:
202
+ raise ImportError(
203
+ "elevenlabs SDK not installed. Run: pip install elevenlabs"
204
+ )
205
+
206
+ # Get voice ID
207
+ voice_id = TTSConfig.ELEVENLABS_VOICES.get(voice.lower(), voice)
208
+
209
+ # Create client
210
+ client = ElevenLabs(api_key=self.elevenlabs_api_key)
211
+
212
+ # Generate audio using new SDK
213
+ def _generate():
214
+ return client.text_to_speech.convert(
215
+ text=text,
216
+ voice_id=voice_id,
217
+ model_id=kwargs.get("model_id", TTSConfig.ELEVENLABS_MODEL),
218
+ output_format="mp3_44100_128",
219
+ )
220
+
221
+ # Run in thread pool since SDK is synchronous
222
+ loop = asyncio.get_event_loop()
223
+ audio_generator = await loop.run_in_executor(None, _generate)
224
+
225
+ # Save audio
226
+ output_path.parent.mkdir(parents=True, exist_ok=True)
227
+ audio_bytes = b"".join(audio_generator)
228
+
229
+ with open(output_path, "wb") as f:
230
+ f.write(audio_bytes)
231
+
232
+ # Get audio info
233
+ file_size = len(audio_bytes)
234
+
235
+ return {
236
+ "provider": "elevenlabs",
237
+ "voice": voice,
238
+ "voice_id": voice_id,
239
+ "output_path": str(output_path),
240
+ "file_size_bytes": file_size,
241
+ "text_length": len(text),
242
+ }
243
+
244
+ async def _generate_huggingface(
245
+ self, text: str, output_path: Path, **kwargs
246
+ ) -> Dict[str, Any]:
247
+ """Generate speech using Hugging Face API."""
248
+ if not self.hf_api_key:
249
+ raise ValueError("Hugging Face API key not provided")
250
+
251
+ # Import HF wrapper
252
+ from utils.hf_wrapper import HuggingFaceWrapper
253
+
254
+ wrapper = HuggingFaceWrapper(api_key=self.hf_api_key)
255
+ model = kwargs.get("model", TTSConfig.HF_TTS_MODELS[0])
256
+
257
+ # Generate speech
258
+ result = await wrapper.text_to_speech(
259
+ text=text, model=model, output_path=str(output_path)
260
+ )
261
+
262
+ return {
263
+ "provider": "huggingface",
264
+ "model": model,
265
+ "output_path": str(output_path),
266
+ "text_length": len(text),
267
+ }
268
+
269
+ async def _generate_gtts(
270
+ self, text: str, output_path: Path, **kwargs
271
+ ) -> Dict[str, Any]:
272
+ """Generate speech using gTTS (Google Text-to-Speech) as last resort."""
273
+ try:
274
+ from gtts import gTTS
275
+ except ImportError:
276
+ raise ImportError("gTTS not installed. Run: pip install gtts")
277
+
278
+ # Generate speech
279
+ tts = gTTS(
280
+ text=text, lang=kwargs.get("lang", "en"), slow=kwargs.get("slow", False)
281
+ )
282
+
283
+ output_path.parent.mkdir(parents=True, exist_ok=True)
284
+ tts.save(str(output_path))
285
+
286
+ return {
287
+ "provider": "gtts",
288
+ "output_path": str(output_path),
289
+ "text_length": len(text),
290
+ }
291
+
292
+ async def get_available_voices(
293
+ self, provider: TTSProvider = TTSProvider.ELEVENLABS
294
+ ) -> Dict[str, str]:
295
+ """
296
+ Get list of available voices for a provider.
297
+
298
+ Args:
299
+ provider: TTS provider
300
+
301
+ Returns:
302
+ Dict mapping voice names to IDs
303
+ """
304
+ if provider == TTSProvider.ELEVENLABS:
305
+ if not self.elevenlabs_api_key:
306
+ return TTSConfig.ELEVENLABS_VOICES
307
+
308
+ # Fetch from API for custom voices
309
+ try:
310
+ async with httpx.AsyncClient(timeout=10.0) as client:
311
+ response = await client.get(
312
+ "https://api.elevenlabs.io/v1/voices",
313
+ headers={"xi-api-key": self.elevenlabs_api_key},
314
+ )
315
+ response.raise_for_status()
316
+ voices_data = response.json()
317
+
318
+ voices = {}
319
+ for voice in voices_data.get("voices", []):
320
+ voices[voice["name"].lower()] = voice["voice_id"]
321
+
322
+ return voices
323
+ except Exception as e:
324
+ logger.warning(f"Failed to fetch ElevenLabs voices: {e}")
325
+ return TTSConfig.ELEVENLABS_VOICES
326
+
327
+ return {}
328
+
329
+ def validate_audio_file(self, audio_path: Path) -> Dict[str, Any]:
330
+ """
331
+ Validate that audio file was generated correctly.
332
+
333
+ Args:
334
+ audio_path: Path to audio file
335
+
336
+ Returns:
337
+ Dict with validation results
338
+ """
339
+ if not audio_path.exists():
340
+ return {"valid": False, "error": "File does not exist"}
341
+
342
+ file_size = audio_path.stat().st_size
343
+
344
+ if file_size == 0:
345
+ return {"valid": False, "error": "File is empty"}
346
+
347
+ if file_size < 1000: # Less than 1KB is suspicious
348
+ return {
349
+ "valid": False,
350
+ "error": "File suspiciously small",
351
+ "size": file_size,
352
+ }
353
+
354
+ # Try to check if it's valid audio (optional, requires pydub)
355
+ try:
356
+ from pydub import AudioSegment
357
+
358
+ audio = AudioSegment.from_file(str(audio_path))
359
+ duration = len(audio) / 1000.0 # Convert to seconds
360
+
361
+ if duration < 0.1:
362
+ return {
363
+ "valid": False,
364
+ "error": "Audio duration too short",
365
+ "duration": duration,
366
+ }
367
+
368
+ return {
369
+ "valid": True,
370
+ "size": file_size,
371
+ "duration": duration,
372
+ "format": audio_path.suffix,
373
+ }
374
+ except ImportError:
375
+ # pydub not available, just check size
376
+ return {"valid": True, "size": file_size, "format": audio_path.suffix}
377
+ except Exception as e:
378
+ return {"valid": False, "error": f"Audio validation failed: {e}"}
379
+
380
+
381
+ # Convenience functions
382
+ async def generate_speech_elevenlabs(
383
+ text: str,
384
+ output_path: Path,
385
+ api_key: Optional[str] = None,
386
+ voice: str = "rachel",
387
+ **kwargs,
388
+ ) -> Dict[str, Any]:
389
+ """
390
+ Quick function to generate speech with ElevenLabs.
391
+
392
+ Args:
393
+ text: Text to convert
394
+ output_path: Output file path
395
+ api_key: ElevenLabs API key
396
+ voice: Voice name or ID
397
+ **kwargs: Additional options
398
+
399
+ Returns:
400
+ Generation info dict
401
+ """
402
+ generator = TTSGenerator(elevenlabs_api_key=api_key, fallback_enabled=False)
403
+ return await generator.generate_speech(
404
+ text=text,
405
+ output_path=output_path,
406
+ voice=voice,
407
+ provider=TTSProvider.ELEVENLABS,
408
+ **kwargs,
409
+ )
410
+
411
+
412
+ async def generate_speech_auto(
413
+ text: str,
414
+ output_path: Path,
415
+ elevenlabs_key: Optional[str] = None,
416
+ hf_key: Optional[str] = None,
417
+ voice: str = "rachel",
418
+ **kwargs,
419
+ ) -> Dict[str, Any]:
420
+ """
421
+ Auto-select best available TTS provider.
422
+
423
+ Args:
424
+ text: Text to convert
425
+ output_path: Output file path
426
+ elevenlabs_key: ElevenLabs API key
427
+ hf_key: Hugging Face API key
428
+ voice: Voice name
429
+ **kwargs: Additional options
430
+
431
+ Returns:
432
+ Generation info dict
433
+ """
434
+ generator = TTSGenerator(
435
+ elevenlabs_api_key=elevenlabs_key,
436
+ hf_api_key=hf_key,
437
+ default_voice=voice,
438
+ fallback_enabled=True,
439
+ )
440
+ return await generator.generate_speech(text=text, output_path=output_path, **kwargs)