Spaces:

MCP-1st-Birthday
/

manim-mcp

Running

App Files Files Community

manim-mcp / IMPROVEMENTS.md

bhaveshgoel07

Deploy code fixes (clean history)

fff13d1 11 days ago

preview code

raw

history blame contribute delete

18.3 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

NeuroAnim Improvements Guide

This document outlines improvements made and further recommendations for enhancing the NeuroAnim system's code generation, script writing, and overall quality.

✅ Issues Fixed

1. Audio Generation Problem - RESOLVED

Problem: Narration text contained prefixes like "Narration Script:\n\n" which were being sent to TTS, causing poor audio quality or failures.

Solution Implemented:

Added _clean_narration_text() method in orchestrator.py that strips prefixes and formatting artifacts
Updated generate_narration() in mcp_servers/creative.py to return clean text without prefixes
Improved prompt to explicitly instruct the model not to add labels

Location:

orchestrator.py lines 353-389 (new method)
mcp_servers/creative.py lines 558-608 (improved prompt and cleaning)

🎯 Recommendations for Further Improvements

2. Manim Code Generation Quality

Current Issues:

Syntax errors (unclosed parentheses, brackets)
Invalid color names (DARK_GREEN, LIGHT_BLUE don't exist in Manim)
Incorrect animation method names (using lowercase instead of capitalized)
Missing imports or incomplete code blocks
Using deprecated Manim classes or methods

Improvements Made:

✅ Enhanced prompts in mcp_servers/creative.py with explicit requirements:

List of valid color constants
Correct animation method names with capitalization
Use of MovingCameraScene for better flexibility
Syntax validation requirements

Additional Recommendations:

A. Add Code Post-Processing Pipeline

Create utils/code_validator.py:

import ast
import re
from typing import Dict, List, Optional

class ManimCodeValidator:
    """Validate and fix common Manim code issues."""
    
    VALID_COLORS = {
        'WHITE', 'BLACK', 'GRAY', 'GREY', 'LIGHT_GRAY', 'DARK_GRAY',
        'RED', 'GREEN', 'BLUE', 'YELLOW', 'ORANGE', 'PINK', 'PURPLE',
        'TEAL', 'GOLD', 'MAROON', 'RED_A', 'RED_B', 'RED_C', 'RED_D',
        'RED_E', 'GREEN_A', 'GREEN_B', 'GREEN_C', 'GREEN_D', 'GREEN_E',
        'BLUE_A', 'BLUE_B', 'BLUE_C', 'BLUE_D', 'BLUE_E'
    }
    
    INVALID_COLOR_REPLACEMENTS = {
        'DARK_GREEN': 'GREEN_D',
        'LIGHT_GREEN': 'GREEN_A',
        'DARK_BLUE': 'BLUE_D',
        'LIGHT_BLUE': 'BLUE_A',
        'DARK_RED': 'RED_D',
        'LIGHT_RED': 'RED_A',
    }
    
    @staticmethod
    def validate_syntax(code: str) -> Dict[str, any]:
        """Check if code has valid Python syntax."""
        try:
            ast.parse(code)
            return {"valid": True, "errors": []}
        except SyntaxError as e:
            return {
                "valid": False,
                "errors": [f"Syntax error at line {e.lineno}: {e.msg}"]
            }
    
    @staticmethod
    def fix_colors(code: str) -> str:
        """Replace invalid color names with valid ones."""
        for invalid, valid in ManimCodeValidator.INVALID_COLOR_REPLACEMENTS.items():
            code = re.sub(rf'\b{invalid}\b', valid, code)
        return code
    
    @staticmethod
    def ensure_imports(code: str) -> str:
        """Ensure proper Manim imports exist."""
        if 'from manim import' not in code and 'import manim' not in code:
            code = 'from manim import *\n\n' + code
        return code
    
    @staticmethod
    def fix_common_issues(code: str) -> str:
        """Apply common fixes to generated code."""
        # Fix colors
        code = ManimCodeValidator.fix_colors(code)
        
        # Ensure imports
        code = ManimCodeValidator.ensure_imports(code)
        
        # Fix common typos in animation methods
        typo_fixes = {
            r'\.fadein\(': '.FadeIn(',
            r'\.fadeout\(': '.FadeOut(',
            r'\.write\(': '.Write(',
            r'\.create\(': '.Create(',
            r'self\.play\(flash\(': 'self.play(Flash(',
            r'self\.play\(indicate\(': 'self.play(Indicate(',
        }
        
        for pattern, replacement in typo_fixes.items():
            code = re.sub(pattern, replacement, code, flags=re.IGNORECASE)
        
        return code

B. Implement Multi-Stage Validation

In orchestrator.py, enhance _generate_and_validate_code():

async def _generate_and_validate_code(
    self, topic: str, concept_plan: str, max_retries: int = 3
) -> str:
    """Generate and validate Manim code with multiple checks."""
    
    from utils.code_validator import ManimCodeValidator
    validator = ManimCodeValidator()
    
    for attempt in range(max_retries):
        # Generate code
        code_result = await self.call_tool(...)
        raw_code = self._extract_python_code(code_result["text"])
        
        # Stage 1: Fix common issues
        fixed_code = validator.fix_common_issues(raw_code)
        
        # Stage 2: Syntax validation
        syntax_check = validator.validate_syntax(fixed_code)
        if not syntax_check["valid"]:
            logger.warning(f"Syntax error in attempt {attempt + 1}")
            # Retry with error feedback
            continue
        
        # Stage 3: Test import (optional, quick check)
        try:
            compile(fixed_code, '<string>', 'exec')
        except Exception as e:
            logger.warning(f"Compilation error: {e}")
            continue
        
        return fixed_code
    
    raise Exception("Failed to generate valid code after retries")

C. Use Few-Shot Examples in Prompts

Add working examples to the code generation prompt:

EXAMPLE_CODE = '''
from manim import *

class ExampleScene(MovingCameraScene):
    def construct(self):
        # Title
        title = Text("Example Animation", font_size=48)
        title.to_edge(UP)
        self.play(Write(title))
        self.wait(1)
        
        # Create objects
        circle = Circle(radius=1, color=BLUE)
        square = Square(side_length=2, color=RED)
        square.next_to(circle, RIGHT, buff=1)
        
        # Animate
        self.play(Create(circle), Create(square))
        self.wait(1)
        self.play(circle.animate.shift(RIGHT * 2))
        self.wait(1)
'''

# Include in prompt:
prompt = f"""
Here's an example of proper Manim code structure:

{EXAMPLE_CODE}

Now generate similar code for: {concept}
...
"""

3. Script Writing (Narration) Quality

Current Issues:

Sometimes too technical or too simple for the audience
Inconsistent pacing
May include unnecessary conversational elements
Duration mismatch with actual content

Improvements Made:

✅ Completely rewritten prompt in mcp_servers/creative.py:

Clear instruction to output only spoken text
Word count guidance based on duration
Explicit formatting requirements
Post-processing to remove prefixes

Additional Recommendations:

A. Add Narration Quality Scoring

Create utils/narration_analyzer.py:

class NarrationAnalyzer:
    """Analyze and score narration quality."""
    
    @staticmethod
    def estimate_duration(text: str, wpm: int = 150) -> float:
        """Estimate speaking duration in seconds."""
        word_count = len(text.split())
        return (word_count / wpm) * 60
    
    @staticmethod
    def check_reading_level(text: str) -> Dict:
        """Analyze text complexity."""
        # Could use textstat library
        import textstat
        
        return {
            "flesch_reading_ease": textstat.flesch_reading_ease(text),
            "grade_level": textstat.flesch_kincaid_grade(text),
            "syllable_count": textstat.syllable_count(text),
        }
    
    @staticmethod
    def validate_audience_match(text: str, audience: str) -> bool:
        """Check if text matches target audience."""
        grade_map = {
            "elementary": (3, 5),
            "middle_school": (6, 8),
            "high_school": (9, 12),
            "undergraduate": (13, 16),
        }
        
        if audience not in grade_map:
            return True
        
        min_grade, max_grade = grade_map[audience]
        actual_grade = textstat.flesch_kincaid_grade(text)
        
        return min_grade <= actual_grade <= max_grade + 2

B. Implement Iterative Refinement

async def generate_refined_narration(self, topic, audience, duration, max_attempts=2):
    """Generate narration with quality checks and refinement."""
    
    analyzer = NarrationAnalyzer()
    
    for attempt in range(max_attempts):
        # Generate narration
        narration = await self.generate_narration(...)
        
        # Check duration match
        estimated_duration = analyzer.estimate_duration(narration)
        target_duration = duration * 60
        
        if abs(estimated_duration - target_duration) > 15:  # 15 sec tolerance
            feedback = f"Duration mismatch: got {estimated_duration}s, need {target_duration}s"
            # Regenerate with feedback
            continue
        
        # Check audience match
        if not analyzer.validate_audience_match(narration, audience):
            feedback = f"Complexity doesn't match {audience} level"
            continue
        
        return narration
    
    # Return best attempt even if not perfect
    return narration

C. Use Structured Output Format

Modify prompt to request JSON structure:

prompt = f"""
Generate narration in JSON format:

{{
    "narration": "The actual spoken text...",
    "key_points": ["point 1", "point 2"],
    "transitions": ["0:00 - Introduction", "0:30 - Main concept"],
    "emphasis_words": ["important", "theorem", "result"]
}}

Topic: {concept}
Audience: {target_audience}
Duration: {duration} seconds
"""

# Parse and extract just the narration part
result = json.loads(response)
narration_text = result["narration"]

4. Overall System Improvements

A. Add Caching Layer

Save generated components to avoid regeneration:

import hashlib
import json
from pathlib import Path

class GenerationCache:
    """Cache generated content."""
    
    def __init__(self, cache_dir: Path = Path("cache")):
        self.cache_dir = cache_dir
        self.cache_dir.mkdir(exist_ok=True)
    
    def _get_hash(self, topic: str, params: Dict) -> str:
        """Generate cache key."""
        key = f"{topic}_{json.dumps(params, sort_keys=True)}"
        return hashlib.md5(key.encode()).hexdigest()
    
    def get_narration(self, topic: str, audience: str) -> Optional[str]:
        """Retrieve cached narration."""
        key = self._get_hash(topic, {"audience": audience, "type": "narration"})
        cache_file = self.cache_dir / f"{key}.txt"
        
        if cache_file.exists():
            return cache_file.read_text()
        return None
    
    def save_narration(self, topic: str, audience: str, content: str):
        """Save narration to cache."""
        key = self._get_hash(topic, {"audience": audience, "type": "narration"})
        cache_file = self.cache_dir / f"{key}.txt"
        cache_file.write_text(content)

B. Implement Quality Metrics Dashboard

Track generation success rates, error types, average durations:

class MetricsCollector:
    """Collect and report system metrics."""
    
    def __init__(self):
        self.metrics = {
            "total_generations": 0,
            "successful_generations": 0,
            "failed_generations": 0,
            "errors": {},
            "average_duration": 0,
        }
    
    def record_success(self, duration: float):
        self.metrics["total_generations"] += 1
        self.metrics["successful_generations"] += 1
        self._update_average_duration(duration)
    
    def record_failure(self, error_type: str):
        self.metrics["total_generations"] += 1
        self.metrics["failed_generations"] += 1
        self.metrics["errors"][error_type] = self.metrics["errors"].get(error_type, 0) + 1
    
    def get_report(self) -> Dict:
        """Get metrics report."""
        success_rate = (
            self.metrics["successful_generations"] / self.metrics["total_generations"]
            if self.metrics["total_generations"] > 0
            else 0
        )
        
        return {
            **self.metrics,
            "success_rate": success_rate,
        }

C. Add Preview Mode

Generate low-quality preview before full render:

async def generate_preview(self, topic: str, audience: str) -> Dict:
    """Generate quick preview without full rendering."""
    
    # Generate only concept plan and narration
    concept = await self.generate_concept(topic, audience)
    narration = await self.generate_narration(topic, concept, audience)
    
    # Generate code but don't render
    code = await self.generate_code(topic, concept)
    
    return {
        "concept": concept,
        "narration": narration,
        "code": code,
        "estimated_duration": len(narration.split()) / 150 * 60,
    }

D. Error Recovery Strategies

Implement better fallback mechanisms:

class GenerationStrategy:
    """Handle generation with multiple fallback strategies."""
    
    async def generate_with_fallback(self, primary_fn, fallback_fn, *args):
        """Try primary method, fall back if it fails."""
        try:
            return await primary_fn(*args)
        except Exception as e:
            logger.warning(f"Primary method failed: {e}, trying fallback")
            return await fallback_fn(*args)
    
    async def generate_code_resilient(self, topic: str, concept: str):
        """Generate code with multiple strategies."""
        
        strategies = [
            ("Complex with camera", lambda: self.generate_with_camera_scene(topic)),
            ("Simple Scene", lambda: self.generate_simple_scene(topic)),
            ("Template-based", lambda: self.use_code_template(topic)),
        ]
        
        for strategy_name, strategy_fn in strategies:
            try:
                logger.info(f"Trying strategy: {strategy_name}")
                return await strategy_fn()
            except Exception as e:
                logger.warning(f"Strategy {strategy_name} failed: {e}")
                continue
        
        raise Exception("All code generation strategies failed")

📋 Implementation Priority

High Priority (Immediate)

✅ Fix audio generation (DONE)
✅ Improve narration prompts (DONE)
✅ Add Gradio frontend (DONE)
🔲 Implement code validator with post-processing
🔲 Add syntax validation before rendering

Medium Priority (Next Sprint)

🔲 Add narration quality analyzer
🔲 Implement caching layer
🔲 Add preview mode
🔲 Enhance error recovery

Low Priority (Future)

🔲 Metrics dashboard
🔲 Advanced code templates
🔲 Multi-model ensemble for better quality
🔲 User feedback loop for iterative improvement

🧪 Testing Recommendations

Unit Tests

def test_narration_cleaning():
    """Test narration text cleaning."""
    dirty = "Narration Script:\n\nThis is the actual text"
    clean = orchestrator._clean_narration_text(dirty)
    assert clean == "This is the actual text"

def test_code_validation():
    """Test Manim code validation."""
    invalid_code = "circle = Circle(color=DARK_GREEN)"
    fixed = validator.fix_colors(invalid_code)
    assert "GREEN_D" in fixed

def test_duration_estimation():
    """Test narration duration estimation."""
    text = "This is a test " * 150  # 150 words
    duration = analyzer.estimate_duration(text, wpm=150)
    assert 59 <= duration <= 61  # Should be ~60 seconds

Integration Tests

async def test_full_pipeline():
    """Test complete generation pipeline."""
    orchestrator = NeuroAnimOrchestrator()
    await orchestrator.initialize()
    
    result = await orchestrator.generate_animation(
        topic="Test Topic",
        target_audience="high_school",
        animation_length_minutes=1.0
    )
    
    assert result["success"]
    assert Path(result["output_file"]).exists()
    assert len(result["narration"]) > 50
    assert "from manim import" in result["manim_code"]

📊 Success Metrics

Track these to measure improvement:

Code Generation Success Rate: % of generated code that renders without errors
Audio Quality Score: User ratings or automated speech quality metrics
Narration Accuracy: Duration match, audience level match
End-to-End Success: % of complete generations without manual intervention
User Satisfaction: Feedback scores from Gradio interface

Target Goals:

Code success rate: >85%
Audio quality: >4/5
Duration accuracy: ±10 seconds
End-to-end success: >75%

🔧 Configuration Best Practices

Create config.yaml for easy tuning:

generation:
  max_retries: 3
  timeout_seconds: 300
  
narration:
  words_per_minute: 150
  min_words: 50
  max_words: 1000
  
code_generation:
  temperature: 0.3
  max_tokens: 2048
  default_scene_class: "MovingCameraScene"
  
rendering:
  quality: "medium"
  frame_rate: 30
  format: "mp4"
  
audio:
  primary_provider: "elevenlabs"
  fallback_providers: ["huggingface", "gtts"]
  default_voice: "rachel"

🎓 Educational Content Guidelines

To maximize educational value:

Clear Learning Objectives: Start narration with "In this video, you'll learn..."
Progressive Complexity: Build from simple to complex
Visual-Audio Sync: Time narration with visual reveals
Repetition: Reinforce key concepts 2-3 times
Real-World Connections: Include practical applications
Assessment: Quiz questions that test understanding, not memorization

📝 Future Enhancements

Multi-Language Support: Generate narration in multiple languages
Custom Voice Cloning: Use teacher's voice with ElevenLabs
Interactive Elements: Clickable annotations in video
Series Generation: Create multi-video curriculum
Adaptive Learning: Adjust complexity based on quiz results
Collaborative Editing: Allow teachers to refine generated content

Document Version: 1.0
Last Updated: 2024
Status: Living document - update as improvements are implemented