--- tags: - moe - minimax - bfloat16 - sglang - mlx license: mit datasets: - nick007x/github-code-2025 - tatsu-lab/alpaca base_model: - MiniMaxAI/MiniMax-M2 --- ![Screenshot](https://huggingface.co/VibeStudio/MiniMax-M2-THRIFT/resolve/main/vibe_processed_by_imagy.png) # VibeStudio/MiniMax-M2-THRIFT-55-v1 **Targeted Reduction for Inference and Fine-Tuning — ~55% Expert Pruned** A lean, efficiency-first variant of MiniMax-M2 designed to maximize **latency, throughput, and VRAM savings** for local, on-prem, and edge deployments. ## TLDR * **What:** ~55% expert-pruned MoE with staged pruning + knowledge distillation. * **Why:** Push the efficiency frontier for compact, responsive deployments. * **Now:** Ready for experimentation with solid coverage across core evals and more on the way. --- ## Why it’s useful * **Lower latency:** Fast, responsive interactions for interactive apps and tools. * **Smaller memory footprint:** Fits tighter VRAM budgets and increases node density. * **Higher throughput:** Serve more concurrent users on the same hardware. * **Deployment-friendly:** Smooth drop-in via SGLang with OpenAI-compatible API. * **Adaptable:** Plays well with light fine-tuning to match domain and style. ## Intended use * Local/air-gapped assistants and dev tools * Cost-sensitive batches and realtime services * Edge and on-prem deployments prioritizing efficiency --- ## How Our Approach Works > **Active research in progress** — we continue to iterate and expand ablations. * **Teacher–student setup:** Start with **MiniMax-M2** as teacher and a copy as student. * **Gradual expert pruning:** Remove **≈5% experts per stage** over **~11 stages** (≈**55% total**), guided by importance scores with a lightweight **Leave-One-Expert-Out** check to retain rare-but-important experts. * **Distill after each prune:** Retrain the student to imitate the teacher on * **Outputs** (token probability distributions), * **Hidden states**, and * **Router behavior** over the **surviving experts**. --- **Run AI Coding Agents Fully Locally (Mac Studio, DGX Spark, AMD AI Max)** https://github.com/latent-variable/minimax-agent-guide