Add model-index with benchmark evaluations

#4

Added structured evaluation results from benchmark image:

  • SimpleQA: 8.90
  • MUSR: 63.49
  • MMLU (Zero Shot): 84.95
  • Math-500: 92.10
  • GPQA-Diamond: 58.55
  • BFCL V3: 59.67

This enables the model to appear in leaderboards and makes it easier to compare with other models.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment