Spaces:

harlley
/

basic-agentic-ui

Running

App Files Files Community

harlley commited on 3 days ago

Commit

0e477bf

1 Parent(s): f0bc0bb

play around with finetunning

Browse files

Files changed (5) hide show

finetuning-functiongemma/README.md +89 -0
finetuning-functiongemma/dataset/square_color_dataset.json +969 -0
finetuning-functiongemma/finetune_functiongemma.ipynb +690 -0
finetuning-functiongemma/finetune_functiongemma_v2.ipynb +635 -0
src/worker.ts +1 -1

finetuning-functiongemma/README.md ADDED Viewed

	@@ -0,0 +1,89 @@

+# Fine-tuning FunctionGemma for Square Color Control
+This directory contains everything needed to fine-tune FunctionGemma to recognize square color control commands.
+## 📋 Overview
+FunctionGemma is a base model that requires fine-tuning to work well with custom functions. This project demonstrates:
+1. **Dataset creation** for function calling
+2. **Fine-tuning with LoRA** using HuggingFace TRL
+3. **Export to ONNX** for browser use
+4. **Deploy to Hugging Face Hub**
+## 🚀 Quick Start
+### Option 1: Google Colab (Recommended)
+1. Upload the entire `finetuning-functiongemma/` folder to [Google Colab](https://colab.research.google.com)
+2. Open the notebook `finetune_functiongemma.ipynb`
+3. Select GPU runtime (T4 is sufficient)
+4. Run all cells
+> **Note:** The notebook loads the dataset from `dataset/square_color_dataset.json`, so make sure to keep the folder structure intact.
+### Option 2: Hugging Face Spaces
+1. Create a new Space with the Gradio template
+2. Configure a GPU Space (if needed)
+3. Use the notebook inside the Space
+## 📁 Structure
+```
+finetuning-functiongemma/
+├── README.md                     # This file
+├── finetune_functiongemma.ipynb  # Main notebook
+├── dataset/
+│   └── square_color_dataset.json # Training dataset
+└── export_to_onnx.py             # Script to convert to ONNX
+```
+## 🎯 Target Functions
+The model will be trained to recognize two functions:
+### `set_square_color`
+Changes the square color to a new color.
+**Example inputs:**
+- "Change the color to blue"
+- "Make it red"
+- "Set the square to green"
+### `get_square_color`
+Returns the current color of the square.
+**Example inputs:**
+- "What color is the square?"
+- "Tell me the current color"
+- "Which color is it?"
+## 📊 Dataset
+The dataset contains varied examples in English, including:
+- Direct commands ("set to red")
+- Indirect commands ("I want it blue")
+- Questions ("what color?")
+- Natural language variations
+## 🔧 Requirements
+```bash
+pip install torch transformers datasets trl accelerate
+pip install optimum[onnxruntime]  # For ONNX export
+```
+## 📝 Important Notes
+1. **GPU Required**: Fine-tuning requires GPU (minimum T4)
+2. **Time**: ~10-15 minutes with 60 examples and 8 epochs
+3. **Format**: The model uses special `<escape>` tokens for strings
+## 🔗 Useful Links
+- [FunctionGemma Docs](https://ai.google.dev/gemma/docs/functiongemma)
+- [Official Fine-tuning Tutorial](https://ai.google.dev/gemma/docs/functiongemma/finetuning-with-functiongemma)
+- [HuggingFace TRL](https://huggingface.co/docs/trl)
+## Author
+Created as an AI Engineering portfolio project.

finetuning-functiongemma/dataset/square_color_dataset.json ADDED Viewed

	@@ -0,0 +1,969 @@

+[
+  {
+    "user_content": "Change the color to blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"blue\"}"
+  },
+  {
+    "user_content": "What color is the square?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Make it red",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"red\"}"
+  },
+  {
+    "user_content": "Tell me the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Set the square to green",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"green\"}"
+  },
+  {
+    "user_content": "Which color is it?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "I want the square to be purple",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"purple\"}"
+  },
+  {
+    "user_content": "What's the color right now?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Turn it yellow",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"yellow\"}"
+  },
+  {
+    "user_content": "Can you tell me what color the square is?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Paint it orange",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"orange\"}"
+  },
+  {
+    "user_content": "I'd like to know the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Switch to pink",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"pink\"}"
+  },
+  {
+    "user_content": "What is the square's color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Make the square cyan",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"cyan\"}"
+  },
+  {
+    "user_content": "Show me the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "I want it to be white",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"white\"}"
+  },
+  {
+    "user_content": "What color is it set to?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Change to black",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"black\"}"
+  },
+  {
+    "user_content": "Tell me the color of the square",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Set it to teal",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"teal\"}"
+  },
+  {
+    "user_content": "Query the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Make it magenta",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"magenta\"}"
+  },
+  {
+    "user_content": "Get the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "I'd like the square to be lime",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"lime\"}"
+  },
+  {
+    "user_content": "Read the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Update the color to navy",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"navy\"}"
+  },
+  {
+    "user_content": "Check the square color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Set color to coral",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"coral\"}"
+  },
+  {
+    "user_content": "What color do we have?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Put it in violet",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"violet\"}"
+  },
+  {
+    "user_content": "Display the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Color it gold",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"gold\"}"
+  },
+  {
+    "user_content": "Fetch the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Apply salmon color",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"salmon\"}"
+  },
+  {
+    "user_content": "Return the color value",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Use turquoise",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"turquoise\"}"
+  },
+  {
+    "user_content": "What's the current state of the color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Modify the square to crimson",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"crimson\"}"
+  },
+  {
+    "user_content": "Retrieve the square's color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Please change to blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"blue\"}"
+  },
+  {
+    "user_content": "Could you tell me the color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "I need it red",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"red\"}"
+  },
+  {
+    "user_content": "What is the current color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Let's make it green",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"green\"}"
+  },
+  {
+    "user_content": "Can you get the color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Go with purple",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"purple\"}"
+  },
+  {
+    "user_content": "I want to know the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "How about yellow?",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"yellow\"}"
+  },
+  {
+    "user_content": "Give me the color info",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Try orange",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"orange\"}"
+  },
+  {
+    "user_content": "Report the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Let's go with pink",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"pink\"}"
+  },
+  {
+    "user_content": "What's the square showing?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Change it to brown",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"brown\"}"
+  },
+  {
+    "user_content": "Tell me what color it is",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Set to silver",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"silver\"}"
+  },
+  {
+    "user_content": "Check what color the square is",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Make the color maroon",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"maroon\"}"
+  },
+  {
+    "user_content": "Show current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"blue\"}"
+  },
+  {
+    "user_content": "red please",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"red\"}"
+  },
+  {
+    "user_content": "green!",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"green\"}"
+  },
+  {
+    "user_content": "color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "what color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "current color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Set the square color to indigo",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"indigo\"}"
+  },
+  {
+    "user_content": "I want indigo",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"indigo\"}"
+  },
+  {
+    "user_content": "Make it olive",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"olive\"}"
+  },
+  {
+    "user_content": "Switch the color to beige",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"beige\"}"
+  },
+  {
+    "user_content": "Can you change it to lavender?",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"lavender\"}"
+  },
+  {
+    "user_content": "What's the color of the square right now?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "I'm curious about the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Tell me what the square looks like",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Please set it to aqua",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"aqua\"}"
+  },
+  {
+    "user_content": "Could you make it peach?",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"peach\"}"
+  },
+  {
+    "user_content": "Would you change the color to mint?",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"mint\"}"
+  },
+  {
+    "user_content": "I'd appreciate it if you set it to ruby",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"ruby\"}"
+  },
+  {
+    "user_content": "Can I get the color please?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Would you mind telling me the color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "I need to know the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Let me know the square's color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "sky blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"sky blue\"}"
+  },
+  {
+    "user_content": "dark green",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"dark green\"}"
+  },
+  {
+    "user_content": "light blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"light blue\"}"
+  },
+  {
+    "user_content": "dark red",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"dark red\"}"
+  },
+  {
+    "user_content": "bright yellow",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"bright yellow\"}"
+  },
+  {
+    "user_content": "pale pink",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"pale pink\"}"
+  },
+  {
+    "user_content": "forest green",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"forest green\"}"
+  },
+  {
+    "user_content": "ocean blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"ocean blue\"}"
+  },
+  {
+    "user_content": "Set square to red",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"red\"}"
+  },
+  {
+    "user_content": "Square color = blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"blue\"}"
+  },
+  {
+    "user_content": "color: green",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"green\"}"
+  },
+  {
+    "user_content": "get color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "show color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "read color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Yo make it blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"blue\"}"
+  },
+  {
+    "user_content": "Hey change to red",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"red\"}"
+  },
+  {
+    "user_content": "Sup whats the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "yo color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "gimme yellow",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"yellow\"}"
+  },
+  {
+    "user_content": "hit me with that purple",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"purple\"}"
+  },
+  {
+    "user_content": "gonna need orange on that",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"orange\"}"
+  },
+  {
+    "user_content": "Just tell me the color already",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Give me green now",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"green\"}"
+  },
+  {
+    "user_content": "What color are we looking at?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "I would like to request the square be changed to azure",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"azure\"}"
+  },
+  {
+    "user_content": "Please kindly update the color to burgundy",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"burgundy\"}"
+  },
+  {
+    "user_content": "If you could, please inform me of the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "I would appreciate knowing what color the square is",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "May I request that you change it to periwinkle?",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"periwinkle\"}"
+  },
+  {
+    "user_content": "Could you kindly set the color to chartreuse?",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"chartreuse\"}"
+  },
+  {
+    "user_content": "plz blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"blue\"}"
+  },
+  {
+    "user_content": "pls red",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"red\"}"
+  },
+  {
+    "user_content": "thx color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "ty what color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "k make it green",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"green\"}"
+  },
+  {
+    "user_content": "Set the display color to amber",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"amber\"}"
+  },
+  {
+    "user_content": "Update display to scarlet",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"scarlet\"}"
+  },
+  {
+    "user_content": "Change display color to emerald",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"emerald\"}"
+  },
+  {
+    "user_content": "What is the display showing?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Get display color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Show display color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Alright, set it to cerulean",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"cerulean\"}"
+  },
+  {
+    "user_content": "OK so make it tangerine",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"tangerine\"}"
+  },
+  {
+    "user_content": "Fine, change to mauve",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"mauve\"}"
+  },
+  {
+    "user_content": "Sure, what's the color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Yeah tell me the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "So what color is it?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "And the color is?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Make it #FF0000",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"#FF0000\"}"
+  },
+  {
+    "user_content": "Set to #00FF00",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"#00FF00\"}"
+  },
+  {
+    "user_content": "Change to #0000FF",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"#0000FF\"}"
+  },
+  {
+    "user_content": "Use hex #FFFF00",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"#FFFF00\"}"
+  },
+  {
+    "user_content": "Apply #FF00FF",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"#FF00FF\"}"
+  },
+  {
+    "user_content": "Set to rgb red",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"red\"}"
+  },
+  {
+    "user_content": "First, tell me what color it is",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Before anything, what's the color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "To start, show me the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Now change it to slate",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"slate\"}"
+  },
+  {
+    "user_content": "Then make it ivory",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"ivory\"}"
+  },
+  {
+    "user_content": "After that set it to khaki",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"khaki\"}"
+  },
+  {
+    "user_content": "Can you check the color for me?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Just checking - what color is it?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Quick question - the color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "One thing - change to plum",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"plum\"}"
+  },
+  {
+    "user_content": "Real quick - make it rust",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"rust\"}"
+  },
+  {
+    "user_content": "BTW set it to jade",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"jade\"}"
+  },
+  {
+    "user_content": "FYI the color should be sapphire",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"sapphire\"}"
+  },
+  {
+    "user_content": "lmk the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "hmu with the color info",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "need the color asap",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "color pls",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Hmm make it rose",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"rose\"}"
+  },
+  {
+    "user_content": "Ugh just set it to tan",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"tan\"}"
+  },
+  {
+    "user_content": "Wow change to electric blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"electric blue\"}"
+  },
+  {
+    "user_content": "Ooh make it neon green",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"neon green\"}"
+  },
+  {
+    "user_content": "Nice! What color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Cool, show me the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Awesome, what's the color now?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Great, tell me the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "I command you to set it to fuchsia",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"fuchsia\"}"
+  },
+  {
+    "user_content": "You must change it to cobalt",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"cobalt\"}"
+  },
+  {
+    "user_content": "I order you to make it bronze",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"bronze\"}"
+  },
+  {
+    "user_content": "You are required to tell me the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "You shall inform me of the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "I hereby request the color information",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "change blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"blue\"}"
+  },
+  {
+    "user_content": "set red",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"red\"}"
+  },
+  {
+    "user_content": "make green",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"green\"}"
+  },
+  {
+    "user_content": "do yellow",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"yellow\"}"
+  },
+  {
+    "user_content": "try purple",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"purple\"}"
+  },
+  {
+    "user_content": "use orange",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"orange\"}"
+  },
+  {
+    "user_content": "whats the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "tell color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "get current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "show what color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Want blue on the square",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"blue\"}"
+  },
+  {
+    "user_content": "Need the square red",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"red\"}"
+  },
+  {
+    "user_content": "Gotta have green",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"green\"}"
+  },
+  {
+    "user_content": "Wanna see yellow",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"yellow\"}"
+  },
+  {
+    "user_content": "Curious what color is showing",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Wondering about the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Interested in the current color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Looking to know the color",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "The square needs to be steel blue",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"steel blue\"}"
+  },
+  {
+    "user_content": "I think hot pink would be nice",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"hot pink\"}"
+  },
+  {
+    "user_content": "How about sea green?",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"sea green\"}"
+  },
+  {
+    "user_content": "Maybe midnight blue?",
+    "tool_name": "set_square_color",
+    "tool_arguments": "{\"color\": \"midnight blue\"}"
+  },
+  {
+    "user_content": "Thinking about the color, what is it?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Speaking of colors, which one is active?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "On the topic of the square, what color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  },
+  {
+    "user_content": "Regarding the display, what's the color?",
+    "tool_name": "get_square_color",
+    "tool_arguments": "{}"
+  }
+]

finetuning-functiongemma/finetune_functiongemma.ipynb ADDED Viewed

	@@ -0,0 +1,690 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 🎨 Fine-tuning FunctionGemma for Square Color Control\n",
+    "\n",
+    "This notebook demonstrates how to fine-tune FunctionGemma to recognize color control commands.\n",
+    "\n",
+    "**Author:** [Your Name]\n",
+    "**Portfolio:** AI Engineering\n",
+    "\n",
+    "## Objectives\n",
+    "1. Train the model to call `set_square_color` when the user wants to change the color\n",
+    "2. Train the model to call `get_square_color` when the user asks about the current color\n",
+    "3. Support various natural language command styles"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 📦 1. Setup and Installation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install dependencies\n",
+    "%pip install -q torch tensorboard\n",
+    "%pip install -q transformers datasets accelerate evaluate trl protobuf sentencepiece\n",
+    "\n",
+    "# If running on Ampere+ GPU (A100, L4), uncomment:\n",
+    "# %pip install -q flash-attn"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Login to Hugging Face Hub\n",
+    "from huggingface_hub import login\n",
+    "\n",
+    "# If using Colab secrets:\n",
+    "# from google.colab import userdata\n",
+    "# login(token=userdata.get('HF_TOKEN'))\n",
+    "\n",
+    "# Or interactive login:\n",
+    "login()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Configuration\n",
+    "BASE_MODEL = \"google/functiongemma-270m-it\"\n",
+    "OUTPUT_DIR = \"functiongemma-square-color\"  # Model name on your HF Hub\n",
+    "LEARNING_RATE = 5e-5\n",
+    "NUM_EPOCHS = 8\n",
+    "BATCH_SIZE = 4"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 📊 2. Prepare Dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "from datasets import Dataset\n",
+    "from transformers.utils import get_json_schema\n",
+    "\n",
+    "# Tool definitions\n",
+    "def set_square_color(color: str) -> str:\n",
+    "    \"\"\"\n",
+    "    Sets the color of the square displayed on the screen.\n",
+    "    \n",
+    "    Args:\n",
+    "        color: The color to set, e.g. red, blue, green\n",
+    "    \"\"\"\n",
+    "    return f\"Color set to {color}\"\n",
+    "\n",
+    "def get_square_color() -> str:\n",
+    "    \"\"\"\n",
+    "    Returns the current color of the square.\n",
+    "    Use this when the user asks about the current color.\n",
+    "    \"\"\"\n",
+    "    return \"Current color\"\n",
+    "\n",
+    "# Generate schemas automatically\n",
+    "TOOLS = [\n",
+    "    get_json_schema(set_square_color),\n",
+    "    get_json_schema(get_square_color)\n",
+    "]\n",
+    "\n",
+    "print(\"Tool schemas:\")\n",
+    "print(json.dumps(TOOLS, indent=2))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load training dataset from file\n",
+    "with open(\"dataset/square_color_dataset.json\", \"r\") as f:\n",
+    "    square_color_dataset = json.load(f)\n",
+    "\n",
+    "print(f\"Total examples: {len(square_color_dataset)}\")\n",
+    "print(f\"  - SET: {len([x for x in square_color_dataset if x['tool_name'] == 'set_square_color'])}\")\n",
+    "print(f\"  - GET: {len([x for x in square_color_dataset if x['tool_name'] == 'get_square_color'])}\")\n",
+    "\n",
+    "# Preview first few examples\n",
+    "print(\"\\nFirst 3 examples:\")\n",
+    "for i, sample in enumerate(square_color_dataset[:3]):\n",
+    "    print(f\"  {i+1}. \\\"{sample['user_content']}\\\" → {sample['tool_name']}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Convert to conversation format\n",
+    "SYSTEM_PROMPT = \"You are a model that can do function calling with the following functions\"\n",
+    "\n",
+    "def create_conversation(sample):\n",
+    "    return {\n",
+    "        \"messages\": [\n",
+    "            {\"role\": \"developer\", \"content\": SYSTEM_PROMPT},\n",
+    "            {\"role\": \"user\", \"content\": sample[\"user_content\"]},\n",
+    "            {\n",
+    "                \"role\": \"assistant\",\n",
+    "                \"tool_calls\": [{\n",
+    "                    \"type\": \"function\",\n",
+    "                    \"function\": {\n",
+    "                        \"name\": sample[\"tool_name\"],\n",
+    "                        \"arguments\": json.loads(sample[\"tool_arguments\"])\n",
+    "                    }\n",
+    "                }]\n",
+    "            },\n",
+    "        ],\n",
+    "        \"tools\": TOOLS\n",
+    "    }\n",
+    "\n",
+    "# Create dataset\n",
+    "dataset = Dataset.from_list(square_color_dataset)\n",
+    "dataset = dataset.map(create_conversation, remove_columns=dataset.features, batched=False)\n",
+    "\n",
+    "# Split 80/20\n",
+    "dataset = dataset.train_test_split(test_size=0.2, shuffle=True, seed=42)\n",
+    "\n",
+    "print(f\"Train: {len(dataset['train'])} examples\")\n",
+    "print(f\"Test: {len(dataset['test'])} examples\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Visualize an example\n",
+    "print(\"Formatted conversation example:\")\n",
+    "print(json.dumps(dataset[\"train\"][0], indent=2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🤖 3. Load Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
+    "\n",
+    "quantization_config = BitsAndBytesConfig(load_in_4bit=True)\n",
+    "\n",
+    "# Load model and tokenizer\n",
+    "model = AutoModelForCausalLM.from_pretrained(\n",
+    "    BASE_MODEL,\n",
+    "    torch_dtype=\"auto\",\n",
+    "    device_map=\"auto\",\n",
+    "    quantization_config=quantization_config,        \n",
+    "    attn_implementation=\"eager\"\n",
+    "    \n",
+    ")\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)\n",
+    "\n",
+    "print(f\"Device: {model.device}\")\n",
+    "print(f\"DType: {model.dtype}\")\n",
+    "print(f\"Parameters: {model.num_parameters():,}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Visualize how the tokenizer formats the prompt\n",
+    "debug_msg = tokenizer.apply_chat_template(\n",
+    "    dataset[\"train\"][0][\"messages\"],\n",
+    "    tools=dataset[\"train\"][0][\"tools\"],\n",
+    "    add_generation_prompt=False,\n",
+    "    tokenize=False\n",
+    ")\n",
+    "\n",
+    "print(\"=== Formatted prompt ===\")\n",
+    "print(debug_msg)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🧪 3.5. Pre-Training Evaluation (Baseline)\n",
+    "\n",
+    "Before fine-tuning, let's evaluate the base model to establish a baseline. This helps us measure the actual improvement from fine-tuning."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def evaluate_model(model, tokenizer, test_samples, tools, system_prompt, verbose=True):\n",
+    "    \"\"\"\n",
+    "    Evaluate model on a set of test samples.\n",
+    "    Returns accuracy metrics and detailed results.\n",
+    "    \"\"\"\n",
+    "    results = {\n",
+    "        \"total\": len(test_samples),\n",
+    "        \"correct\": 0,\n",
+    "        \"correct_tool\": 0,\n",
+    "        \"correct_args\": 0,\n",
+    "        \"details\": []\n",
+    "    }\n",
+    "    \n",
+    "    for sample in test_samples:\n",
+    "        messages = [\n",
+    "            {\"role\": \"developer\", \"content\": system_prompt},\n",
+    "            {\"role\": \"user\", \"content\": sample[\"user_content\"]},\n",
+    "        ]\n",
+    "        \n",
+    "        inputs = tokenizer.apply_chat_template(\n",
+    "            messages,\n",
+    "            tools=tools,\n",
+    "            tokenize=True,\n",
+    "            add_generation_prompt=True,\n",
+    "            return_dict=True,\n",
+    "            return_tensors=\"pt\"\n",
+    "        ).to(model.device)\n",
+    "        \n",
+    "        with torch.no_grad():\n",
+    "            output = model.generate(\n",
+    "                **inputs,\n",
+    "                max_new_tokens=128,\n",
+    "                do_sample=False,\n",
+    "            )\n",
+    "        \n",
+    "        input_length = inputs['input_ids'].shape[1]\n",
+    "        response = tokenizer.decode(output[0][input_length:], skip_special_tokens=False)\n",
+    "        \n",
+    "        # Check if correct tool was called\n",
+    "        tool_correct = sample[\"tool_name\"] in response\n",
+    "        \n",
+    "        # Check if arguments are correct (for set_square_color)\n",
+    "        args_correct = False\n",
+    "        if tool_correct and sample[\"tool_name\"] == \"set_square_color\":\n",
+    "            expected_args = json.loads(sample[\"tool_arguments\"])\n",
+    "            args_correct = expected_args.get(\"color\", \"\") in response\n",
+    "        elif tool_correct and sample[\"tool_name\"] == \"get_square_color\":\n",
+    "            args_correct = True  # No args needed\n",
+    "        \n",
+    "        if tool_correct:\n",
+    "            results[\"correct_tool\"] += 1\n",
+    "        if tool_correct and args_correct:\n",
+    "            results[\"correct\"] += 1\n",
+    "            results[\"correct_args\"] += 1\n",
+    "        \n",
+    "        results[\"details\"].append({\n",
+    "            \"input\": sample[\"user_content\"],\n",
+    "            \"expected_tool\": sample[\"tool_name\"],\n",
+    "            \"expected_args\": sample[\"tool_arguments\"],\n",
+    "            \"response\": response,\n",
+    "            \"tool_correct\": tool_correct,\n",
+    "            \"args_correct\": args_correct\n",
+    "        })\n",
+    "    \n",
+    "    results[\"tool_accuracy\"] = results[\"correct_tool\"] / results[\"total\"] * 100\n",
+    "    results[\"full_accuracy\"] = results[\"correct\"] / results[\"total\"] * 100\n",
+    "    \n",
+    "    if verbose:\n",
+    "        print(f\"Tool Accuracy: {results['correct_tool']}/{results['total']} ({results['tool_accuracy']:.1f}%)\")\n",
+    "        print(f\"Full Accuracy (tool + args): {results['correct']}/{results['total']} ({results['full_accuracy']:.1f}%)\")\n",
+    "    \n",
+    "    return results"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create evaluation test set from the dataset (sample 5 SET + 5 GET)\n",
+    "import random\n",
+    "\n",
+    "random.seed(42)  # For reproducibility\n",
+    "\n",
+    "set_samples = [s for s in square_color_dataset if s[\"tool_name\"] == \"set_square_color\"]\n",
+    "get_samples = [s for s in square_color_dataset if s[\"tool_name\"] == \"get_square_color\"]\n",
+    "\n",
+    "eval_test_cases = random.sample(set_samples, min(5, len(set_samples))) + \\\n",
+    "                  random.sample(get_samples, min(5, len(get_samples)))\n",
+    "\n",
+    "print(\"=\" * 50)\n",
+    "print(\"PRE-TRAINING EVALUATION (Baseline)\")\n",
+    "print(\"=\" * 50)\n",
+    "print(f\"\\nEvaluating base model on {len(eval_test_cases)} test cases...\\n\")\n",
+    "\n",
+    "baseline_results = evaluate_model(\n",
+    "    model=model,\n",
+    "    tokenizer=tokenizer,\n",
+    "    test_samples=eval_test_cases,\n",
+    "    tools=TOOLS,\n",
+    "    system_prompt=SYSTEM_PROMPT\n",
+    ")\n",
+    "\n",
+    "# Show some example outputs\n",
+    "print(\"\\n--- Sample Outputs (Base Model) ---\")\n",
+    "for i, detail in enumerate(baseline_results[\"details\"][:4]):\n",
+    "    status = \"✅\" if detail[\"tool_correct\"] else \"❌\"\n",
+    "    print(f\"\\n{status} Input: {detail['input']}\")\n",
+    "    print(f\"   Expected: {detail['expected_tool']}\")\n",
+    "    print(f\"   Output: {detail['response'][:200]}...\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🔥 4. Fine-tuning"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
+    "\n",
+    "print(\"Reloading model for fine-tuning (without quantization)...\")\n",
+    "\n",
+    "del model\n",
+    "torch.cuda.empty_cache()\n",
+    "\n",
+    "model = AutoModelForCausalLM.from_pretrained(\n",
+    "    BASE_MODEL,\n",
+    "    torch_dtype=torch.bfloat16,\n",
+    "    device_map=\"auto\",\n",
+    "    attn_implementation=\"eager\"\n",
+    ")\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)\n",
+    "\n",
+    "print(f\"Device: {model.device}\")\n",
+    "print(f\"DType: {model.dtype}\")\n",
+    "print(f\"Parameters: {model.num_parameters():,}\")\n",
+    "print(\"Ready for fine-tuning!\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from trl import SFTConfig, SFTTrainer\n",
+    "\n",
+    "torch_dtype = model.dtype\n",
+    "\n",
+    "# Training configuration\n",
+    "args = SFTConfig(\n",
+    "    output_dir=OUTPUT_DIR,\n",
+    "    max_length=512,\n",
+    "    packing=False,\n",
+    "    num_train_epochs=NUM_EPOCHS,\n",
+    "    per_device_train_batch_size=BATCH_SIZE,\n",
+    "    gradient_checkpointing=False,\n",
+    "    optim=\"adamw_torch_fused\",\n",
+    "    logging_steps=1,\n",
+    "    eval_strategy=\"epoch\",\n",
+    "    save_strategy=\"epoch\",\n",
+    "    learning_rate=LEARNING_RATE,\n",
+    "    fp16=True if torch_dtype == torch.float16 else False,\n",
+    "    bf16=True if torch_dtype == torch.bfloat16 else False,\n",
+    "    lr_scheduler_type=\"constant\",\n",
+    "    push_to_hub=True,\n",
+    "    report_to=\"tensorboard\",\n",
+    "    load_best_model_at_end=True,\n",
+    "    metric_for_best_model=\"eval_loss\",\n",
+    ")\n",
+    "\n",
+    "# Create trainer\n",
+    "trainer = SFTTrainer(\n",
+    "    model=model,\n",
+    "    args=args,\n",
+    "    train_dataset=dataset['train'],\n",
+    "    eval_dataset=dataset['test'],\n",
+    "    processing_class=tokenizer,\n",
+    ")\n",
+    "\n",
+    "print(\"Trainer created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 🚀 Start training!\n",
+    "print(\"Starting fine-tuning...\")\n",
+    "trainer.train()\n",
+    "\n",
+    "print(\"\\n✅ Training complete!\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save final model\n",
+    "trainer.save_model()\n",
+    "print(f\"Model saved to: {OUTPUT_DIR}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 📈 5. Visualize Results"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# Extract loss history\n",
+    "log_history = trainer.state.log_history\n",
+    "\n",
+    "train_losses = [log[\"loss\"] for log in log_history if \"loss\" in log]\n",
+    "epoch_train = [log[\"epoch\"] for log in log_history if \"loss\" in log]\n",
+    "eval_losses = [log[\"eval_loss\"] for log in log_history if \"eval_loss\" in log]\n",
+    "epoch_eval = [log[\"epoch\"] for log in log_history if \"eval_loss\" in log]\n",
+    "\n",
+    "# Plot\n",
+    "plt.figure(figsize=(10, 6))\n",
+    "plt.plot(epoch_train, train_losses, label=\"Training Loss\", alpha=0.7)\n",
+    "plt.plot(epoch_eval, eval_losses, label=\"Validation Loss\", marker='o')\n",
+    "plt.xlabel(\"Epoch\")\n",
+    "plt.ylabel(\"Loss\")\n",
+    "plt.title(\"Training and Validation Loss\")\n",
+    "plt.legend()\n",
+    "plt.grid(True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🧪 6. Post-Training Evaluation\n",
+    "\n",
+    "Now let's evaluate the fine-tuned model and compare it with the baseline to measure the improvement."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\"=\" * 50)\n",
+    "print(\"POST-TRAINING EVALUATION (Fine-tuned)\")\n",
+    "print(\"=\" * 50)\n",
+    "print(f\"\\nEvaluating fine-tuned model on {len(eval_test_cases)} test cases...\\n\")\n",
+    "\n",
+    "finetuned_results = evaluate_model(\n",
+    "    model=model,\n",
+    "    tokenizer=tokenizer,\n",
+    "    test_samples=eval_test_cases,\n",
+    "    tools=TOOLS,\n",
+    "    system_prompt=SYSTEM_PROMPT\n",
+    ")\n",
+    "\n",
+    "# Show some example outputs\n",
+    "print(\"\\n--- Sample Outputs (Fine-tuned Model) ---\")\n",
+    "for i, detail in enumerate(finetuned_results[\"details\"][:4]):\n",
+    "    status = \"✅\" if detail[\"tool_correct\"] else \"❌\"\n",
+    "    print(f\"\\n{status} Input: {detail['input']}\")\n",
+    "    print(f\"   Expected: {detail['expected_tool']}\")\n",
+    "    print(f\"   Output: {detail['response'][:200]}...\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Compare baseline vs fine-tuned results\n",
+    "print(\"=\" * 60)\n",
+    "print(\"📊 COMPARISON: Baseline vs Fine-tuned\")\n",
+    "print(\"=\" * 60)\n",
+    "\n",
+    "print(f\"\\n{'Metric':<30} {'Baseline':>12} {'Fine-tuned':>12} {'Improvement':>12}\")\n",
+    "print(\"-\" * 66)\n",
+    "\n",
+    "# Tool accuracy comparison\n",
+    "tool_improvement = finetuned_results[\"tool_accuracy\"] - baseline_results[\"tool_accuracy\"]\n",
+    "print(f\"{'Tool Accuracy':<30} {baseline_results['tool_accuracy']:>11.1f}% {finetuned_results['tool_accuracy']:>11.1f}% {tool_improvement:>+11.1f}%\")\n",
+    "\n",
+    "# Full accuracy comparison\n",
+    "full_improvement = finetuned_results[\"full_accuracy\"] - baseline_results[\"full_accuracy\"]\n",
+    "print(f\"{'Full Accuracy (tool + args)':<30} {baseline_results['full_accuracy']:>11.1f}% {finetuned_results['full_accuracy']:>11.1f}% {full_improvement:>+11.1f}%\")\n",
+    "\n",
+    "print(\"-\" * 66)\n",
+    "\n",
+    "# Summary\n",
+    "if full_improvement > 0:\n",
+    "    print(f\"\\n✅ Fine-tuning improved accuracy by {full_improvement:.1f} percentage points!\")\n",
+    "elif full_improvement == 0:\n",
+    "    print(f\"\\n⚠️ No change in accuracy. Consider adjusting training parameters.\")\n",
+    "else:\n",
+    "    print(f\"\\n❌ Accuracy decreased. Check for overfitting or data issues.\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Visualization: Baseline vs Fine-tuned comparison\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "\n",
+    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
+    "\n",
+    "# Chart 1: Bar chart comparison\n",
+    "metrics = ['Tool\\nAccuracy', 'Full\\nAccuracy']\n",
+    "baseline_vals = [baseline_results[\"tool_accuracy\"], baseline_results[\"full_accuracy\"]]\n",
+    "finetuned_vals = [finetuned_results[\"tool_accuracy\"], finetuned_results[\"full_accuracy\"]]\n",
+    "\n",
+    "x = np.arange(len(metrics))\n",
+    "width = 0.35\n",
+    "\n",
+    "bars1 = axes[0].bar(x - width/2, baseline_vals, width, label='Baseline', color='#ff6b6b', alpha=0.8)\n",
+    "bars2 = axes[0].bar(x + width/2, finetuned_vals, width, label='Fine-tuned', color='#4ecdc4', alpha=0.8)\n",
+    "\n",
+    "axes[0].set_ylabel('Accuracy (%)')\n",
+    "axes[0].set_title('Model Performance: Baseline vs Fine-tuned')\n",
+    "axes[0].set_xticks(x)\n",
+    "axes[0].set_xticklabels(metrics)\n",
+    "axes[0].legend()\n",
+    "axes[0].set_ylim(0, 110)\n",
+    "axes[0].axhline(y=100, color='gray', linestyle='--', alpha=0.3)\n",
+    "\n",
+    "# Add value labels on bars\n",
+    "for bar in bars1:\n",
+    "    height = bar.get_height()\n",
+    "    axes[0].annotate(f'{height:.1f}%', xy=(bar.get_x() + bar.get_width() / 2, height),\n",
+    "                     xytext=(0, 3), textcoords=\"offset points\", ha='center', va='bottom', fontsize=10)\n",
+    "for bar in bars2:\n",
+    "    height = bar.get_height()\n",
+    "    axes[0].annotate(f'{height:.1f}%', xy=(bar.get_x() + bar.get_width() / 2, height),\n",
+    "                     xytext=(0, 3), textcoords=\"offset points\", ha='center', va='bottom', fontsize=10)\n",
+    "\n",
+    "# Chart 2: Per-sample comparison\n",
+    "sample_labels = [d[\"input\"][:20] + \"...\" for d in baseline_results[\"details\"]]\n",
+    "baseline_correct = [1 if d[\"tool_correct\"] else 0 for d in baseline_results[\"details\"]]\n",
+    "finetuned_correct = [1 if d[\"tool_correct\"] else 0 for d in finetuned_results[\"details\"]]\n",
+    "\n",
+    "x2 = np.arange(len(sample_labels))\n",
+    "width2 = 0.35\n",
+    "\n",
+    "axes[1].barh(x2 - width2/2, baseline_correct, width2, label='Baseline', color='#ff6b6b', alpha=0.8)\n",
+    "axes[1].barh(x2 + width2/2, finetuned_correct, width2, label='Fine-tuned', color='#4ecdc4', alpha=0.8)\n",
+    "\n",
+    "axes[1].set_xlabel('Correct (1) / Incorrect (0)')\n",
+    "axes[1].set_title('Per-Sample Results')\n",
+    "axes[1].set_yticks(x2)\n",
+    "axes[1].set_yticklabels(sample_labels, fontsize=8)\n",
+    "axes[1].legend(loc='lower right')\n",
+    "axes[1].set_xlim(-0.1, 1.5)\n",
+    "\n",
+    "plt.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "# Print detailed per-sample comparison\n",
+    "print(\"\\n📋 Detailed Per-Sample Comparison:\")\n",
+    "print(\"-\" * 80)\n",
+    "for i, (b, f) in enumerate(zip(baseline_results[\"details\"], finetuned_results[\"details\"])):\n",
+    "    b_status = \"✅\" if b[\"tool_correct\"] else \"❌\"\n",
+    "    f_status = \"✅\" if f[\"tool_correct\"] else \"❌\"\n",
+    "    change = \"\"\n",
+    "    if not b[\"tool_correct\"] and f[\"tool_correct\"]:\n",
+    "        change = \" 🎉 FIXED!\"\n",
+    "    elif b[\"tool_correct\"] and not f[\"tool_correct\"]:\n",
+    "        change = \" ⚠️ REGRESSED\"\n",
+    "    print(f\"{b['input'][:40]:<42} Base: {b_status}  Fine-tuned: {f_status}{change}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 📤 7. Push to Hugging Face Hub"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Push to Hub\n",
+    "trainer.push_to_hub()\n",
+    "\n",
+    "print(f\"\\n✅ Model pushed to: https://huggingface.co/{trainer.hub_model_id}\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "T4",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

finetuning-functiongemma/finetune_functiongemma_v2.ipynb ADDED Viewed

	@@ -0,0 +1,635 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 🎨 Fine-tuning FunctionGemma for Square Color Control\n",
+    "\n",
+    "This notebook demonstrates how to fine-tune FunctionGemma to recognize color control commands.\n",
+    "\n",
+    "**Author:** Harlley Oliveira\n",
+    "**Portfolio:** AI Engineering\n",
+    "\n",
+    "## Objectives\n",
+    "1. Train the model to call `set_square_color` when the user wants to change the color\n",
+    "2. Train the model to call `get_square_color` when the user asks about the current color\n",
+    "3. Support various natural language command styles"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 📦 1. Setup and Installation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install dependencies\n",
+    "%pip install -q torch tensorboard\n",
+    "%pip install -q transformers datasets accelerate evaluate trl protobuf sentencepiece"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Login to Hugging Face Hub\n",
+    "from huggingface_hub import login\n",
+    "login()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Configuration\n",
+    "BASE_MODEL = \"google/functiongemma-270m-it\"\n",
+    "OUTPUT_DIR = \"functiongemma-square-color\"\n",
+    "LEARNING_RATE = 5e-5\n",
+    "NUM_EPOCHS = 8\n",
+    "BATCH_SIZE = 4"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 📊 2. Prepare Dataset with Correct Format"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "from datasets import Dataset\n",
+    "\n",
+    "# Tool definitions (same as before)\n",
+    "def set_square_color(color: str) -> str:\n",
+    "    \"\"\"\n",
+    "    Sets the color of the square displayed on the screen.\n",
+    "    \n",
+    "    Args:\n",
+    "        color: The color to set, e.g. red, blue, green\n",
+    "    \"\"\"\n",
+    "    return f\"Color set to {color}\"\n",
+    "\n",
+    "def get_square_color() -> str:\n",
+    "    \"\"\"\n",
+    "    Returns the current color of the square.\n",
+    "    Use this when the user asks about the current color.\n",
+    "    \"\"\"\n",
+    "    return \"Current color\"\n",
+    "\n",
+    "# Get JSON schemas\n",
+    "from transformers.utils import get_json_schema\n",
+    "TOOLS = [\n",
+    "    get_json_schema(set_square_color),\n",
+    "    get_json_schema(get_square_color)\n",
+    "]\n",
+    "\n",
+    "print(\"Tool schemas:\")\n",
+    "print(json.dumps(TOOLS, indent=2))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load training dataset\n",
+    "with open(\"dataset/square_color_dataset.json\", \"r\") as f:\n",
+    "    square_color_dataset = json.load(f)\n",
+    "\n",
+    "print(f\"Total examples: {len(square_color_dataset)}\")\n",
+    "print(f\"  - SET: {len([x for x in square_color_dataset if x['tool_name'] == 'set_square_color'])}\")\n",
+    "print(f\"  - GET: {len([x for x in square_color_dataset if x['tool_name'] == 'get_square_color'])}\")\n",
+    "\n",
+    "# Preview first few examples\n",
+    "print(\"\\nFirst 3 examples:\")\n",
+    "for i, sample in enumerate(square_color_dataset[:3]):\n",
+    "    print(f\"  {i+1}. \\\"{sample['user_content']}\\\" → {sample['tool_name']}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# CRITICAL: FunctionGemma's expected output format\n",
+    "# The model should output: <start_function_call>call:func{args}<end_function_call>\n",
+    "\n",
+    "SYSTEM_PROMPT = \"You are a model that can do function calling with the following functions\"\n",
+    "\n",
+    "def format_function_call_output(tool_name: str, tool_arguments: dict) -> str:\n",
+    "    \"\"\"\n",
+    "    Format the expected output in FunctionGemma's native format.\n",
+    "    \n",
+    "    FunctionGemma outputs: <start_function_call>call:func_name{arg:<escape>value<escape>}<end_function_call>\n",
+    "    \"\"\"\n",
+    "    if not tool_arguments:\n",
+    "        # For functions with no arguments\n",
+    "        return f\"<start_function_call>call:{tool_name}{{}}<end_function_call>\"\n",
+    "    \n",
+    "    # Format arguments with <escape> tokens for string values\n",
+    "    args_parts = []\n",
+    "    for key, value in tool_arguments.items():\n",
+    "        if isinstance(value, str):\n",
+    "            args_parts.append(f\"{key}:<escape>{value}<escape>\")\n",
+    "        else:\n",
+    "            args_parts.append(f\"{key}:{value}\")\n",
+    "    \n",
+    "    args_str = \",\".join(args_parts)\n",
+    "    return f\"<start_function_call>call:{tool_name}{{{args_str}}}<end_function_call>\"\n",
+    "\n",
+    "# Test the format\n",
+    "print(\"Example outputs:\")\n",
+    "print(format_function_call_output(\"set_square_color\", {\"color\": \"blue\"}))\n",
+    "print(format_function_call_output(\"get_square_color\", {}))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import AutoTokenizer\n",
+    "\n",
+    "# Load tokenizer first to use apply_chat_template\n",
+    "tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)\n",
+    "\n",
+    "def create_training_text(sample):\n",
+    "    \"\"\"\n",
+    "    Create the full training text using FunctionGemma's chat template.\n",
+    "    \n",
+    "    The key is that we format the assistant's response in FunctionGemma's\n",
+    "    native function call format.\n",
+    "    \"\"\"\n",
+    "    tool_args = json.loads(sample[\"tool_arguments\"])\n",
+    "    expected_output = format_function_call_output(sample[\"tool_name\"], tool_args)\n",
+    "    \n",
+    "    # Create messages - note: assistant content is the raw function call format\n",
+    "    messages = [\n",
+    "        {\"role\": \"developer\", \"content\": SYSTEM_PROMPT},\n",
+    "        {\"role\": \"user\", \"content\": sample[\"user_content\"]},\n",
+    "        {\"role\": \"assistant\", \"content\": expected_output},\n",
+    "    ]\n",
+    "    \n",
+    "    # Apply chat template WITH tools to get proper function declarations\n",
+    "    text = tokenizer.apply_chat_template(\n",
+    "        messages,\n",
+    "        tools=TOOLS,\n",
+    "        tokenize=False,\n",
+    "        add_generation_prompt=False\n",
+    "    )\n",
+    "    \n",
+    "    return {\"text\": text}\n",
+    "\n",
+    "# Create dataset\n",
+    "dataset = Dataset.from_list(square_color_dataset)\n",
+    "dataset = dataset.map(create_training_text, remove_columns=dataset.features, batched=False)\n",
+    "\n",
+    "# Split 80/20\n",
+    "dataset = dataset.train_test_split(test_size=0.2, shuffle=True, seed=42)\n",
+    "\n",
+    "print(f\"Train: {len(dataset['train'])} examples\")\n",
+    "print(f\"Test: {len(dataset['test'])} examples\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Visualize a formatted example\n",
+    "print(\"=\" * 60)\n",
+    "print(\"FORMATTED TRAINING EXAMPLE\")\n",
+    "print(\"=\" * 60)\n",
+    "print(dataset[\"train\"][0][\"text\"])\n",
+    "print(\"=\" * 60)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🤖 3. Load Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from transformers import AutoModelForCausalLM\n",
+    "\n",
+    "print(\"Loading model for fine-tuning...\")\n",
+    "\n",
+    "model = AutoModelForCausalLM.from_pretrained(\n",
+    "    BASE_MODEL,\n",
+    "    dtype=torch.bfloat16,\n",
+    "    device_map=\"auto\",\n",
+    "    attn_implementation=\"eager\"\n",
+    ")\n",
+    "\n",
+    "print(f\"Device: {model.device}\")\n",
+    "print(f\"DType: {model.dtype}\")\n",
+    "print(f\"Parameters: {model.num_parameters():,}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🧪 3.5. Pre-Training Evaluation (Baseline)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import re\n",
+    "\n",
+    "def extract_function_call(text):\n",
+    "    \"\"\"\n",
+    "    Extract function call from FunctionGemma's output format.\n",
+    "    Returns (function_name, arguments_dict) or (None, None) if not found.\n",
+    "    \"\"\"\n",
+    "    pattern = r\"<start_function_call>call:(\\w+)\\{(.*)\\}<end_function_call>\"\n",
+    "    match = re.search(pattern, text, re.DOTALL)\n",
+    "    \n",
+    "    if not match:\n",
+    "        return None, None\n",
+    "    \n",
+    "    func_name = match.group(1)\n",
+    "    args_str = match.group(2)\n",
+    "    \n",
+    "    # Parse arguments\n",
+    "    args = {}\n",
+    "    if args_str.strip():\n",
+    "        # Match key:<escape>value<escape> or key:value patterns\n",
+    "        arg_pattern = r\"(\\w+):(?:<escape>(.*?)<escape>|([^,}]*))\"\n",
+    "        for m in re.finditer(arg_pattern, args_str):\n",
+    "            key = m.group(1)\n",
+    "            value = m.group(2) if m.group(2) else m.group(3)\n",
+    "            args[key] = value.strip() if value else \"\"\n",
+    "    \n",
+    "    return func_name, args\n",
+    "\n",
+    "def evaluate_model(model, tokenizer, test_samples, tools, system_prompt, verbose=True):\n",
+    "    \"\"\"\n",
+    "    Evaluate model on test samples using FunctionGemma's format.\n",
+    "    \"\"\"\n",
+    "    results = {\n",
+    "        \"total\": len(test_samples),\n",
+    "        \"correct\": 0,\n",
+    "        \"correct_tool\": 0,\n",
+    "        \"correct_args\": 0,\n",
+    "        \"details\": []\n",
+    "    }\n",
+    "    \n",
+    "    for sample in test_samples:\n",
+    "        messages = [\n",
+    "            {\"role\": \"developer\", \"content\": system_prompt},\n",
+    "            {\"role\": \"user\", \"content\": sample[\"user_content\"]},\n",
+    "        ]\n",
+    "        \n",
+    "        inputs = tokenizer.apply_chat_template(\n",
+    "            messages,\n",
+    "            tools=tools,\n",
+    "            tokenize=True,\n",
+    "            add_generation_prompt=True,\n",
+    "            return_dict=True,\n",
+    "            return_tensors=\"pt\"\n",
+    "        ).to(model.device)\n",
+    "        \n",
+    "        with torch.no_grad():\n",
+    "            output = model.generate(\n",
+    "                **inputs,\n",
+    "                max_new_tokens=128,\n",
+    "                do_sample=False,\n",
+    "            )\n",
+    "        \n",
+    "        input_length = inputs['input_ids'].shape[1]\n",
+    "        response = tokenizer.decode(output[0][input_length:], skip_special_tokens=False)\n",
+    "        \n",
+    "        # Parse the function call from response\n",
+    "        called_func, called_args = extract_function_call(response)\n",
+    "        \n",
+    "        # Check if correct tool was called\n",
+    "        tool_correct = called_func == sample[\"tool_name\"]\n",
+    "        \n",
+    "        # Check arguments\n",
+    "        args_correct = False\n",
+    "        expected_args = json.loads(sample[\"tool_arguments\"])\n",
+    "        \n",
+    "        if tool_correct:\n",
+    "            if sample[\"tool_name\"] == \"get_square_color\":\n",
+    "                args_correct = True  # No args needed\n",
+    "            elif called_args and \"color\" in called_args:\n",
+    "                args_correct = called_args.get(\"color\", \"\").lower() == expected_args.get(\"color\", \"\").lower()\n",
+    "        \n",
+    "        if tool_correct:\n",
+    "            results[\"correct_tool\"] += 1\n",
+    "        if tool_correct and args_correct:\n",
+    "            results[\"correct\"] += 1\n",
+    "            results[\"correct_args\"] += 1\n",
+    "        \n",
+    "        results[\"details\"].append({\n",
+    "            \"input\": sample[\"user_content\"],\n",
+    "            \"expected_tool\": sample[\"tool_name\"],\n",
+    "            \"expected_args\": sample[\"tool_arguments\"],\n",
+    "            \"called_func\": called_func,\n",
+    "            \"called_args\": called_args,\n",
+    "            \"response\": response,\n",
+    "            \"tool_correct\": tool_correct,\n",
+    "            \"args_correct\": args_correct\n",
+    "        })\n",
+    "    \n",
+    "    results[\"tool_accuracy\"] = results[\"correct_tool\"] / results[\"total\"] * 100\n",
+    "    results[\"full_accuracy\"] = results[\"correct\"] / results[\"total\"] * 100\n",
+    "    \n",
+    "    if verbose:\n",
+    "        print(f\"Tool Accuracy: {results['correct_tool']}/{results['total']} ({results['tool_accuracy']:.1f}%)\")\n",
+    "        print(f\"Full Accuracy (tool + args): {results['correct']}/{results['total']} ({results['full_accuracy']:.1f}%)\")\n",
+    "    \n",
+    "    return results"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create evaluation test set\n",
+    "import random\n",
+    "\n",
+    "random.seed(42)\n",
+    "\n",
+    "set_samples = [s for s in square_color_dataset if s[\"tool_name\"] == \"set_square_color\"]\n",
+    "get_samples = [s for s in square_color_dataset if s[\"tool_name\"] == \"get_square_color\"]\n",
+    "\n",
+    "test_cases = 25\n",
+    "eval_test_cases = random.sample(set_samples, min(test_cases, len(set_samples))) + \\\n",
+    "                random.sample(get_samples, min(test_cases, len(get_samples)))\n",
+    "\n",
+    "print(\"=\" * 50)\n",
+    "print(\"PRE-TRAINING EVALUATION (Baseline)\")\n",
+    "print(\"=\" * 50)\n",
+    "print(f\"\\nEvaluating base model on {len(eval_test_cases)} test cases...\\n\")\n",
+    "\n",
+    "baseline_results = evaluate_model(\n",
+    "    model=model,\n",
+    "    tokenizer=tokenizer,\n",
+    "    test_samples=eval_test_cases,\n",
+    "    tools=TOOLS,\n",
+    "    system_prompt=SYSTEM_PROMPT\n",
+    ")\n",
+    "\n",
+    "# Show sample outputs\n",
+    "print(\"\\n--- Sample Outputs (Base Model) ---\")\n",
+    "for i, detail in enumerate(baseline_results[\"details\"][:4]):\n",
+    "    status = \"✅\" if detail[\"tool_correct\"] else \"❌\"\n",
+    "    print(f\"\\n{status} Input: {detail['input']}\")\n",
+    "    print(f\"   Expected: {detail['expected_tool']}\")\n",
+    "    print(f\"   Got: {detail['called_func']} with args {detail['called_args']}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🔥 4. Fine-tuning"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from trl import SFTConfig, SFTTrainer\n",
+    "\n",
+    "torch_dtype = model.dtype\n",
+    "\n",
+    "# Training configuration\n",
+    "args = SFTConfig(\n",
+    "    output_dir=OUTPUT_DIR,\n",
+    "    max_length=512,\n",
+    "    packing=False,\n",
+    "    num_train_epochs=NUM_EPOCHS,\n",
+    "    per_device_train_batch_size=BATCH_SIZE,\n",
+    "    gradient_checkpointing=False,\n",
+    "    optim=\"adamw_torch_fused\",\n",
+    "    logging_steps=1,\n",
+    "    eval_strategy=\"epoch\",\n",
+    "    save_strategy=\"epoch\",\n",
+    "    learning_rate=LEARNING_RATE,\n",
+    "    fp16=True if torch_dtype == torch.float16 else False,\n",
+    "    bf16=True if torch_dtype == torch.bfloat16 else False,\n",
+    "    lr_scheduler_type=\"constant\",\n",
+    "    push_to_hub=True,\n",
+    "    report_to=\"tensorboard\",\n",
+    "    load_best_model_at_end=True,\n",
+    "    metric_for_best_model=\"eval_loss\",\n",
+    "    dataset_text_field=\"text\",  # IMPORTANT: specify the text field\n",
+    ")\n",
+    "\n",
+    "# Create trainer\n",
+    "trainer = SFTTrainer(\n",
+    "    model=model,\n",
+    "    args=args,\n",
+    "    train_dataset=dataset['train'],\n",
+    "    eval_dataset=dataset['test'],\n",
+    "    processing_class=tokenizer,\n",
+    ")\n",
+    "\n",
+    "print(\"Trainer created successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 🚀 Start training!\n",
+    "print(\"Starting fine-tuning...\")\n",
+    "trainer.train()\n",
+    "\n",
+    "print(\"\\n✅ Training complete!\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save final model in the original dtype (BF16)\n",
+    "# This prevents the model from being saved as FP32 (which doubles the size)\n",
+    "model.save_pretrained(OUTPUT_DIR, safe_serialization=True)\n",
+    "tokenizer.save_pretrained(OUTPUT_DIR)\n",
+    "print(f\"Model saved to: {OUTPUT_DIR}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 📈 5. Visualize Results"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# Extract loss history\n",
+    "log_history = trainer.state.log_history\n",
+    "\n",
+    "train_losses = [log[\"loss\"] for log in log_history if \"loss\" in log]\n",
+    "epoch_train = [log[\"epoch\"] for log in log_history if \"loss\" in log]\n",
+    "eval_losses = [log[\"eval_loss\"] for log in log_history if \"eval_loss\" in log]\n",
+    "epoch_eval = [log[\"epoch\"] for log in log_history if \"eval_loss\" in log]\n",
+    "\n",
+    "# Plot\n",
+    "plt.figure(figsize=(10, 6))\n",
+    "plt.plot(epoch_train, train_losses, label=\"Training Loss\", alpha=0.7)\n",
+    "plt.plot(epoch_eval, eval_losses, label=\"Validation Loss\", marker='o')\n",
+    "plt.xlabel(\"Epoch\")\n",
+    "plt.ylabel(\"Loss\")\n",
+    "plt.title(\"Training and Validation Loss\")\n",
+    "plt.legend()\n",
+    "plt.grid(True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🧪 6. Post-Training Evaluation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\"=\" * 50)\n",
+    "print(\"POST-TRAINING EVALUATION (Fine-tuned)\")\n",
+    "print(\"=\" * 50)\n",
+    "print(f\"\\nEvaluating fine-tuned model on {len(eval_test_cases)} test cases...\\n\")\n",
+    "\n",
+    "finetuned_results = evaluate_model(\n",
+    "    model=model,\n",
+    "    tokenizer=tokenizer,\n",
+    "    test_samples=eval_test_cases,\n",
+    "    tools=TOOLS,\n",
+    "    system_prompt=SYSTEM_PROMPT\n",
+    ")\n",
+    "\n",
+    "# Show sample outputs\n",
+    "print(\"\\n--- Sample Outputs (Fine-tuned Model) ---\")\n",
+    "for i, detail in enumerate(finetuned_results[\"details\"][:4]):\n",
+    "    status = \"✅\" if detail[\"tool_correct\"] else \"❌\"\n",
+    "    print(f\"\\n{status} Input: {detail['input']}\")\n",
+    "    print(f\"   Expected: {detail['expected_tool']}\")\n",
+    "    print(f\"   Got: {detail['called_func']} with args {detail['called_args']}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Compare baseline vs fine-tuned\n",
+    "print(\"=\" * 60)\n",
+    "print(\"📊 COMPARISON: Baseline vs Fine-tuned\")\n",
+    "print(\"=\" * 60)\n",
+    "\n",
+    "print(f\"\\n{'Metric':<30} {'Baseline':>12} {'Fine-tuned':>12} {'Improvement':>12}\")\n",
+    "print(\"-\" * 66)\n",
+    "\n",
+    "tool_improvement = finetuned_results[\"tool_accuracy\"] - baseline_results[\"tool_accuracy\"]\n",
+    "print(f\"{'Tool Accuracy':<30} {baseline_results['tool_accuracy']:>11.1f}% {finetuned_results['tool_accuracy']:>11.1f}% {tool_improvement:>+11.1f}%\")\n",
+    "\n",
+    "full_improvement = finetuned_results[\"full_accuracy\"] - baseline_results[\"full_accuracy\"]\n",
+    "print(f\"{'Full Accuracy (tool + args)':<30} {baseline_results['full_accuracy']:>11.1f}% {finetuned_results['full_accuracy']:>11.1f}% {full_improvement:>+11.1f}%\")\n",
+    "\n",
+    "print(\"-\" * 66)\n",
+    "\n",
+    "if full_improvement > 0:\n",
+    "    print(f\"\\n✅ Fine-tuning improved accuracy by {full_improvement:.1f} percentage points!\")\n",
+    "elif full_improvement == 0:\n",
+    "    print(f\"\\n⚠️ No change in accuracy.\")\n",
+    "else:\n",
+    "    print(f\"\\n❌ Accuracy decreased. Check for overfitting or data issues.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 📤 7. Push to Hugging Face Hub"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Push to Hub\n",
+    "trainer.push_to_hub()\n",
+    "\n",
+    "print(f\"\\n✅ Model pushed to: https://huggingface.co/{trainer.hub_model_id}\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "T4",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

src/worker.ts CHANGED Viewed

@@ -31,7 +31,7 @@ function getModel(onProgress?: ProgressCallback) {
         progress_callback: onProgress,
       }),
       AutoModelForCausalLM.from_pretrained(MODEL_ID, {
-        dtype: "q4",
         device: "webgpu",
         progress_callback: onProgress,
       }),

         progress_callback: onProgress,
       }),
       AutoModelForCausalLM.from_pretrained(MODEL_ID, {
+        dtype: "fp16",
         device: "webgpu",
         progress_callback: onProgress,
       }),