File size: 4,893 Bytes
24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 24665a5 8165002 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
language:
- en
base_model:
- google/functiongemma-270m-it
---
## FunctionGemma-270M-IT RAG
This is a fine-tuned derivative of `google/functiongemma-270m-it`, optimized for **lightweight Retrieval-Augmented Generation (RAG)** on **mobile / edge / low-power devices**. The fine-tune specializes the model to **consistently emit a tool call to `vector_search`**—with a well-formed, high-recall search query—when the user asks a natural-language question that should be answered from a document store.
It’s intended to be used as the **“retrieval controller”** in a local-first RAG pipeline:
**User question → model generates `vector_search(query=…)` → system retrieves passages → (optional) downstream answer model composes final response**.
### Base model
- **Base:** `google/functiongemma-270m-it` (Gemma 3 270M family), a small model tuned specifically for function calling. ([Google AI for Developers](https://ai.google.dev/gemma/docs/functiongemma "FunctionGemma model overview | Google AI for Developers"))
- **Interface & formatting:** Uses FunctionGemma’s special control tokens for tool use (e.g., `<start_function_call>…<end_function_call>`) and the `<escape>` delimiter for string fields. ([Google AI for Developers](https://ai.google.dev/gemma/docs/functiongemma/formatting-and-best-practices "FunctionGemma formatting and best practices | Google AI for Developers"))
- **Context length (base):** 32K total input context (and up to 32K output context per request, budget permitting). ([Hugging Face](https://huggingface.co/google/functiongemma-270m-it "google/functiongemma-270m-it · Hugging Face"))
### What’s new in this fine-tune
**Primary behavioral change:** When asked questions in natural language, the model reliably chooses to call:
- `vector_search`
- with a **single string argument**: a retrieval query designed to maximize recall and relevance for downstream passage ranking.
**Example behavior (from your eval set):**
- **Prompt:** “Can you compare the political systems of the Roman Republic and the Aztec Empire… succession and social mobility?”
**Output:** `<start_function_call>call:vector_search{query:<escape>Roman Republic vs Aztec Empire political systems succession social mobility ...<escape>}<end_function_call>` ✅
(Additional examples include VAR vs VAR review, journalism ethics across platforms, intrinsic vs extrinsic motivation, bench vs jury trial, Rodin image sources.)
### Intended use
**Designed for:**
- On-device or constrained deployments (mobile apps, embedded, low-cost CPU boxes) that need **fast, local routing to retrieval**. FunctionGemma is explicitly positioned as a lightweight base for local-first agents and edge workflows. ([Google AI for Developers](https://ai.google.dev/gemma/docs/functiongemma "FunctionGemma model overview | Google AI for Developers"))
- RAG systems where **the most important skill is producing the right search query**, not writing the final answer.
**Not designed for:**
- Being the sole “answer model” for complex, high-stakes, or deeply reasoned tasks (it’s small; use it to retrieve, then answer with a stronger model if needed).
- Multi-step tool plans out of the box (FunctionGemma’s training is strongest for single-turn / parallel calls; multi-step chaining isn’t its primary trained workflow). ([Google AI for Developers](https://ai.google.dev/gemma/docs/functiongemma/formatting-and-best-practices "FunctionGemma formatting and best practices | Google AI for Developers"))
### Tool contract
This fine-tune assumes a tool with the following conceptual signature:
- **Tool name:** `vector_search`
- **Arguments:**
- `query` (string): a search query describing the user’s information need
- **Returns:** passages/snippets (top-k) with metadata (titles/urls/ids), which are then fed into a downstream step.
**Important formatting note:** String values in tool blocks must be wrapped in `<escape>…<escape>` to avoid parsing ambiguity. ([Google AI for Developers](https://ai.google.dev/gemma/docs/functiongemma/formatting-and-best-practices "FunctionGemma formatting and best practices | Google AI for Developers"))
### How to use (recommended pattern)
1. **Run the model** on the user question.
2. If the output contains a `vector_search` call, execute retrieval.
3. Feed retrieved passages to:
- either the same model (if you accept lower-quality synthesis), or
- a larger model for final answer generation.
If you are using the Hugging Face tooling, FunctionGemma models are typically used via chat templates that support tool definitions and function-call decoding. ([Hugging Face](https://huggingface.co/google/functiongemma-270m-it "google/functiongemma-270m-it · Hugging Face")) |