Update README.md

8165002 verified 8 days ago

4.89 kB

	---
	language:
	- en
	base_model:
	- google/functiongemma-270m-it
	---
	## FunctionGemma-270M-IT RAG

	This is a fine-tuned derivative of `google/functiongemma-270m-it`, optimized for lightweight Retrieval-Augmented Generation (RAG) on mobile / edge / low-power devices. The fine-tune specializes the model to consistently emit a tool call to `vector_search`—with a well-formed, high-recall search query—when the user asks a natural-language question that should be answered from a document store.

	It’s intended to be used as the “retrieval controller” in a local-first RAG pipeline:
	User question → model generates `vector_search(query=…)` → system retrieves passages → (optional) downstream answer model composes final response.

	### Base model

	- Base: `google/functiongemma-270m-it` (Gemma 3 270M family), a small model tuned specifically for function calling. ([Google AI for Developers](https://ai.google.dev/gemma/docs/functiongemma "FunctionGemma model overview \| Google AI for Developers"))

	- Interface & formatting: Uses FunctionGemma’s special control tokens for tool use (e.g., `<start_function_call>…<end_function_call>`) and the `<escape>` delimiter for string fields. ([Google AI for Developers](https://ai.google.dev/gemma/docs/functiongemma/formatting-and-best-practices "FunctionGemma formatting and best practices \| Google AI for Developers"))

	- Context length (base): 32K total input context (and up to 32K output context per request, budget permitting). ([Hugging Face](https://huggingface.co/google/functiongemma-270m-it "google/functiongemma-270m-it · Hugging Face"))


	### What’s new in this fine-tune

	Primary behavioral change: When asked questions in natural language, the model reliably chooses to call:

	- `vector_search`

	- with a single string argument: a retrieval query designed to maximize recall and relevance for downstream passage ranking.


	Example behavior (from your eval set):

	- Prompt: “Can you compare the political systems of the Roman Republic and the Aztec Empire… succession and social mobility?”
	Output: `<start_function_call>call:vector_search{query:<escape>Roman Republic vs Aztec Empire political systems succession social mobility ...<escape>}<end_function_call>` ✅


	(Additional examples include VAR vs VAR review, journalism ethics across platforms, intrinsic vs extrinsic motivation, bench vs jury trial, Rodin image sources.)

	### Intended use

	Designed for:

	- On-device or constrained deployments (mobile apps, embedded, low-cost CPU boxes) that need fast, local routing to retrieval. FunctionGemma is explicitly positioned as a lightweight base for local-first agents and edge workflows. ([Google AI for Developers](https://ai.google.dev/gemma/docs/functiongemma "FunctionGemma model overview \| Google AI for Developers"))

	- RAG systems where the most important skill is producing the right search query, not writing the final answer.


	Not designed for:

	- Being the sole “answer model” for complex, high-stakes, or deeply reasoned tasks (it’s small; use it to retrieve, then answer with a stronger model if needed).

	- Multi-step tool plans out of the box (FunctionGemma’s training is strongest for single-turn / parallel calls; multi-step chaining isn’t its primary trained workflow). ([Google AI for Developers](https://ai.google.dev/gemma/docs/functiongemma/formatting-and-best-practices "FunctionGemma formatting and best practices \| Google AI for Developers"))


	### Tool contract

	This fine-tune assumes a tool with the following conceptual signature:

	- Tool name: `vector_search`

	- Arguments:

	- `query` (string): a search query describing the user’s information need

	- Returns: passages/snippets (top-k) with metadata (titles/urls/ids), which are then fed into a downstream step.


	Important formatting note: String values in tool blocks must be wrapped in `<escape>…<escape>` to avoid parsing ambiguity. ([Google AI for Developers](https://ai.google.dev/gemma/docs/functiongemma/formatting-and-best-practices "FunctionGemma formatting and best practices \| Google AI for Developers"))

	### How to use (recommended pattern)

	1. Run the model on the user question.

	2. If the output contains a `vector_search` call, execute retrieval.

	3. Feed retrieved passages to:

	- either the same model (if you accept lower-quality synthesis), or

	- a larger model for final answer generation.


	If you are using the Hugging Face tooling, FunctionGemma models are typically used via chat templates that support tool definitions and function-call decoding. ([Hugging Face](https://huggingface.co/google/functiongemma-270m-it "google/functiongemma-270m-it · Hugging Face"))