SmallEvals — QA Generation Model (Qwen3-0.6B)

Repository (HF): mburaksayici/golden_generate_qwen_0.6b_v2
Repository (GGUF): mburaksayici/golden_generate_qwen_0.6b_v2_gguf
Base Model: Qwen3-0.6B
Format: FP16 + GGUF quantizations
Primary Use: QA generation for RAG evaluation

This model is part of SmallEvals — an open-source framework for evaluating retrieval-augmented generation systems by generating high-quality golden QA datasets.

Overview

This model is fine-tuned to extract single atomic question-answer pairs from a passage.

Designed for:

Golden dataset generation
RAG benchmarking
Chunk validation
Evaluation corpus bootstrapping
Retriever quality testing

Training Data

The model was trained on:

TriviaQA
SQuAD 2.0
Hand-curated synthetic data generated using Qwen-70B

Training focused on:

Grounded questions
Single-fact answers
JSON-only outputs
Minimal verbosity

Prompt Template

The model expects the following instruction format:

Given the passage below, extract ONE question/answer pair grounded strictly in a single atomic fact.

PASSAGE:
"<.<passage>.>"

Return ONLY a JSON object.

Example Output

{
  "question": "When was the Eiffel Tower completed?",
  "answer": "1889"
}

Inference

llama.cpp

llama-cli \
  --hf mburaksayici/golden_generate_qwen_0.6b_v2_gguf \
  -p "Given the passage below, extract ONE question/answer pair grounded strictly in a single atomic fact.\n\nPASSAGE:\n\"The Eiffel Tower was completed in 1889.\"\n\nReturn ONLY a JSON object."

Ollama

This repository includes an Ollama Modelfile.

ollama run mburaksayici/golden_generate_qwen_0.6b_v2_gguf

Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mburaksayici/golden_generate_qwen_0.6b_v2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = """Given the passage below, extract ONE question/answer pair grounded strictly in a single atomic fact.

PASSAGE:
"The Great Wall of China was built over several centuries."

Return ONLY a JSON object.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Available Files (GGUF)

File	Description
`Qwen3-0.6B.F16.gguf`	Full precision
`Qwen3-0.6B.Q8_0.gguf`	Best quality quant
`Qwen3-0.6B.Q5_K_M.gguf`	Balanced
`Qwen3-0.6B.Q4_K_M.gguf`	Fast + compact

Intended Use

✅ RAG evaluation
✅ QA dataset generation
✅ Retriever testing
✅ Chunk quality scoring
✅ Benchmark creation

Not Intended For

❌ Chatbots
❌ Creative writing
❌ Long-form summarization
❌ Instruction-following
❌ Multi-hop reasoning

License

Apache-2.0
(Base model license applies)

Related Projects

SmallEvals
EvalVD
ChunkTuner
Golden-QAG
RAG-Boilerplate

Available Model files:

Qwen3-0.6B.F16.gguf
Qwen3-0.6B.Q5_K_M.gguf
Qwen3-0.6B.Q8_0.gguf
Qwen3-0.6B.Q4_K_M.gguf

Downloads last month: 128

GGUF

Model size

0.6B params

Architecture

qwen3

Hardware compatibility

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mburaksayici/golden_generate_qwen_0.6b_v2_gguf

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Quantized

(207)

this model

mburaksayici
/

golden_generate_qwen_0.6b_v2_gguf