SmallEvals — QA Generation Model (Qwen3-0.6B)
Repository (HF): mburaksayici/golden_generate_qwen_0.6b_v2
Repository (GGUF): mburaksayici/golden_generate_qwen_0.6b_v2_gguf
Base Model: Qwen3-0.6B
Format: FP16 + GGUF quantizations
Primary Use: QA generation for RAG evaluation
This model is part of SmallEvals — an open-source framework for evaluating retrieval-augmented generation systems by generating high-quality golden QA datasets.
Overview
This model is fine-tuned to extract single atomic question-answer pairs from a passage.
Designed for:
- Golden dataset generation
- RAG benchmarking
- Chunk validation
- Evaluation corpus bootstrapping
- Retriever quality testing
Training Data
The model was trained on:
- TriviaQA
- SQuAD 2.0
- Hand-curated synthetic data generated using Qwen-70B
Training focused on:
- Grounded questions
- Single-fact answers
- JSON-only outputs
- Minimal verbosity
Prompt Template
The model expects the following instruction format:
Given the passage below, extract ONE question/answer pair grounded strictly in a single atomic fact.
PASSAGE:
"<.<passage>.>"
Return ONLY a JSON object.
Example Output
{
"question": "When was the Eiffel Tower completed?",
"answer": "1889"
}
Inference
llama.cpp
llama-cli \
--hf mburaksayici/golden_generate_qwen_0.6b_v2_gguf \
-p "Given the passage below, extract ONE question/answer pair grounded strictly in a single atomic fact.\n\nPASSAGE:\n\"The Eiffel Tower was completed in 1889.\"\n\nReturn ONLY a JSON object."
Ollama
This repository includes an Ollama Modelfile.
ollama run mburaksayici/golden_generate_qwen_0.6b_v2_gguf
Hugging Face Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mburaksayici/golden_generate_qwen_0.6b_v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
prompt = """Given the passage below, extract ONE question/answer pair grounded strictly in a single atomic fact.
PASSAGE:
"The Great Wall of China was built over several centuries."
Return ONLY a JSON object.
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Available Files (GGUF)
| File | Description |
|---|---|
Qwen3-0.6B.F16.gguf |
Full precision |
Qwen3-0.6B.Q8_0.gguf |
Best quality quant |
Qwen3-0.6B.Q5_K_M.gguf |
Balanced |
Qwen3-0.6B.Q4_K_M.gguf |
Fast + compact |
Intended Use
✅ RAG evaluation
✅ QA dataset generation
✅ Retriever testing
✅ Chunk quality scoring
✅ Benchmark creation
Not Intended For
❌ Chatbots
❌ Creative writing
❌ Long-form summarization
❌ Instruction-following
❌ Multi-hop reasoning
License
Apache-2.0
(Base model license applies)
Related Projects
- SmallEvals
- EvalVD
- ChunkTuner
- Golden-QAG
- RAG-Boilerplate
Available Model files:
Qwen3-0.6B.F16.ggufQwen3-0.6B.Q5_K_M.ggufQwen3-0.6B.Q8_0.ggufQwen3-0.6B.Q4_K_M.gguf
- Downloads last month
- 128
4-bit
5-bit
8-bit
16-bit