|
|
|
|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- causal-lm |
|
|
- instruction-following |
|
|
- loRA |
|
|
- QLoRA |
|
|
- quantized |
|
|
language: en |
|
|
library_name: transformers |
|
|
base_model: microsoft/phi-2 |
|
|
--- |
|
|
|
|
|
# Phi-2 QLoRA Fine-Tuned Model |
|
|
|
|
|
|
|
|
**Model:** `mishrabp/phi2-custom-response-qlora-adapter` |
|
|
|
|
|
**Base Model:** [`microsoft/phi-2`](https://huggingface.co/microsoft/phi-2) |
|
|
|
|
|
**Fine-Tuning Method:** QLoRA (4-bit quantized LoRA) |
|
|
|
|
|
**Task:** Instruction-following / Customer Support Responses |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This repository contains a **Phi-2 language model fine-tuned using QLoRA** on a synthetic dataset of customer support instructions and responses. The fine-tuning uses **4-bit quantized LoRA adapters** for memory-efficient training and can run on GPU or CPU (slower on CPU). |
|
|
|
|
|
The model is designed for **instruction-following tasks** like customer support, FAQs, or other dialog generation tasks. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The fine-tuning dataset is synthetic, consisting of 3000 instruction-response pairs: |
|
|
|
|
|
**Example:** |
|
|
|
|
|
```text |
|
|
Instruction: "Customer asks about refund window #1" |
|
|
Response: "Our refund window is 30 days from delivery." |
|
|
``` |
|
|
|
|
|
Here is the dataset that was used for fine-tunning: |
|
|
https://huggingface.co/datasets/mishrabp/customer-support-responses/resolve/main/train.csv |
|
|
|
|
|
You can replace the dataset with your own CSV/JSON file to train on real-world data. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
* Generate responses to instructions in customer support scenarios. |
|
|
* Small-scale instruction-following experiments. |
|
|
* Educational or research purposes. |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Use |
|
|
|
|
|
### Load the Fine-Tuned Model |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
from peft import PeftModel |
|
|
|
|
|
# ----------------------------- |
|
|
# Load fine-tuned model from HF |
|
|
# ----------------------------- |
|
|
model_name = "mishrabp/phi2-custom-response-qlora-adapter" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2") |
|
|
model = PeftModel.from_pretrained(base_model, model_name) |
|
|
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
model.to(device) |
|
|
|
|
|
# ----------------------------- |
|
|
# Sample evaluation dataset |
|
|
# ----------------------------- |
|
|
eval_data = [ |
|
|
{"instruction": "Customer asks about refund window", "reference": "Our refund window is 30 days from delivery."}, |
|
|
{"instruction": "Order arrived late", "reference": "Sorry for the delay. A delivery credit has been applied."}, |
|
|
{"instruction": "Wrong item received", "reference": "We’ll ship the correct item and provide a return label."}, |
|
|
] |
|
|
|
|
|
# ----------------------------- |
|
|
# Evaluation loop |
|
|
# ----------------------------- |
|
|
for i, example in enumerate(eval_data, 1): |
|
|
prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(device) |
|
|
output_ids = model.generate(**inputs, max_new_tokens=50) |
|
|
generated = tokenizer.decode(output_ids[0], skip_special_tokens=True) |
|
|
|
|
|
print(f"Example {i}") |
|
|
print("Instruction:", example["instruction"]) |
|
|
print("Generated Response:", generated.split("### Response:")[-1].strip()) |
|
|
print("Reference Response:", example["reference"]) |
|
|
print("-" * 50) |
|
|
|
|
|
# ----------------------------- |
|
|
# Optional: compute simple token-level accuracy or BLEU |
|
|
# ----------------------------- |
|
|
from nltk.translate.bleu_score import sentence_bleu |
|
|
|
|
|
bleu_scores = [] |
|
|
for example in eval_data: |
|
|
prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(device) |
|
|
output_ids = model.generate(**inputs, max_new_tokens=50) |
|
|
generated = tokenizer.decode(output_ids[0], skip_special_tokens=True).split("### Response:")[-1].strip() |
|
|
|
|
|
reference_tokens = example["reference"].split() |
|
|
generated_tokens = generated.split() |
|
|
bleu = sentence_bleu([reference_tokens], generated_tokens) |
|
|
bleu_scores.append(bleu) |
|
|
|
|
|
print("Average BLEU score:", sum(bleu_scores)/len(bleu_scores)) |
|
|
|
|
|
|
|
|
|
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Script |
|
|
|
|
|
The training script performs the following steps: |
|
|
|
|
|
1. Loads the **Phi-2 base model**. |
|
|
2. Creates a **synthetic dataset** of instruction-response pairs. |
|
|
3. Tokenizes and formats the dataset for causal language modeling. |
|
|
4. Applies a **LoRA adapter**. |
|
|
5. Trains using **QLoRA** if GPU is available, otherwise full-precision LoRA on CPU. |
|
|
6. Saves the adapter and tokenizer to `./phi2-qlora`. |
|
|
7. Pushes the adapter and tokenizer to Hugging Face Hub. |
|
|
|
|
|
### Requirements |
|
|
|
|
|
```bash |
|
|
pip install torch transformers peft datasets huggingface_hub python-dotenv |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Parameters |
|
|
|
|
|
* `r=8`, `lora_alpha=16`, `lora_dropout=0.05` |
|
|
* `target_modules=["q_proj","v_proj"]` (adjust for different base models) |
|
|
* Learning rate: `2e-4` |
|
|
* Batch si |
|
|
|
|
|
|