mishrabp's picture
Upload README.md with huggingface_hub
31e424e verified
---
license: mit
tags:
- causal-lm
- instruction-following
- loRA
- QLoRA
- quantized
language: en
library_name: transformers
base_model: microsoft/phi-2
---
# Phi-2 QLoRA Fine-Tuned Model
**Model:** `mishrabp/phi2-custom-response-qlora-adapter`
**Base Model:** [`microsoft/phi-2`](https://huggingface.co/microsoft/phi-2)
**Fine-Tuning Method:** QLoRA (4-bit quantized LoRA)
**Task:** Instruction-following / Customer Support Responses
---
## Model Description
This repository contains a **Phi-2 language model fine-tuned using QLoRA** on a synthetic dataset of customer support instructions and responses. The fine-tuning uses **4-bit quantized LoRA adapters** for memory-efficient training and can run on GPU or CPU (slower on CPU).
The model is designed for **instruction-following tasks** like customer support, FAQs, or other dialog generation tasks.
---
## Training Data
The fine-tuning dataset is synthetic, consisting of 3000 instruction-response pairs:
**Example:**
```text
Instruction: "Customer asks about refund window #1"
Response: "Our refund window is 30 days from delivery."
```
Here is the dataset that was used for fine-tunning:
https://huggingface.co/datasets/mishrabp/customer-support-responses/resolve/main/train.csv
You can replace the dataset with your own CSV/JSON file to train on real-world data.
---
## Intended Use
* Generate responses to instructions in customer support scenarios.
* Small-scale instruction-following experiments.
* Educational or research purposes.
---
## How to Use
### Load the Fine-Tuned Model
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# -----------------------------
# Load fine-tuned model from HF
# -----------------------------
model_name = "mishrabp/phi2-custom-response-qlora-adapter"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
model = PeftModel.from_pretrained(base_model, model_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# -----------------------------
# Sample evaluation dataset
# -----------------------------
eval_data = [
{"instruction": "Customer asks about refund window", "reference": "Our refund window is 30 days from delivery."},
{"instruction": "Order arrived late", "reference": "Sorry for the delay. A delivery credit has been applied."},
{"instruction": "Wrong item received", "reference": "We’ll ship the correct item and provide a return label."},
]
# -----------------------------
# Evaluation loop
# -----------------------------
for i, example in enumerate(eval_data, 1):
prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
output_ids = model.generate(**inputs, max_new_tokens=50)
generated = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(f"Example {i}")
print("Instruction:", example["instruction"])
print("Generated Response:", generated.split("### Response:")[-1].strip())
print("Reference Response:", example["reference"])
print("-" * 50)
# -----------------------------
# Optional: compute simple token-level accuracy or BLEU
# -----------------------------
from nltk.translate.bleu_score import sentence_bleu
bleu_scores = []
for example in eval_data:
prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
output_ids = model.generate(**inputs, max_new_tokens=50)
generated = tokenizer.decode(output_ids[0], skip_special_tokens=True).split("### Response:")[-1].strip()
reference_tokens = example["reference"].split()
generated_tokens = generated.split()
bleu = sentence_bleu([reference_tokens], generated_tokens)
bleu_scores.append(bleu)
print("Average BLEU score:", sum(bleu_scores)/len(bleu_scores))
```
---
## Training Script
The training script performs the following steps:
1. Loads the **Phi-2 base model**.
2. Creates a **synthetic dataset** of instruction-response pairs.
3. Tokenizes and formats the dataset for causal language modeling.
4. Applies a **LoRA adapter**.
5. Trains using **QLoRA** if GPU is available, otherwise full-precision LoRA on CPU.
6. Saves the adapter and tokenizer to `./phi2-qlora`.
7. Pushes the adapter and tokenizer to Hugging Face Hub.
### Requirements
```bash
pip install torch transformers peft datasets huggingface_hub python-dotenv
```
---
## Parameters
* `r=8`, `lora_alpha=16`, `lora_dropout=0.05`
* `target_modules=["q_proj","v_proj"]` (adjust for different base models)
* Learning rate: `2e-4`
* Batch si