Phi-2 QLoRA Fine-Tuned Model

Model: mishrabp/phi2-custom-response-qlora-adapter

Base Model: microsoft/phi-2

Fine-Tuning Method: QLoRA (4-bit quantized LoRA)

Task: Instruction-following / Customer Support Responses


Model Description

This repository contains a Phi-2 language model fine-tuned using QLoRA on a synthetic dataset of customer support instructions and responses. The fine-tuning uses 4-bit quantized LoRA adapters for memory-efficient training and can run on GPU or CPU (slower on CPU).

The model is designed for instruction-following tasks like customer support, FAQs, or other dialog generation tasks.


Training Data

The fine-tuning dataset is synthetic, consisting of 3000 instruction-response pairs:

Example:

Instruction: "Customer asks about refund window #1"
Response: "Our refund window is 30 days from delivery."

Here is the dataset that was used for fine-tunning: https://huggingface.co/datasets/mishrabp/customer-support-responses/resolve/main/train.csv

You can replace the dataset with your own CSV/JSON file to train on real-world data.


Intended Use

  • Generate responses to instructions in customer support scenarios.
  • Small-scale instruction-following experiments.
  • Educational or research purposes.

How to Use

Load the Fine-Tuned Model

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# -----------------------------
# Load fine-tuned model from HF
# -----------------------------
model_name = "mishrabp/phi2-custom-response-qlora-adapter"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
model = PeftModel.from_pretrained(base_model, model_name)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# -----------------------------
# Sample evaluation dataset
# -----------------------------
eval_data = [
    {"instruction": "Customer asks about refund window", "reference": "Our refund window is 30 days from delivery."},
    {"instruction": "Order arrived late", "reference": "Sorry for the delay. A delivery credit has been applied."},
    {"instruction": "Wrong item received", "reference": "We’ll ship the correct item and provide a return label."},
]

# -----------------------------
# Evaluation loop
# -----------------------------
for i, example in enumerate(eval_data, 1):
    prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:"
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    output_ids = model.generate(**inputs, max_new_tokens=50)
    generated = tokenizer.decode(output_ids[0], skip_special_tokens=True)

    print(f"Example {i}")
    print("Instruction:", example["instruction"])
    print("Generated Response:", generated.split("### Response:")[-1].strip())
    print("Reference Response:", example["reference"])
    print("-" * 50)

# -----------------------------
# Optional: compute simple token-level accuracy or BLEU
# -----------------------------
from nltk.translate.bleu_score import sentence_bleu

bleu_scores = []
for example in eval_data:
    prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:"
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    output_ids = model.generate(**inputs, max_new_tokens=50)
    generated = tokenizer.decode(output_ids[0], skip_special_tokens=True).split("### Response:")[-1].strip()

    reference_tokens = example["reference"].split()
    generated_tokens = generated.split()
    bleu = sentence_bleu([reference_tokens], generated_tokens)
    bleu_scores.append(bleu)

print("Average BLEU score:", sum(bleu_scores)/len(bleu_scores))



Training Script

The training script performs the following steps:

  1. Loads the Phi-2 base model.
  2. Creates a synthetic dataset of instruction-response pairs.
  3. Tokenizes and formats the dataset for causal language modeling.
  4. Applies a LoRA adapter.
  5. Trains using QLoRA if GPU is available, otherwise full-precision LoRA on CPU.
  6. Saves the adapter and tokenizer to ./phi2-qlora.
  7. Pushes the adapter and tokenizer to Hugging Face Hub.

Requirements

pip install torch transformers peft datasets huggingface_hub python-dotenv

Parameters

  • r=8, lora_alpha=16, lora_dropout=0.05
  • target_modules=["q_proj","v_proj"] (adjust for different base models)
  • Learning rate: 2e-4
  • Batch si
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mishrabp/phi2-custom-response-qlora-adapter

Base model

microsoft/phi-2
Finetuned
(445)
this model