File size: 4,732 Bytes
6f1f755
2639057
6f1f755
 
 
 
 
 
 
 
2639057
6f1f755
2639057
 
6f1f755
2639057
 
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
 
 
 
2639057
6f1f755
 
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
 
 
2639057
6f1f755
2639057
6f1f755
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31e424e
6f1f755
 
 
 
 
 
31e424e
6f1f755
 
 
 
 
 
 
 
 
 
31e424e
6f1f755
 
31e424e
6f1f755
 
 
 
 
 
 
 
aa98ef8
31e424e
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
 
 
 
 
 
 
2639057
6f1f755
2639057
6f1f755
 
 
2639057
6f1f755
2639057
6f1f755
2639057
6f1f755
 
 
 
2639057
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158

---
license: mit
tags:
  - causal-lm
  - instruction-following
  - loRA
  - QLoRA
  - quantized
language: en
library_name: transformers
base_model: microsoft/phi-2
---

# Phi-2 QLoRA Fine-Tuned Model


**Model:** `mishrabp/phi2-custom-response-qlora-adapter`

**Base Model:** [`microsoft/phi-2`](https://huggingface.co/microsoft/phi-2)

**Fine-Tuning Method:** QLoRA (4-bit quantized LoRA)

**Task:** Instruction-following / Customer Support Responses

---

## Model Description

This repository contains a **Phi-2 language model fine-tuned using QLoRA** on a synthetic dataset of customer support instructions and responses. The fine-tuning uses **4-bit quantized LoRA adapters** for memory-efficient training and can run on GPU or CPU (slower on CPU).

The model is designed for **instruction-following tasks** like customer support, FAQs, or other dialog generation tasks.

---

## Training Data

The fine-tuning dataset is synthetic, consisting of 3000 instruction-response pairs:

**Example:**

```text
Instruction: "Customer asks about refund window #1"
Response: "Our refund window is 30 days from delivery."
```

Here is the dataset that was used for fine-tunning:
https://huggingface.co/datasets/mishrabp/customer-support-responses/resolve/main/train.csv

You can replace the dataset with your own CSV/JSON file to train on real-world data.

---

## Intended Use

* Generate responses to instructions in customer support scenarios.
* Small-scale instruction-following experiments.
* Educational or research purposes.

---

## How to Use

### Load the Fine-Tuned Model

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# -----------------------------
# Load fine-tuned model from HF
# -----------------------------
model_name = "mishrabp/phi2-custom-response-qlora-adapter"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
model = PeftModel.from_pretrained(base_model, model_name)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# -----------------------------
# Sample evaluation dataset
# -----------------------------
eval_data = [
    {"instruction": "Customer asks about refund window", "reference": "Our refund window is 30 days from delivery."},
    {"instruction": "Order arrived late", "reference": "Sorry for the delay. A delivery credit has been applied."},
    {"instruction": "Wrong item received", "reference": "We’ll ship the correct item and provide a return label."},
]

# -----------------------------
# Evaluation loop
# -----------------------------
for i, example in enumerate(eval_data, 1):
    prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:"
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    output_ids = model.generate(**inputs, max_new_tokens=50)
    generated = tokenizer.decode(output_ids[0], skip_special_tokens=True)

    print(f"Example {i}")
    print("Instruction:", example["instruction"])
    print("Generated Response:", generated.split("### Response:")[-1].strip())
    print("Reference Response:", example["reference"])
    print("-" * 50)

# -----------------------------
# Optional: compute simple token-level accuracy or BLEU
# -----------------------------
from nltk.translate.bleu_score import sentence_bleu

bleu_scores = []
for example in eval_data:
    prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:"
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    output_ids = model.generate(**inputs, max_new_tokens=50)
    generated = tokenizer.decode(output_ids[0], skip_special_tokens=True).split("### Response:")[-1].strip()

    reference_tokens = example["reference"].split()
    generated_tokens = generated.split()
    bleu = sentence_bleu([reference_tokens], generated_tokens)
    bleu_scores.append(bleu)

print("Average BLEU score:", sum(bleu_scores)/len(bleu_scores))



```

---

## Training Script

The training script performs the following steps:

1. Loads the **Phi-2 base model**.
2. Creates a **synthetic dataset** of instruction-response pairs.
3. Tokenizes and formats the dataset for causal language modeling.
4. Applies a **LoRA adapter**.
5. Trains using **QLoRA** if GPU is available, otherwise full-precision LoRA on CPU.
6. Saves the adapter and tokenizer to `./phi2-qlora`.
7. Pushes the adapter and tokenizer to Hugging Face Hub.

### Requirements

```bash
pip install torch transformers peft datasets huggingface_hub python-dotenv
```

---

## Parameters

* `r=8`, `lora_alpha=16`, `lora_dropout=0.05`
* `target_modules=["q_proj","v_proj"]` (adjust for different base models)
* Learning rate: `2e-4`
* Batch si