Upload README.md with huggingface_hub

31e424e verified 15 days ago

4.73 kB


	---
	license: mit
	tags:
	- causal-lm
	- instruction-following
	- loRA
	- QLoRA
	- quantized
	language: en
	library_name: transformers
	base_model: microsoft/phi-2
	---

	# Phi-2 QLoRA Fine-Tuned Model


	Model: `mishrabp/phi2-custom-response-qlora-adapter`

	Base Model: [`microsoft/phi-2`](https://huggingface.co/microsoft/phi-2)

	Fine-Tuning Method: QLoRA (4-bit quantized LoRA)

	Task: Instruction-following / Customer Support Responses

	---

	## Model Description

	This repository contains a Phi-2 language model fine-tuned using QLoRA on a synthetic dataset of customer support instructions and responses. The fine-tuning uses 4-bit quantized LoRA adapters for memory-efficient training and can run on GPU or CPU (slower on CPU).

	The model is designed for instruction-following tasks like customer support, FAQs, or other dialog generation tasks.

	---

	## Training Data

	The fine-tuning dataset is synthetic, consisting of 3000 instruction-response pairs:

	Example:

	```text
	Instruction: "Customer asks about refund window #1"
	Response: "Our refund window is 30 days from delivery."
	```

	Here is the dataset that was used for fine-tunning:
	https://huggingface.co/datasets/mishrabp/customer-support-responses/resolve/main/train.csv

	You can replace the dataset with your own CSV/JSON file to train on real-world data.

	---

	## Intended Use

	* Generate responses to instructions in customer support scenarios.
	* Small-scale instruction-following experiments.
	* Educational or research purposes.

	---

	## How to Use

	### Load the Fine-Tuned Model

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel

	# -----------------------------
	# Load fine-tuned model from HF
	# -----------------------------
	model_name = "mishrabp/phi2-custom-response-qlora-adapter"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
	model = PeftModel.from_pretrained(base_model, model_name)

	device = "cuda" if torch.cuda.is_available() else "cpu"
	model.to(device)

	# -----------------------------
	# Sample evaluation dataset
	# -----------------------------
	eval_data = [
	{"instruction": "Customer asks about refund window", "reference": "Our refund window is 30 days from delivery."},
	{"instruction": "Order arrived late", "reference": "Sorry for the delay. A delivery credit has been applied."},
	{"instruction": "Wrong item received", "reference": "We’ll ship the correct item and provide a return label."},
	]

	# -----------------------------
	# Evaluation loop
	# -----------------------------
	for i, example in enumerate(eval_data, 1):
	prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:"
	inputs = tokenizer(prompt, return_tensors="pt").to(device)
	output_ids = model.generate(**inputs, max_new_tokens=50)
	generated = tokenizer.decode(output_ids[0], skip_special_tokens=True)

	print(f"Example {i}")
	print("Instruction:", example["instruction"])
	print("Generated Response:", generated.split("### Response:")[-1].strip())
	print("Reference Response:", example["reference"])
	print("-" * 50)

	# -----------------------------
	# Optional: compute simple token-level accuracy or BLEU
	# -----------------------------
	from nltk.translate.bleu_score import sentence_bleu

	bleu_scores = []
	for example in eval_data:
	prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:"
	inputs = tokenizer(prompt, return_tensors="pt").to(device)
	output_ids = model.generate(**inputs, max_new_tokens=50)
	generated = tokenizer.decode(output_ids[0], skip_special_tokens=True).split("### Response:")[-1].strip()

	reference_tokens = example["reference"].split()
	generated_tokens = generated.split()
	bleu = sentence_bleu([reference_tokens], generated_tokens)
	bleu_scores.append(bleu)

	print("Average BLEU score:", sum(bleu_scores)/len(bleu_scores))



	```

	---

	## Training Script

	The training script performs the following steps:

	1. Loads the Phi-2 base model.
	2. Creates a synthetic dataset of instruction-response pairs.
	3. Tokenizes and formats the dataset for causal language modeling.
	4. Applies a LoRA adapter.
	5. Trains using QLoRA if GPU is available, otherwise full-precision LoRA on CPU.
	6. Saves the adapter and tokenizer to `./phi2-qlora`.
	7. Pushes the adapter and tokenizer to Hugging Face Hub.

	### Requirements

	```bash
	pip install torch transformers peft datasets huggingface_hub python-dotenv
	```

	---

	## Parameters

	* `r=8`, `lora_alpha=16`, `lora_dropout=0.05`
	* `target_modules=["q_proj","v_proj"]` (adjust for different base models)
	* Learning rate: `2e-4`
	* Batch si