KDLLM Student Model — Distilled Lightweight BERT

This repository hosts the Student Model distilled under the KDLLM framework proposed in:

KDLLM: Knowledge Distillation for Compressed and Copyright-Safe Large Language Model Sharing
Shiva Shrestha et al.
Tsinghua Science and Technology, 2025
Manuscript ID: TST-2025-0253

📄 Overview

The KDLLM Student model is a compact distilled version of the fine tuned teacher bert-base-uncased model - sh7vashrestha/BertBaseUncased-SenetimentAnalysis. It preserves most of the teacher's performance while significantly reducing model size and computational requirements. The student model was trained using behavioral knowledge distillation:

Soft-label distillation using:
- ✅ Kullback-Leibler Divergence (KLD)
Ground-truth supervised learning (cross-entropy loss).

🔧 Model Architecture

Component	Configuration
Layers	4
Hidden Size	312
Attention Heads	6
Feedforward Size	1024
Dropout	0.2
Parameters	~13.9M
File Size	~54MB

📊 Performance Summary

Distillation Loss	Accuracy	F1 Score
KLD	86.30%	86.23%

Dataset: IMDb sentiment classification (binary, 50k samples)
Compression ratio: ~7.74× smaller than teacher.

🚀 Inference Example

from transformers import BertForSequenceClassification, BertTokenizer

tokenizer = BertTokenizer.from_pretrained("sh7vashrestha/Bert_Small_13M_SentimentalAnalysis")
model = BertForSequenceClassification.from_pretrained("sh7vashrestha/Bert_Small_13M_SentimentalAnalysis")

inputs = tokenizer("I absolutely loved the movie!", return_tensors="pt")
outputs = model(**inputs)
prediction = outputs.logits.argmax(dim=1)
print(prediction)
label_mapping = {0: "negative", 1: "positive"}
prediction_label = label_mapping[prediction.item()]
print(prediction_label)

Downloads last month: 9

Model tree for sh7vashrestha/Bert_Small_13M_SentimentalAnalysis

Base model

google-bert/bert-base-uncased

Finetuned

sh7vashrestha/BertBaseUncased-SenetimentAnalysis

Finetuned

(1)

this model