KDLLM Student Model β Distilled Lightweight BERT
This repository hosts the Student Model distilled under the KDLLM framework proposed in:
KDLLM: Knowledge Distillation for Compressed and Copyright-Safe Large Language Model Sharing
Shiva Shrestha et al.
Tsinghua Science and Technology, 2025
Manuscript ID: TST-2025-0253
π Overview
The KDLLM Student model is a compact distilled version of the fine tuned teacher bert-base-uncased model - sh7vashrestha/BertBaseUncased-SenetimentAnalysis. It preserves most of the teacher's performance while significantly reducing model size and computational requirements. The student model was trained using behavioral knowledge distillation:
- Soft-label distillation using:
- β Kullback-Leibler Divergence (KLD)
- Ground-truth supervised learning (cross-entropy loss).
π§ Model Architecture
| Component | Configuration |
|---|---|
| Layers | 4 |
| Hidden Size | 312 |
| Attention Heads | 6 |
| Feedforward Size | 1024 |
| Dropout | 0.2 |
| Parameters | ~13.9M |
| File Size | ~54MB |
π Performance Summary
| Distillation Loss | Accuracy | F1 Score |
|---|---|---|
| KLD | 86.30% | 86.23% |
- Dataset: IMDb sentiment classification (binary, 50k samples)
- Compression ratio: ~7.74Γ smaller than teacher.
π Inference Example
from transformers import BertForSequenceClassification, BertTokenizer
tokenizer = BertTokenizer.from_pretrained("sh7vashrestha/Bert_Small_13M_SentimentalAnalysis")
model = BertForSequenceClassification.from_pretrained("sh7vashrestha/Bert_Small_13M_SentimentalAnalysis")
inputs = tokenizer("I absolutely loved the movie!", return_tensors="pt")
outputs = model(**inputs)
prediction = outputs.logits.argmax(dim=1)
print(prediction)
label_mapping = {0: "negative", 1: "positive"}
prediction_label = label_mapping[prediction.item()]
print(prediction_label)
- Downloads last month
- 9
Model tree for sh7vashrestha/Bert_Small_13M_SentimentalAnalysis
Base model
google-bert/bert-base-uncased