KDLLM Student Model β€” Distilled Lightweight BERT

This repository hosts the Student Model distilled under the KDLLM framework proposed in:

KDLLM: Knowledge Distillation for Compressed and Copyright-Safe Large Language Model Sharing
Shiva Shrestha et al.
Tsinghua Science and Technology, 2025
Manuscript ID: TST-2025-0253


πŸ“„ Overview

The KDLLM Student model is a compact distilled version of the fine tuned teacher bert-base-uncased model - sh7vashrestha/BertBaseUncased-SenetimentAnalysis. It preserves most of the teacher's performance while significantly reducing model size and computational requirements. The student model was trained using behavioral knowledge distillation:

  • Soft-label distillation using:
    • βœ… Kullback-Leibler Divergence (KLD)
  • Ground-truth supervised learning (cross-entropy loss).

πŸ”§ Model Architecture

Component Configuration
Layers 4
Hidden Size 312
Attention Heads 6
Feedforward Size 1024
Dropout 0.2
Parameters ~13.9M
File Size ~54MB

πŸ“Š Performance Summary

Distillation Loss Accuracy F1 Score
KLD 86.30% 86.23%
  • Dataset: IMDb sentiment classification (binary, 50k samples)
  • Compression ratio: ~7.74Γ— smaller than teacher.

πŸš€ Inference Example

from transformers import BertForSequenceClassification, BertTokenizer

tokenizer = BertTokenizer.from_pretrained("sh7vashrestha/Bert_Small_13M_SentimentalAnalysis")
model = BertForSequenceClassification.from_pretrained("sh7vashrestha/Bert_Small_13M_SentimentalAnalysis")

inputs = tokenizer("I absolutely loved the movie!", return_tensors="pt")
outputs = model(**inputs)
prediction = outputs.logits.argmax(dim=1)
print(prediction)
label_mapping = {0: "negative", 1: "positive"}
prediction_label = label_mapping[prediction.item()]
print(prediction_label)
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sh7vashrestha/Bert_Small_13M_SentimentalAnalysis