DeepSeek PHI De-identification Adapter

This repository hosts a LoRA adapter fine-tuned for safe detection and redaction of Protected Health Information (PHI) in clinical text.

The model is trained on a large synthetic and de-identified corpus derived from MIMIC-III-style clinical notes and is designed to operate as part of a configurable, explainable medical text de-identification pipeline.


Model Details

  • Developed by: Iftakhar Khandokar (Marquette University)
  • Funded by: Academic research (EECE Department, Marquette University)
  • Shared by: Iftakhar Khandokar
  • Model type: LoRA adapter (PEFT)
  • Base model: deepseek-ai/deepseek-11m-7b-base
  • Language: English (clinical / biomedical NLP)
  • License: Apache 2.0

Intended Use

This adapter is intended for:

✅ Research on medical data de-identification
✅ Benchmarking privacy-preserving NLP pipelines
✅ Safety and explainability evaluation for clinical LLM workflows


Not Intended For

❌ Automated medical diagnosis
❌ Direct patient care deployment without regulatory review
❌ Generating synthetic patient records for real-world use


Loading the Model

from transformers import AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/deepseek-11m-7b-base", trust_remote_code=True
)

model = PeftModel.from_pretrained(
    base,
    "Iftakhar/deepseek-phi-adapter"
)
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support