haeylee
/

ssl_ft_pron

Feature Extraction

self-supervised-learning

pronunciation-assessment

Model card Files Files and versions

haeylee commited on Sep 24, 2025

Commit

d225510

·

verified ·

1 Parent(s): 0ff7f54

Update README.md

Files changed (1) hide show

README.md +75 -1

README.md CHANGED Viewed

@@ -16,6 +16,80 @@ tags:
 - self-supervised-learning
 - pronunciation-assessment
 - speech
 metrics:
 - pearsonr
----

 - self-supervised-learning
 - pronunciation-assessment
 - speech
+- wav2vec2
+- hubert
+- wavlm
+- ctc
+- regression
+- feature-extraction
+datasets:
+- openslr/speechocean762
 metrics:
 - pearsonr
+---
+# SSL-FT-PRON: Fine-tuned SSL Models for Automatic Pronunciation Assessment (APA)
+A collection of fine-tuned **Self-Supervised Learning (SSL)** speech checkpoints (Wav2Vec2.0, HuBERT, WavLM) for **Automatic Pronunciation Assessment (APA)**.
+Three strategies are provided per backbone:
+- **CTC**: ASR-style head trained with CTC
+- **Freeze**: CNN feature extractor frozen; rest is fine-tuned
+- **General**: no CTC head; a lightweight regression head predicts four APA scores (Accuracy, Fluency, Prosody, Total)
+> **Important:** This Hub repository is a *collection*. Each model lives in a **subdirectory**.
+> Load with the full sub-path, e.g. `haeylee/ssl_ft_pron/wav2vec2/general/02_wav2vec2-large-960h`.
+---
+## Model Details
+- **Developed by:** Haeyoung Lee (haeylee)
+- **Affiliation (paper):** Seoul National University, SNU Spoken Language Processing Lab
+- **Model type:** SSL speech encoders fine-tuned for APA (CTC / General / Freeze variants)
+- **Language(s):** English (evaluated on Speechocean762)
+- **License:** *TBD by author*
+- **Finetuned from:** See `base_model` list above
+### Model Sources
+- **Code:** https://github.com/hy310/ssl_finetuning
+- **Paper:** *Analysis of Various Self-Supervised Learning Models for Automatic Pronunciation Assessment (APSIPA ASC 2024)*
+---
+## Uses
+### Direct Use
+- Research and prototyping for **pronunciation scoring** and **feature analysis** on read English speech.
+- As encoders for downstream APA tasks, analytics, or visualization (e.g., PCA of hidden states).
+### Downstream Use
+- Integrate APA scores into CALL (Computer-Assisted Language Learning) or assessment tools.
+- Use CTC variants for ASR-aligned pipelines; use General/Freeze variants for score regression.
+### Out-of-Scope Use
+- Non-English targets without adaptation.
+- High-stakes assessment without proper validation, calibration, and fairness checks.
+---
+## Bias, Risks, and Limitations
+- Trained/evaluated on **Speechocean762** (read English speech by L2 speakers). May not generalize to spontaneous speech, other accents/languages, or noisy conditions.
+- APA involves subjective human judgments; ensure careful calibration and validation on your domain.
+- Consider privacy/consent when handling speech data.
+**Recommendation:** Validate on in-domain data and monitor subgroup performance.
+---
+## How to Get Started
+### A) CTC models (with CTC head)
+```python
+from transformers import AutoModelForCTC, AutoProcessor
+ckpt = "haeylee/ssl_ft_pron/wav2vec2/ctc/01_wav2vec2-large"  # pick your subdir
+model = AutoModelForCTC.from_pretrained(ckpt)
+processor = AutoProcessor.from_pretrained(ckpt)