haeylee commited on
Commit
d225510
·
verified ·
1 Parent(s): 0ff7f54

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -1
README.md CHANGED
@@ -16,6 +16,80 @@ tags:
16
  - self-supervised-learning
17
  - pronunciation-assessment
18
  - speech
 
 
 
 
 
 
 
 
19
  metrics:
20
  - pearsonr
21
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  - self-supervised-learning
17
  - pronunciation-assessment
18
  - speech
19
+ - wav2vec2
20
+ - hubert
21
+ - wavlm
22
+ - ctc
23
+ - regression
24
+ - feature-extraction
25
+ datasets:
26
+ - openslr/speechocean762
27
  metrics:
28
  - pearsonr
29
+ ---
30
+
31
+ # SSL-FT-PRON: Fine-tuned SSL Models for Automatic Pronunciation Assessment (APA)
32
+
33
+ A collection of fine-tuned **Self-Supervised Learning (SSL)** speech checkpoints (Wav2Vec2.0, HuBERT, WavLM) for **Automatic Pronunciation Assessment (APA)**.
34
+ Three strategies are provided per backbone:
35
+
36
+ - **CTC**: ASR-style head trained with CTC
37
+ - **Freeze**: CNN feature extractor frozen; rest is fine-tuned
38
+ - **General**: no CTC head; a lightweight regression head predicts four APA scores (Accuracy, Fluency, Prosody, Total)
39
+
40
+ > **Important:** This Hub repository is a *collection*. Each model lives in a **subdirectory**.
41
+ > Load with the full sub-path, e.g. `haeylee/ssl_ft_pron/wav2vec2/general/02_wav2vec2-large-960h`.
42
+
43
+ ---
44
+
45
+ ## Model Details
46
+
47
+ - **Developed by:** Haeyoung Lee (haeylee)
48
+ - **Affiliation (paper):** Seoul National University, SNU Spoken Language Processing Lab
49
+ - **Model type:** SSL speech encoders fine-tuned for APA (CTC / General / Freeze variants)
50
+ - **Language(s):** English (evaluated on Speechocean762)
51
+ - **License:** *TBD by author*
52
+ - **Finetuned from:** See `base_model` list above
53
+
54
+ ### Model Sources
55
+ - **Code:** https://github.com/hy310/ssl_finetuning
56
+ - **Paper:** *Analysis of Various Self-Supervised Learning Models for Automatic Pronunciation Assessment (APSIPA ASC 2024)*
57
+
58
+ ---
59
+
60
+ ## Uses
61
+
62
+ ### Direct Use
63
+ - Research and prototyping for **pronunciation scoring** and **feature analysis** on read English speech.
64
+ - As encoders for downstream APA tasks, analytics, or visualization (e.g., PCA of hidden states).
65
+
66
+ ### Downstream Use
67
+ - Integrate APA scores into CALL (Computer-Assisted Language Learning) or assessment tools.
68
+ - Use CTC variants for ASR-aligned pipelines; use General/Freeze variants for score regression.
69
+
70
+ ### Out-of-Scope Use
71
+ - Non-English targets without adaptation.
72
+ - High-stakes assessment without proper validation, calibration, and fairness checks.
73
+
74
+ ---
75
+
76
+ ## Bias, Risks, and Limitations
77
+
78
+ - Trained/evaluated on **Speechocean762** (read English speech by L2 speakers). May not generalize to spontaneous speech, other accents/languages, or noisy conditions.
79
+ - APA involves subjective human judgments; ensure careful calibration and validation on your domain.
80
+ - Consider privacy/consent when handling speech data.
81
+
82
+ **Recommendation:** Validate on in-domain data and monitor subgroup performance.
83
+
84
+ ---
85
+
86
+ ## How to Get Started
87
+
88
+ ### A) CTC models (with CTC head)
89
+ ```python
90
+ from transformers import AutoModelForCTC, AutoProcessor
91
+
92
+ ckpt = "haeylee/ssl_ft_pron/wav2vec2/ctc/01_wav2vec2-large" # pick your subdir
93
+ model = AutoModelForCTC.from_pretrained(ckpt)
94
+ processor = AutoProcessor.from_pretrained(ckpt)
95
+