Taejin commited on
Commit
0d9aba6
·
1 Parent(s): 47c1dbf

Updated DER values in the beginning part

Browse files

Signed-off-by: taejinp <[email protected]>

Files changed (1) hide show
  1. private_README.md +107 -15
private_README.md CHANGED
@@ -34,7 +34,7 @@ widget:
34
  - example_title: Librispeech sample 2
35
  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
36
  model-index:
37
- - name: diar_streaming_sortformer_4spk-v2
38
  results:
39
  - task:
40
  name: Speaker Diarization
@@ -48,7 +48,7 @@ model-index:
48
  metrics:
49
  - name: Test DER
50
  type: der
51
- value: 13.24
52
  - task:
53
  name: Speaker Diarization
54
  type: speaker-diarization-with-post-processing
@@ -61,7 +61,7 @@ model-index:
61
  metrics:
62
  - name: Test DER
63
  type: der
64
- value: 42.56
65
  - task:
66
  name: Speaker Diarization
67
  type: speaker-diarization-with-post-processing
@@ -74,7 +74,7 @@ model-index:
74
  metrics:
75
  - name: Test DER
76
  type: der
77
- value: 18.91
78
  - task:
79
  name: Speaker Diarization
80
  type: speaker-diarization-with-post-processing
@@ -87,7 +87,7 @@ model-index:
87
  metrics:
88
  - name: Test DER
89
  type: der
90
- value: 6.57
91
  - task:
92
  name: Speaker Diarization
93
  type: speaker-diarization-with-post-processing
@@ -100,7 +100,7 @@ model-index:
100
  metrics:
101
  - name: Test DER
102
  type: der
103
- value: 10.05
104
  - task:
105
  name: Speaker Diarization
106
  type: speaker-diarization-with-post-processing
@@ -113,7 +113,7 @@ model-index:
113
  metrics:
114
  - name: Test DER
115
  type: der
116
- value: 12.44
117
  - task:
118
  name: Speaker Diarization
119
  type: speaker-diarization-with-post-processing
@@ -126,7 +126,7 @@ model-index:
126
  metrics:
127
  - name: Test DER
128
  type: der
129
- value: 21.68
130
  - task:
131
  name: Speaker Diarization
132
  type: speaker-diarization-with-post-processing
@@ -139,7 +139,7 @@ model-index:
139
  metrics:
140
  - name: Test DER
141
  type: der
142
- value: 28.74
143
  - task:
144
  name: Speaker Diarization
145
  type: speaker-diarization-with-post-processing
@@ -152,7 +152,7 @@ model-index:
152
  metrics:
153
  - name: Test DER
154
  type: der
155
- value: 10.70
156
  - task:
157
  name: Speaker Diarization
158
  type: speaker-diarization-with-post-processing
@@ -165,7 +165,98 @@ model-index:
165
  metrics:
166
  - name: Test DER
167
  type: der
168
- value: 4.88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
  metrics:
170
  - der
171
  pipeline_tag: audio-classification
@@ -352,8 +443,8 @@ Sortformer diarizer models can be performed with post-processing algorithms usin
352
 
353
  ## Datasets
354
 
355
- Sortformer was trained on a combination of ???? hours of real conversations and 5150 hours or simulated audio mixtures generated by [NeMo speech data simulator](https://arxiv.org/abs/2310.12371)[7].
356
- All the datasets listed above are based on the same labeling method via [RTTM](https://web.archive.org/web/20100606092041if_/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf) format. A subset of RTTM files used for model training are processed for the speaker diarization model training purposes.
357
  Data collection methods vary across individual datasets. For example, the above datasets include phone calls, interviews, web videos, and audiobook recordings. Please refer to the [Linguistic Data Consortium (LDC) website](https://www.ldc.upenn.edu/) or dataset webpage for detailed data collection methods.
358
 
359
 
@@ -405,7 +496,7 @@ Data collection methods vary across individual datasets. For example, the above
405
  * [Forced alignment based ground-truth RTTMs](https://github.com/nttcslab-sp/diar-forced-alignment)[8] are used for AMI and AliMeeting.
406
 
407
 
408
- ### Evaluation Results
409
 
410
  | **Model** | **Latency** | **DIHARD III Eval <=4spk** | **DIHARD III Eval >=5spk** | **DIHARD III Eval full** | **CALLHOME-part2 2spk** | **CALLHOME-part2 3spk** | **CALLHOME-part2 4spk** | **CALLHOME-part2 5spk** | **CALLHOME-part2 6spk** | **CALLHOME-part2 full** | **CH109** |
411
  |-----------------------------------------|-------------|----------------------------|----------------------------|--------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-----------|
@@ -414,7 +505,7 @@ Data collection methods vary across individual datasets. For example, the above
414
  | diar_streaming_sortformer_4spk-v2 | 1.04s | 14.49 | 42.22 | 19.85 | 7.51 | 11.45 | 13.75 | 23.22 | 29.22 | 11.89 | 5.37 |
415
  | **diar_streaming_sortformer_4spk-v2.1** | 1.04s | 15.09 | 41.42 | 20.21 | 6.65 | 11.25 | 13.35 | 22.12 | 24.51 | 11.19 | 5.09 |
416
 
417
- ### Evaluation Results (Meeting Datasets)
418
 
419
  | **Model** | **Latency** | **AliMeeting Test near** | **AliMeeting Test far** | **AMI Test IHM** | **AMI Test SDM** | **NOTSOFAR1 Eval SC <=4spk** | **NOTSOFAR1 Eval SC >=5spk** | **NOTSOFAR1 Eval full** |
420
  |-----------------------------------------|-------------|--------------------------|-------------------------|------------------|------------------|------------------------------|------------------------------|-------------------------|
@@ -443,3 +534,4 @@ Data collection methods vary across individual datasets. For example, the above
443
 
444
  ## Licence
445
 
 
 
34
  - example_title: Librispeech sample 2
35
  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
36
  model-index:
37
+ - name: diar_streaming_sortformer_4spk-v2.1
38
  results:
39
  - task:
40
  name: Speaker Diarization
 
48
  metrics:
49
  - name: Test DER
50
  type: der
51
+ value: 15.09
52
  - task:
53
  name: Speaker Diarization
54
  type: speaker-diarization-with-post-processing
 
61
  metrics:
62
  - name: Test DER
63
  type: der
64
+ value: 41.42
65
  - task:
66
  name: Speaker Diarization
67
  type: speaker-diarization-with-post-processing
 
74
  metrics:
75
  - name: Test DER
76
  type: der
77
+ value: 20.21
78
  - task:
79
  name: Speaker Diarization
80
  type: speaker-diarization-with-post-processing
 
87
  metrics:
88
  - name: Test DER
89
  type: der
90
+ value: 6.65
91
  - task:
92
  name: Speaker Diarization
93
  type: speaker-diarization-with-post-processing
 
100
  metrics:
101
  - name: Test DER
102
  type: der
103
+ value: 11.25
104
  - task:
105
  name: Speaker Diarization
106
  type: speaker-diarization-with-post-processing
 
113
  metrics:
114
  - name: Test DER
115
  type: der
116
+ value: 13.35
117
  - task:
118
  name: Speaker Diarization
119
  type: speaker-diarization-with-post-processing
 
126
  metrics:
127
  - name: Test DER
128
  type: der
129
+ value: 22.12
130
  - task:
131
  name: Speaker Diarization
132
  type: speaker-diarization-with-post-processing
 
139
  metrics:
140
  - name: Test DER
141
  type: der
142
+ value: 24.51
143
  - task:
144
  name: Speaker Diarization
145
  type: speaker-diarization-with-post-processing
 
152
  metrics:
153
  - name: Test DER
154
  type: der
155
+ value: 11.19
156
  - task:
157
  name: Speaker Diarization
158
  type: speaker-diarization-with-post-processing
 
165
  metrics:
166
  - name: Test DER
167
  type: der
168
+ value: 5.09
169
+ - task:
170
+ name: Speaker Diarization
171
+ type: speaker-diarization-with-post-processing
172
+ dataset:
173
+ name: AliMeeting Test near
174
+ type: alimeeting-test-near
175
+ config: with_overlap_collar_0.0s
176
+ input_buffer_lenght: 1.04s
177
+ split: test-near
178
+ metrics:
179
+ - name: Test DER
180
+ type: der
181
+ value: 12.60
182
+ - task:
183
+ name: Speaker Diarization
184
+ type: speaker-diarization-with-post-processing
185
+ dataset:
186
+ name: AliMeeting Test far
187
+ type: alimeeting-test-far
188
+ config: with_overlap_collar_0.0s
189
+ input_buffer_lenght: 1.04s
190
+ split: test-far
191
+ metrics:
192
+ - name: Test DER
193
+ type: der
194
+ value: 15.60
195
+ - task:
196
+ name: Speaker Diarization
197
+ type: speaker-diarization-with-post-processing
198
+ dataset:
199
+ name: AMI Test IHM
200
+ type: ami-test-ihm
201
+ config: with_overlap_collar_0.0s
202
+ input_buffer_lenght: 1.04s
203
+ split: test-ihm
204
+ metrics:
205
+ - name: Test DER
206
+ type: der
207
+ value: 16.67
208
+ - task:
209
+ name: Speaker Diarization
210
+ type: speaker-diarization-with-post-processing
211
+ dataset:
212
+ name: AMI Test SDM
213
+ type: ami-test-sdm
214
+ config: with_overlap_collar_0.0s
215
+ input_buffer_lenght: 1.04s
216
+ split: test-sdm
217
+ metrics:
218
+ - name: Test DER
219
+ type: der
220
+ value: 20.57
221
+ - task:
222
+ name: Speaker Diarization
223
+ type: speaker-diarization-with-post-processing
224
+ dataset:
225
+ name: NOTSOFAR1 Eval SC (<=4 spk)
226
+ type: notsofar1-eval-sc-1to4spks
227
+ config: with_overlap_collar_0.0s
228
+ input_buffer_lenght: 1.04s
229
+ split: eval-sc-1to4spks
230
+ metrics:
231
+ - name: Test DER
232
+ type: der
233
+ value: 17.26
234
+ - task:
235
+ name: Speaker Diarization
236
+ type: speaker-diarization-with-post-processing
237
+ dataset:
238
+ name: NOTSOFAR1 Eval SC (>=5 spk)
239
+ type: notsofar1-eval-sc-5to7spks
240
+ config: with_overlap_collar_0.0s
241
+ input_buffer_lenght: 1.04s
242
+ split: eval-sc-5to7spks
243
+ metrics:
244
+ - name: Test DER
245
+ type: der
246
+ value: 36.76
247
+ - task:
248
+ name: Speaker Diarization
249
+ type: speaker-diarization-with-post-processing
250
+ dataset:
251
+ name: NOTSOFAR1 Eval SC (full)
252
+ type: notsofar1-eval-sc
253
+ config: with_overlap_collar_0.0s
254
+ input_buffer_lenght: 1.04s
255
+ split: eval-sc
256
+ metrics:
257
+ - name: Test DER
258
+ type: der
259
+ value: 28.75
260
  metrics:
261
  - der
262
  pipeline_tag: audio-classification
 
443
 
444
  ## Datasets
445
 
446
+ Sortformer was trained on approximately 5,000 hours of audio, combining real conversations and simulated audio mixtures generated using the [NeMo speech data simulator](https://arxiv.org/abs/2310.12371)[7].
447
+ All datasets used in training follow the [RTTM](https://web.archive.org/web/20100606092041if_/http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf) labeling format. A subset of the RTTM files were processed specifically for speaker diarization model training.
448
  Data collection methods vary across individual datasets. For example, the above datasets include phone calls, interviews, web videos, and audiobook recordings. Please refer to the [Linguistic Data Consortium (LDC) website](https://www.ldc.upenn.edu/) or dataset webpage for detailed data collection methods.
449
 
450
 
 
496
  * [Forced alignment based ground-truth RTTMs](https://github.com/nttcslab-sp/diar-forced-alignment)[8] are used for AMI and AliMeeting.
497
 
498
 
499
+ ### Evaluation Results (Telephonic and General-Purpose Speech Corpus)
500
 
501
  | **Model** | **Latency** | **DIHARD III Eval <=4spk** | **DIHARD III Eval >=5spk** | **DIHARD III Eval full** | **CALLHOME-part2 2spk** | **CALLHOME-part2 3spk** | **CALLHOME-part2 4spk** | **CALLHOME-part2 5spk** | **CALLHOME-part2 6spk** | **CALLHOME-part2 full** | **CH109** |
502
  |-----------------------------------------|-------------|----------------------------|----------------------------|--------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-------------------------|-----------|
 
505
  | diar_streaming_sortformer_4spk-v2 | 1.04s | 14.49 | 42.22 | 19.85 | 7.51 | 11.45 | 13.75 | 23.22 | 29.22 | 11.89 | 5.37 |
506
  | **diar_streaming_sortformer_4spk-v2.1** | 1.04s | 15.09 | 41.42 | 20.21 | 6.65 | 11.25 | 13.35 | 22.12 | 24.51 | 11.19 | 5.09 |
507
 
508
+ ### Evaluation Results (Meeting Speech Corpus)
509
 
510
  | **Model** | **Latency** | **AliMeeting Test near** | **AliMeeting Test far** | **AMI Test IHM** | **AMI Test SDM** | **NOTSOFAR1 Eval SC <=4spk** | **NOTSOFAR1 Eval SC >=5spk** | **NOTSOFAR1 Eval full** |
511
  |-----------------------------------------|-------------|--------------------------|-------------------------|------------------|------------------|------------------------------|------------------------------|-------------------------|
 
534
 
535
  ## Licence
536
 
537
+ Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).