Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,7 @@ tags:
|
|
| 5 |
- audio-to-audio
|
| 6 |
language: en
|
| 7 |
datasets:
|
| 8 |
-
-
|
| 9 |
license: cc-by-4.0
|
| 10 |
---
|
| 11 |
|
|
@@ -13,22 +13,69 @@ license: cc-by-4.0
|
|
| 13 |
|
| 14 |
### `wyz/vctk_bsrnn_medium_noncausal`
|
| 15 |
|
| 16 |
-
This model was trained by
|
| 17 |
|
| 18 |
### Demo: How to use in ESPnet2
|
| 19 |
|
| 20 |
Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
|
| 21 |
if you haven't done that already.
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
```
|
| 30 |
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
## ENH config
|
| 34 |
|
|
|
|
| 5 |
- audio-to-audio
|
| 6 |
language: en
|
| 7 |
datasets:
|
| 8 |
+
- VCTK_DEMAND
|
| 9 |
license: cc-by-4.0
|
| 10 |
---
|
| 11 |
|
|
|
|
| 13 |
|
| 14 |
### `wyz/vctk_bsrnn_medium_noncausal`
|
| 15 |
|
| 16 |
+
This model was trained by wyz based on the universal_se_v1 recipe in [espnet](https://github.com/espnet/espnet/).
|
| 17 |
|
| 18 |
### Demo: How to use in ESPnet2
|
| 19 |
|
| 20 |
Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
|
| 21 |
if you haven't done that already.
|
| 22 |
|
| 23 |
+
To use the model in the Python interface, you could use the following code:
|
| 24 |
+
|
| 25 |
+
```python
|
| 26 |
+
import soundfile as sf
|
| 27 |
+
from espnet2.bin.enh_inference import SeparateSpeech
|
| 28 |
+
|
| 29 |
+
# For model downloading + loading
|
| 30 |
+
model = SeparateSpeech.from_pretrained(
|
| 31 |
+
model_tag="wyz/vctk_bsrnn_medium_noncausal",
|
| 32 |
+
normalize_output_wav=True,
|
| 33 |
+
device="cuda",
|
| 34 |
+
)
|
| 35 |
+
# For loading a downloaded model
|
| 36 |
+
# model = SeparateSpeech(
|
| 37 |
+
# train_config="exp_vctk/enh_train_enh_bsrnn_medium_noncausal_raw/config.yaml",
|
| 38 |
+
# model_file="exp_vctk/enh_train_enh_bsrnn_medium_noncausal_raw/xxxx.pth",
|
| 39 |
+
# normalize_output_wav=True,
|
| 40 |
+
# device="cuda",
|
| 41 |
+
# )
|
| 42 |
+
|
| 43 |
+
audio, fs = sf.read("/path/to/noisy/utt1.flac")
|
| 44 |
+
enhanced = model(audio[None, :], fs=fs)[0]
|
| 45 |
```
|
| 46 |
|
| 47 |
|
| 48 |
+
<!-- Generated by ./scripts/utils/show_enh_score.sh -->
|
| 49 |
+
# RESULTS
|
| 50 |
+
## Environments
|
| 51 |
+
- date: `Wed Feb 28 01:40:25 EST 2024`
|
| 52 |
+
- python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
|
| 53 |
+
- espnet version: `espnet 202304`
|
| 54 |
+
- pytorch version: `pytorch 2.0.1+cu118`
|
| 55 |
+
- Git hash: `443028662106472c60fe8bd892cb277e5b488651`
|
| 56 |
+
- Commit date: `Thu May 11 03:32:59 2023 +0000`
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
## enhanced_test_16k
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
|dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
|
| 63 |
+
|---|---|---|---|---|---|---|---|---|---|---|
|
| 64 |
+
|chime4_et05_real_isolated_6ch_track|1.11|48.88|-3.44|-3.44|0.00|-31.29|2.78|3.09|3.79|3.47|
|
| 65 |
+
|chime4_et05_simu_isolated_6ch_track|1.22|69.37|5.61|5.61|0.00|-0.58|2.60|2.89|3.80|3.13|
|
| 66 |
+
|dns20_tt_synthetic_no_reverb|2.34|93.88|14.32|14.32|0.00|13.42|3.27|3.56|4.02|3.91|
|
| 67 |
+
|reverb_et_real_8ch_multich|1.06|37.83|0.59|0.59|0.00|-4.19|2.19|2.51|3.50|2.88|
|
| 68 |
+
|reverb_et_simu_8ch_multich|1.55|77.68|7.89|7.89|0.00|-11.68|2.91|3.24|3.84|3.58|
|
| 69 |
+
|whamr_tt_mix_single_reverb_max_16k|1.28|73.13|4.95|4.95|0.00|0.52|2.66|2.95|3.83|3.38|
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
## enhanced_test_48k
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
|dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
|
| 76 |
+
|---|---|---|---|---|---|---|---|---|---|
|
| 77 |
+
|vctk_noisy_tt_2spk|95.08|20.39|20.39|0.00|19.28|3.17|3.46|3.99|3.53|
|
| 78 |
+
|
| 79 |
|
| 80 |
## ENH config
|
| 81 |
|