Bartelds commited on
Commit
497df9b
·
1 Parent(s): 07c5d0e

Upload checkpoint, sanitized config, and transcripts for ctc-baseline_xlsr_set_4

Browse files
Files changed (5) hide show
  1. README.md +41 -0
  2. config.yaml +342 -0
  3. hyp.trn +0 -0
  4. ref.trn +0 -0
  5. valid.loss.best.pth +3 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: "CTC-DRO XLSR-based ASR model - set 4"
3
+ language: multilingual
4
+ tags:
5
+ - asr
6
+ - ctc-dro
7
+ - XLSR
8
+ license: cc-by-nc-4.0
9
+ ---
10
+
11
+ # CTC-Baseline XLSR-based ASR model - set 4
12
+
13
+ This repository contains a CTC-Baseline XLSR-based automatic speech recognition (ASR) model trained with ESPnet.
14
+ The model was trained on balanced training data from set 4.
15
+
16
+ ## Intended Use
17
+
18
+ This model is intended for ASR. Users can run inference using the provided checkpoint (`valid.loss.best.pth`) and configuration file (`config.yaml`):
19
+ ```bash
20
+ import soundfile as sf
21
+ from espnet2.bin.asr_inference import Speech2Text
22
+
23
+ asr_train_config = "ctc-baseline_xlsr_set_4/config.yaml"
24
+ asr_model_file = "ctc-baseline_xlsr_set_4/valid.loss.best.pth"
25
+
26
+ model = Speech2Text.from_pretrained(
27
+ asr_train_config=asr_train_config,
28
+ asr_model_file=asr_model_file
29
+ )
30
+
31
+ speech, _ = sf.read("input.wav")
32
+ text, *_ = model(speech)[0]
33
+
34
+ print("Recognized text:", text)
35
+ ```
36
+
37
+ ## How to Use
38
+
39
+ 1. Clone this repository.
40
+ 2. Use ESPnet’s inference scripts with the provided `config.yaml` and checkpoint file.
41
+ 3. Ensure any external resources referenced in `config.yaml` are available at the indicated relative paths.
config.yaml ADDED
@@ -0,0 +1,342 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ accum_grad: 16
2
+ adapter: lora
3
+ adapter_conf: {}
4
+ allow_multi_rates: false
5
+ allow_variable_data_keys: false
6
+ aux_ctc_tasks: []
7
+ batch_bins: 1000000
8
+ batch_size: 4
9
+ batch_type: duration_language
10
+ best_model_criterion:
11
+ - - valid
12
+ - loss
13
+ - min
14
+ bpemodel: null
15
+ chunk_default_fs: null
16
+ chunk_excluded_key_prefixes: []
17
+ chunk_length: 500
18
+ chunk_shift_ratio: 0.5
19
+ cleaner: null
20
+ collect_stats: false
21
+ create_graph_in_tensorboard: false
22
+ ctc_conf:
23
+ ctc_type: builtin
24
+ cudnn_benchmark: false
25
+ cudnn_deterministic: true
26
+ cudnn_enabled: true
27
+ decoder: null
28
+ decoder_conf: {}
29
+ detect_anomaly: false
30
+ distributed: false
31
+ drop_last_iter: false
32
+ dry_run: false
33
+ duration_batch_length: -1
34
+ early_stopping_criterion:
35
+ - valid
36
+ - loss
37
+ - min
38
+ encoder: transformer
39
+ encoder_conf:
40
+ attention_dropout_rate: 0.1
41
+ attention_heads: 8
42
+ dropout_rate: 0.1
43
+ input_layer: conv2d2
44
+ linear_units: 1024
45
+ normalize_before: true
46
+ num_blocks: 2
47
+ output_size: 256
48
+ positional_dropout_rate: 0.1
49
+ exclude_weight_decay: false
50
+ exclude_weight_decay_conf: {}
51
+ fold_length:
52
+ - 80000
53
+ - 150
54
+ freeze_param: []
55
+ frontend: s3prl
56
+ frontend_conf:
57
+ download_dir: ./hub
58
+ frontend_conf:
59
+ upstream: xls_r_300m
60
+ fs: 16k
61
+ multilayer_feature: true
62
+ g2p: null
63
+ grad_clip: 5.0
64
+ grad_clip_type: 2.0
65
+ grad_noise: false
66
+ ignore_init_mismatch: false
67
+ init: xavier_uniform
68
+ init_param: []
69
+ input_size: null
70
+ iterator_type: sequence
71
+ joint_net_conf: null
72
+ keep_nbest_models: 3
73
+ log_interval: null
74
+ log_level: INFO
75
+ max_cache_fd: 32
76
+ max_cache_size: 0.0
77
+ max_epoch: 40
78
+ model: espnet
79
+ model_conf:
80
+ ctc_weight: 1.0
81
+ multiple_iterator: false
82
+ multiprocessing_distributed: false
83
+ nbest_averaging_interval: 0
84
+ ngpu: 1
85
+ no_forward_run: false
86
+ noise_apply_prob: 1.0
87
+ noise_db_range: '13_15'
88
+ noise_scp: null
89
+ non_linguistic_symbols: ./nlsyms.txt
90
+ normalize: utterance_mvn
91
+ normalize_conf: {}
92
+ num_att_plot: 3
93
+ num_cache_chunks: 1024
94
+ num_iters_per_epoch: 1200
95
+ num_workers: 4
96
+ optim: adam
97
+ optim_conf:
98
+ lr: 0.0001
99
+ weight_decay: 1.0e-06
100
+ output_dir: ./inference_results
101
+ patience: null
102
+ postencoder: null
103
+ postencoder_conf: {}
104
+ preencoder: linear
105
+ preencoder_conf:
106
+ input_size: 1024
107
+ output_size: 80
108
+ preprocessor: default
109
+ preprocessor_conf: {}
110
+ pretrain_path: null
111
+ print_config: false
112
+ required:
113
+ - output_dir
114
+ - token_list
115
+ resume: true
116
+ rir_apply_prob: 1.0
117
+ rir_scp: null
118
+ save_strategy: all
119
+ scheduler: null
120
+ scheduler_conf: {}
121
+ seed: 0
122
+ sharded_ddp: false
123
+ short_noise_thres: 0.5
124
+ shuffle_within_batch: false
125
+ sort_batch: descending
126
+ sort_in_batch: descending
127
+ specaug: specaug
128
+ specaug_conf:
129
+ apply_freq_mask: true
130
+ apply_time_mask: true
131
+ apply_time_warp: true
132
+ freq_mask_width_range:
133
+ - 0
134
+ - 27
135
+ num_freq_mask: 2
136
+ num_time_mask: 10
137
+ time_mask_width_ratio_range:
138
+ - 0.0
139
+ - 0.05
140
+ time_warp_mode: bicubic
141
+ time_warp_window: 5
142
+ speech_volume_normalize: null
143
+ token_list:
144
+ - <blank>
145
+ - <unk>
146
+ - <space>
147
+ - E
148
+ - A
149
+ - O
150
+ - N
151
+ - S
152
+ - I
153
+ - ا
154
+ - L
155
+ - T
156
+ - R
157
+ - و
158
+ - D
159
+ - ن
160
+ - ر
161
+ - ی
162
+ - ي
163
+ - M
164
+ - U
165
+ - H
166
+ - P
167
+ - ک
168
+ - م
169
+ - C
170
+ - А
171
+ - Ӹ
172
+ - Н
173
+ - B
174
+ - ت
175
+ - س
176
+ - ل
177
+ - J
178
+ - K
179
+ - ہ
180
+ - Т
181
+ - ے
182
+ - G
183
+ - Ш
184
+ - К
185
+ - Е
186
+ - Л
187
+ - Ы
188
+ - V
189
+ - М
190
+ - ج
191
+ - Ӓ
192
+ - ه
193
+ - ب
194
+ - د
195
+ - О
196
+ - Y
197
+ - '[slv]'
198
+ - Р
199
+ - ڪ
200
+ - پ
201
+ - Z
202
+ - '[mrj]'
203
+ - F
204
+ - گ
205
+ - И
206
+ - В
207
+ - ئ
208
+ - Д
209
+ - '[sot]'
210
+ - ں
211
+ - '[spa]'
212
+ - W
213
+ - Q
214
+ - П
215
+ - Г
216
+ - ف
217
+ - ق
218
+ - С
219
+ - ع
220
+ - ش
221
+ - Ж
222
+ - ز
223
+ - ھ
224
+ - آ
225
+ - Č
226
+ - Í
227
+ - У
228
+ - ح
229
+ - '[urd]'
230
+ - Š
231
+ - ٹ
232
+ - چ
233
+ - Ь
234
+ - ٽ
235
+ - '[snd]'
236
+ - ڻ
237
+ - Й
238
+ - ط
239
+ - ص
240
+ - ٿ
241
+ - Ц
242
+ - خ
243
+ - Ó
244
+ - Я
245
+ - Á
246
+ - É
247
+ - Ч
248
+ - ۾
249
+ - '0'
250
+ - Ž
251
+ - З
252
+ - '1'
253
+ - ۽
254
+ - –
255
+ - ڏ
256
+ - Э
257
+ - ڊ
258
+ - —
259
+ - ڈ
260
+ - ء
261
+ - Ñ
262
+ - ڙ
263
+ - ِ
264
+ - '2'
265
+ - ٻ
266
+ - Х
267
+ - Ӱ
268
+ - ظ
269
+ - ض
270
+ - ث
271
+ - ڳ
272
+ - ،
273
+ - X
274
+ - ¡
275
+ - غ
276
+ - ڑ
277
+ - Ӧ
278
+ - ذ
279
+ - ¿
280
+ - '5'
281
+ - ڌ
282
+ - '3'
283
+ - ڀ
284
+ - ُ
285
+ - '9'
286
+ - Ú
287
+ - '4'
288
+ - '8'
289
+ - ۔
290
+ - '6'
291
+ - ٺ
292
+ - Ю
293
+ - »
294
+ - Б
295
+ - «
296
+ - ڇ
297
+ - ً
298
+ - ڃ
299
+ - '7'
300
+ - ڄ
301
+ - ؤ
302
+ - ڍ
303
+ - Ф
304
+ - َ
305
+ - ٰ
306
+ - ّ
307
+ - ڱ
308
+ - ”
309
+ - ژ
310
+ - ڦ
311
+ - Ё
312
+ - ؛
313
+ - ٍ
314
+ - Щ
315
+ - ؟
316
+ - ’
317
+ - ‘
318
+ - °
319
+ - ۃ
320
+ - إ
321
+ - Ć
322
+ - <sos/eos>
323
+ token_type: char
324
+ train_dtype: float32
325
+ unused_parameters: true
326
+ use_adapter: false
327
+ use_amp: false
328
+ use_lang_prompt: false
329
+ use_matplotlib: true
330
+ use_nlp_prompt: false
331
+ use_preprocessor: true
332
+ use_tensorboard: true
333
+ val_scheduler_criterion:
334
+ - valid
335
+ - loss
336
+ valid_batch_bins: null
337
+ valid_batch_size: null
338
+ valid_batch_type: null
339
+ valid_iterator_type: null
340
+ valid_max_cache_size: null
341
+ version: '202402'
342
+ write_collected_feats: false
hyp.trn ADDED
The diff for this file is too large to render. See raw diff
 
ref.trn ADDED
The diff for this file is too large to render. See raw diff
 
valid.loss.best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ea23bca7b78e6588073f319b3b5fe03d7560f607fa29eddd13369f1b032fe13
3
+ size 1288666400