252e444b575eab5e6b6920703707115f

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [de-es] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.7048
  • Data Size: 1.0
  • Epoch Runtime: 107.9386
  • Bleu: 4.1646

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 16.8185 0 9.7091 0.1638
No log 1 688 16.4396 0.0078 10.6734 0.2205
No log 2 1376 15.4148 0.0156 11.9081 0.1916
No log 3 2064 12.7371 0.0312 13.9575 0.2713
0.5767 4 2752 10.0688 0.0625 16.8220 0.2518
0.8794 5 3440 6.8944 0.125 22.8812 0.2819
6.4365 6 4128 4.6648 0.25 35.9062 1.0805
5.2 7 4816 3.9887 0.5 60.2414 0.9737
4.4643 8.0 5504 3.4851 1.0 111.0302 1.4972
4.2029 9.0 6192 3.3275 1.0 111.2567 1.8706
4.011 10.0 6880 3.2396 1.0 109.4068 2.1290
3.8989 11.0 7568 3.1762 1.0 110.6279 2.2698
3.7675 12.0 8256 3.1212 1.0 110.7316 2.4218
3.703 13.0 8944 3.0880 1.0 110.9528 2.5715
3.666 14.0 9632 3.0493 1.0 111.2139 2.6786
3.5767 15.0 10320 3.0264 1.0 110.0062 2.7794
3.5211 16.0 11008 2.9992 1.0 109.6109 2.8721
3.4903 17.0 11696 2.9718 1.0 111.9554 2.9183
3.4116 18.0 12384 2.9549 1.0 110.8191 2.9965
3.4001 19.0 13072 2.9316 1.0 109.8554 3.0525
3.383 20.0 13760 2.9168 1.0 110.4508 3.1468
3.3254 21.0 14448 2.9005 1.0 111.2983 3.2101
3.3037 22.0 15136 2.8937 1.0 111.4072 3.2799
3.2432 23.0 15824 2.8767 1.0 111.1446 3.3293
3.2361 24.0 16512 2.8614 1.0 112.3162 3.3471
3.2076 25.0 17200 2.8521 1.0 111.1131 3.3737
3.1398 26.0 17888 2.8393 1.0 112.0510 3.4616
3.1293 27.0 18576 2.8340 1.0 110.8477 3.4917
3.1118 28.0 19264 2.8205 1.0 111.7321 3.5628
3.1101 29.0 19952 2.8069 1.0 111.8174 3.5736
3.0583 30.0 20640 2.8023 1.0 111.9836 3.6254
3.0587 31.0 21328 2.7971 1.0 110.5331 3.6616
3.0171 32.0 22016 2.7849 1.0 110.4607 3.6894
3.0193 33.0 22704 2.7772 1.0 109.0230 3.7525
2.9894 34.0 23392 2.7695 1.0 107.6458 3.7653
2.9708 35.0 24080 2.7636 1.0 107.4148 3.7776
2.9556 36.0 24768 2.7605 1.0 108.0904 3.8465
2.9331 37.0 25456 2.7578 1.0 107.4862 3.8456
2.9557 38.0 26144 2.7503 1.0 107.4639 3.9086
2.8623 39.0 26832 2.7475 1.0 107.7396 3.9209
2.876 40.0 27520 2.7413 1.0 108.5188 3.9477
2.8569 41.0 28208 2.7377 1.0 107.9827 3.9703
2.838 42.0 28896 2.7346 1.0 107.2617 4.0063
2.8481 43.0 29584 2.7263 1.0 108.0689 4.0092
2.793 44.0 30272 2.7234 1.0 107.9407 4.0551
2.7863 45.0 30960 2.7221 1.0 108.2426 4.0762
2.7853 46.0 31648 2.7182 1.0 108.6926 4.0772
2.7543 47.0 32336 2.7112 1.0 107.8927 4.1124
2.7247 48.0 33024 2.7077 1.0 107.5613 4.1342
2.7427 49.0 33712 2.7111 1.0 107.9450 4.1599
2.7071 50.0 34400 2.7048 1.0 107.9386 4.1646

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
4
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/252e444b575eab5e6b6920703707115f

Base model

google/umt5-small
Finetuned
(45)
this model

Evaluation results