metadata
language:
- en
library_name: sentence-transformers
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:67190
- loss:AdaptiveLayerLoss
- loss:MultipleNegativesRankingLoss
base_model: microsoft/deberta-v3-small
datasets:
- stanfordnlp/snli
metrics:
- cosine_accuracy
- cosine_accuracy_threshold
- cosine_f1
- cosine_f1_threshold
- cosine_precision
- cosine_recall
- cosine_ap
- dot_accuracy
- dot_accuracy_threshold
- dot_f1
- dot_f1_threshold
- dot_precision
- dot_recall
- dot_ap
- manhattan_accuracy
- manhattan_accuracy_threshold
- manhattan_f1
- manhattan_f1_threshold
- manhattan_precision
- manhattan_recall
- manhattan_ap
- euclidean_accuracy
- euclidean_accuracy_threshold
- euclidean_f1
- euclidean_f1_threshold
- euclidean_precision
- euclidean_recall
- euclidean_ap
- max_accuracy
- max_accuracy_threshold
- max_f1
- max_f1_threshold
- max_precision
- max_recall
- max_ap
widget:
- source_sentence: A man is walking past a large sign that says E.S.E. Electronics.
sentences:
- a child opens a present on his birthday
- The man works at E.S.E Electronics.
- The soccer team in blue plays soccer.
- source_sentence: This child is on the library steps.
sentences:
- A mother dog checking up on her baby puppy.
- A guy bites into a freshly opened marshmallow chick
- The child is on the steps inside the library.
- source_sentence: Two men are standing in a boat.
sentences:
- People are watching the flowers blossom
- The couple is married.
- A few men are fishing on a boat.
- source_sentence: >-
Four men playing drums in very orange lighting while one of them is also
drinking something out of a bottle.
sentences:
- four men play drums
- The man puts something on the other mans head.
- The dogs are in the backyard.
- source_sentence: >-
First Lady Laura Bush at podium, in front of seated audience, at the White
House Conference on Global Literacy.
sentences:
- Some people are exercising outside.
- The former First Lady is at the podium for a conference.
- This person is going to the waterfall
pipeline_tag: sentence-similarity
model-index:
- name: SentenceTransformer based on microsoft/deberta-v3-small
results:
- task:
type: binary-classification
name: Binary Classification
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy
value: 0.6651071536371869
name: Cosine Accuracy
- type: cosine_accuracy_threshold
value: 0.687929630279541
name: Cosine Accuracy Threshold
- type: cosine_f1
value: 0.7077349458301839
name: Cosine F1
- type: cosine_f1_threshold
value: 0.6304811239242554
name: Cosine F1 Threshold
- type: cosine_precision
value: 0.6222862206468763
name: Cosine Precision
- type: cosine_recall
value: 0.8203855140186916
name: Cosine Recall
- type: cosine_ap
value: 0.7058220689813709
name: Cosine Ap
- type: dot_accuracy
value: 0.6313009357078176
name: Dot Accuracy
- type: dot_accuracy_threshold
value: 135.98495483398438
name: Dot Accuracy Threshold
- type: dot_f1
value: 0.6997334569475027
name: Dot F1
- type: dot_f1_threshold
value: 115.54609680175781
name: Dot F1 Threshold
- type: dot_precision
value: 0.5800192122958694
name: Dot Precision
- type: dot_recall
value: 0.8817172897196262
name: Dot Recall
- type: dot_ap
value: 0.6554755795160082
name: Dot Ap
- type: manhattan_accuracy
value: 0.6708421370359191
name: Manhattan Accuracy
- type: manhattan_accuracy_threshold
value: 219.32388305664062
name: Manhattan Accuracy Threshold
- type: manhattan_f1
value: 0.7119951778179626
name: Manhattan F1
- type: manhattan_f1_threshold
value: 262.314697265625
name: Manhattan F1 Threshold
- type: manhattan_precision
value: 0.6062410182714022
name: Manhattan Precision
- type: manhattan_recall
value: 0.8624415887850467
name: Manhattan Recall
- type: manhattan_ap
value: 0.7135236162968746
name: Manhattan Ap
- type: euclidean_accuracy
value: 0.6652580742529429
name: Euclidean Accuracy
- type: euclidean_accuracy_threshold
value: 11.506816864013672
name: Euclidean Accuracy Threshold
- type: euclidean_f1
value: 0.7080090384132564
name: Euclidean F1
- type: euclidean_f1_threshold
value: 12.478536605834961
name: Euclidean F1 Threshold
- type: euclidean_precision
value: 0.6208718626155878
name: Euclidean Precision
- type: euclidean_recall
value: 0.8235981308411215
name: Euclidean Recall
- type: euclidean_ap
value: 0.7090362803652147
name: Euclidean Ap
- type: max_accuracy
value: 0.6708421370359191
name: Max Accuracy
- type: max_accuracy_threshold
value: 219.32388305664062
name: Max Accuracy Threshold
- type: max_f1
value: 0.7119951778179626
name: Max F1
- type: max_f1_threshold
value: 262.314697265625
name: Max F1 Threshold
- type: max_precision
value: 0.6222862206468763
name: Max Precision
- type: max_recall
value: 0.8817172897196262
name: Max Recall
- type: max_ap
value: 0.7135236162968746
name: Max Ap
SentenceTransformer based on microsoft/deberta-v3-small
This is a sentence-transformers model finetuned from microsoft/deberta-v3-small on the stanfordnlp/snli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: microsoft/deberta-v3-small
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTaV3-small-ST-AdaptiveLayer-Norm-ep2")
# Run inference
sentences = [
'First Lady Laura Bush at podium, in front of seated audience, at the White House Conference on Global Literacy.',
'The former First Lady is at the podium for a conference.',
'This person is going to the waterfall',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Binary Classification
- Evaluated with
BinaryClassificationEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.6651 |
| cosine_accuracy_threshold | 0.6879 |
| cosine_f1 | 0.7077 |
| cosine_f1_threshold | 0.6305 |
| cosine_precision | 0.6223 |
| cosine_recall | 0.8204 |
| cosine_ap | 0.7058 |
| dot_accuracy | 0.6313 |
| dot_accuracy_threshold | 135.985 |
| dot_f1 | 0.6997 |
| dot_f1_threshold | 115.5461 |
| dot_precision | 0.58 |
| dot_recall | 0.8817 |
| dot_ap | 0.6555 |
| manhattan_accuracy | 0.6708 |
| manhattan_accuracy_threshold | 219.3239 |
| manhattan_f1 | 0.712 |
| manhattan_f1_threshold | 262.3147 |
| manhattan_precision | 0.6062 |
| manhattan_recall | 0.8624 |
| manhattan_ap | 0.7135 |
| euclidean_accuracy | 0.6653 |
| euclidean_accuracy_threshold | 11.5068 |
| euclidean_f1 | 0.708 |
| euclidean_f1_threshold | 12.4785 |
| euclidean_precision | 0.6209 |
| euclidean_recall | 0.8236 |
| euclidean_ap | 0.709 |
| max_accuracy | 0.6708 |
| max_accuracy_threshold | 219.3239 |
| max_f1 | 0.712 |
| max_f1_threshold | 262.3147 |
| max_precision | 0.6223 |
| max_recall | 0.8817 |
| max_ap | 0.7135 |
Training Details
Training Dataset
stanfordnlp/snli
- Dataset: stanfordnlp/snli at cdb5c3d
- Size: 67,190 training samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 4 tokens
- mean: 21.19 tokens
- max: 133 tokens
- min: 4 tokens
- mean: 11.77 tokens
- max: 49 tokens
- 0: 100.00%
- Samples:
sentence1 sentence2 label Without a placebo group, we still won't know if any of the treatments are better than nothing and therefore worth giving.It is necessary to use a controlled method to ensure the treatments are worthwhile.0It was conducted in silence.It was done silently.0oh Lewisville any decent food in your cafeteria up thereIs there any decent food in your cafeteria up there in Lewisville?0 - Loss:
AdaptiveLayerLosswith these parameters:{ "loss": "MultipleNegativesRankingLoss", "n_layers_per_step": 1, "last_layer_weight": 1, "prior_layers_weight": 0.05, "kl_div_weight": 2, "kl_temperature": 0.9 }
Evaluation Dataset
stanfordnlp/snli
- Dataset: stanfordnlp/snli at cdb5c3d
- Size: 6,626 evaluation samples
- Columns:
premise,hypothesis, andlabel - Approximate statistics based on the first 1000 samples:
premise hypothesis label type string string int details - min: 6 tokens
- mean: 17.28 tokens
- max: 59 tokens
- min: 4 tokens
- mean: 10.53 tokens
- max: 32 tokens
- 0: ~48.70%
- 1: ~51.30%
- Samples:
premise hypothesis label This church choir sings to the masses as they sing joyous songs from the book at a church.The church has cracks in the ceiling.0This church choir sings to the masses as they sing joyous songs from the book at a church.The church is filled with song.1A woman with a green headscarf, blue shirt and a very big grin.The woman is young.0 - Loss:
AdaptiveLayerLosswith these parameters:{ "loss": "MultipleNegativesRankingLoss", "n_layers_per_step": 1, "last_layer_weight": 1, "prior_layers_weight": 0.05, "kl_div_weight": 2, "kl_temperature": 0.9 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 45per_device_eval_batch_size: 22learning_rate: 3e-06weight_decay: 1e-09num_train_epochs: 2lr_scheduler_type: cosinewarmup_ratio: 0.5save_safetensors: Falsefp16: Truepush_to_hub: Truehub_model_id: bobox/DeBERTaV3-small-ST-AdaptiveLayer-Norm-ep2-checkpointshub_strategy: checkpointbatch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 45per_device_eval_batch_size: 22per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonelearning_rate: 3e-06weight_decay: 1e-09adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.5warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Falsesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Trueresume_from_checkpoint: Nonehub_model_id: bobox/DeBERTaV3-small-ST-AdaptiveLayer-Norm-ep2-checkpointshub_strategy: checkpointhub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | loss | max_ap |
|---|---|---|---|---|
| 0.1004 | 150 | 4.5827 | - | - |
| 0.2001 | 299 | - | 3.5735 | 0.6133 |
| 0.2008 | 300 | 3.5451 | - | - |
| 0.3012 | 450 | 2.9066 | - | - |
| 0.4003 | 598 | - | 2.8785 | 0.6561 |
| 0.4016 | 600 | 2.5141 | - | - |
| 0.5020 | 750 | 2.0248 | - | - |
| 0.6004 | 897 | - | 2.1300 | 0.6917 |
| 0.6024 | 900 | 1.6782 | - | - |
| 0.7028 | 1050 | 1.4187 | - | - |
| 0.8005 | 1196 | - | 1.7111 | 0.7051 |
| 0.8032 | 1200 | 1.2446 | - | - |
| 0.9036 | 1350 | 1.1078 | - | - |
| 1.0007 | 1495 | - | 1.4859 | 0.7108 |
| 1.0040 | 1500 | 0.9827 | - | - |
| 1.1044 | 1650 | 0.9335 | - | - |
| 1.2008 | 1794 | - | 1.3516 | 0.7121 |
| 1.2048 | 1800 | 0.8595 | - | - |
| 1.3052 | 1950 | 0.8362 | - | - |
| 1.4009 | 2093 | - | 1.2659 | 0.7147 |
| 1.4056 | 2100 | 0.8167 | - | - |
| 1.5060 | 2250 | 0.7695 | - | - |
| 1.6011 | 2392 | - | 1.2218 | 0.7135 |
| 1.6064 | 2400 | 0.7544 | - | - |
| 1.7068 | 2550 | 0.7625 | - | - |
| 1.8012 | 2691 | - | 1.2073 | 0.7135 |
| 1.8072 | 2700 | 0.7366 | - | - |
| 1.9076 | 2850 | 0.7348 | - | - |
Framework Versions
- Python: 3.10.13
- Sentence Transformers: 3.0.1
- Transformers: 4.41.2
- PyTorch: 2.1.2
- Accelerate: 0.30.1
- Datasets: 2.19.2
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
AdaptiveLayerLoss
@misc{li20242d,
title={2D Matryoshka Sentence Embeddings},
author={Xianming Li and Zongxi Li and Jing Li and Haoran Xie and Qing Li},
year={2024},
eprint={2402.14776},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}