DigitalAsocial commited on
Commit
0b26c33
·
verified ·
1 Parent(s): c81af54

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +502 -525
README.md CHANGED
@@ -1,526 +1,503 @@
1
- ---
2
- tags:
3
- - sentence-transformers
4
- - sentence-similarity
5
- - feature-extraction
6
- - dense
7
- - generated_from_trainer
8
- - dataset_size:67416
9
- - loss:MultipleNegativesRankingLoss
10
- widget:
11
- - source_sentence: ', k on their diagonal and zero elsewhere.'
12
- sentences:
13
- - 'The sample ACF and PACF of the data for that time period in Figure 6.17
14
-
15
-
16
- 466
17
-
18
- TRANSFER FUNCTIONS AND INTERVENTION MODELS
19
-
20
- 100
21
-
22
- 80
23
-
24
- 60
25
-
26
- Week
27
-
28
- Week 88
29
-
30
- 40
31
-
32
- 20
33
-
34
- 0
35
-
36
- 100,000
37
-
38
- 160,000
39
-
40
- 140,000
41
-
42
- Sales
43
-
44
- 200,000
45
-
46
- 180,000
47
-
48
- 220,000
49
-
50
- 120,000
51
-
52
- FIGURE 6.16
53
-
54
- Time series plot of the weekly sales data.'
55
- - 'Just as in equation 6.10, we can write
56
-
57
- X = u1a1vT
58
-
59
- 1 + u2a2vT
60
-
61
- 2 + · · · + ukakvT
62
-
63
- k
64
-
65
- (6.29)
66
-
67
- We can ignore the corresponding ui, vi of very small, though nonzero,
68
-
69
- ai and can still reconstruct X without too much error.'
70
- - '13.8
71
-
72
- Multiple Kernel Learning
73
-
74
- It is possible to construct new kernels by combining simpler kernels.'
75
- - source_sentence: 'The main difference is that a node appears at most once as a neighbor
76
- of an-
77
-
78
- other node, whereas a word might appear more than once in the context of another
79
- word.'
80
- sentences:
81
- - '3.2.6
82
-
83
- A Decoupled View of Vector-Centric Backpropagation
84
-
85
- In the previous discussion, two equivalent ways of computing the updates based
86
- on Equa-
87
-
88
- tions 3.12 and 3.18 were provided.'
89
- - '(4.59),
90
-
91
- we obtain
92
-
93
- eT −(1 −𝜆)eT−1 = (yT −̂yT−1) −(1 −𝜆)(yT−1 −̂yT−2)
94
-
95
- = yT −yT−1 −̂yT−1 + 𝜆yT−1 + (1 −𝜆)̂yT−2
96
-
97
- ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟
98
-
99
- =̂yT−1
100
-
101
- = yT −yT−1 −̂yT−1 + ̂yT−1
102
-
103
- = yT −yT−1.'
104
- - 8This fact is not evident in the toy example of Figure 2.17.
105
- - source_sentence: 'This influence is specified by the conditional probability
106
-
107
- P(Y|X).'
108
- sentences:
109
- - 'Seasonality
110
-
111
- is the component of time series behavior that repeats on a regular basis,
112
-
113
- such as each year.'
114
- - 'Note that one of the
115
-
116
- classes is defined by strongly non-zero values in the first and third dimensions,
117
- whereas the
118
-
119
- second class is defined by strongly non-zero values in the second and fourth dimensions.'
120
- - 'The nodes and the arcs between the nodes define the struc-
121
-
122
- ture of the network, and the conditional probabilities are the parameters
123
-
124
- given the structure.'
125
- - source_sentence: '238
126
-
127
- 9
128
-
129
- Decision Trees
130
-
131
- Rokach, L., and O. Maimon.'
132
- sentences:
133
- - '“Top-Down Induction of Decision Trees
134
-
135
- Classifiers—A Survey.” IEEE Transactions on Systems, Man, and Cybernetics–
136
-
137
- Part C 35:476–487.'
138
- - 'The only feedback is at the
139
-
140
- end of the game when we win or lose the game.'
141
- - 'Subsequently, this computation is propagated
142
-
143
- in the backwards direction with dynamic programming updates (similar to Equation
144
- 3.8).'
145
- - source_sentence: 'Therefore, one can use L1-regularization
146
-
147
- to estimate which features are predictive to the application at hand.'
148
- sentences:
149
- - Blumer, A., A. Ehrenfeucht, D. Haussler, and M. K. Warmuth.
150
- - What about the connections in the hidden layers whose weights are set to 0?
151
- - 'In cases where computational complexity is important, such
152
-
153
- as in a production setting where thousands of models are being fit, it may not
154
- be
155
-
156
- worth the extra computational effort.'
157
- pipeline_tag: sentence-similarity
158
- library_name: sentence-transformers
159
- metrics:
160
- - pearson_cosine
161
- - spearman_cosine
162
- model-index:
163
- - name: SentenceTransformer
164
- results:
165
- - task:
166
- type: semantic-similarity
167
- name: Semantic Similarity
168
- dataset:
169
- name: val
170
- type: val
171
- metrics:
172
- - type: pearson_cosine
173
- value: .nan
174
- name: Pearson Cosine
175
- - type: spearman_cosine
176
- value: .nan
177
- name: Spearman Cosine
178
- ---
179
-
180
- # SentenceTransformer
181
-
182
- This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
183
-
184
- ## Model Details
185
-
186
- ### Model Description
187
- - **Model Type:** Sentence Transformer
188
- <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
189
- - **Maximum Sequence Length:** 384 tokens
190
- - **Output Dimensionality:** 768 dimensions
191
- - **Similarity Function:** Cosine Similarity
192
- <!-- - **Training Dataset:** Unknown -->
193
- <!-- - **Language:** Unknown -->
194
- <!-- - **License:** Unknown -->
195
-
196
- ### Model Sources
197
-
198
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
199
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
200
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
201
-
202
- ### Full Model Architecture
203
-
204
- ```
205
- SentenceTransformer(
206
- (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
207
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
208
- (2): Normalize()
209
- )
210
- ```
211
-
212
- ## Usage
213
-
214
- ### Direct Usage (Sentence Transformers)
215
-
216
- First install the Sentence Transformers library:
217
-
218
- ```bash
219
- pip install -U sentence-transformers
220
- ```
221
-
222
- Then you can load this model and run inference.
223
- ```python
224
- from sentence_transformers import SentenceTransformer
225
-
226
- # Download from the 🤗 Hub
227
- model = SentenceTransformer("sentence_transformers_model_id")
228
- # Run inference
229
- sentences = [
230
- 'Therefore, one can use L1-regularization\nto estimate which features are predictive to the application at hand.',
231
- 'What about the connections in the hidden layers whose weights are set to 0?',
232
- 'In cases where computational complexity is important, such\nas in a production setting where thousands of models are being fit, it may not be\nworth the extra computational effort.',
233
- ]
234
- embeddings = model.encode(sentences)
235
- print(embeddings.shape)
236
- # [3, 768]
237
-
238
- # Get the similarity scores for the embeddings
239
- similarities = model.similarity(embeddings, embeddings)
240
- print(similarities)
241
- # tensor([[ 1.0000, 0.4198, 0.2089],
242
- # [ 0.4198, 1.0000, -0.0369],
243
- # [ 0.2089, -0.0369, 1.0000]])
244
- ```
245
-
246
- <!--
247
- ### Direct Usage (Transformers)
248
-
249
- <details><summary>Click to see the direct usage in Transformers</summary>
250
-
251
- </details>
252
- -->
253
-
254
- <!--
255
- ### Downstream Usage (Sentence Transformers)
256
-
257
- You can finetune this model on your own dataset.
258
-
259
- <details><summary>Click to expand</summary>
260
-
261
- </details>
262
- -->
263
-
264
- <!--
265
- ### Out-of-Scope Use
266
-
267
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
268
- -->
269
-
270
- ## Evaluation
271
-
272
- ### Metrics
273
-
274
- #### Semantic Similarity
275
-
276
- * Dataset: `val`
277
- * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
278
-
279
- | Metric | Value |
280
- |:--------------------|:--------|
281
- | pearson_cosine | nan |
282
- | **spearman_cosine** | **nan** |
283
-
284
- <!--
285
- ## Bias, Risks and Limitations
286
-
287
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
288
- -->
289
-
290
- <!--
291
- ### Recommendations
292
-
293
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
294
- -->
295
-
296
- ## Training Details
297
-
298
- ### Training Dataset
299
-
300
- #### Unnamed Dataset
301
-
302
- * Size: 67,416 training samples
303
- * Columns: <code>sentence_0</code> and <code>sentence_1</code>
304
- * Approximate statistics based on the first 1000 samples:
305
- | | sentence_0 | sentence_1 |
306
- |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
307
- | type | string | string |
308
- | details | <ul><li>min: 7 tokens</li><li>mean: 39.68 tokens</li><li>max: 384 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 39.93 tokens</li><li>max: 384 tokens</li></ul> |
309
- * Samples:
310
- | sentence_0 | sentence_1 |
311
- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
312
- | <code>Leveraging Redundancies in Weights<br>It was shown in [94] that the vast majority of the weights in a neural network are redundant.</code> | <code>Furthermore, it is assumed that k ≪min{m1, m2}.</code> |
313
- | <code>Aran, O., O. T. Yıldız, and E. Alpaydın.</code> | <code>“An Incremental Framework Based<br>on Cross-Validation for Estimating the Architecture of a Multilayer Percep-<br>tron.” International Journal of Pattern Recognition and Artificial Intelligence<br>23:159–190.</code> |
314
- | <code>(a)<br>(d)<br>(e)<br>(f)<br><br>29<br>Code is life<br>input_decoder = Input(shape=(latent_dim,), name="decoder_input") <br>decoder_h = Dense(intermediate_dim, activation='relu', <br>name="decoder_h")(input_decoder)<br>x_decoded = Dense(original_dim, activation='sigmoid', <br>name="flat_decoded")(decoder_h) <br>decoder = Model(input_decoder, x_decoded, name="decoder") <br>We can now combine the encoder and the decoder into a single VAE model.</code> | <code>output_combined = decoder(encoder(x)[2]) <br>vae = Model(x, output_combined) <br>vae.summary() <br>Next, we get to the more familiar parts of machine learning: defining a loss function<br>so our autoencoder can train.</code> |
315
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
316
- ```json
317
- {
318
- "scale": 20.0,
319
- "similarity_fct": "cos_sim",
320
- "gather_across_devices": false
321
- }
322
- ```
323
-
324
- ### Training Hyperparameters
325
- #### Non-Default Hyperparameters
326
-
327
- - `per_device_train_batch_size`: 16
328
- - `per_device_eval_batch_size`: 16
329
- - `num_train_epochs`: 6
330
- - `fp16`: True
331
- - `multi_dataset_batch_sampler`: round_robin
332
-
333
- #### All Hyperparameters
334
- <details><summary>Click to expand</summary>
335
-
336
- - `overwrite_output_dir`: False
337
- - `do_predict`: False
338
- - `eval_strategy`: no
339
- - `prediction_loss_only`: True
340
- - `per_device_train_batch_size`: 16
341
- - `per_device_eval_batch_size`: 16
342
- - `per_gpu_train_batch_size`: None
343
- - `per_gpu_eval_batch_size`: None
344
- - `gradient_accumulation_steps`: 1
345
- - `eval_accumulation_steps`: None
346
- - `torch_empty_cache_steps`: None
347
- - `learning_rate`: 5e-05
348
- - `weight_decay`: 0.0
349
- - `adam_beta1`: 0.9
350
- - `adam_beta2`: 0.999
351
- - `adam_epsilon`: 1e-08
352
- - `max_grad_norm`: 1
353
- - `num_train_epochs`: 6
354
- - `max_steps`: -1
355
- - `lr_scheduler_type`: linear
356
- - `lr_scheduler_kwargs`: {}
357
- - `warmup_ratio`: 0.0
358
- - `warmup_steps`: 0
359
- - `log_level`: passive
360
- - `log_level_replica`: warning
361
- - `log_on_each_node`: True
362
- - `logging_nan_inf_filter`: True
363
- - `save_safetensors`: True
364
- - `save_on_each_node`: False
365
- - `save_only_model`: False
366
- - `restore_callback_states_from_checkpoint`: False
367
- - `no_cuda`: False
368
- - `use_cpu`: False
369
- - `use_mps_device`: False
370
- - `seed`: 42
371
- - `data_seed`: None
372
- - `jit_mode_eval`: False
373
- - `bf16`: False
374
- - `fp16`: True
375
- - `fp16_opt_level`: O1
376
- - `half_precision_backend`: auto
377
- - `bf16_full_eval`: False
378
- - `fp16_full_eval`: False
379
- - `tf32`: None
380
- - `local_rank`: 0
381
- - `ddp_backend`: None
382
- - `tpu_num_cores`: None
383
- - `tpu_metrics_debug`: False
384
- - `debug`: []
385
- - `dataloader_drop_last`: False
386
- - `dataloader_num_workers`: 0
387
- - `dataloader_prefetch_factor`: None
388
- - `past_index`: -1
389
- - `disable_tqdm`: False
390
- - `remove_unused_columns`: True
391
- - `label_names`: None
392
- - `load_best_model_at_end`: False
393
- - `ignore_data_skip`: False
394
- - `fsdp`: []
395
- - `fsdp_min_num_params`: 0
396
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
397
- - `fsdp_transformer_layer_cls_to_wrap`: None
398
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
399
- - `parallelism_config`: None
400
- - `deepspeed`: None
401
- - `label_smoothing_factor`: 0.0
402
- - `optim`: adamw_torch
403
- - `optim_args`: None
404
- - `adafactor`: False
405
- - `group_by_length`: False
406
- - `length_column_name`: length
407
- - `project`: huggingface
408
- - `trackio_space_id`: trackio
409
- - `ddp_find_unused_parameters`: None
410
- - `ddp_bucket_cap_mb`: None
411
- - `ddp_broadcast_buffers`: False
412
- - `dataloader_pin_memory`: True
413
- - `dataloader_persistent_workers`: False
414
- - `skip_memory_metrics`: True
415
- - `use_legacy_prediction_loop`: False
416
- - `push_to_hub`: False
417
- - `resume_from_checkpoint`: None
418
- - `hub_model_id`: None
419
- - `hub_strategy`: every_save
420
- - `hub_private_repo`: None
421
- - `hub_always_push`: False
422
- - `hub_revision`: None
423
- - `gradient_checkpointing`: False
424
- - `gradient_checkpointing_kwargs`: None
425
- - `include_inputs_for_metrics`: False
426
- - `include_for_metrics`: []
427
- - `eval_do_concat_batches`: True
428
- - `fp16_backend`: auto
429
- - `push_to_hub_model_id`: None
430
- - `push_to_hub_organization`: None
431
- - `mp_parameters`:
432
- - `auto_find_batch_size`: False
433
- - `full_determinism`: False
434
- - `torchdynamo`: None
435
- - `ray_scope`: last
436
- - `ddp_timeout`: 1800
437
- - `torch_compile`: False
438
- - `torch_compile_backend`: None
439
- - `torch_compile_mode`: None
440
- - `include_tokens_per_second`: False
441
- - `include_num_input_tokens_seen`: no
442
- - `neftune_noise_alpha`: None
443
- - `optim_target_modules`: None
444
- - `batch_eval_metrics`: False
445
- - `eval_on_start`: False
446
- - `use_liger_kernel`: False
447
- - `liger_kernel_config`: None
448
- - `eval_use_gather_object`: False
449
- - `average_tokens_across_devices`: True
450
- - `prompts`: None
451
- - `batch_sampler`: batch_sampler
452
- - `multi_dataset_batch_sampler`: round_robin
453
- - `router_mapping`: {}
454
- - `learning_rate_mapping`: {}
455
-
456
- </details>
457
-
458
- ### Training Logs
459
- | Epoch | Step | Training Loss | val_spearman_cosine |
460
- |:------:|:----:|:-------------:|:-------------------:|
461
- | 0.1187 | 500 | 1.5671 | - |
462
- | 0.2373 | 1000 | 1.2804 | - |
463
- | 0.3560 | 1500 | 1.1256 | - |
464
- | 0.4746 | 2000 | 0.9789 | - |
465
- | 0.5933 | 2500 | 0.8839 | - |
466
- | 0.7119 | 3000 | 0.7748 | - |
467
- | 0.8306 | 3500 | 0.73 | - |
468
- | 0.9492 | 4000 | 0.698 | - |
469
- | 1.0 | 4214 | - | nan |
470
-
471
-
472
- ### Framework Versions
473
- - Python: 3.11.7
474
- - Sentence Transformers: 5.1.1
475
- - Transformers: 4.57.0
476
- - PyTorch: 2.5.1+cu121
477
- - Accelerate: 1.12.0
478
- - Datasets: 4.4.1
479
- - Tokenizers: 0.22.1
480
-
481
- ## Citation
482
-
483
- ### BibTeX
484
-
485
- #### Sentence Transformers
486
- ```bibtex
487
- @inproceedings{reimers-2019-sentence-bert,
488
- title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
489
- author = "Reimers, Nils and Gurevych, Iryna",
490
- booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
491
- month = "11",
492
- year = "2019",
493
- publisher = "Association for Computational Linguistics",
494
- url = "https://arxiv.org/abs/1908.10084",
495
- }
496
- ```
497
-
498
- #### MultipleNegativesRankingLoss
499
- ```bibtex
500
- @misc{henderson2017efficient,
501
- title={Efficient Natural Language Response Suggestion for Smart Reply},
502
- author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
503
- year={2017},
504
- eprint={1705.00652},
505
- archivePrefix={arXiv},
506
- primaryClass={cs.CL}
507
- }
508
- ```
509
-
510
- <!--
511
- ## Glossary
512
-
513
- *Clearly define terms in order to be accessible across audiences.*
514
- -->
515
-
516
- <!--
517
- ## Model Card Authors
518
-
519
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
520
- -->
521
-
522
- <!--
523
- ## Model Card Contact
524
-
525
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
526
  -->
 
1
+ ---
2
+ tags:
3
+ - Data-Science
4
+ - Machine-Learning
5
+ - sentence-transformers
6
+ - sentence-similarity
7
+ - feature-extraction
8
+ - dense
9
+ - generated_from_trainer
10
+ - dataset_size:67416
11
+ - loss:MultipleNegativesRankingLoss
12
+ widget:
13
+ - source_sentence: ', k on their diagonal and zero elsewhere.'
14
+ sentences:
15
+ - |-
16
+ The sample ACF and PACF of the data for that time period in Figure 6.17
17
+
18
+ 466
19
+ TRANSFER FUNCTIONS AND INTERVENTION MODELS
20
+ 100
21
+ 80
22
+ 60
23
+ Week
24
+ Week 88
25
+ 40
26
+ 20
27
+ 0
28
+ 100,000
29
+ 160,000
30
+ 140,000
31
+ Sales
32
+ 200,000
33
+ 180,000
34
+ 220,000
35
+ 120,000
36
+ FIGURE 6.16
37
+ Time series plot of the weekly sales data.
38
+ - |-
39
+ Just as in equation 6.10, we can write
40
+ X = u1a1vT
41
+ 1 + u2a2vT
42
+ 2 + · · · + ukakvT
43
+ k
44
+ (6.29)
45
+ We can ignore the corresponding ui, vi of very small, though nonzero,
46
+ ai and can still reconstruct X without too much error.
47
+ - |-
48
+ 13.8
49
+ Multiple Kernel Learning
50
+ It is possible to construct new kernels by combining simpler kernels.
51
+ - source_sentence: >-
52
+ The main difference is that a node appears at most once as a neighbor of an-
53
+
54
+ other node, whereas a word might appear more than once in the context of
55
+ another word.
56
+ sentences:
57
+ - >-
58
+ 3.2.6
59
+
60
+ A Decoupled View of Vector-Centric Backpropagation
61
+
62
+ In the previous discussion, two equivalent ways of computing the updates
63
+ based on Equa-
64
+
65
+ tions 3.12 and 3.18 were provided.
66
+ - |-
67
+ (4.59),
68
+ we obtain
69
+ eT −(1 −𝜆)eT−1 = (yT −̂yT−1) −(1 −𝜆)(yT−1 −̂yT−2)
70
+ = yT −yT−1 −̂yT−1 + 𝜆yT−1 + (1 −𝜆)̂yT−2
71
+ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟
72
+ =̂yT−1
73
+ = yT −yT−1 −̂yT−1 + ̂yT−1
74
+ = yT −yT−1.
75
+ - 8This fact is not evident in the toy example of Figure 2.17.
76
+ - source_sentence: |-
77
+ This influence is specified by the conditional probability
78
+ P(Y|X).
79
+ sentences:
80
+ - |-
81
+ Seasonality
82
+ is the component of time series behavior that repeats on a regular basis,
83
+ such as each year.
84
+ - >-
85
+ Note that one of the
86
+
87
+ classes is defined by strongly non-zero values in the first and third
88
+ dimensions, whereas the
89
+
90
+ second class is defined by strongly non-zero values in the second and fourth
91
+ dimensions.
92
+ - |-
93
+ The nodes and the arcs between the nodes define the struc-
94
+ ture of the network, and the conditional probabilities are the parameters
95
+ given the structure.
96
+ - source_sentence: |-
97
+ 238
98
+ 9
99
+ Decision Trees
100
+ Rokach, L., and O. Maimon.
101
+ sentences:
102
+ - |-
103
+ “Top-Down Induction of Decision Trees
104
+ Classifiers—A Survey.” IEEE Transactions on Systems, Man, and Cybernetics–
105
+ Part C 35:476–487.
106
+ - |-
107
+ The only feedback is at the
108
+ end of the game when we win or lose the game.
109
+ - >-
110
+ Subsequently, this computation is propagated
111
+
112
+ in the backwards direction with dynamic programming updates (similar to
113
+ Equation 3.8).
114
+ - source_sentence: |-
115
+ Therefore, one can use L1-regularization
116
+ to estimate which features are predictive to the application at hand.
117
+ sentences:
118
+ - Blumer, A., A. Ehrenfeucht, D. Haussler, and M. K. Warmuth.
119
+ - What about the connections in the hidden layers whose weights are set to 0?
120
+ - >-
121
+ In cases where computational complexity is important, such
122
+
123
+ as in a production setting where thousands of models are being fit, it may
124
+ not be
125
+
126
+ worth the extra computational effort.
127
+ pipeline_tag: sentence-similarity
128
+ library_name: sentence-transformers
129
+ metrics:
130
+ - pearson_cosine
131
+ - spearman_cosine
132
+ model-index:
133
+ - name: SentenceTransformer
134
+ results:
135
+ - task:
136
+ type: semantic-similarity
137
+ name: Semantic Similarity
138
+ dataset:
139
+ name: val
140
+ type: val
141
+ metrics:
142
+ - type: pearson_cosine
143
+ value: null
144
+ name: Pearson Cosine
145
+ - type: spearman_cosine
146
+ value: null
147
+ name: Spearman Cosine
148
+ license: apache-2.0
149
+ language:
150
+ - en
151
+ base_model:
152
+ - sentence-transformers/all-mpnet-base-v2
153
+ datasets:
154
+ - DigitalAsocial/ds-tb-5-raw
155
+ ---
156
+
157
+ # SentenceTransformer
158
+
159
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
160
+
161
+ ## Model Details
162
+
163
+ ### Model Description
164
+ - **Model Type:** Sentence Transformer
165
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
166
+ - **Maximum Sequence Length:** 384 tokens
167
+ - **Output Dimensionality:** 768 dimensions
168
+ - **Similarity Function:** Cosine Similarity
169
+ <!-- - **Training Dataset:** Unknown -->
170
+ <!-- - **Language:** Unknown -->
171
+ <!-- - **License:** Unknown -->
172
+
173
+ ### Model Sources
174
+
175
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
176
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
177
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
178
+
179
+ ### Full Model Architecture
180
+
181
+ ```
182
+ SentenceTransformer(
183
+ (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
184
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
185
+ (2): Normalize()
186
+ )
187
+ ```
188
+
189
+ ## Usage
190
+
191
+ ### Direct Usage (Sentence Transformers)
192
+
193
+ First install the Sentence Transformers library:
194
+
195
+ ```bash
196
+ pip install -U sentence-transformers
197
+ ```
198
+
199
+ Then you can load this model and run inference.
200
+ ```python
201
+ from sentence_transformers import SentenceTransformer
202
+
203
+ # Download from the 🤗 Hub
204
+ model = SentenceTransformer("sentence_transformers_model_id")
205
+ # Run inference
206
+ sentences = [
207
+ 'Therefore, one can use L1-regularization\nto estimate which features are predictive to the application at hand.',
208
+ 'What about the connections in the hidden layers whose weights are set to 0?',
209
+ 'In cases where computational complexity is important, such\nas in a production setting where thousands of models are being fit, it may not be\nworth the extra computational effort.',
210
+ ]
211
+ embeddings = model.encode(sentences)
212
+ print(embeddings.shape)
213
+ # [3, 768]
214
+
215
+ # Get the similarity scores for the embeddings
216
+ similarities = model.similarity(embeddings, embeddings)
217
+ print(similarities)
218
+ # tensor([[ 1.0000, 0.4198, 0.2089],
219
+ # [ 0.4198, 1.0000, -0.0369],
220
+ # [ 0.2089, -0.0369, 1.0000]])
221
+ ```
222
+
223
+ <!--
224
+ ### Direct Usage (Transformers)
225
+
226
+ <details><summary>Click to see the direct usage in Transformers</summary>
227
+
228
+ </details>
229
+ -->
230
+
231
+ <!--
232
+ ### Downstream Usage (Sentence Transformers)
233
+
234
+ You can finetune this model on your own dataset.
235
+
236
+ <details><summary>Click to expand</summary>
237
+
238
+ </details>
239
+ -->
240
+
241
+ <!--
242
+ ### Out-of-Scope Use
243
+
244
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
245
+ -->
246
+
247
+ ## Evaluation
248
+
249
+ ### Metrics
250
+
251
+ #### Semantic Similarity
252
+
253
+ * Dataset: `val`
254
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
255
+
256
+ | Metric | Value |
257
+ |:--------------------|:--------|
258
+ | pearson_cosine | nan |
259
+ | **spearman_cosine** | **nan** |
260
+
261
+ <!--
262
+ ## Bias, Risks and Limitations
263
+
264
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
265
+ -->
266
+
267
+ <!--
268
+ ### Recommendations
269
+
270
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
271
+ -->
272
+
273
+ ## Training Details
274
+
275
+ ### Training Dataset
276
+
277
+ #### Unnamed Dataset
278
+
279
+ * Size: 67,416 training samples
280
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
281
+ * Approximate statistics based on the first 1000 samples:
282
+ | | sentence_0 | sentence_1 |
283
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
284
+ | type | string | string |
285
+ | details | <ul><li>min: 7 tokens</li><li>mean: 39.68 tokens</li><li>max: 384 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 39.93 tokens</li><li>max: 384 tokens</li></ul> |
286
+ * Samples:
287
+ | sentence_0 | sentence_1 |
288
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
289
+ | <code>Leveraging Redundancies in Weights<br>It was shown in [94] that the vast majority of the weights in a neural network are redundant.</code> | <code>Furthermore, it is assumed that k ≪min{m1, m2}.</code> |
290
+ | <code>Aran, O., O. T. Yıldız, and E. Alpaydın.</code> | <code>“An Incremental Framework Based<br>on Cross-Validation for Estimating the Architecture of a Multilayer Percep-<br>tron.” International Journal of Pattern Recognition and Artificial Intelligence<br>23:159–190.</code> |
291
+ | <code>(a)<br>(d)<br>(e)<br>(f)<br><br>29<br>Code is life<br>input_decoder = Input(shape=(latent_dim,), name="decoder_input") <br>decoder_h = Dense(intermediate_dim, activation='relu', <br>name="decoder_h")(input_decoder)<br>x_decoded = Dense(original_dim, activation='sigmoid', <br>name="flat_decoded")(decoder_h) <br>decoder = Model(input_decoder, x_decoded, name="decoder") <br>We can now combine the encoder and the decoder into a single VAE model.</code> | <code>output_combined = decoder(encoder(x)[2]) <br>vae = Model(x, output_combined) <br>vae.summary() <br>Next, we get to the more familiar parts of machine learning: defining a loss function<br>so our autoencoder can train.</code> |
292
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
293
+ ```json
294
+ {
295
+ "scale": 20.0,
296
+ "similarity_fct": "cos_sim",
297
+ "gather_across_devices": false
298
+ }
299
+ ```
300
+
301
+ ### Training Hyperparameters
302
+ #### Non-Default Hyperparameters
303
+
304
+ - `per_device_train_batch_size`: 16
305
+ - `per_device_eval_batch_size`: 16
306
+ - `num_train_epochs`: 6
307
+ - `fp16`: True
308
+ - `multi_dataset_batch_sampler`: round_robin
309
+
310
+ #### All Hyperparameters
311
+ <details><summary>Click to expand</summary>
312
+
313
+ - `overwrite_output_dir`: False
314
+ - `do_predict`: False
315
+ - `eval_strategy`: no
316
+ - `prediction_loss_only`: True
317
+ - `per_device_train_batch_size`: 16
318
+ - `per_device_eval_batch_size`: 16
319
+ - `per_gpu_train_batch_size`: None
320
+ - `per_gpu_eval_batch_size`: None
321
+ - `gradient_accumulation_steps`: 1
322
+ - `eval_accumulation_steps`: None
323
+ - `torch_empty_cache_steps`: None
324
+ - `learning_rate`: 5e-05
325
+ - `weight_decay`: 0.0
326
+ - `adam_beta1`: 0.9
327
+ - `adam_beta2`: 0.999
328
+ - `adam_epsilon`: 1e-08
329
+ - `max_grad_norm`: 1
330
+ - `num_train_epochs`: 6
331
+ - `max_steps`: -1
332
+ - `lr_scheduler_type`: linear
333
+ - `lr_scheduler_kwargs`: {}
334
+ - `warmup_ratio`: 0.0
335
+ - `warmup_steps`: 0
336
+ - `log_level`: passive
337
+ - `log_level_replica`: warning
338
+ - `log_on_each_node`: True
339
+ - `logging_nan_inf_filter`: True
340
+ - `save_safetensors`: True
341
+ - `save_on_each_node`: False
342
+ - `save_only_model`: False
343
+ - `restore_callback_states_from_checkpoint`: False
344
+ - `no_cuda`: False
345
+ - `use_cpu`: False
346
+ - `use_mps_device`: False
347
+ - `seed`: 42
348
+ - `data_seed`: None
349
+ - `jit_mode_eval`: False
350
+ - `bf16`: False
351
+ - `fp16`: True
352
+ - `fp16_opt_level`: O1
353
+ - `half_precision_backend`: auto
354
+ - `bf16_full_eval`: False
355
+ - `fp16_full_eval`: False
356
+ - `tf32`: None
357
+ - `local_rank`: 0
358
+ - `ddp_backend`: None
359
+ - `tpu_num_cores`: None
360
+ - `tpu_metrics_debug`: False
361
+ - `debug`: []
362
+ - `dataloader_drop_last`: False
363
+ - `dataloader_num_workers`: 0
364
+ - `dataloader_prefetch_factor`: None
365
+ - `past_index`: -1
366
+ - `disable_tqdm`: False
367
+ - `remove_unused_columns`: True
368
+ - `label_names`: None
369
+ - `load_best_model_at_end`: False
370
+ - `ignore_data_skip`: False
371
+ - `fsdp`: []
372
+ - `fsdp_min_num_params`: 0
373
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
374
+ - `fsdp_transformer_layer_cls_to_wrap`: None
375
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
376
+ - `parallelism_config`: None
377
+ - `deepspeed`: None
378
+ - `label_smoothing_factor`: 0.0
379
+ - `optim`: adamw_torch
380
+ - `optim_args`: None
381
+ - `adafactor`: False
382
+ - `group_by_length`: False
383
+ - `length_column_name`: length
384
+ - `project`: huggingface
385
+ - `trackio_space_id`: trackio
386
+ - `ddp_find_unused_parameters`: None
387
+ - `ddp_bucket_cap_mb`: None
388
+ - `ddp_broadcast_buffers`: False
389
+ - `dataloader_pin_memory`: True
390
+ - `dataloader_persistent_workers`: False
391
+ - `skip_memory_metrics`: True
392
+ - `use_legacy_prediction_loop`: False
393
+ - `push_to_hub`: False
394
+ - `resume_from_checkpoint`: None
395
+ - `hub_model_id`: None
396
+ - `hub_strategy`: every_save
397
+ - `hub_private_repo`: None
398
+ - `hub_always_push`: False
399
+ - `hub_revision`: None
400
+ - `gradient_checkpointing`: False
401
+ - `gradient_checkpointing_kwargs`: None
402
+ - `include_inputs_for_metrics`: False
403
+ - `include_for_metrics`: []
404
+ - `eval_do_concat_batches`: True
405
+ - `fp16_backend`: auto
406
+ - `push_to_hub_model_id`: None
407
+ - `push_to_hub_organization`: None
408
+ - `mp_parameters`:
409
+ - `auto_find_batch_size`: False
410
+ - `full_determinism`: False
411
+ - `torchdynamo`: None
412
+ - `ray_scope`: last
413
+ - `ddp_timeout`: 1800
414
+ - `torch_compile`: False
415
+ - `torch_compile_backend`: None
416
+ - `torch_compile_mode`: None
417
+ - `include_tokens_per_second`: False
418
+ - `include_num_input_tokens_seen`: no
419
+ - `neftune_noise_alpha`: None
420
+ - `optim_target_modules`: None
421
+ - `batch_eval_metrics`: False
422
+ - `eval_on_start`: False
423
+ - `use_liger_kernel`: False
424
+ - `liger_kernel_config`: None
425
+ - `eval_use_gather_object`: False
426
+ - `average_tokens_across_devices`: True
427
+ - `prompts`: None
428
+ - `batch_sampler`: batch_sampler
429
+ - `multi_dataset_batch_sampler`: round_robin
430
+ - `router_mapping`: {}
431
+ - `learning_rate_mapping`: {}
432
+
433
+ </details>
434
+
435
+ ### Training Logs
436
+ | Epoch | Step | Training Loss | val_spearman_cosine |
437
+ |:------:|:----:|:-------------:|:-------------------:|
438
+ | 0.1187 | 500 | 1.5671 | - |
439
+ | 0.2373 | 1000 | 1.2804 | - |
440
+ | 0.3560 | 1500 | 1.1256 | - |
441
+ | 0.4746 | 2000 | 0.9789 | - |
442
+ | 0.5933 | 2500 | 0.8839 | - |
443
+ | 0.7119 | 3000 | 0.7748 | - |
444
+ | 0.8306 | 3500 | 0.73 | - |
445
+ | 0.9492 | 4000 | 0.698 | - |
446
+ | 1.0 | 4214 | - | nan |
447
+
448
+
449
+ ### Framework Versions
450
+ - Python: 3.11.7
451
+ - Sentence Transformers: 5.1.1
452
+ - Transformers: 4.57.0
453
+ - PyTorch: 2.5.1+cu121
454
+ - Accelerate: 1.12.0
455
+ - Datasets: 4.4.1
456
+ - Tokenizers: 0.22.1
457
+
458
+ ## Citation
459
+
460
+ ### BibTeX
461
+
462
+ #### Sentence Transformers
463
+ ```bibtex
464
+ @inproceedings{reimers-2019-sentence-bert,
465
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
466
+ author = "Reimers, Nils and Gurevych, Iryna",
467
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
468
+ month = "11",
469
+ year = "2019",
470
+ publisher = "Association for Computational Linguistics",
471
+ url = "https://arxiv.org/abs/1908.10084",
472
+ }
473
+ ```
474
+
475
+ #### MultipleNegativesRankingLoss
476
+ ```bibtex
477
+ @misc{henderson2017efficient,
478
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
479
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
480
+ year={2017},
481
+ eprint={1705.00652},
482
+ archivePrefix={arXiv},
483
+ primaryClass={cs.CL}
484
+ }
485
+ ```
486
+
487
+ <!--
488
+ ## Glossary
489
+
490
+ *Clearly define terms in order to be accessible across audiences.*
491
+ -->
492
+
493
+ <!--
494
+ ## Model Card Authors
495
+
496
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
497
+ -->
498
+
499
+ <!--
500
+ ## Model Card Contact
501
+
502
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
503
  -->