test
Browse files- README.md +5 -0
- parapluie.py +7 -4
README.md
CHANGED
|
@@ -24,6 +24,7 @@ ParaPLUIE use the perplexity of an LLM to compute a confidence score.
|
|
| 24 |
It has shown the highest correlation with human judgement on paraphrase classification meanwhile reamin the computional cost low as it roughtly equal to one token generation cost.
|
| 25 |
|
| 26 |
## How to Use
|
|
|
|
| 27 |
This metric requires a source sentence and it's hypothetical paraphrase.
|
| 28 |
|
| 29 |
```python
|
|
@@ -34,12 +35,14 @@ This metric requires a source sentence and it's hypothetical paraphrase.
|
|
| 34 |
```
|
| 35 |
|
| 36 |
### Inputs
|
|
|
|
| 37 |
- **predictions** (`list` of `int`): Predicted labels.
|
| 38 |
- **references** (`list` of `int`): Ground truth labels.
|
| 39 |
- **normalize** (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
|
| 40 |
- **sample_weight** (`list` of `float`): Sample weights Defaults to None.
|
| 41 |
|
| 42 |
### Output Values
|
|
|
|
| 43 |
- **accuracy**(`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`. A higher score means higher accuracy.
|
| 44 |
Output Example(s):
|
| 45 |
```python
|
|
@@ -48,9 +51,11 @@ Output Example(s):
|
|
| 48 |
This metric outputs a dictionary, containing the accuracy score.
|
| 49 |
|
| 50 |
#### Values from Papers
|
|
|
|
| 51 |
ParaPLUIE has been compared to other state of art metrics in: [ImageNet](https://paperswithcode.com/sota/image-classification-on-imagenet) and showed a high correlation with humand judgement while beeing less computing intensive than LLM as a judge methods.
|
| 52 |
|
| 53 |
### Examples
|
|
|
|
| 54 |
Example 1-A simple example
|
| 55 |
```python
|
| 56 |
>>> accuracy_metric = evaluate.load("accuracy")
|
|
|
|
| 24 |
It has shown the highest correlation with human judgement on paraphrase classification meanwhile reamin the computional cost low as it roughtly equal to one token generation cost.
|
| 25 |
|
| 26 |
## How to Use
|
| 27 |
+
# TODO
|
| 28 |
This metric requires a source sentence and it's hypothetical paraphrase.
|
| 29 |
|
| 30 |
```python
|
|
|
|
| 35 |
```
|
| 36 |
|
| 37 |
### Inputs
|
| 38 |
+
# TODO
|
| 39 |
- **predictions** (`list` of `int`): Predicted labels.
|
| 40 |
- **references** (`list` of `int`): Ground truth labels.
|
| 41 |
- **normalize** (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
|
| 42 |
- **sample_weight** (`list` of `float`): Sample weights Defaults to None.
|
| 43 |
|
| 44 |
### Output Values
|
| 45 |
+
# TODO
|
| 46 |
- **accuracy**(`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`. A higher score means higher accuracy.
|
| 47 |
Output Example(s):
|
| 48 |
```python
|
|
|
|
| 51 |
This metric outputs a dictionary, containing the accuracy score.
|
| 52 |
|
| 53 |
#### Values from Papers
|
| 54 |
+
# TODO
|
| 55 |
ParaPLUIE has been compared to other state of art metrics in: [ImageNet](https://paperswithcode.com/sota/image-classification-on-imagenet) and showed a high correlation with humand judgement while beeing less computing intensive than LLM as a judge methods.
|
| 56 |
|
| 57 |
### Examples
|
| 58 |
+
# TODO
|
| 59 |
Example 1-A simple example
|
| 60 |
```python
|
| 61 |
>>> accuracy_metric = evaluate.load("accuracy")
|
parapluie.py
CHANGED
|
@@ -70,14 +70,17 @@ Examples:
|
|
| 70 |
{'accuracy': 1.0}
|
| 71 |
"""
|
| 72 |
|
| 73 |
-
|
| 74 |
-
BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
|
| 75 |
|
| 76 |
|
| 77 |
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
|
| 78 |
class Parapluie(evaluate.Metric):
|
| 79 |
"""TODO: Short description of my evaluation module."""
|
| 80 |
|
|
|
|
|
|
|
|
|
|
| 81 |
def _info(self):
|
| 82 |
# TODO: Specifies the evaluate.EvaluationModuleInfo object
|
| 83 |
return evaluate.MetricInfo(
|
|
@@ -88,8 +91,8 @@ class Parapluie(evaluate.Metric):
|
|
| 88 |
inputs_description=_KWARGS_DESCRIPTION,
|
| 89 |
# This defines the format of each prediction and reference
|
| 90 |
features=datasets.Features({
|
| 91 |
-
'
|
| 92 |
-
'
|
| 93 |
}),
|
| 94 |
# Homepage of the module for documentation
|
| 95 |
# homepage="http://module.homepage",
|
|
|
|
| 70 |
{'accuracy': 1.0}
|
| 71 |
"""
|
| 72 |
|
| 73 |
+
|
| 74 |
+
# BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
|
| 75 |
|
| 76 |
|
| 77 |
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
|
| 78 |
class Parapluie(evaluate.Metric):
|
| 79 |
"""TODO: Short description of my evaluation module."""
|
| 80 |
|
| 81 |
+
def printnimp(self):
|
| 82 |
+
print("nimp")
|
| 83 |
+
|
| 84 |
def _info(self):
|
| 85 |
# TODO: Specifies the evaluate.EvaluationModuleInfo object
|
| 86 |
return evaluate.MetricInfo(
|
|
|
|
| 91 |
inputs_description=_KWARGS_DESCRIPTION,
|
| 92 |
# This defines the format of each prediction and reference
|
| 93 |
features=datasets.Features({
|
| 94 |
+
'source': datasets.Value("string"),
|
| 95 |
+
'hypothese': datasets.Value("string"),
|
| 96 |
}),
|
| 97 |
# Homepage of the module for documentation
|
| 98 |
# homepage="http://module.homepage",
|