qlemesle commited on
Commit
faab91a
·
1 Parent(s): 9093803
Files changed (2) hide show
  1. README.md +5 -0
  2. parapluie.py +7 -4
README.md CHANGED
@@ -24,6 +24,7 @@ ParaPLUIE use the perplexity of an LLM to compute a confidence score.
24
  It has shown the highest correlation with human judgement on paraphrase classification meanwhile reamin the computional cost low as it roughtly equal to one token generation cost.
25
 
26
  ## How to Use
 
27
  This metric requires a source sentence and it's hypothetical paraphrase.
28
 
29
  ```python
@@ -34,12 +35,14 @@ This metric requires a source sentence and it's hypothetical paraphrase.
34
  ```
35
 
36
  ### Inputs
 
37
  - **predictions** (`list` of `int`): Predicted labels.
38
  - **references** (`list` of `int`): Ground truth labels.
39
  - **normalize** (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
40
  - **sample_weight** (`list` of `float`): Sample weights Defaults to None.
41
 
42
  ### Output Values
 
43
  - **accuracy**(`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`. A higher score means higher accuracy.
44
  Output Example(s):
45
  ```python
@@ -48,9 +51,11 @@ Output Example(s):
48
  This metric outputs a dictionary, containing the accuracy score.
49
 
50
  #### Values from Papers
 
51
  ParaPLUIE has been compared to other state of art metrics in: [ImageNet](https://paperswithcode.com/sota/image-classification-on-imagenet) and showed a high correlation with humand judgement while beeing less computing intensive than LLM as a judge methods.
52
 
53
  ### Examples
 
54
  Example 1-A simple example
55
  ```python
56
  >>> accuracy_metric = evaluate.load("accuracy")
 
24
  It has shown the highest correlation with human judgement on paraphrase classification meanwhile reamin the computional cost low as it roughtly equal to one token generation cost.
25
 
26
  ## How to Use
27
+ # TODO
28
  This metric requires a source sentence and it's hypothetical paraphrase.
29
 
30
  ```python
 
35
  ```
36
 
37
  ### Inputs
38
+ # TODO
39
  - **predictions** (`list` of `int`): Predicted labels.
40
  - **references** (`list` of `int`): Ground truth labels.
41
  - **normalize** (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
42
  - **sample_weight** (`list` of `float`): Sample weights Defaults to None.
43
 
44
  ### Output Values
45
+ # TODO
46
  - **accuracy**(`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`. A higher score means higher accuracy.
47
  Output Example(s):
48
  ```python
 
51
  This metric outputs a dictionary, containing the accuracy score.
52
 
53
  #### Values from Papers
54
+ # TODO
55
  ParaPLUIE has been compared to other state of art metrics in: [ImageNet](https://paperswithcode.com/sota/image-classification-on-imagenet) and showed a high correlation with humand judgement while beeing less computing intensive than LLM as a judge methods.
56
 
57
  ### Examples
58
+ # TODO
59
  Example 1-A simple example
60
  ```python
61
  >>> accuracy_metric = evaluate.load("accuracy")
parapluie.py CHANGED
@@ -70,14 +70,17 @@ Examples:
70
  {'accuracy': 1.0}
71
  """
72
 
73
- # TODO: Define external resources urls if needed
74
- BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
75
 
76
 
77
  @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
78
  class Parapluie(evaluate.Metric):
79
  """TODO: Short description of my evaluation module."""
80
 
 
 
 
81
  def _info(self):
82
  # TODO: Specifies the evaluate.EvaluationModuleInfo object
83
  return evaluate.MetricInfo(
@@ -88,8 +91,8 @@ class Parapluie(evaluate.Metric):
88
  inputs_description=_KWARGS_DESCRIPTION,
89
  # This defines the format of each prediction and reference
90
  features=datasets.Features({
91
- 'Source': datasets.Value("string"),
92
- 'Hypothese': datasets.Value("string"),
93
  }),
94
  # Homepage of the module for documentation
95
  # homepage="http://module.homepage",
 
70
  {'accuracy': 1.0}
71
  """
72
 
73
+
74
+ # BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
75
 
76
 
77
  @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
78
  class Parapluie(evaluate.Metric):
79
  """TODO: Short description of my evaluation module."""
80
 
81
+ def printnimp(self):
82
+ print("nimp")
83
+
84
  def _info(self):
85
  # TODO: Specifies the evaluate.EvaluationModuleInfo object
86
  return evaluate.MetricInfo(
 
91
  inputs_description=_KWARGS_DESCRIPTION,
92
  # This defines the format of each prediction and reference
93
  features=datasets.Features({
94
+ 'source': datasets.Value("string"),
95
+ 'hypothese': datasets.Value("string"),
96
  }),
97
  # Homepage of the module for documentation
98
  # homepage="http://module.homepage",