Spaces:

qlemesle
/

parapluie

Sleeping

App Files Files Community

qlemesle commited on Nov 6

Commit

faab91a

1 Parent(s): 9093803

test

Browse files

Files changed (2) hide show

README.md +5 -0
parapluie.py +7 -4

README.md CHANGED Viewed

@@ -24,6 +24,7 @@ ParaPLUIE use the perplexity of an LLM to compute a confidence score.
 It has shown the highest correlation with human judgement on paraphrase classification meanwhile reamin the computional cost low as it roughtly equal to one token generation cost.
 ## How to Use
 This metric requires a source sentence and it's hypothetical paraphrase.
 ```python
@@ -34,12 +35,14 @@ This metric requires a source sentence and it's hypothetical paraphrase.
 ```
 ### Inputs
 - **predictions** (`list` of `int`): Predicted labels.
 - **references** (`list` of `int`): Ground truth labels.
 - **normalize** (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
 - **sample_weight** (`list` of `float`): Sample weights Defaults to None.
 ### Output Values
 - **accuracy**(`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`. A higher score means higher accuracy.
 Output Example(s):
 ```python
@@ -48,9 +51,11 @@ Output Example(s):
 This metric outputs a dictionary, containing the accuracy score.
 #### Values from Papers
 ParaPLUIE has been compared to other state of art metrics in: [ImageNet](https://paperswithcode.com/sota/image-classification-on-imagenet) and showed a high correlation with humand judgement while beeing less computing intensive than LLM as a judge methods.
 ### Examples
 Example 1-A simple example
 ```python
 >>> accuracy_metric = evaluate.load("accuracy")

 It has shown the highest correlation with human judgement on paraphrase classification meanwhile reamin the computional cost low as it roughtly equal to one token generation cost.
 ## How to Use
+# TODO
 This metric requires a source sentence and it's hypothetical paraphrase.
 ```python
 ```
 ### Inputs
+# TODO
 - **predictions** (`list` of `int`): Predicted labels.
 - **references** (`list` of `int`): Ground truth labels.
 - **normalize** (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
 - **sample_weight** (`list` of `float`): Sample weights Defaults to None.
 ### Output Values
+# TODO
 - **accuracy**(`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`. A higher score means higher accuracy.
 Output Example(s):
 ```python
 This metric outputs a dictionary, containing the accuracy score.
 #### Values from Papers
+# TODO
 ParaPLUIE has been compared to other state of art metrics in: [ImageNet](https://paperswithcode.com/sota/image-classification-on-imagenet) and showed a high correlation with humand judgement while beeing less computing intensive than LLM as a judge methods.
 ### Examples
+# TODO
 Example 1-A simple example
 ```python
 >>> accuracy_metric = evaluate.load("accuracy")

parapluie.py CHANGED Viewed

@@ -70,14 +70,17 @@ Examples:
     {'accuracy': 1.0}
 """
-# TODO: Define external resources urls if needed
-BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
 @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
 class Parapluie(evaluate.Metric):
     """TODO: Short description of my evaluation module."""
     def _info(self):
         # TODO: Specifies the evaluate.EvaluationModuleInfo object
         return evaluate.MetricInfo(
@@ -88,8 +91,8 @@ class Parapluie(evaluate.Metric):
             inputs_description=_KWARGS_DESCRIPTION,
             # This defines the format of each prediction and reference
             features=datasets.Features({
-                'Source': datasets.Value("string"),
-                'Hypothese': datasets.Value("string"),
             }),
             # Homepage of the module for documentation
             # homepage="http://module.homepage",

     {'accuracy': 1.0}
 """
+# BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
 @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
 class Parapluie(evaluate.Metric):
     """TODO: Short description of my evaluation module."""
+    def printnimp(self):
+        print("nimp")
     def _info(self):
         # TODO: Specifies the evaluate.EvaluationModuleInfo object
         return evaluate.MetricInfo(
             inputs_description=_KWARGS_DESCRIPTION,
             # This defines the format of each prediction and reference
             features=datasets.Features({
+                'source': datasets.Value("string"),
+                'hypothese': datasets.Value("string"),
             }),
             # Homepage of the module for documentation
             # homepage="http://module.homepage",