zli12321
/

answer_equivalence_bert

@@ -14,10 +14,10 @@ pipeline_tag: text-classification
 [![PyPI version qa-metrics](https://img.shields.io/pypi/v/qa-metrics.svg)](https://pypi.org/project/qa-metrics/)
 [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ke23KIeHFdPWad0BModmcWKZ6jSbF5nI?usp=sharing)
-> Check out the main [Repo](https://github.com/zli12321/qa_metrics)
 > A fast and lightweight Python package for evaluating question-answering models and prompting of black-box and open-source large language models.
 ## 🎉 Latest Updates
 - **Version 0.2.19 Released!**
@@ -30,6 +30,13 @@ pipeline_tag: text-classification
 ## 🚀 Quick Start
 ### Prerequisites
 - Python >= 3.6
 - openai >= 1.0
@@ -51,9 +58,11 @@ Our package offers six QA evaluation methods with varying strengths:
 | [Open Source LLM Evaluation](https://huggingface.co/zli12321/prometheus2-2B) | All QA types | Free | High |
 | Black-box LLM Evaluation | All QA types | Paid | Highest |
 ## 📖 Documentation
-### 1. Normalized Exact Match
 #### Method: `em_match`
 **Parameters**
@@ -71,7 +80,7 @@ candidate_answer = "The movie \"The Princess and the Frog\" is loosely based off
 match_result = em_match(reference_answer, candidate_answer)
 ```
-### 2. F1 Score
 #### Method: `f1_score_with_precision_recall`
 **Parameters**
@@ -97,7 +106,7 @@ f1_stats = f1_score_with_precision_recall(reference_answer[0], candidate_answer)
 match_result = f1_match(reference_answer, candidate_answer, threshold=0.5)
 ```
-### 3. PEDANTS
 #### Method: `get_score`
 **Parameters**
@@ -160,7 +169,7 @@ scores = pedant.get_scores(reference_answer, candidate_answer, question)
 match_result = pedant.evaluate(reference_answer, candidate_answer, question)
 ```
-### 4. Transformer Neural Evaluation
 #### Method: `get_score`
 **Parameters**
@@ -206,7 +215,7 @@ tm = TransformerMatcher("zli12321/answer_equivalence_tiny_bert")
 match_result = tm.transformer_match(reference_answer, candidate_answer, question)
 ```
-### 5. LLM Integration
 #### Method: `prompt_gpt`
 **Parameters**

 [![PyPI version qa-metrics](https://img.shields.io/pypi/v/qa-metrics.svg)](https://pypi.org/project/qa-metrics/)
 [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Ke23KIeHFdPWad0BModmcWKZ6jSbF5nI?usp=sharing)
 > A fast and lightweight Python package for evaluating question-answering models and prompting of black-box and open-source large language models.
+> `pip install qa-metrics` is all you need!
 ## 🎉 Latest Updates
 - **Version 0.2.19 Released!**
 ## 🚀 Quick Start
+## Table of Contents
+* 1. [Normalized Exact Match](#em)
+* 2. [Token F1 Score](#f1)
+* 3. [PEDANTS](#pedants)
+* 4. [Finetuned Neural Matching](#neural)
+* 5. [Prompting LLM](#llm)
 ### Prerequisites
 - Python >= 3.6
 - openai >= 1.0
 | [Open Source LLM Evaluation](https://huggingface.co/zli12321/prometheus2-2B) | All QA types | Free | High |
 | Black-box LLM Evaluation | All QA types | Paid | Highest |
 ## 📖 Documentation
+### 1. <a name='em'></a>Normalized Exact Match
 #### Method: `em_match`
 **Parameters**
 match_result = em_match(reference_answer, candidate_answer)
 ```
+### 2. <a name='f1'></a>F1 Score
 #### Method: `f1_score_with_precision_recall`
 **Parameters**
 match_result = f1_match(reference_answer, candidate_answer, threshold=0.5)
 ```
+### 3. <a name='pedants'></a>PEDANTS
 #### Method: `get_score`
 **Parameters**
 match_result = pedant.evaluate(reference_answer, candidate_answer, question)
 ```
+### 4. <a name='neural'></a>Transformer Neural Evaluation
 #### Method: `get_score`
 **Parameters**
 match_result = tm.transformer_match(reference_answer, candidate_answer, question)
 ```
+### 5. <a name='llm'></a>LLM Integration
 #### Method: `prompt_gpt`
 **Parameters**