Introduction
As global food production becomes increasingly industrialized, we face a growing dificulty in identifying which product are free from artifical additives and contain clean healthy ingredients. With FDA oversight allowing the use of processed ingredients, finding food products with clean ingredients is more dificult than ever. This project explores different food products and attempts to provide information on food nutritional value and cleanliness. Although current LLMs offer powerful capabilities for information retrieval and reasoning, they struggle with this task since they are trained on a broad internet information that is filled with misinformation, causing them to hallucinate. This project attempts to create a food product reviewer based LLM using a rag approach. Using the large openfoodfacts dataset, the rag pipeline creates lanchain documents for each food product and a FAISS vector store for the LLM to retrive relevent information for each food product. A small LLM was used (Gemma-3-270M) with MiniLM embedding model. The results of this project showed less hallucination to food nutrition information and was able to retrive accurate values to food nutrition questions. The LLMs reasoning slightly improved for general based food questions and can be improved with more finetuning.
Data
The data I am using for this project is the openfoodfacts dataset. The Open Food Facts dataset is the foundational dataset which provides detailed product-level information such as brand, product name, and declared ingredients. This project focuses on the food split only, formatted to consider columns that would provide the most information about the food's nutrition facts. (additives_n", "additives_tags", "allergens_tags", "brands", "categories", "ingredients_analysis_tags", "ingredients_n", "ingredients_original_tags", "nutrient_levels_tags", "nutriments", "nutriscore_grade", "nutriscore_score", "product_name", "vitamins_tags", "nova_groups_tags) The information in these columns were merged and formated into a string, which was passed into a dataframe with a text and source column. the text column contained nturition information and the source contained the product name and brand name. To eveluate the rag model, I generated 5 Q/A pairs on a test split of the product dataframe, covering nutriton facts, ingredients, nutrition rating, allergen tags, and brand name.
Sample Q/A pair:
{
"question": "What is the sugar content in Guanabana (Soursop Pulp)?",
"answer": "11.8100004196167g",
"category": "nutrition_facts",
"source": "Guanabana (Soursop Pulp) - La Fe"
}
3 other benchmarks were used to evaluate the RAG model: NutriBench, hellaswag, and squadv2. NutriBench consists of about 11,857 real-world meal descriptions annotated with macronutrient values (calories, protein, fat, carbs). This benchmark tests the LLM's ability to detect nutrient content from natural language descriptions. Since my rag model is retriving nutrient information for food procducts, this benchmark helps gadge if the LLM knows what keywords to look for in the documents. The reason to use hellaswag and squad is to test the effect rag has on the LLMs reasoning ability on general questions. Since this architecture uses rag, testing reading comprehension tasks is important to understand how well the model can do to very vague or general nutrition based prompts and if it will not hallucinate with such prompts.
Methodology
For this task, I am implementing rag pipeline using embedding-based retrieval with cosine similarity combined with a dense vector store. Due to how massive the dataset is, This pipeline uses only 45% of the dataset due to the time it take to create the vector store (+13 hours). I chose this approach because, based on prior homework experiments, generative models struggled with factual accuracy and hallucinations when answering domain-specific questions. For this task the LLM needs to answer food product information, which is very domain specific. The primary drawback of this approach is the additional computational overhead of embedding all documents. In the code, you will see various embedding transformer that has been tested for this task. After testing different embedding models for the rag pipeline, I found all-MiniLM-L6-v2 performed the best with an exact match rate of 0.6. Do anticipate some hallucintonf for food products that may not be in the vector store.
Evaluation
| Model | test dataset split |
|---|---|
| Base Gemma-3-270M | exact match rate: 0% partial match rate: 5% keyword match rate: 5% |
| RAG Gemma-3-270M with all-MiniLM-L6-v2 | exact match rate: 60% partial match rate: 35% keyword match rate: 35% |
| RAG Gemma-3-270M with all-mpnet-base-v2 | exact match rate: 35% partial match rate: 32.5% keyword match rate: 32.5% |
| RAG Gemma-3-270M with all-multi-qa-mpnet-base-dot-v1 | exact match rate: 52.5% partial match rate: 32.5% keyword match rate: 32.5% |
When experimenting with different rag models, I wanted to test how different embedding models would perform on the rag pipeline. all-multi-qa-mpnet-base-dot-v1 works well with qa based responses, all-mpnet-base-v2 was a large embedding model and all-MiniLM-L6-v2 was a smaller more simpler embedding model. I wanted to test these models specifically to see if size of the enbedding model mattered for my task. using the all-multi-qa-mpnet-base-dot-v1 model helped me understand whether the openfoodfacts should be formatted in a q/a format or less structured. For that model I created more qa pairs and formatted the data as qa data for the rag. The results showed that the small enbedding model did alot better compared to the others.
| Model | Nutribench | hellaswag | squadv2 |
|---|---|---|---|
| Base Gemma-3-270M | carb: acc=0 energy: acc=0.4 fat: acc=0.4 protein: acc=0.2 | 44% | 0% |
| RAG Gemma-3-270M with all-MiniLM-L6-v2 | carb: acc=0.004 energy: acc=0.8 fat: acc=0.804 protein: acc=0.34 | 46% | 0% |
Based off this table, it seems like the reading comprehension tasks score did not change at all with the new model. General reasoning however slightly increased and the score for Nutribench improved. this is a really good sign and shows that the new model is able to recognized nutrition keywords and find the values for nutruiton facts in sentences. Initially I thought the text comprihention value would increase with the rag pipeline since the LLM is retriving text documents to answer questions so it is interesting to see the score did not go up. Hellawsag was a chosen benchmark to tes if the rag model improves general reasoning. As hypothesized, with rag and more nutrition knowledge, it was predicted for the hellaswag score to go up. Nutribench was the perfect benchmark for this task to test more general nutrition based prompts and test whether it understand nutrition based words in paragraphs.
Here is an example of a response from the base Gemma model:
Question: What is the sugar content in Guanabana (Soursop Pulp)?
Expected Answer: 11.8100004196167g
Model Answer: Sugar content in Guanabana (Soursop Pulp) is 100% sugar.
Here is an example of a resonse from RAG Gemma:
Question: What is the sugar content in Guanabana (Soursop Pulp)?
Expected Answer: 11.8100004196167g
RAG Answer: 11.8100004196167g
Retrieved Documents: 3
Context Preview: Question: What is the sugar content in Guanabana (Soursop Pulp)?
Answer: 11.8100004196167g
The rag was able to retrieve the answer from the relevent documents.
Usage and Intended Uses
Installation:
pip install -r requirements.txt
starter code:
from rag_system import RAGSystem, load_csv_data
# Initialize the RAG system
rag = RAGSystem(model_name="google/gemma-3-270m")
# Load the data in the repo
documents = load_csv_data("your_data.csv")
# Create knowledge base
rag.create_knowledge_base(documents, chunk_size=512)
# Ask questions
result = rag.ask_rag("What is the sugar content in bananas?")
print(f"Answer: {result['answer']}")
In this repository you will find a rag_system.py that contains the build for the RAG Gemma-3-270M with all-MiniLM-L6-v2 model. Using the code above you can run the model and ask it questions yourself. the checkin4 code contains all the testing done during the experimentation stage and results are saved to json files. The main use case for this model is to help provide nutrition facts to users when finding certain food products.
Prompt Format and Output
for the best results, your prompts should be formatted towards questions about certain food products. For example: What is the sugar count for Guanabana (Soursop Pulp)? Are there any additives in Guanabana (Soursop Pulp)? What is the nutriscore for La Fe Guanabana?
Asking questions about a specific food product will give more accurate results, but you can ask about food in general like yougurt or red beens
Example run:
Question: what is the sugar count for sainsbury's red kidney beans
Thinking...
Answer: The sugar count for sainsbury's red kidney beans is 1.0 g per 100g.
Retrieved 4 relevant documents:
1. Source: Red Kidney Beans - Sainsbury's
Preview: product_name: Red Kidney Beans | brands: Sainsbury's | categories: en:red-beans | additives_n: 1.0 | additives_tags: e509 | allergens_tags: N/A | ingr...
2. Source: Red Kidney Beans - Weis Quality
Preview: product_name: Red Kidney Beans | brands: Weis Quality | categories: undefined | additives_n: 0.0 | additives_tags: N/A | allergens_tags: N/A | ingredi...
3. Source: Red Dark Kidney Beans - California Garden
Preview: product_name: Red Dark Kidney Beans | brands: California Garden | categories: Plant-based foods and beverages, Plant-based foods, Legumes and their ...
4. Source: Red Kidney Beans - Luigi Vitelli
Preview: product_name: Red Kidney Beans | brands: Luigi Vitelli | categories: undefined | additives_n: 0.0 | additives_tags: N/A | allergens_tags: N/A | ingred...
Limitations
Since the dataset does not contain all the food products in the world, the model may have a tough time answering questions on food product it has no documents for. However, to revent any hallucination, the model will respond with N/A to any questions it is not able to retrieve information for.
Question: Does Guanabana (Soursop Pulp) contain any allergens?
Expected Answer: N/A
RAG Answer: N/A
Retrieved Documents: 3
Context Preview: Question: Does Guanabana (Soursop Pulp) contain any allergens?
Answer: N/A
Question: Does Guanabana Pulp contain any allergens?
Answer: N/A
Question: Does Fruta Guanabana Pulp contain any allergens?...
Evaluation: Skipped (ground truth was 'Not specified')
Model tree for prathch2/nutrition_openfoodfacts_rag
Base model
google/gemma-3-270m