Calculate the accuracy of all the LLMs using the gold dataset

aravindmc

2 months ago

Once you create the gold dataset, you can do the following:

calculate the accuracy of each Gemini LLM used for the majority vote (because sometimes they differ from the majority value)
calculate the accuracy of other LLMs you wish to compare

Once you create a copy of the majority vote CSV file which includes all corrections, it becomes your gold dataset.

Now you can use this gold dataset to calculate the accuracy of all the LLMs for the given task.

import json
import pandas as pd

df: pd.DataFrame = pd.read_csv(f'../csv/gold/onset_date_gold.csv')
experiment = 'onset_date'
key_name = 'onset_date'
gold_values = {}
for index, row in df.iterrows():
    vaers_id = str(row['vaers_id'])
    correct_val = str(row['val_correct'])
    gold_values[vaers_id] = correct_val

model_list = [
    'gemini-1.5-flash',
    'gemini-1.5-pro',
    'gemini-2.0-flash-lite',
    'gemini-2.5-flash-preview-05-20'
]

data = []
for model_name in model_list:
    file_name = f'../json/{experiment}/{experiment}_{model_name}.json'
    with open(file_name, 'r') as f:
        file_obj = json.load(f)
    num_correct = 0
    for key, value in gold_values.items():
        curr_obj = file_obj[key]
        curr_val = curr_obj['parsed'][key_name]
        if value == curr_val:
            num_correct += 1
    accuracy = num_correct
    data.append({
        "model_name": model_name,
        "accuracy": accuracy
    })

df1: pd.DataFrame = pd.DataFrame(data)
df1.to_csv(f'../csv/accuracy/{experiment}_accuracy.csv', index=False)

Here are the results for the ONSET_DATE data extraction task.