Site icon BotFlo

Calculate the accuracy of all the LLMs using the gold dataset

Once you create the gold dataset, you can do the following:

Once you create a copy of the majority vote CSV file which includes all corrections, it becomes your gold dataset.

Now you can use this gold dataset to calculate the accuracy of all the LLMs for the given task.

import json
import pandas as pd

df: pd.DataFrame = pd.read_csv(f'../csv/gold/onset_date_gold.csv')
experiment = 'onset_date'
key_name = 'onset_date'
gold_values = {}
for index, row in df.iterrows():
    vaers_id = str(row['vaers_id'])
    correct_val = str(row['val_correct'])
    gold_values[vaers_id] = correct_val

model_list = [
    'gemini-1.5-flash',
    'gemini-1.5-pro',
    'gemini-2.0-flash-lite',
    'gemini-2.5-flash-preview-05-20'
]

data = []
for model_name in model_list:
    file_name = f'../json/{experiment}/{experiment}_{model_name}.json'
    with open(file_name, 'r') as f:
        file_obj = json.load(f)
    num_correct = 0
    for key, value in gold_values.items():
        curr_obj = file_obj[key]
        curr_val = curr_obj['parsed'][key_name]
        if value == curr_val:
            num_correct += 1
    accuracy = num_correct
    data.append({
        "model_name": model_name,
        "accuracy": accuracy
    })

df1: pd.DataFrame = pd.DataFrame(data)
df1.to_csv(f'../csv/accuracy/{experiment}_accuracy.csv', index=False)

Here are the results for the ONSET_DATE data extraction task.

Exit mobile version