How to extract AGE from VAERS report using Google Gemini

We are interested in extracting the AGE information from a VAERS writeup. This must be an integer. Since there isn’t anything ambiguous about the age, we expect that this will be an easy task for the current generation of LLMs.

Feeling cold; Chills and shivering; high sweating; This is a spontaneous report from a contactable physician, received from the Regulatory Authority, regulatory authority report number of v20100815 and v20100816, and also received information via System. A 47-year-old female received first dose of bnt162b2 (COMIRNATY, lot number: EP2163, expiration date: 31May2021) via intramuscular at a single dose on 19Feb2021 at 14:05 at left arm for Covid-19 immunisation. Medical history included the histopathological diagnosis of neurofibromatosis, type 1 (von Recklinghausen''s disease) from an unknown date, and the past history of excision of neurofibromatosis, no history of allergy, no relevant family history. The patient''s concomitant medications were not reported. Body temperature before the vaccination was 35.9 degrees Centigrade on 19Feb2021. On 19Feb2021 at 14:30 (the day of vaccinations), the patient experienced feeling cold, chills and shivering. Patient also had high sweating on unspecified date in Feb2021. The clinical course was provided as follows: On 19Feb2021, the patient was asymptomatic 15 minutes after the inoculation, but suddenly feeling cold, chills and shivering appeared 30 minutes after just before going back to work. There was no decrease in blood pressure, impaired consciousness, nor bradycardia. Due to the high degree of coldness, the saline infusion was started, and heating with an electric blanket was started, and then the symptoms improved shortly. The patient was hospitalized for observation purposes. On 20Feb2021 (1 day after the vaccination), she was discharged because her symptoms disappeared without any significant changes in vital signs. The outcome of all events was recovered on 20Feb2021. The patient was not pregnant at the time of vaccination. Prior to vaccination, the patient was not diagnosed with COVID-19. Since the vaccination, the patient had not been tested for COVID-19. The physician classified seriousness criteria for the events as serious (seriousness criterion: hospitalization) and assessed the causality was unable to determine for bnt162b2, unknown for Recklinghausen''s disease. The patient was hospitalized for the events from 19Feb2021 to 20Feb2021. Reporter comment: Although it was not a typical anaphylactic symptom, it was reported because of generalized cold sensation and high sweating. Upon the blood collection, there were no abnormal findings such as hypoglycemia. The causal relationship with von Recklinghausen''s disease was also unknown.; Reporter''s Comments: Although it was not a typical anaphylactic symptom, it was reported because of generalized cold sensation and high sweating. Upon the blood collection, there were no abnormal findings such as hypoglycemia. The causal relationship with von Recklinghausen''s disease was also unknown.

This is the Python code for extracting the age

import json

import pandas as pd
from google import genai
from pydantic import BaseModel, Field
import os
from dotenv import load_dotenv
from google.genai import types
import time

load_dotenv()
gemini_api_key = os.getenv('GEMINI_API_KEY')


class AgeExtraction(BaseModel):
    """Model for extracting age information from VAERS data."""
    age: int = Field(description="The age of the patient in years. Must be a positive integer.")
    age_explanation: str = Field(
        description="Verbatim sentence from the symptom_text which provides a citation for the age")


client = genai.Client(api_key=gemini_api_key)

df: pd.DataFrame = pd.read_csv(f'../csv/llm/japan_100.csv')
df.columns = [x.upper() for x in df.columns]

model_name = 'gemini-1.5-pro'
experiment = 'age'
full_json = {}
num_rows = 100
for index, row in df.head(num_rows).iterrows():
    symptom_text = row['SYMPTOM_TEXT']
    vaers_id = row['VAERS_ID']
    print(f'Processing {index} = {vaers_id}')
    start_time = time.time()
    prompt = symptom_text
    response = client.models.generate_content(
        model=model_name,
        contents=prompt,
        config=types.GenerateContentConfig(
            system_instruction="You are a VAERS expert, and your goal is to read the symptom_text and provide the output in the specified schema",
            response_mime_type='application/json',
            response_schema=AgeExtraction,
        )
    )
    elapsed = time.time() - start_time
    response_json = json.loads(response.model_dump_json())
    full_json[vaers_id] = {
        "response": response_json,
        "parsed": response_json['parsed'],
        "duration": elapsed,
        "prompt": prompt
    }

file_name = f'../json/{experiment}/{experiment}_{model_name}.json'
with open(file_name, 'w+') as f:
    json.dump(full_json, f, indent=2)

Leave a Reply