How to extract number of vaccine doses from VAERS report using Google Gemini

This task is to extract the number of COVID19 vaccine doses a patient has taken

Feeling cold; Chills and shivering; high sweating; This is a spontaneous report from a contactable physician, received from the Regulatory Authority, regulatory authority report number of v20100815 and v20100816, and also received information via System. A 47-year-old female received first dose of bnt162b2 (COMIRNATY, lot number: EP2163, expiration date: 31May2021) via intramuscular at a single dose on 19Feb2021 at 14:05 at left arm for Covid-19 immunisation. Medical history included the histopathological diagnosis of neurofibromatosis, type 1 (von Recklinghausen's disease) from an unknown date, and the past history of excision of neurofibromatosis, no history of allergy, no relevant family history. The patient's concomitant medications were not reported. Body temperature before the vaccination was 35.9 degrees Centigrade on 19Feb2021. On 19Feb2021 at 14:30 (the day of vaccinations), the patient experienced feeling cold, chills and shivering. Patient also had high sweating on unspecified date in Feb2021. The clinical course was provided as follows: On 19Feb2021, the patient was asymptomatic 15 minutes after the inoculation, but suddenly feeling cold, chills and shivering appeared 30 minutes after just before going back to work. There was no decrease in blood pressure, impaired consciousness, nor bradycardia. Due to the high degree of coldness, the saline infusion was started, and heating with an electric blanket was started, and then the symptoms improved shortly. The patient was hospitalized for observation purposes. On 20Feb2021 (1 day after the vaccination), she was discharged because her symptoms disappeared without any significant changes in vital signs. The outcome of all events was recovered on 20Feb2021. The patient was not pregnant at the time of vaccination. Prior to vaccination, the patient was not diagnosed with COVID-19. Since the vaccination, the patient had not been tested for COVID-19. The physician classified seriousness criteria for the events as serious (seriousness criterion: hospitalization) and assessed the causality was unable to determine for bnt162b2, unknown for Recklinghausen's disease. The patient was hospitalized for the events from 19Feb2021 to 20Feb2021.  Reporter comment: Although it was not a typical anaphylactic symptom, it was reported because of generalized cold sensation and high sweating. Upon the blood collection, there were no abnormal findings such as hypoglycemia. The causal relationship with von Recklinghausen's disease was also unknown.; Reporter's Comments: Although it was not a typical anaphylactic symptom, it was reported because of generalized cold sensation and high sweating. Upon the blood collection, there were no abnormal findings such as hypoglycemia. The causal relationship with von Recklinghausen's disease was also unknown.

This is the Python code

import json

import pandas as pd
from google import genai
from pydantic import BaseModel, Field
import os
from dotenv import load_dotenv
from google.genai import types
import time
from enum import Enum

load_dotenv()
gemini_api_key = os.getenv('GEMINI_API_KEY')

class VaccineInfo(BaseModel):
    """Model for extracting hcp information from VAERS data."""
    num_vax_doses: int = Field(description="Number of COVID19 vaccine doses that the patient took. Use -1 if the information is not provided")
    num_vax_doses_explanation: str = Field(
        description="Either An explanation for the inference, or a verbatim sentence from the symptom_text which provides a citation")


client = genai.Client(api_key=gemini_api_key)

df: pd.DataFrame = pd.read_csv(f'../csv/llm/japan_100.csv')
df.columns = [x.upper() for x in df.columns]

model_name = 'gemini-2.0-flash'
experiment = 'num_vax_doses'
file_name = f'../json/{experiment}/{experiment}_{model_name}.json'
curr_json = {}
try:
    with open(file_name, 'r') as f:
        curr_json = json.load(f)
except Exception as e:
    print(e)

processed_ids = curr_json.keys()
print(processed_ids)
num_rows = 100
for index, row in df.head(num_rows).iterrows():
    symptom_text = row['SYMPTOM_TEXT']
    vaers_id = str(row['VAERS_ID'])
    if vaers_id in processed_ids:
        print(f'Skipping {vaers_id}')
        continue
    print(f'Processing {index} = {vaers_id}')
    start_time = time.time()
    prompt = symptom_text
    response = client.models.generate_content(
        model=model_name,
        contents=prompt,
        config=types.GenerateContentConfig(
            system_instruction="You are a VAERS expert, and your goal is to read the symptom_text and provide the output in the specified schema",
            response_mime_type='application/json',
            response_schema=VaccineInfo,
        )
    )
    elapsed = time.time() - start_time
    response_json = json.loads(response.model_dump_json())
    curr_json[vaers_id] = {
        "response": response_json,
        "parsed": response_json['parsed'],
        "duration": elapsed,
        "prompt": prompt
    }
    with open(file_name, 'w+') as f:
        json.dump(curr_json, f, indent=2)

with open(file_name, 'w+') as f:
    json.dump(curr_json, f, indent=2)

Leave a Reply