Why VAERS is well suited for this course

  • Data is in the public domain
  • many different data types
    • boolean, enum, date, text, list of strings etc
  • has a lot of information already filled out
    • can help with verification of LLM outputs
  • fairly information dense
    • this is a good test for LLM information extraction, as you will see later in the course
  • quite technical, but also easily accessible
    • contrast: the LAB_DATA field mentions a bunch of diagnostic tests that is both technical, but also not easily accessible