Why VAERS is well suited for this course
- Data is in the public domain
- many different data types
- boolean, enum, date, text, list of strings etc
- has a lot of information already filled out
- can help with verification of LLM outputs
- fairly information dense
- this is a good test for LLM information extraction, as you will see later in the course
- quite technical, but also easily accessible
- contrast: the LAB_DATA field mentions a bunch of diagnostic tests that is both technical, but also not easily accessible