Measuring schema compliance using Structured Output Percentage Stats

Given a Pydantic class and a set of input prompts
- How many were pure JSON?
- How many contained valid JSON?
  - Claude (which does not support structured outputs) often produces high quality responses which have valid JSON but it is not pure JSON
- How many of the valid JSON were also valid schema (adhered to the Pydantic class)
- Structured Output Percentage Stats (see image)
Parse JSON into a dataframe to verify
- is pure json
- contains valid json
- json is valid schema
Save results to CSV file from inside Marimo
Marimo Demo

What the stats look like inside Marimo