Measuring schema compliance using Structured Output Percentage Stats
- Given a Pydantic class and a set of input prompts
- How many were pure JSON?
- How many contained valid JSON?
- Claude (which does not support structured outputs) often produces high quality responses which have valid JSON but it is not pure JSON
- How many of the valid JSON were also valid schema (adhered to the Pydantic class)
- Structured Output Percentage Stats (see image)
- Parse JSON into a dataframe to verify
- is pure json
- contains valid json
- json is valid schema
- Save results to CSV file from inside Marimo
- Marimo Demo
What the stats look like inside Marimo
