Measuring schema compliance using Structured Output Percentage Stats

  • Given a Pydantic class and a set of input prompts
    • How many were pure JSON?
    • How many contained valid JSON?
      • Claude (which does not support structured outputs) often produces high quality responses which have valid JSON but it is not pure JSON
    • How many of the valid JSON were also valid schema (adhered to the Pydantic class)
    • Structured Output Percentage Stats (see image)
  • Parse JSON into a dataframe to verify
    • is pure json
    • contains valid json
    • json is valid schema
  • Save results to CSV file from inside Marimo
  • Marimo Demo

What the stats look like inside Marimo