Measuring LLM Accuracy for Structured Outputs

This course provides a step by step process for measuring the accuracy of any LLM if you are using it for extracting structured outputs from unstructured data.

Lesson Notes are notes I use to present the videos inside the course.

Introduction

Why learn about Structured Outputs

Pros and Cons of using OpenRouter

OpenRouter Response Schema vs Structured Outputs

The benchmark task

Empty values in NUMDAYS field

Calculating an upper limit for NUMDAYS

Why the empty NUMDAYS value is a good test case

Calculating the accuracy

Send 100 requests to an LLM using OpenRouter

Measuring schema compliance using Structured Output Percentage Stats

Run the same experiment using four LLMs

Consolidate multiple results into a single CSV file

Use DataBlist to generate the gold dataset

Use the gold dataset to calculate accuracy for all LLMs

Calculating the accuracy for complex Pydantic schemas

Send 100 requests to OpenRouter

Compare accuracy for each field

Gemini Pro 2.5 vs GPT 5 (Full) JSON metrics