|

GPT5 vs Gemini Pro 2.5 for Pydantic structured output using OpenRouter

I tested sending 100 requests each to both GPT-5 and Gemini Pro 2.5 to extract information from clinical narratives into a very complex Pydantic schema

I used the following metrics:

is_pure_json

contains_valid_json

is_valid_schema

to measure how well they adhered to the Pydantic schema

Sometimes the LLMs will not even produce valid JSON as a substring of the response, in which case I would do a retry for that particular clinical narrative (until I eventually got contains_valid_json = True for all 100 responses)

These are the results

MetricGPT-5Gemini Pro 2.5
Number of retries244
is_pure_json1000
contains_valid_json100100
is_valid_schema9928

I discuss the workflow I used (for measuring all these values) in my Prompt Engineering for Structured Outputs course.

Which one should you use?

There are actually a few more tradeoffs to consider.

GPT-5 needs more retries

GPT-5 needs a total of 24 retries as compared to just 4 retries for Gemini Pro 2.5 (on OpenRouter) to get valid JSON. I think this happens because GPT-5 is trying very hard behind the scenes to produce perfectly valid schema, which becomes quite a hard task for a sufficiently complex schema.

And you can of course make your Pydantic schema as complex as you want, bounded only by the limit of what you can understand in the future.

This is why I think there is still a long way to go for these LLMs to become truly impressive (in my book).

Gemini Pro 2.5 is faster

The first thing to note is that Gemini Pro 2.5 is quite a bit faster than GPT-5 when it comes to extracting structured data

Some stats for the 100 requests

MetricGPT-5Gemini Pro 2.5
Median Elapsed Time in seconds28879
Mean Elapsed Time in seconds28483
Minimum Elapsed Time (s)16555
Maximum Elapsed Time (s)419141

This means if you are very concerned about the response latency of structured outputs, Gemini Pro 2.5 is a better choice.

Gemini Pro 2.5 is less verbose

Although you do want your LLM response to be clear (especially the Explanation field) you do not want the LLM to be too wasteful in terms of output tokens

MetricGemini Pro 2.5GPT 5
Completion Tokens Median922913928
Completion Tokens Mean980013626
Completion Tokens Min58827382
Completion Tokens Max1701718078

Leave a Reply