How I compare the Pydantic schema adherence of multiple LLMs on OpenRouter
This is process I use to compare the Pydantic Schema adherence of multiple LLMs on OpenRouter
I first send 100 requests to the LLM (these are async requests, sent 5 requests concurrently)
Most of the responses contain valid_json but some of them don’t
Some of the requests also throw an api_error in which case contains_valid_json will automatically be False (in other words the response will not contain valid JSON).
I retry all requests which don’t have valid JSON – this automatically also includes those which threw an API error during the last request batch.
For every retry batch, most of the responses end up with valid JSON – in other words, the batch size itself keeps shrinking.
I keep doing more retrys until all responses have valid JSON.
