Gemini Pro 2.5 vs GPT 5 (Full) JSON metrics
A comparison Demo
All lesson notes associated with the LLM Accuracy Comparisons for Structured Outputs course
A comparison Demo
Use the system defined in the previous chapter for a single field Evaluate and compare the accuracy for each field (which is comparable) Use Citations and Explanation for the adjudication
We use four metrics related to the JSON responses coming back from LLMs api_error If response_full.json() throws an error, that is considered an api_error is_pure_json If json.loads(inner_response_text) does not throw any error, then is_pure_json is true contains_valid_json Sometimes you have inner_response_text which looks like this: This would be perfectly valid JSON once you remove the…
This is the method used in the code (which is called 100 times) Later in the code, I will be saving all this information into a JSON file. Make note of the different variables used in this code snippet because we will be revisiting them over the next few lessons.
What the stats look like inside Marimo