Gemini Pro 2.5 vs GPT 5 (Full) JSON metrics
A comparison Demo
A comparison Demo
Use the system defined in the previous chapter for a single field Evaluate and compare the accuracy for each field (which is comparable) Use Citations and Explanation for the adjudication
We use four metrics related to the JSON responses coming back from LLMs api_error If response_full.json() throws an error, that is considered an api_error is_pure_json If json.loads(inner_response_text) does not throw any error, then is_pure_json is true contains_valid_json Sometimes you have inner_response_text which looks like this: This would be perfectly valid JSON once you remove the…
This is the method used in the code (which is called 100 times) Later in the code, I will be saving all this information into a JSON file. Make note of the different variables used in this code snippet because we will be revisiting them over the next few lessons.
Using LLMs to extract structured data will allow you to easily separate AI hyper from genuine progress It allows you to use a systematic process (like the one I explain in this course) to get an intuition for how well AI is able to do certain tasks When you see how often AI can fail…
Some people refer to agents as “models using tools in a loop” Understanding how structured data extraction works will be an important part of learning about agentic AI since it is often the extracted structured data that is sent to the tool
You can run this script for each LLM on OpenRouter and get a good idea of how things are evolving in terms of LLM reasoning I use this approach in this course to evaluate the ability of many different LLMs to extract structured data (so you can just get this course if you don’t want…
Keep a list of inputs and a predefined schema, and ask the LLM to extract the structured data and measure how well it does For a sufficiently complex schema (which you will use in this course), and reasonably long text, you will be surprised at how often even the best LLMs fail to produce complete…
Here is a simple code example to get started System instruction Input text Pydantic Schema
This is the Pydantic schema used in the course. As you can see, this is a very complex schema which is used to extract structured data from a VAERS report. The complexity of this schema acts as a very good test for the quality of an LLM.
This is the system instruction class This provides one file where you can modify your system instructions to see which instruction gives you the best results
This is the input class In the get_input_text method, the input text is formatted to a) split it into sentences b) prepend a sentence number to each sentence This formatting does matter – in fact it matters quite a lot – when you are extracting structured output, since the sentence numbers can be used to…