Sundar Pichai mentioned a few interesting things in a recent interview on the All In Podcast
- People are automatically asking longer queries because of AI generated answers
- Gemini Flash had become a real workhorse of the industry
- Gemini LLMs are now at the “pareto frontier” of performance and cost
About a year back, Google Gemini was simply not very good when compared to ChatGPT. But the LLM competition has now intensified, and you can see that Gemini has improved tremendously when compared to a year back.
And this statement isn’t just based on intuition.
You can actually measure this improvement.
And one of the best places to observe this improvement is Structured Data Extraction from unstructured text.
Simon Willison refers to this as “possibly the most economically valuable application of LLMs right now: using them for data entry from unstructured or messy sources”
Measuring the accuracy of Structured Data Extraction
I have developed a system for measuring the accuracy of Structured Data Extraction
The system is fairly simple:
- get four different LLMs to do the same data extraction
- if there is a majority vote (at least three agree), use that value as the correct answer
- if there is no majority vote, inspect the differences using my LLM accuracy evaluation tool and select or add the correct answer
- save the correct answers and use the saved information to evaluate the accuracy of other LLMs, including the ones which generated the majority vote
As you can see the system is straightforward. The benefit of using my particular approach is that I have explained how you can build some tooling to help you speed up the process.
I explain this system in more detail in my Udemy course
LLM Comparions
Here are some accuracy comparisons I have done based on my system
Gemini 1.5 Flash vs Gemini 2.5 Flash Accuracy for Structured Data Extraction
Gemini 1.5 Pro vs Gemini 2.5 Flash Preview accuracy for structured data extraction
Gemini 2 Flash Lite vs Gemini 2.5 Flash Preview accuracy for structured data extraction
Gemini 2 Flash vs Gemini 2.5 Flash Preview for structured data extraction