How much has Google Gemini improved over the last year?

aravindmc

9 months ago

Sundar Pichai mentioned a few interesting things in a recent interview on the All In Podcast

People are automatically asking longer queries because of AI generated answers
Gemini Flash had become a real workhorse of the industry
Gemini LLMs are now at the “pareto frontier” of performance and cost

About a year back, Google Gemini was simply not very good when compared to ChatGPT. But the LLM competition has now intensified, and you can see that Gemini has improved tremendously when compared to a year back.

And this statement isn’t just based on intuition.

You can actually measure this improvement.

And one of the best places to observe this improvement is Structured Data Extraction from unstructured text.

Simon Willison refers to this as “possibly the most economically valuable application of LLMs right now: using them for data entry from unstructured or messy sources”

https://simonwillison.net/2025/May/15/building-on-llms/

Measuring the accuracy of Structured Data Extraction

I have developed a system for measuring the accuracy of Structured Data Extraction

The system is fairly simple:

get four different LLMs to do the same data extraction
if there is a majority vote (at least three agree), use that value as the correct answer
if there is no majority vote, inspect the differences using my LLM accuracy evaluation tool and select or add the correct answer
save the correct answers and use the saved information to evaluate the accuracy of other LLMs, including the ones which generated the majority vote

As you can see the system is straightforward. The benefit of using my particular approach is that I have explained how you can build some tooling to help you speed up the process.

I explain this system in more detail in my Udemy course

LLM Comparions

Here are some accuracy comparisons I have done based on my system

Gemini 1.5 Flash vs Gemini 2.5 Flash Accuracy for Structured Data Extraction

Gemini 1.5 Pro vs Gemini 2.5 Flash Preview accuracy for structured data extraction

Gemini 2 Flash Lite vs Gemini 2.5 Flash Preview accuracy for structured data extraction

Gemini 2 Flash vs Gemini 2.5 Flash Preview for structured data extraction