Why Google Gemini is the best choice for LLM evals (LLM as a judge)

In a recent article, Simon Willison wrote this

In my opinion, Google Gemini models make great choices for LLM evals

  • there are about 7 different Gemini LLMs which are fairly decent in terms of quality
    • you only need 4 different LLMs for the system I recommend
  • most of them are very cost efficient
    • for Tier 1 (which means you provide your credit card information), the rate limits are very generous
    • you will usually be able to generate the entire evaluation set for free
  • they are now fairly accurate
    • there has been a major improvement in recent Gemini models
  • Gemini supports pydantic BaseModel based structured data extraction
    • the schema validation is done server side
    • reduces code complexity
      • might need to use the instructor library otherwise
    • reduces cost
      • no need for retries (which instructor might do)
    • reduces latency
      • for the same reason – no need for retries
    • Note: sometimes Gemini can fail to do the extraction and you might have to do a retry, but it will happen on the server side

Leave a Reply