How do you measure your Dialogflow bot’s accuracy

The confusion matrix

You may be familiar with the term error matrix or┬áconfusion matrix. If not, don’t worry! It is a way to measure if classification techniques work well, and it is quite appropriate in this case because under the hood, Dialogflow takes the user’s input and classifies it to the nearest matching intent.

Sample size

Consider the last 100 user messages to your bot. If you don’t have that many, get a few beta testers to try out your bot for a few minutes.

Chatbase UMM

A while back, I wrote a post which linked to Chatbase’s UMM method which provides a way to reason about your chatbot’s accuracy. While it is a good idea and I do derive some ideas from it, it is not particularly useful because there isn’t any way to measure the accuracy using the UMM method.

About this website

BotFlo1 was created by Aravind Mohanoor as a website which provided training and tools for non-programmers who were2 building Dialogflow chatbots.

This website has now expanded into other topics in Natural Language Processing, including the recent Large Language Models (GPT etc.) with a special focus on helping non-programmers identify and use the right tool for their specific NLP task.

1 BotFlo was previously called MiningBusinessData. That is why you see that name in many videos

2 And still are building Dialogflow chatbots. Dialogflow ES first evolved into Dialogflow CX, and Dialogflow CX itself evolved to add Generative AI features in mid-2023

  • When a take a look to theIntent Detection Confidence, i see a score of 0.83768564. I suppose there is no way i know wich intent gets fired with the 0.16231436 score from the total of 1… as dialogflow dont display such an intent…

    • Yes, that is correct. I wish Dialogflow would have implemented a top N intents feature. It is probably Dialogflow’s biggest shortcoming when compared to the other bot frameworks.