How do you measure your Dialogflow bot’s accuracy

The confusion matrix

You may be familiar with the term error matrix or┬áconfusion matrix. If not, don’t worry! It is a way to measure if classification techniques work well, and it is quite appropriate in this case because under the hood, Dialogflow takes the user’s input and classifies it to the nearest matching intent.

Sample size

Consider the last 100 user messages to your bot. If you don’t have that many, get a few beta testers to try out your bot for a few minutes.

Chatbase UMM

A while back, I wrote a post which linked to Chatbase’s UMM method which provides a way to reason about your chatbot’s accuracy. While it is a good idea and I do derive some ideas from it, it is not particularly useful because there isn’t any way to measure the accuracy using the UMM method.
  • When a take a look to theIntent Detection Confidence, i see a score of 0.83768564. I suppose there is no way i know wich intent gets fired with the 0.16231436 score from the total of 1… as dialogflow dont display such an intent…

    • Yes, that is correct. I wish Dialogflow would have implemented a top N intents feature. It is probably Dialogflow’s biggest shortcoming when compared to the other bot frameworks.