Home / DialogFlow ES / How do you measure your Dialogflow bot’s accuracy
DialogFlow ES | Analytics

How do you measure your Dialogflow bot’s accuracy

First Published: July 2018 Last updated: August 2021

This article provides a simple way to measure your Dialogflow bot’s accuracy.

This technique is somewhat tedious, but it will be very useful if you need a metric to evaluate the quality of your chatbot.

The U-M-M method

A while back, I wrote a post which linked to Chatbase’s UMM method which provides a way to reason about your chatbot’s accuracy. While it is a good idea and I do derive some ideas from it, it is not particularly useful because there isn’t any way to measure the accuracy using the UMM method.

What I am proposing here is much simpler[1].

The confusion matrix

You may be familiar with the term error matrix or confusion matrix. If not, don’t worry!

It is a way to measure if classification techniques work well, and it is quite appropriate in this case because under the hood, Dialogflow takes the user’s input and classifies it to the nearest matching intent.

So let us define the following terms:

Regular intent = An intent which is not a fallback intent

Correct mapping = the user’s phrase was mapped to the expected, appropriate intent. By the way, if you find the idea of “correct mapping” subjective rather than objective, you probably need to improve the way you are defining your intents.

True Positive (TP) = A user phrase is mapped to a regular intent correctly

True Negative (TN) = A user phrase is mapped to a fallback intent correctly (this means, we haven’t yet declared an intent to handle the user’s phrase)

False Positive (FP) = A regular intent is triggered, but it should have either been mapped to a different regular intent, or it should have been mapped to the fallback intent because we don’t yet handle the phrase. Instead it is wrongly mapped to a regular intent.

False Negative (FN) = A fallback intent is triggered, but we actually have already defined a regular intent which should have been mapped to the user’s phrase. An excellent example of this is when the user types a message which is nearly identical to a training phrase except for a small typo.

Here is a little flowchart you can use as a reference:

Sample size

Consider the last 100 user messages to your bot. If you don’t have that many, get some 10 or 15 beta testers to try out your bot for a few minutes.

Let TP be the number (out of the 100) messages which were true positive mappings.

Similarly, TN = number of true negative mappings

FP = number of false positive mappings

FN = number of false negative mappings

Let Correct Mapping (CM) = TP + TN

Let Incorrect Mapping (IM) = FP + FN

Accuracy = CM / CM + IM

Since CM + IM = 100 (if you got the correct sample size), the value is already a percentage.

Tips to improve your bot’s accuracy

In my Improving Dialogflow ES accuracy course, I provide a set of tips which allow you to improve your Dialogflow ES bot’s accuracy.

[1] However, it works best with my existing recommendations such as avoiding slot filling, using a context lifespan of 1 etc. That makes the bot much easier to analyze.

<— End of article —>


This website contains affiliate links. See the disclosure page for more details. 
"The magic key I needed as a non-programmer"

The custom payload generator was the magic key I needed (as a non-programmer) to build a good demo with rich responses in DialogFlow Messenger. I've only used it for 30 minutes and am thrilled. I've spent hours trying to figure out some of the intricacies of DialogFlow on my own. Over and over, I kept coming back to Aravind's tutorials available on-line. I trust the other functionalities I learn to use in the app will save me additional time and heartburn.

- Kathleen R
Cofounder, gathrHealth
"Much clearer than the official documentation to be honest"

Thanks a lot for the advice (of buying and following your videos)! They helped a lot indeed. Everything is very clear when you explain, much clearer than the official documentation to be honest 🙂

Neuraz T
Review for Learn Dialogflow CX
"I will strongly recommend this course because even I can learn how to design chatbot (no programming background)"

I think Aravind really did a great job to introduce dialogflow to people like me, without programming background. He organizes his course in very clear manner since I have been a college professor for 20 years. It is very easy for me to recognize how great Aravind’s course is! Very use-friend and very easy to follow. He doesn’t have any strong accent when he gives the lectures. It is so easy for me to understand. Really appreciate it.

Yes, I will strongly recommend this course because even I can learn how to design chatbot (no programming background) after studying Avarind’s course, you definitely can!

Ann Cai
Review for Learn Dialogflow ES

Similar Posts

  • When a take a look to theIntent Detection Confidence, i see a score of 0.83768564. I suppose there is no way i know wich intent gets fired with the 0.16231436 score from the total of 1… as dialogflow dont display such an intent…

    • Yes, that is correct. I wish Dialogflow would have implemented a top N intents feature. It is probably Dialogflow’s biggest shortcoming when compared to the other bot frameworks.