Website Name Change
I have changed the name of this website from Mining Business Data to BotFlo. I am offering a 60% off discount on both my Dialogflow ES and Dialogflow CX courses till April 20th 2021 for people who can help me spread the word about my new website.
I get some variant of this question quite often from readers or coaching clients:
What type of machine learning is happening within the black box? Any ideas?
While the short answer is “No, I don’t since Dialogflow isn’t open source as of date”, this doesn’t mean you cannot try and reverse engineer and get as much understanding as you can by using a special tool which already exists inside Dialogflow: the score coming back in the JSON response.
Here are a few things which can help you understand what is going on. I have also added the relevant thumbnail from my Dialogflow Conversation Design course for those who might be interested in pursuing this further.
The candidate list
Imagine this. You have designed a chatbot. And you are giving a demo of your chatbot to a friend.
They have tried a couple of messages, and both work just fine. And then they try a third message. You are anticipating a certain response from the chatbot based on what you think should happen. Instead, a different response comes back.
And you are wondering: why did that response get selected by Dialogflow? To answer this question, you should first understand that there is such a thing as a list of intents which are all “viable candidates” at that point in the conversation.
The score coming back in the JSON response
A second thing you should understand is that there is a field in the JSON called score, now called the Intent Detection Confidence in API v2, coming back each time you try out a message in the Dialogflow simulator.
As it turns out, we can use the intent detection confidence value to do a bunch of testing and understand what is going on under the hood.
Scoring the user’s message
Words which have a common root, such as intent, intend, intended, and intention are treated by Dialogflow as well as other NLU bot frameworks as being similar or even identical (from the viewpoint of the algorithm which processes them). This root is called the “stem” of the word and stemming is what helps Dialogflow manage multiple variants of the same basic “word concept” so it can do better intent matching.
In the video below, I show how stemming can impact Dialogflow’s intent mapping.
There are some words in the English language which are not high in information value. Words such as “a”, “the” etc are so common that they are generally not very useful when doing intent mapping – they are considered somewhat superfluous.
I show an example of that in my video.
So you already know that tweaking the value of the ML Threshold makes it more or less likely that only close phrases (to what is already in your training set) will match the intent. But how does it work in practice?
I show an example below.
You can tell Dialogflow to give higher weight to certain words and phrases by repeating it (or them) in multiple training phrases. This is why the Dialogflow team encourages bot creators to use about 10-15 training phrases per intent.
The repetition tells Dialogflow which are the most important words that you really would like to pattern match on. The words which don’t get repeated as much in the training phrases are given less weight (and thus less importance) as the matching happens.
Other ways to get insight into what’s going on
In addition to the ideas I have described above, you can play around with the JSON score (intent detection confidence) in a few more ways to understand what really happens under the hood:
- use different entity values and see if/how it affects the score
- introduce a typo into the entity value
- introduce a typo into a non-entity word (that is, a word which you have declared in the training phrase)
- use close synonyms instead of the words already in your training phrases
- create contexts with lifespan more than 1 (which I don’t usually recommend) and see if higher lifespan contexts produce higher scores for the same training phrase
- see if putting everything within a followup intent tree affects the score
At the end of the day, while all these can help, you will still be doing a good amount of testing, and trial and error if you want your chatbot to be as accurate as possible in handling the user’s messages. But I hope this article serves as a good starting point for you to go and explore how things work.