Automatically extract Dialogflow intents from chat logs
In this article I provide some ideas on how you can automatically extract Dialogflow intentsBoth Dialogflow ES and Dialogflow CX have the concept of int... More from chat logs.
An example dialog dataset
I took this example dataset where people called in to customer support at a telecommunications company (the dataset already has redacted personal and other identifying information). This is actually a very good dataset since it a) is obviously realistic and b) provides a lot of insight into this topic on the whole, as you will see later.
Here is a sample list of sentences from the dataset.
ID | Text |
1 | I’m away from home and cannot verify my MAC address due to no access to digital receiver |
2 | I changed my user name why can’t i move on then i am very irritated it said everything was completed an i received a email |
3 | i paid my bill on 10/21 but it does not appear that the money was taken out of my bank account. my cname– account is showing that I made the payment, but i think something is wrong. |
4 | I am a subscriber to cname– interntet and phone, and am unable to play EPIX online content. I am told to “Authenticate your EPIX Login with your Television Provider.” |
5 | i GO TO PUT MY ZIPCODE INTO MY SIGN IN ARE AND IT TELLS ME ITS AND INCORRECT ZIPCODE |
6 | Every username I try to enter does not work. It says that the username is not available, and this is impossible. Please advise |
7 | I am currently only getting 4.9 mbs download speed. Is there some sort of local issue effecting my service? |
8 | how do I know time and channel of show I want to record to watch latter |
9 | is there a way to rewire my tv to a diffent room |
10 | Thanks, by wasting my time I may go to Hulu and other services and dumping you. |
11 | cname– instant video like utube or hbo go is giving me an error page |
12 | I need my wireless network name and it’s not written on the bottom of our router. |
13 | I was just sent a contract and was locked out of my e-mal until I accepted the contract. Is this the way your contracts are typically signed??? |
14 | Why wont cname– COME OUT TO WHERE IM LIVING and set up service? |
15 | Im head of household, but your shit software won’t let me pay my bill, so that must mean I don’t have to pay it. |
16 | I need the set up code to sinc my onkyo receiver to my cname- remote |
17 | how can i cancel my TV service? im not happy with my package, cost or service, im still with in my 3odays |
18 | tv screen says one moment please, this channel should be available shortly. Ref Code ——- |
19 | Can you send me an updated card with my channell guide stations? |
20 | Need to remove silver package TV channels and continue with the basic package. |
21 | My Wi-Fi is down due to a ssl being unsecured. How do I fix that? |
22 | This web page is not working it will not let me go to the page it keeps saying session is expired |
23 | Why does our cable always say we’re not subscribed when I know we are. I’m really getting tired of this. |
24 | The service I have is for a summer residence. I want to suspend service until next spring, how do I do this? |
25 | good morning, since I updated my user name as it asked I can no longer access my bill to pay |
26 | I would like to know when my payment is due and the amount due please? |
27 | why do some of my channels say they will be available shortly |
28 | My phone does not have dial tone. Checked to boxes that connect tv, phone, internet. |
29 | Hi My Cable wont turn on and their are 4 blue dashses where the time is |
30 | why can’t cname– have a normal email system, cname-s really is bad for email! When you click on an email you can’t read it normally! Your system not worth the cost. |
Results from using the tool
So once you upload this dataset to the Autotrain tool, you will get a CSV file which is an extension of the 4 column CSV format which you can use to define and organize your Dialogflow intents.
Here is a screenshot of the resulting CSV file based on the list of messages above (I used a total of 500 messages). Note that the tool is able to group multiple phrases based on their intents (e.g. pay bill) and also outputs the source identifier (SentID) from the original document for each row.
What you can infer from the output of Autotrain
We can make some interesting observations based on the output file. These observations will in fact make it easier to clean up the output and improve your Dialogflow bot’s training process.
The user message can contain multiple sentences
The conventional approach is to specify a single sentence in a single Dialogflow training phrase. As you can see, that isn’t ideal.
A single sentence can contain multiple topics
This is even more challenging, because splitting up the user input into sentences isn’t always going to be sufficient
A lot of user messages span multiple intents
If you took the output produced by the Autotrain tool, and grouped the results by sentence ID, you will notice that a single user message spans multiple intents
The multiple intents may need multiple handlers
That is, if you do have multiple intents inside a single user message, you might need to process them as separate intents because they might require completely unique actions. For example, take a look at this user message:
To handle this user’s message, we have to understand two separate intents. 1 The customer has paid their past due amount, and hence 2. The service needs to be restored.
Contrast that to this other user message.
While the second message is also talking about restoring service, it is due to an entirely different reason (outage).
How to create your Dialogflow agent from the output CSV
As I mentioned earlier, the format used in the output CSV file is based on the 4 Column CSV file used in the FAQ Bot generator tool.
So you need to delete everything except the first 4 columns, and you will have the 4 column CSV file you need for generating the Dialogflow agent ZIP file with a single click. You can of course modify the CSV file and add a response for each intent etc. before you convert it to a Dialogflow agent ZIP file.
You might have noticed that the number of phrases which are actually categorized into non-unique intents (that is, have the same pattern as another phrase) is only about 20% of the dataset. In the follow up article, I will explain how you can cluster more of your phrases into existing intents.
Note: This is my old website and is in maintenance mode. I am publishing new articles only on my new website.
If you are not sure where to start on my new website, I recommend the following article:
Is Dialogflow still relevant in the era of Large Language Models?