Automatically extract Dialogflow intents from chat logs

Managing Large Dialogflow ES Bots

In this article I provide some ideas on how you can automatically extract Dialogflow intents from chat logs.

An example dialog dataset

I took this example dataset where people called in to customer support at a telecommunications company (the dataset already has redacted personal and other identifying information). This is actually a very good dataset since it a) is obviously realistic and b) provides a lot of insight into this topic on the whole, as you will see later.

Here is a sample list of sentences from the dataset.

IDText
1I’m away from home and cannot verify my MAC address due to no access to digital receiver
2I changed my user name why can’t i move on then i am very irritated it said everything was completed an i received a email
3i paid my bill on 10/21 but it does not appear that the money was taken out of my bank account. my cname– account is showing that I made the payment, but i think something is wrong.
4I am a subscriber to cname– interntet and phone, and am unable to play EPIX online content. I am told to “Authenticate your EPIX Login with your Television Provider.”
5i GO TO PUT MY ZIPCODE INTO MY SIGN IN ARE AND IT TELLS ME ITS AND INCORRECT ZIPCODE
6Every username I try to enter does not work. It says that the username is not available, and this is impossible. Please advise
7I am currently only getting 4.9 mbs download speed. Is there some sort of local issue effecting my service?
8how do I know time and channel of show I want to record to watch latter
9is there a way to rewire my tv to a diffent room
10Thanks, by wasting my time I may go to Hulu and other services and dumping you.
11cname– instant video like utube or hbo go is giving me an error page
12I need my wireless network name and it’s not written on the bottom of our router.
13I was just sent a contract and was locked out of my e-mal until I accepted the contract. Is this the way your contracts are typically signed???
14Why wont cname– COME OUT TO WHERE IM LIVING and set up service?
15Im head of household, but your shit software won’t let me pay my bill, so that must mean I don’t have to pay it.
16I need the set up code to sinc my onkyo receiver to my cname- remote
17how can i cancel my TV service? im not happy with my package, cost or service, im still with in my 3odays
18tv screen says one moment please, this channel should be available shortly. Ref Code ——-
19Can you send me an updated card with my channell guide stations?
20Need to remove silver package TV channels and continue with the basic package.
21My Wi-Fi is down due to a ssl being unsecured. How do I fix that?
22This web page is not working it will not let me go to the page it keeps saying session is expired
23Why does our cable always say we’re not subscribed when I know we are. I’m really getting tired of this.
24The service I have is for a summer residence. I want to suspend service until next spring, how do I do this?
25good morning, since I updated my user name as it asked I can no longer access my bill to pay
26I would like to know when my payment is due and the amount due please?
27why do some of my channels say they will be available shortly
28My phone does not have dial tone. Checked to boxes that connect tv, phone, internet.
29Hi My Cable wont turn on and their are 4 blue dashses where the time is
30why can’t cname– have a normal email system, cname-s really is bad for email! When you click on an email you can’t read it normally! Your system not worth the cost.
First 30 messages from the dataset

Results from using the tool

So once you upload this dataset to the Autotrain tool, you will get a CSV file which is an extension of the 4 column CSV format which you can use to define and organize your Dialogflow intents.

Here is a screenshot of the resulting CSV file based on the list of messages above (I used a total of 500 messages). Note that the tool is able to group multiple phrases based on their intents (e.g. pay bill) and also outputs the source identifier (SentID) from the original document for each row.

Screenshot of CSV output from my Autotrain tool

What you can infer from the output of Autotrain

We can make some interesting observations based on the output file. These observations will in fact make it easier to clean up the output and improve your Dialogflow bot’s training process.

The user message can contain multiple sentences

The conventional approach is to specify a single sentence in a single Dialogflow training phrase. As you can see, that isn’t ideal.

Single user message, multiple sentences

A single sentence can contain multiple topics

This is even more challenging, because splitting up the user input into sentences isn’t always going to be sufficient

Multiple topics per sentence

A lot of user messages span multiple intents

If you took the output produced by the Autotrain tool, and grouped the results by sentence ID, you will notice that a single user message spans multiple intents

The multiple intents may need multiple handlers

That is, if you do have multiple intents inside a single user message, you might need to process them as separate intents because they might require completely unique actions. For example, take a look at this user message:

Single user message spans multiple intents

To handle this user’s message, we have to understand two separate intents. 1 The customer has paid their past due amount, and hence 2. The service needs to be restored.

Contrast that to this other user message.

A different message with a different intent

While the second message is also talking about restoring service, it is due to an entirely different reason (outage).

How to create your Dialogflow agent from the output CSV

As I mentioned earlier, the format used in the output CSV file is based on the 4 Column CSV file used in the FAQ Bot generator tool.

So you need to delete everything except the first 4 columns, and you will have the 4 column CSV file you need for generating the Dialogflow agent ZIP file with a single click. You can of course modify the CSV file and add a response for each intent etc. before you convert it to a Dialogflow agent ZIP file.

You might have noticed that the number of phrases which are actually categorized into non-unique intents (that is, have the same pattern as another phrase) is only about 20% of the dataset. In the follow up article, I will explain how you can cluster more of your phrases into existing intents.


About this website

I created this website to provide training and tools for non-programmers who are building Dialogflow chatbots.

I have now changed my focus to Vertex AI Search, which I think is a natural evolution from chatbots.

Note

BotFlo was previously called MiningBusinessData. That is why you see that watermark in many of my previous videos.

Leave a Reply