Tutorial Draft

I have built a free tool BotFlo Autotrain (see how to use it) which will help you extract Dialogflow intents from your chat logs.

Note: the intent extraction is based on dependency parse trees for grammar rules and such, so the tool only works for the English language. I don’t know the grammar rules for any other language. 🙂

An example dialog dataset

I took this example dataset where people called in to customer support at a telecommunications company (the dataset already has redacted personal and other identifying information). This is actually a very good dataset since it a) is obviously realistic and b) provides a lot of insight into this topic on the whole, as you will see later.

Here is a sample list of sentences from the dataset.

1I’m away from home and cannot verify my MAC address due to no access to digital receiver
2I changed my user name why can’t i move on then i am very irritated it said everything was completed an i received a email
3i paid my bill on 10/21 but it does not appear that the money was taken out of my bank account. my cname– account is showing that I made the payment, but i think something is wrong.
4I am a subscriber to cname– interntet and phone, and am unable to play EPIX online content. I am told to “Authenticate your EPIX Login with your Television Provider.”
6Every username I try to enter does not work. It says that the username is not available, and this is impossible. Please advise
7I am currently only getting 4.9 mbs download speed. Is there some sort of local issue effecting my service?
8how do I know time and channel of show I want to record to watch latter
9is there a way to rewire my tv to a diffent room
10Thanks, by wasting my time I may go to Hulu and other services and dumping you.
11cname– instant video like utube or hbo go is giving me an error page
12I need my wireless network name and it’s not written on the bottom of our router.
13I was just sent a contract and was locked out of my e-mal until I accepted the contract. Is this the way your contracts are typically signed???
14Why wont cname– COME OUT TO WHERE IM LIVING and set up service?
15Im head of household, but your shit software won’t let me pay my bill, so that must mean I don’t have to pay it.
16I need the set up code to sinc my onkyo receiver to my cname- remote
17how can i cancel my TV service? im not happy with my package, cost or service, im still with in my 3odays
18tv screen says one moment please, this channel should be available shortly. Ref Code ——-
19Can you send me an updated card with my channell guide stations?
20Need to remove silver package TV channels and continue with the basic package.
21My Wi-Fi is down due to a ssl being unsecured. How do I fix that?
22This web page is not working it will not let me go to the page it keeps saying session is expired
23Why does our cable always say we’re not subscribed when I know we are. I’m really getting tired of this.
24The service I have is for a summer residence. I want to suspend service until next spring, how do I do this?
25good morning, since I updated my user name as it asked I can no longer access my bill to pay
26I would like to know when my payment is due and the amount due please?
27why do some of my channels say they will be available shortly
28My phone does not have dial tone. Checked to boxes that connect tv, phone, internet.
29Hi My Cable wont turn on and their are 4 blue dashses where the time is
30why can’t cname– have a normal email system, cname-s really is bad for email! When you click on an email you can’t read it normally! Your system not worth the cost.
First 30 messages from the dataset

Results from using the tool

So once you upload this dataset to the Autotrain tool, you will get a CSV file which is an extension of the 4 column CSV format which you can use to define and organize your Dialogflow intents.

Here is a screenshot of the resulting CSV file based on the list of messages above (I used a total of 500 messages). Note that the tool is able to group multiple phrases based on their intents (e.g. pay bill) and also outputs the source identifier (SentID) from the original document for each row.

Screenshot of CSV output from my Autotrain tool

Observations from the output CSV file

We can make some interesting observations based on the output file. These observations will in fact make it easier to clean up the output and improve your Dialogflow bot’s training process.

The user message can contain multiple sentences

The conventional approach is to specify a single sentence in a single Dialogflow training phrase. As you can see, that isn’t ideal.

Single user message, multiple sentences

A single sentence can contain multiple topics

This is even more challenging, because splitting up the user input into sentences isn’t always going to be sufficient

Multiple topics per sentence

A lot of user messages span multiple intents

If you took the output produced by the Autotrain tool, and grouped the results by sentence ID, you will notice that a single user message spans multiple intents

The multiple intents may need multiple handlers

That is, if you do have multiple intents inside a single user message, you might need to process them as separate intents because they might require completely unique actions. For example, take a look at this user message:

Single user message spans multiple intents

To handle this user’s message, we have to understand two separate intents. 1 The customer has paid their past due amount, and hence 2. The service needs to be restored.

Contrast that to this other user message.

A different message with a different intent

While the second message is also talking about restoring service, it is due to an entirely different reason (outage).

How to create your Dialogflow agent from the output CSV

As I mentioned earlier, the format used in the output CSV file is based on the 4 Column CSV file used in the Simple FAQ Bot generator.

So you need to delete everything except the first 4 columns, and you will have the 4 column CSV file you need for generating the Dialogflow agent ZIP file with a single click.

You might have noticed that the number of phrases which are actually categorized into non-unique intents (that is, have the same pattern as another phrase) is only about 20% of the dataset. In the follow up article, I will explain how you can cluster more of your phrases into existing intents.