An example dialog dataset

I took this example dataset where people called in to customer support at a telecommunications company (the dataset already has redacted personal and other identifying information). This is actually a very good dataset since it a) is obviously realistic and b) provides a lot of insight into this topic on the whole, as you will see later.

Here is a sample list of sentences from the dataset.

1I’m away from home and cannot verify my MAC address due to no access to digital receiver
2I changed my user name why can’t i move on then i am very irritated it said everything was completed an i received a email
3i paid my bill on 10/21 but it does not appear that the money was taken out of my bank account. my cname– account is showing that I made the payment, but i think something is wrong.
4I am a subscriber to cname– interntet and phone, and am unable to play EPIX online content. I am told to “Authenticate your EPIX Login with your Television Provider.”
6Every username I try to enter does not work. It says that the username is not available, and this is impossible. Please advise
7I am currently only getting 4.9 mbs download speed. Is there some sort of local issue effecting my service?
8how do I know time and channel of show I want to record to watch latter
9is there a way to rewire my tv to a diffent room
10Thanks, by wasting my time I may go to Hulu and other services and dumping you.
11cname– instant video like utube or hbo go is giving me an error page
12I need my wireless network name and it’s not written on the bottom of our router.
13I was just sent a contract and was locked out of my e-mal until I accepted the contract. Is this the way your contracts are typically signed???
14Why wont cname– COME OUT TO WHERE IM LIVING and set up service?
15Im head of household, but your shit software won’t let me pay my bill, so that must mean I don’t have to pay it.
16I need the set up code to sinc my onkyo receiver to my cname- remote
17how can i cancel my TV service? im not happy with my package, cost or service, im still with in my 3odays
18tv screen says one moment please, this channel should be available shortly. Ref Code ——-
19Can you send me an updated card with my channell guide stations?
20Need to remove silver package TV channels and continue with the basic package.
21My Wi-Fi is down due to a ssl being unsecured. How do I fix that?
22This web page is not working it will not let me go to the page it keeps saying session is expired
23Why does our cable always say we’re not subscribed when I know we are. I’m really getting tired of this.
24The service I have is for a summer residence. I want to suspend service until next spring, how do I do this?
25good morning, since I updated my user name as it asked I can no longer access my bill to pay
26I would like to know when my payment is due and the amount due please?
27why do some of my channels say they will be available shortly
28My phone does not have dial tone. Checked to boxes that connect tv, phone, internet.
29Hi My Cable wont turn on and their are 4 blue dashses where the time is
30why can’t cname– have a normal email system, cname-s really is bad for email! When you click on an email you can’t read it normally! Your system not worth the cost.
First 30 messages from the dataset

Results from using the tool

So once you upload this dataset to the Autotrain tool, you will get a CSV file which is an extension of the 4 column CSV format which you can use to define and organize your Dialogflow intents.

Here is a screenshot of the resulting CSV file based on the list of messages above (I used a total of 500 messages). Note that the tool is able to group multiple phrases based on their intents (e.g. pay bill) and also outputs the source identifier (SentID) from the original document for each row.

Screenshot of CSV output from my Autotrain tool