Comparing the accuracy of ChatGPT and Dialogflow

Please note: when I say ChatGPT, I am referring to the API provided by OpenAI. Using the term ChatGPT is helpful because it has better name recognition, and also because the API methods themselves refer to many different trained models and ChatGPT is like an umbrella term which covers all of them. 

To summarize what we have learnt so far:

To provide a comparison with Dialogflow (which I have been working with for many years), I used the HN FAQ CSV file and used my Dialogflow ES FAQ Bot Generator tool to convert the CSV file into a Dialogflow ES agent ZIP file.

This created the following Dialogflow ES agent:

Here are the two queries we tried with the fine-tuning approach, with the corresponding responses:

QueryResponse
How is Show HN different from Ask HN?Ask HN is for asking specific questions about your startup or software project. Show HN is for showing off your work to the community
How is Ask HN different from Show HN?Ask HN is for asking for help, and Show HN is for sharing personal stories
Notice that the second answer is wrong – nowhere does the FAQ page mention the phrase personal stories

Now let us try the same two queries with our Dialogflow ES bot:

As you can see, switching the question order does not cause any problems for our Dialogflow ES bot.

I will go over this in a later article, but it is also worth pointing out that this is actually an ES bot which has a single training phrase in the corresponding intent, Dialogflow ES has a very generous free tier, and the Dialogflow Messenger integration is provided for free out-of-the-box.

Does this mean Dialogflow ES is more accurate than ChatGPT?

It is actually quite nuanced, and I hope to cover this in future lessons.

But it is important to notice that when you send either the full FAQ page, or a truncated version of the FAQ page based on top N matches, GPT has to extract the answer from the page. And the answer was quite accurate.

On the other hand, when you use fine-tuning, the accuracy goes down because GPT lacks the surrounding context (that is my best explanation at the moment).

But Dialogflow uses an entirely different approach. If we call GPT’s approach as Extractive Question Answering, we can call Dialogflow’s approach as Intent-based Question Answering.

Both are useful in their own ways, and in fact both also have their own set of limitations.

In my opinion, you first need to decide which type of FAQ bot you are building before making the choice.