Integrating GPT into your Dialogflow bot

The bot on the bottom left of the site is a demo of the Dialogflow + GPT integration.

It can help you choose between Dialogflow ES, Dialogflow CX and the GPT API depending on your use case [1].

I think Dialogflow is superior to GPT in many ways.

But you don’t have to choose between them. It is also possible to integrate GPT into your Dialogflow bot and use it to improve your Dialogflow bot’s accuracy.

You can find online tutorials where people use Dialogflow as a thin wrapper – that is, the Dialogflow bot will not have anything other than the Default intents. It will simply forward the user’s message to GPT API from a webhook in the Fallback Intent. This way, you do get the builtin chat widgets already provided by Dialogflow (e.g. the Dialogflow Messenger website integration) and also you automatically see the conversation history.

But a student of mine had this question:

And now, I am getting my head around on how to properly ‘merge’ DialogFlow with GPT. The first and foremost aspect is on how to better handle context of a given conversation/journey. Like considering the principles of DialogFlow workings, but instead adding a custom GPT model to the mix.Have you tried achieving something in this sense?

In other words, is there a way to implement a deeper integration of GPT into your Dialogflow ES bot.

While this is possible, there is an important caveat – it is best to do this offline first.

Improving Dialogflow accuracy using GPT

In fact, you can use this deep integration to systematically improve your Dialogflow bot’s accuracy. But it involves a fair amount of custom Python coding.

When you measure your Dialogflow bot’s accuracy, you will notice four kinds of responses.

True Negatives

You can use the GPT API to handle true negatives – these are cases where the Dialogflow bot does not know the answer and is not expected to know the answer, and goes to the fallback intent. This is a little bit like adding Smalltalk into your chatbot, except that the small talk is based on the full repertoire of GPT’s knowledge base.

For example, you can use such a bot to create a generic answer bot which you can use to identify frequently asked questions that you can then use to construct your Dialogflow bot intents.

True Positives

Menu bots are chatbots which respond to the user’s initial question using natural language, but then transition to “dumb bots” which are focused on achieving the user’s goal by asking them to click on buttons or collect user input without doing any intent mapping.

You can use GPT API to handle true positives where you ask it to identify the “flow” of your Dialogflow menu bot, and then let your Dialogflow bot handle the rest.

This is not specific to Dialogflow, but this is an excellent approach to add an initial first step for something like Zoho SalesIQ Codeless Bot builder, where it uses GPT's powerful text parsing capabilities to route to the correct flow based on the user's initial request.

False Negatives

False negatives are perhaps the most important thing that users would want to handle in their Dialogflow bots.

These are user utterances which should have been handled by an intent in your Dialogflow bot, but Dialogflow somehow missed it. And sometimes GPT could do a better job of mapping the intent.

The basic idea works like this:

  • you call the webhook in the Fallback intent
  • you use a pretrained GPT classifier to map the user’s utterance to a predefined label
  • based on the label, the webhook will use the followup event to fire the corresponding intent
  • conversation proceeds as expected, and as a bonus, no hallucinations!

In addition to making your bot work much better in real time, this method uses the complete capabilities of the GPT API without sacrificing the conversation flow you have already designed in Dialogflow

False Positives

False positives are an interesting challenge. These are the cases where Dialogflow did map an intent, but it mapped the wrong intent.

In this case, you can use the pretrained GPT classifier to provide a “second opinion” where it will go over Dialogflow’s responses and flag potential false positives that you can review.

The obvious caveat is that it only makes sense to do this offline (because you don't want GPT's response to interrupt your existing flow diagram), by sending the user utterance from your conversation logs to your pretrained GPT classifier and checking its output.

Note: this step must be paired with versioning to make sure GPT is working with the right version of your Dialogflow bot.

In other words, this provides an automated offline technique to improve your Dialogflow bot’s accuracy.

Try it Offline first!

Despite all these possibilities, it is important to remember that GPT is still in its infancy.

I recommend a slow-but-steady approach if you want to incorporate these ideas into your Dialogflow ES bot. In fact, it makes sense to first do it offline to see how well GPT does and then use it in real-time only after you are satisfied with the output.

[1] The webhook call sometimes times out and comes back with no response due to the latency of the GPT API itself. AFTER I published the lesson, I noticed that the latency of requests sent to GPT 3.5 and GPT 4 models have recently increased well above the 5 second webhook timeout limit that Dialogflow ES imposes. GPT API responses sometimes take well over 5 seconds.

For example see this support request on the OpenAI forum:

While everything I have written is still relevant for the offline use case where you use GPT API to evaluate Dialogflow ES accuracy, I think it is best not to attempt to integrate GPT into your production Dialogflow bot for the real-time use case until OpenAI fixes the latency issue.

Even though Dialogflow CX has a much larger webhook timeout limit, a nearly 10 second delay would be unacceptable for most users who are interacting with your chatbot. And it is even more problematic given that the best use case for GPT API is extractive question answering (which works better with Dialogflow ES FAQ bots), while the main use case for Dialogflow CX is multi-turn conversation flows for which GPT is not even a good idea.