Is the GPT finetuning API too slow for practical chatbots?

On the bottom left of this page, there is a Dialogflow ES chatbot which integrates with the GPT API.

If you tell it what kind of tasks your bot should do, it will give you a recommendation – whether to use Dialogflow CX, Dialogflow ES and GPT API.

It is also a demo of a fine-tuned GPT-3.5-turbo model which answers the user’s question if the Dialogflow ES bot goes to the Fallback intent.

Here is how it works:

  • if the ES bot can identify the user’s task, it will provide a recommendation based on the existing intents in the bot (so this will not call the GPT API)
  • if the ES bot cannot identify the user’s task, it will go to the Default Fallback Intent. The Default Fallback Intent calls the GPT API where I have created a fine-tuned model, and this model returns the best recommendation based on user input
  • Dialogflow ES then relays the response from the GPT API to the user

Is the GPT Finetuning API too slow for practical chatbots?

Using regular prompts based on GPT-3.5-Turbo is usually too slow for practical chatbot use cases.

Fine-tuned models are supposed to be much faster than regular prompting. In fact that is one of the reasons mentioned in the GPT documentation for using fine-tuning.

From what I have seen, the GPT fine-tuning API is too slow (that is, the latency is too high) for practical chatbots.

In other words, when I check the Raw Interaction Log inside the Dialogflow ES console I see this output:

In other words, the webhook code took more than 5 seconds to send back its response, and as a result the webhook call itself timed out.

You can try it and see for yourself. I will also be monitoring the results and keep updating this article every few weeks.