Format the transcript using GPT4 API

[00:00:00] Using Whisper Library and GPT-4 API for Transcript Cleanup

So, once you use the Whisper library to get the transcript of the audio file, what you can do is send this transcript to the GPT-4 API. You can ask it to clean up the transcript. That is, you can ask it to add formatting, punctuations, and sometimes it can change misspelled words back to the correct form. It can do those kinds of things, and what you will find is that the output of this cleanup is actually really good.

[00:00:32] Understanding Large Language Models

By the way, I just want to point out something here. I have this video where I talk about how these large language models work. It’s for a different course, you can go and take a look at it if you want. But the gist of the video is that these large language models, what they do is they work by predicting the next word.

[00:00:52] Mapping Spoken Language to Written Language

When you think about it, what is going on here is that it’s taking spoken language, that is the speech, right, and it’s trying to map it to written language. Usually, written language is not only more formal, but it also needs to have those punctuations for people to be able to read what you have written. You don’t have that constraint when you are just speaking.

[00:01:24] Addressing Run-on Sentences in Audio Transcripts

The second thing is that it’s very easy, and I do it a lot in my courses, to keep saying more and more words in a given sentence. These are called run-on sentences. It’s very easy to have run-on sentences in audio, but if you have those in written format, it’s very hard to read them. So, people are naturally inclined to cut these sentences short so that it’s easy to read.

[00:01:50] GPT-4 API’s Role in Enhancing Readability of Transcripts

When GPT-4 processes your audio, it is able to do things like add additional punctuations and sometimes even shorten the sentences so that it’s easy for people to read the stuff that you have in the audio. The reason I’m saying all this is, it should not be surprising that the GPT-4 API is able to do such a good job of cleaning up your raw transcript from the audio format. It’s trying to map it to the written text format, so it should not be a big surprise that it’s able to do such a good job.

[00:02:23] The Third Step: Cleaning Up the Raw Transcript

So, that’s the third step: you have to use the GPT-4 API to clean up the raw transcript.

About this website

BotFlo1 was created by Aravind Mohanoor as a website which provided training and tools for non-programmers who were2 building Dialogflow chatbots.

This website has now expanded into other topics in Natural Language Processing, including the recent Large Language Models (GPT etc.) with a special focus on helping non-programmers identify and use the right tool for their specific NLP task. 

For example, when not to use GPT

1 BotFlo was previously called MiningBusinessData. That is why you see that name in many videos

2 And still are building Dialogflow chatbots. Dialogflow ES first evolved into Dialogflow CX, and Dialogflow CX itself evolved to add Generative AI features in mid-2023