Convert mp3 audio files to text using speech recognition

[00:00:00] Converting MP4 to MP3 and Transcription Options

Once you convert your MP4 video files into the MP3 audio format, you have two options. You can either use the Whisper API, and this API is provided by the OpenAI team. You can see that based on this logo, it’s going to the OpenAI website. This is an API which allows you to send an MP3 file, and then you get the transcript back as a response. Now, the problem with this is that it doesn’t give you the timestamps of individual words or phrases.

[00:00:34] Using the Whisper Python Library

On the other hand, you can also use this free open source Python library called Whisper. In fact, you can see that it’s built by the same OpenAI team. This does give you the option to get the timestamps. That’s why I prefer using this library over using the API. First of all, you don’t have to pay for the API if the library is itself sufficient. As I mentioned, it gives you the timestamps. But most importantly, I have not noticed any major difference in the accuracy of the two transcriptions. So my suggestion is just to go for that, that is, use the open source library.

[00:01:19] Example of a Transcript Using the Whisper Library

For the video that I just showed before, this is what the transcript looks like when you use the Whisper library. You can see it has all these words, “One of the most important aspects of any CRM, not just Zoho, but any CRM is the fact that you can do blah, blah, blah.” So all this is just the transcript that is generated by the Whisper Python library. This is the step two in your process. You have to convert your MP3 file into text by using the Whisper library.

About this website

BotFlo1 was created by Aravind Mohanoor as a website which provided training and tools for non-programmers who were2 building Dialogflow chatbots.

This website has now expanded into other topics in Natural Language Processing, including the recent Large Language Models (GPT etc.) with a special focus on helping non-programmers identify and use the right tool for their specific NLP task. 

For example, when not to use GPT

1 BotFlo was previously called MiningBusinessData. That is why you see that name in many videos

2 And still are building Dialogflow chatbots. Dialogflow ES first evolved into Dialogflow CX, and Dialogflow CX itself evolved to add Generative AI features in mid-2023