GPT4 vs Palm2 for generating video transcript subheadings

PaLM (Pathways Language Model) is Google’s GPT competitor.

I did some analysis to compare Palm2 and GPT4 for generating video transcript subheadings, and I find PaLM 2 is better than GPT4 for this particular task.

An explanation of the task

Consider this video transcript (it was autogenerated) of the video lesson “How does a large language model work?”

(Autogenerated transcript)

[00:00:00]
 So, I am going to start this chapter by explaining what a large language model is and I will first of all I will point out that this is meant for it is an introduction which is mainly meant for non programmers and in case you are a programmer you might notice that I am not only over simplifying things but I might be using some concepts which are not exactly how it works under the hood but what I am trying to do here is provide a way to think about the language models the large language models like the text-bison that Dialogflow CX uses as well as what ChatGPT uses because that will help people understand a bit of what is going on under the hood and also it will help them set the right expectations about what you can and you cannot expect from tools like ChatGPT.

[00:01:00]
 So the way I am going to explain it is I am going to ask you to look at the Google suggestions that pop up when you type in any word in the Google search engine. So here what I have done is I have just typed in the word Dialogflow and you notice that it has this set of suggestions okay. Now if I were to ask you how are these suggestions being made if you were to take a guess what you would say is well it probably is based on Google taking a look at the search logs it goes and looks at all the things that people usually search for it sorts them by most searched to least searched and then it just takes the top 10 or how many ever shows up in this drop down over here it just takes the top 10 and just displays them and you can see that it is not in alphabetical order. So it is actually in the order in which the information is searched okay.

[00:02:00]
 So this means when you just take a look at it the word console is used after Dialogflow more frequently than the word CX which is used more frequently than the word chatbot and which is used more frequently than the word login and so on okay. So if you were to take a probability you will say that the probability that console shows up after Dialogflow in the search log is higher than the probability that CX shows up after the word Dialogflow which is higher than the probability that chatbot shows up after the word Dialogflow that is okay. Now the case here is very simple because it is all search logs which people are doing like the searches on Google and that is all that Google considers when coming up with this list of suggestions but when you are having a large language model what it does is it just takes a much larger volume of text it goes and crawls the entire internet

[00:03:00]
 and so many different sources of information and then it does a similar analysis to see if what if you see the word Dialogflow what is the word which is most likely to occur after that you know it is all the calculation of probabilities but I also add that they do a lot of tuning under the hood to make it more grammatically correct they do some tuning to make it more you know sometimes you do not want all these swear words and things like that say they want to make it more safe they do not want it to provide information which can be harmful to users and so on and so forth there is a bit of filtering that has to happen too it is not purely probabilities but what I will say is that this basic concept of choosing the next word based on the previous word is something that you are already familiar with just by using a search engine that is something that people are even non programmers

[00:04:00]
 they know that this is something that can be done and is already being done outside of the context of a large language model okay so what the LLMs they do is that based on the prompt that you are providing it is going to look at the probability of the next word which is most likely to occur after the last word that you have provided in your prompt okay and then it will do the calculation based on all the words which went before so it will take the entire prompt into consideration and then it will guess the next most likely word and then once it makes a guess it will take the word that it just guessed it will use that as the next prompt or rather it will take that entire string as the input for the next calculation of probability so it keeps doing that over and over so as an example right so you have Dialogflow you see that Dialogflow console is the most

[00:05:00]
 frequently used word suppose I were to type Dialogflow console you see that the next two most likely words are log in and CX right on the other hand if you typed in Dialogflow CX the most likely words next are console and versus which is you know for the ES and then documentation right so depending on whether it filled out the next word with either console or with CX the word which is after that like the second word which is being guessed will be selected from a different list so I think that you should be that should become apparent as soon as you see this example right so now the large language model what it does is it takes your prompt looks at all the words in your prompt and makes a guess based on probabilities of what the next word is likely to be and it just keeps doing it now the first thing you will ask is okay how does it know when to stop okay one hint that you can give it is you can give it like the maximum

[00:06:00]
 number of tokens that you can use to for it to stop second hint you can give it is a sort of a delimiter that is if you find that you have like some kind of a paragraph break or line break or you know period that is a full stop then you would want the you want the output to stop there there are some ways to tell the GPT when to stop like or rather the LLM and sometimes it also makes that decision based on what it has still now the stuff which it has output and it comes to some kind of a logical stopping point okay and I will also add that this is why you see when you are interacting with some of these language models it it's it you can see that it changes the word in real time it keeps generating the output and sometimes you can kind of see that it's updating itself in real time

[00:07:00]
 that's kind of what is going on behind the scenes okay so a large language model is effectively a way to use probabilities of words occurring next to each other it is using that to finish your sentences and maybe it takes your sentence not only finishes it with one word but sometimes with a paragraph sometimes with a full chunk of text and a full document and and so on right the the thing here is as you are thinking about these large language models but you might ask like how is it possible that it comes up with such intelligent answers for certain types of questions and it's almost as if it actually became intelligent and I'm going to give the perspective of someone who understands the basic concept but it's not like a researcher I'm not the one who's actually doing research into these technologies what I'll say is that it mimics what you find online which is

[00:08:00]
 why it looks so convincing because there is a lot of people who have written a lot of stuff which is all like grammatically correct sentences and there are certain words which just occur more next to other words with very very high probability and by simply selecting those words you can form a very cogent sentences which sometimes makes sense but this is also the reason why you find that the large language models tend to do this thing called hallucination where it combines two words together which are actually which are perfectly good choices for two words to be next to each other but when you look at the meaning that they generate it won't make any sense and sometimes it will be completely nonsensical and then there are times when it will also be inherently contradictory and and then there are times when it is you know obviously a wrong answer and things like that so that's what we call as hallucination and reducing hallucination itself is a big ongoing

[00:09:00]
 field of research and you know I don't think that it's been fully sorted out yet okay or at least it's not there in any commercial product but my point is you start with this approach of a language model a large language model what it's really doing is it is doing this selection of the next word and the next word and the next word based on probabilities of like finding those words next to each other in all the documents which are available on the internet like which are publicly available for people to use for training and all that and based on that it's going to construct the response okay so once you think of that from in that in those terms you will usually have much lower expectations so this is not to say that the technology a technology like GPT doesn't have a good use that's not what I'm saying at all what I'm saying is unless you also keep this at the

[00:10:00]
 back of your mind you and you have your expectations set accordingly you will find that you'll over hype the capabilities of what the language models can do and you know you might end up building something which is based on a very shaky foundation right a good example is for me at least a good example is that I also mentioned this in one of my courses so let me just see if I can pull it up so if I go to my website and I go to this GPT versus Dialogflow course I'll take this lesson that I have already added to my course so you see that somebody wrote this right writing a chat bot which is not just for Q&A is hard two months in and still unsuccessful and when you look at the explanation of the bot they're trying to do you can immediately see what is going on there because they are trying to use it for a use case where they have a lot of important custom entities

[00:11:00]
 and they have like some state management and things like that and these are just not things which are suited for the ChatGPT right now for the GPT API at the moment okay I'm sure that these things will improve over time but this is a good example of someone who's who sort of jumped into this with very high hopes and which they do not understand like all the nuances which are involved in a system like this and once it starts giving unexpected responses that's when you realize that not only is it not as you know as perfect or accurate as you think but the bigger problem is that you don't have any easy ways of there are not there are no easy fallbacks you don't have any easy way of getting around the problem okay and at least I think that Dialogflow CX solves it in a much better way because it incorporates generative features into an existing chat bot framework which

[00:12:00]
 has its own well-defined API which has its own well-defined set of rules which actually you can use to build these bots in my view in a much better way okay but if you do if you use the raw GPT API which has which doesn't have any real visual tools you know you will run into this problem of over hyping the technology in your own mind and then investing time into it and finding out that it's not quite where you expect it to be now having said all this I do expect that the technology is going to mature and I do expect that a year or two down the line a lot of these things are going to get sorted out because there's so much research and so much interest in this field that I think that these problems are going to get ironed out one after the other and then things are going to reach a stage where it becomes possible to build complex chat bots using something like the GPT API but all I'm saying is understanding what a large language model is at its essence

[00:13:00]
 would help you not have overly high expectations but it also means that you can build things which are useful which as long as it fits the use case and it means that you also get to learn this new technology which I think that it's going to get very important I don't doubt that the technology itself is going to get really important over the next few years because it's going to automate a lot of things which people probably did not even expect but on the other hand it's good to have a somewhat cautious and skeptical view of how the field is progressing and definitely have skeptical view of people who are making tall claims which just don't look I mean when you look at it from a realistic perspective they don't even make sense you want to be careful about those kind of claims so that's a quick intro to what large language models are and how they work

I am interested in generating subheadings for each minute segment. This would make it easier to skim the video’s contents as I can add them as video chapters.

Comparing the response from GPT4 and Palm2

I wrote a Python script to generate subheadings for each minute for both GPT4 and Palm2.

Here is a comparison:

GPT4:

[00:00]: Introduction to Large Language Models for Non-Programmers
[01:00]: Understanding Large Language Models Using Google Suggestions as Example
[02:00]: The Concept of Word Occurrence and Sequence Probability in Searches
[03:00]: How Large Language Models Use Analogous Analysis to Find Most Likely Word Occurrences
[04:00]: Working Mechanism of Large Language Models Using Prompt Inputs
[05:00]: Illustration of Word Predictions by Large Language Models
[06:00]: Tips on How to Instruct the LLMs on When to Stop Generating Suggestions
[07:00]: How Large Language Models Generate Texts Based on Probabilities
[08:00]: Why LLM Output Sometimes Appears Intelligent and Understanding the Concept of ‘Hallucination’ in LLMs
[09:00]: Importance of Realistic Expectations from Large Language Models
[10:00]: Possible Limitations and Inaccuracies in Using LLMs
[11:00]: Case Study of Overestimated Capabilities of LLMs in Chatbots
[12:00]: Advantages of Dialogflow CX over Raw GPT API and The Danger of Overhyping LLM Technology
[13:00]: Future Prospects of Large Language Models and the Importance of Cautious Evaluation.

Palm2:

00:00: Introduction to Large Language Models
00:01: Google Search Engine Suggestions
00:02: Large Language Models and Probability
00:03: Crawling the Internet for Data
00:04: Prompt-Based Word Generation
00:05: Stopping Criteria for Output
00:06: Real-Time Output Generation
00:07: Mimicking Online Content
00:08: Hallucination in Large Language Models
00:09: Ongoing Research on Reducing Hallucination
00:10: ChatGPT vs. Dialogflow for Chatbots
00:11: Overhyping the Capabilities of Language Models
00:12: Generative Features in Existing Chatbot Frameworks
00:13: Maturation of Large Language Model Technology
00:14: Conclusion

There are three reasons why the Palm2 subheadings are better than GPT4 subheadings:

  • less hallucination
  • better aligned to the timestamps of that segment
  • less “high level” and more to-the-point

I explain all these in detail in my lesson:

Why I prefer Palm2 over GPT4 for generating video transcript subheadings


About this website

BotFlo1 was created by Aravind Mohanoor as a website which provided training and tools for non-programmers who were2 building Dialogflow chatbots.

This website has now expanded into other topics in Natural Language Processing, including the recent Large Language Models (GPT etc.) with a special focus on helping non-programmers identify and use the right tool for their specific NLP task. 

For example, when not to use GPT

1 BotFlo was previously called MiningBusinessData. That is why you see that name in many videos

2 And still are building Dialogflow chatbots. Dialogflow ES first evolved into Dialogflow CX, and Dialogflow CX itself evolved to add Generative AI features in mid-2023