Provide better instructions

Providing better instructions is a good first step towards optimizing GPT cost, and this is where most people start.

Reducing max_tokens

Here is some sample code I sent to the OpenAI API recently for my chatbot demo:

response = openai.Completion.create(
            engine="text-davinci-003",
            prompt=f'{full_text}',
            temperature=0,
            max_tokens=200,
            stop=["\n\n"]
        )

Notice that I set max_tokens to 200.

Initially I used a max_tokens of 500.

In addition to taking a longer time to generate the response, I also noticed that it sent back a lot of superfluous text in the reply.

After tweaking with the numbers a little bit, I decided to use the value of 200, which reduced the latency a little bit, but definitely reduced the verbosity and made the final answer much better.

Ask GPT to be concise

Since GPT is good at instruction-following, you can usually get better results simply by asking GPT to be concise in its response.

The full_text variable you see above includes the following prompt:

full_text = f'''
        Should I use Dialogflow ES, Dialogflow CX or the GPT API for the following use case? 
        Please briefly explain your reason.
        
        {query}
        '''

As you can see, I asked GPT itself to “briefly” explain its reasoning. If you replace it with some other word like detailed or descriptive, it will usually provide a much longer answer.