Provide better instructions

Providing better instructions is a good first step towards optimizing GPT cost, and this is where most people start.

Reducing max_tokens

Here is some sample code I sent to the OpenAI API recently for my chatbot demo:

response = openai.Completion.create(

Notice that I set max_tokens to 200.

Initially I used a max_tokens of 500.

In addition to taking a longer time to generate the response, I also noticed that it sent back a lot of superfluous text in the reply.

After tweaking with the numbers a little bit, I decided to use the value of 200, which reduced the latency a little bit, but definitely reduced the verbosity and made the final answer much better.

Ask GPT to be concise

Since GPT is good at instruction-following, you can usually get better results simply by asking GPT to be concise in its response.

The full_text variable you see above includes the following prompt:

full_text = f'''
        Should I use Dialogflow ES, Dialogflow CX or the GPT API for the following use case? 
        Please briefly explain your reason.

As you can see, I asked GPT itself to “briefly” explain its reasoning. If you replace it with some other word like detailed or descriptive, it will usually provide a much longer answer.