Provide better instructions
Providing better instructions is a good first step towards optimizing GPT cost, and this is where most people start.
Reducing max_tokens
Here is some sample code I sent to the OpenAI API recently for my chatbot demo:
response = openai.Completion.create(
engine="text-davinci-003",
prompt=f'{full_text}',
temperature=0,
max_tokens=200,
stop=["\n\n"]
)
Notice that I set max_tokens to 200.
Initially I used a max_tokens of 500.
In addition to taking a longer time to generate the response, I also noticed that it sent back a lot of superfluous text in the reply.
After tweaking with the numbers a little bit, I decided to use the value of 200, which reduced the latency a little bit, but definitely reduced the verbosity and made the final answer much better.
Ask GPT to be concise
Since GPT is good at instruction-following, you can usually get better results simply by asking GPT to be concise in its response.
The full_text variable you see above includes the following prompt:
full_text = f'''
Should I use Dialogflow ES, Dialogflow CX or the GPT API for the following use case?
Please briefly explain your reason.
{query}
'''
As you can see, I asked GPT itself to “briefly” explain its reasoning. If you replace it with some other word like detailed or descriptive, it will usually provide a much longer answer.