How to fine-tune a GPT bot based on your FAQ page
Please note: this tutorial is based on the legacy fine-tuning API OpenAI has also released an updated fine tuning API which is based on GPT-3.5-Turbo But the basic concept of fine-tuning is the same and you can still use this guide to understand how it works
As we saw in the previous lesson, if you send the entire FAQ page with each completion request, you will be utilising your token quota very quickly.
Instead, it is a better idea to create a trained model using the GPT API, but based on your FAQ page.
First you must install the OpenAI Python CLI tool

Next, you must set the OPENAI_API_KEY environment variable. You can do this inside the PyCharm terminal.
export OPENAI_API_KEY=<your-openai-api-key>
Next, convert the CSV file into a JSONL format (which you need to be able to create the fine-tuned model).
openai tools fine_tunes.prepare_data -f hn_faq.csv
Note that the CSV file column names need to be “prompt” and “completion”, otherwise it will throw an error.

When you run the fine_tunes.prepare_data command, it will convert the CSV file into a JSONL file that you can use for the next step.
The output of the prepare_data command is actually pretty useful and worth noting carefully. I added the bold emphasis.
Analyzing... - Based on your file extension, your file is formatted as a CSV file - Your file contains 32 prompt-completion pairs. In general, we recommend having at least a few hundred examples. We've found that performance tends to linearly increase for every doubling of the number of examples - All prompts end with suffix `? ` - All prompts start with prefix ` ` - Your data does not contain a common ending at the end of your completions. Having a common ending string appended to the end of the completion makes it clearer to the fine-tuned model where the completion should end. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more detail and examples. - The completion should start with a whitespace character (` `). This tends to produce better results due to the tokenization we use. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more details Based on the analysis we will perform the following actions: - [Necessary] Your format `CSV` will be converted to `JSONL` - [Recommended] Add a suffix ending ` END` to all completions [Y/n]: Y - [Recommended] Add a whitespace character to the beginning of the completion [Y/n]: Y Your data will be written to a new JSONL file. Proceed [Y/n]: Y Wrote modified file to `hn_faq_prepared.jsonl` Feel free to take a look! Now use that file when fine-tuning: > openai api fine_tunes.create -t "hn_faq_prepared.jsonl" After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string `? ` for the model to start generating completions, rather than continuing with the prompt. Make sure to include `stop=[" END"]` so that the generated texts ends at the expected place. Once your model starts training, it'll approximately take 2.88 minutes to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you.
Now that you have generated the JSONL file, it is time to use the file to fine-tune the davinci model.
openai api fine_tunes.create -t "hn_faq_prepared.jsonl" -m davinci
This uses the davinci GPT model on the hn_faq_prepared.jsonl file, and creates another fine-tuned model that we will query directly.
This was the initial output:
Upload progress: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.81k/7.81k [00:00<00:00, 6.20Mit/s] Uploaded file from hn_faq_prepared.jsonl: file-jXQCleJEdygQR48MCcrpOO76 Created fine-tune: ft-EEoJpj3TkOnSaqptiuv9Ntp8 Streaming events until fine-tuning is complete... (Ctrl-C will interrupt the stream, but not cancel the fine-tune) [2023-06-01 20:45:11] Created fine-tune: ft-EEoJpj3TkOnSaqptiuv9Ntp8 Stream interrupted (client disconnected). To resume the stream, run: openai api fine_tunes.follow -i ft-EEoJpj3TkOnSaqptiuv9Ntp8
As you can see, the training got interrupted, but we can resume the training (and the command is already provided in the console output)
> openai api fine_tunes.follow -i ft-EEoJpj3TkOnSaqptiuv9Ntp8 [2023-06-01 20:45:11] Created fine-tune: ft-EEoJpj3TkOnSaqptiuv9Ntp8 [2023-06-01 20:46:24] Fine-tune costs $0.20 [2023-06-01 20:46:24] Fine-tune enqueued. Queue number: 0 [2023-06-01 20:46:36] Fine-tune started
Now that we have our fine-tuned model, we need to query against this model instead of the full GPT model. In turn, this means we can just send our question alone and not send the entire FAQ page as a prompt for each API call.
How to get the name of the fine-tuned model
You can get the name of the fine-tuned model by making a request to the List Models API call.
curl https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY"
You will find your model name listed in the JSON response. Unlike the built-in GPT models, fine-tuned models will have the format
<base-model-name>:ft-<org-name>-<timestamp>
You can also add a suffix to the model name to help distinguish between multiple fine-tuned models.

Now we will use the model to generate completions. Since the prompts are questions based on the FAQ page, we expect to see answers for the questions based on the fine-tuned training model.
Example 1
First we run the following Python script:
import json
import os
import traceback
import openai
from dotenv import load_dotenv
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
model_name = 'davinci:ft-personal-2023-06-01-15-20-33'
question = '\n\nHow is Show HN different from Ask HN?'
try:
response = openai.Completion.create(
model=f'{model_name}',
prompt=question,
temperature=0,
max_tokens=100
)
with open('response_fine_tuned.json', 'w+') as f:
json.dump(response, f, indent=4)
except Exception as e:
print(traceback.format_exc())
print(f'Exception:{e}')
And here is the response we get:
{
"id": "-",
"object": "text_completion",
"created": 1685636242,
"model": "davinci:ft-personal-2023-06-01-15-20-33",
"choices": [
{
"text": "\n\nAsk HN is for asking specific questions about your startup or software project. Show HN is for showing off your work to the community.\n\nAsk HN is for asking for help. Show HN is for showing off.\n\nAsk HN is for software. Show HN is for everything else.\n\nAsk HN is for personal stories. Show HN is for work you've done.\n\nAsk HN is for asking for help with a specific problem",
"index": 0,
"logprobs": null,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 13,
"completion_tokens": 100,
"total_tokens": 113
}
}
The first thing to note is that we only sent 13 tokens in the prompt_tokens, a huge reduction from the 2046 tokens we sent in the previous lesson.
Since we allow a maximum of 100 completion tokens, GPT does use up all the 100 tokens, to produce a total of 113 tokens.
However, the answer looks like this:
\n\nAsk HN is for asking specific questions about your startup or software project. Show HN is for showing off your work to the community.\n\nAsk HN is for asking for help. Show HN is for showing off.\n\nAsk HN is for software. Show HN is for everything else.\n\nAsk HN is for personal stories. Show HN is for work you've done.\n\nAsk HN is for asking for help with a specific problem
As you can see, we get multiple responses because GPT tends to fill out the response to the maximum number of tokens which are allowed.
What we want to do instead, is just get the first answer.
We do that by using the “stop” parameter in the API call – this is a comma-separated array of strings which contains delimiters where we want the answer to stop. In this case, we can see that the string “.\n\n” is a good choice.
Example 2
So now we add a stop parameter to the API request.
import json
import os
import traceback
import openai
from dotenv import load_dotenv
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
model_name = 'davinci:ft-personal-2023-06-01-15-20-33'
question = '\n\nHow is Show HN different from Ask HN?'
try:
response = openai.Completion.create(
model=f'{model_name}',
prompt=question,
temperature=0,
max_tokens=100,
stop=[".\n\n"]
)
with open('response_fine_tuned.json', 'w+') as f:
json.dump(response, f, indent=4)
except Exception as e:
print(traceback.format_exc())
print(f'Exception:{e}')
This time we get a more terse response, and as a bonus, also fewer tokens used up.
{
"id": "-",
"object": "text_completion",
"created": 1685636155,
"model": "davinci:ft-personal-2023-06-01-15-20-33",
"choices": [
{
"text": "\n\nAsk HN is for asking specific questions about your startup or software project. Show HN is for showing off your work to the community",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 13,
"completion_tokens": 29,
"total_tokens": 42
}
}
Example 3
What happens if we switch the question around? Can GPT still provide a good answer?
import json
import os
import traceback
import openai
from dotenv import load_dotenv
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
model_name = 'davinci:ft-personal-2023-06-01-15-20-33'
question = '\n\nHow is Ask HN different from Show HN?'
try:
response = openai.Completion.create(
model=f'{model_name}',
prompt=question,
temperature=0,
max_tokens=100,
stop=[".\n\n"]
)
with open('response_fine_tuned.json', 'w+') as f:
json.dump(response, f, indent=4)
except Exception as e:
print(traceback.format_exc())
print(f'Exception:{e}')
I have switched the question around, and here is the response:
{
"id": "-",
"object": "text_completion",
"created": 1685636331,
"model": "davinci:ft-personal-2023-06-01-15-20-33",
"choices": [
{
"text": "\n\nAsk HN is for asking for help, and Show HN is for sharing personal stories",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 13,
"completion_tokens": 20,
"total_tokens": 33
}
}
Unfortunately, it looks like GPT is hallucinating a bit in this response.
This is the actual FAQ:
What are Ask HN and Show HN? Ask HN lists questions and other text submissions. Show HN is for sharing your personal work and has special rules.
Here is the GPT response:
Ask HN is for asking for help, and Show HN is for sharing personal stories
Asking a question based on a large text as we did in the previous lesson produced an almost verbatim and more accurate response.
But fine tuning with specific question and answer pairs seems to reduce the accuracy even though it also reduces the cost (in terms of number of tokens used).
You must be logged in to post a comment.