How to use the Search and Ask approach to improve GPT accuracy

In the previous lesson, I explained how the fine-tuning approach will reduce the token usage quite a bit, but the accuracy of the response is not as high as we would like.

The OpenAI team suggests the Search and Ask method as a workaround for this problem.

They have also provided an example of how to use the Search and Ask method.

I took the code and adopted it for the HN FAQ dataset.

Generate and save an embeddings CSV file

Search for top N documents matching the query

Concatenate top N results into a prompt

Ask the question based on the prompt

Here is the question I sent to the API: “How is Show HN different from Ask HN?”

Here is the JSON response:

{
    "id": "-",
    "object": "chat.completion",
    "created": 1685882427,
    "model": "gpt-3.5-turbo-0301",
    "usage": {
        "prompt_tokens": 436,
        "completion_tokens": 24,
        "total_tokens": 460
    },
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "Show HN is for sharing personal work and has special rules, while Ask HN lists questions and other text submissions."
            },
            "finish_reason": "stop",
            "index": 0
        }
    ]
}

So we see that the accuracy is very good, and the number of tokens sent to the API is more than the fine-tuning approach, but also much less than sending the entire FAQ page as the prompt. Clearly, if your FAQ page is even larger than our example, then you would be saving a lot more tokens using the “Search and Ask” method.

But this method still has a downside – and I will discuss it in a future lesson.