How to build a custom GPT bot trained on your FAQ page

We can use the Create completion API provided by GPT to build a custom GPT bot trained on your own FAQ page.

I will use the FAQ page for Hacker News as an example during this course.

I chose the Hacker News FAQ page because the format is very simple (making it easy for me to copy/paste the content into a spreadsheet), it is not too long which makes it easier to fit everything into a single prompt, and the questions are mostly very different (unambiguous).

I copied and pasted the Questions and Answers into a CSV file.

With this CSV file, you can now write the following Python code (you must remember to add the OPENAI API Key to your .env file)

How it works:

You create a prompt – which is just a long text of questions and answers separated by two blank lines (this is the GPT recommendation), and then at the end of the text you will add your question.

GPT is smart enough to understand that you are asking it to answer the question at the end of the text.

import pandas as pd
from dotenv import load_dotenv
import openai
import traceback
import os
import json

model_name = 'text-davinci-003'

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

df: pd.DataFrame = pd.read_csv('hn_faq.csv', encoding='utf-8', dtype=object,
                               index_col=False)

full_text = ''
for line_number, (index, row) in enumerate(df.iterrows()):
    curr_text = f'''
    Q:{row['Question']}


    A:{row['Answer']}


    '''
    full_text += curr_text

question = '\n\nHow is Ask HN different from Show HN?'
full_text += question

try:
    response = openai.Completion.create(
        model=f'{model_name}',
        prompt=full_text,
        temperature=0,
        max_tokens=200
    )
    with open('response.json', 'w+') as f:
        json.dump(response, f, indent=4)
except Exception as e:
    print(traceback.format_exc())
    print(f'Exception:{e}')

I mentioned that GPT is smart enough to understand that you are asking it a question.

This is the full text which is sent to the API, notice that the question we asked is at the end of the text.

    Q: Are there rules about submissions and comments? 


    A:https://news.ycombinator.com/newsguidelines.html


    
    Q: How are stories ranked? 


    A: The basic algorithm divides points by a power of the time since a story was submitted. Comments in threads are ranked the same way.

Other factors affecting rank include user flags, anti-abuse software, software which demotes overheated discussions, account or site weighting, and moderator action


    
    Q: How is a user's karma calculated? 


    A: Roughly, the number of upvotes on their posts minus the number of downvotes. These don't match up exactly. Some votes are dropped by anti-abuse software. 


    
    Q: Do posts by users with more karma rank higher? 


    A: No. 


    
    Q: Why don't I see down arrows? 


    A: There are no down arrows on stories. They appear on comments after users reach a certain karma threshold, but never on direct replies. 


    
    Q: What kind of formatting can you use in comments? 


    A:http://news.ycombinator.com/formatdoc


    
    Q: How do I submit a question? 


    A: Use the submit link in the top bar, and leave the url field blank. 


    
    Q: How do I submit a poll? 


    A:http://news.ycombinator.com/newpoll


    
    Q: How do I make a link in a text submission? 


    A: You can't. This is to prevent people from submitting a link with their comments in a privileged position at the top of the page. If you want to submit a link with comments, just submit it, then add a regular comment. 


    
    Q: What are Ask HN and Show HN? 


    A: Ask HN lists questions and other text submissions. Show HN is for sharing your personal work and has special rules. 


    
    Q: Why hasn't my submission appeared on Ask HN or Show HN? 


    A: All Ask HNs appear on newest and asknew, and all Show HNs on newest and shownew, but there is a small points threshold before a post makes it to ask or show. 


    
    Q: What do green usernames mean? 


    A: Green indicates a new account. 


    
    Q: Why are some comments faded? 


    A: Faded text means that a comment has been downvoted. You can read the comment in normal text by clicking on its timestamp to go to its page. 


    
    Q: What does [flagged] mean? 


    A: Users flagged the post as breaking the guidelines or otherwise not belonging on HN.

Moderators sometimes also add [flagged] (though not usually on submissions), and sometimes turn flags off when they are unfair. 


    
    Q: How do I flag a comment? 


    A: Click on its timestamp to go to its page, then click the 'flag' link at the top. There's a small karma threshold before flag links appear. 


    
    Q: What does [dead] mean? 


    A: The post was killed by software, user flags, or moderators.

Dead posts aren't displayed by default, but you can see them all by turning on 'showdead' in your profile.

If you see a [dead] post that shouldn't be dead, you can vouch for it. Click on its timestamp to go to its page, then click 'vouch' at the top. When enough users do this, the post is restored. There's a small karma threshold before vouch links appear. 


    
    Q: What does [deleted] mean? 


    A: The author deleted the post outright, or asked us to. Unlike dead posts, these remain deleted even when showdead is turned on. 


    
    Q: Are reposts ok? 


    A: If a story has not had significant attention in the last year or so, a small number of reposts is ok. Otherwise we bury reposts as duplicates.

Please don't delete and repost the same story. Deletion is for things that shouldn't have been submitted in the first place. 


    
    Q: Are paywalls ok? 


    A: It's ok to post stories from sites with paywalls that have workarounds.

In comments, it's ok to ask how to read an article and to help other users do so. But please don't post complaints about paywalls. Those are off topic. More here. 


    
    Q: Can I ask people to upvote my submission? 


    A: No. Users should vote for a story because they personally find it intellectually interesting, not because someone has content to promote. We penalize or ban submissions, accounts, and sites that break this rule, so please don't. 


    
    Q: Can I ask people to comment on my submission? 


    A: No, for the same reason. It's also not in your interest: HN readers are sensitive to this and will detect it, flag it, and use unkind words like 'spam'. 


    
    Q: Can I post a job ad? 


    A: Please don't post job ads as submissions to HN.

A regular "Who Is Hiring?" thread appears on the first weekday of each month (or Jan 2). Most job ads are welcome there. Only an account called whoishiring is allowed to submit the thread itself. This prevents a race to post it first.

Another kind of job ad is reserved for YC-funded startups. These appear on the front page, but are not stories: they have no vote arrows, points, or comments. They begin part-way down and fall steadily. Only one is on the front page at a time. The rest are listed at jobs. 


    
    Q: What's the relationship between YC and HN? 


    A: Y Combinator owns and funds HN. The HN team is editorially independent.

HN gives three features to YC: job ads (see above) and startup launches get placed on the front page, and YC founder names are displayed to other YC alumni in orange. 


    
    Q: Are negative stories about YC suppressed on HN? 


    A: No, we moderate less, not more, when YC or a YC startup is the topic. The good will of the community is worth more than any story. 


    
    Q: Why can't I post a comment to a thread? 


    A: Threads are closed to new comments after two weeks, or if the submission has been killed by software, moderators, or user flags. 


    
    Q: Why is A ranked below B even though A has more points and is newer? 


    A: You can't derive rank from votes and time alone. See "How are stories ranked?" above. 


    
    Q: In my profile, what is delay? 


    A: It gives you time to edit your comments before they appear to others. Set it to the number of minutes you'd like. The maximum is 10. 


    
    Q: In my profile, what is noprocrast? 


    A: It's a way to help you prevent yourself from spending too much time on HN. If you turn it on you'll only be allowed to visit the site for maxvisit minutes at a time, with gaps of minaway minutes in between. The defaults are 20 and 180, which would let you view the site for 20 minutes at a time, and then not allow you back in for 3 hours. 


    
    Q: How do I reset my password? 


    A: If you have an email address in your profile, you can do that here. If you haven't, email hn@ycombinator.com for help. 


    
    Q: Can I change my username? 


    A: Yes. Email hn@ycombinator.com and we'll help. 


    
    Q: Can I delete my account? 


    A: We try not to delete entire account histories because that would gut the threads the account had participated in. However, we care about protecting individual users and take care of privacy requests every day, so if we can help, please email hn@ycombinator.com. We don't want anyone to get in trouble from anything they posted to HN. More here. 


    
    Q: My IP address seems to be banned. How can I unban it? 


    A:If you request many pages too quickly, your IP address might get banned. This self-serve unbanning procedure works most of the time. If not, email hn@ycombinator.com. 


    

How is Ask HN different from Show HN?

This full text is called a prompt.

This is the JSON response received from the API:

{
    "id": "-",
    "object": "text_completion",
    "created": 1685612868,
    "model": "text-davinci-003",
    "choices": [
        {
            "text": "\n\nAsk HN is for questions and other text submissions, while Show HN is for sharing personal work. Show HN has special rules, such as no links to commercial products or services.",
            "index": 0,
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 2046,
        "completion_tokens": 40,
        "total_tokens": 2086
    }
}

Here is the response from GPT API:

Ask HN is for questions and other text submissions, while Show HN is for sharing personal work. Show HN has special rules, such as no links to commercial products or services.

Here is the actual answer on the website FAQ page:

Ask HN lists questions and other text submissions. Show HN is for sharing your personal work and has special rules. 

We can see that the response (in the “text” field of the JSON response) is the answer that we expected.

Also notice that the token usage information is always appended to the response after every single API request. For this particular prompt, we used up 2046 prompt tokens and 40 completion tokens, for a total of 2086 tokens.

What if we want to ask another question based on the same FAQ page?

Suppose we want to ask another question, based on the same FAQ page.

So we would have to send the entire FAQ page over as part of the prompt

Since the pricing depends on the total number of tokens, this means we will be billed for the same set of tokens over and over for each question. Given that we have already sent the FAQ page once, wouldn’t it be better if there was a way to “save” the FAQ page in GPT’s memory and just send the question alone?

We can do this by using fine-tuning, which I will discuss in the next lesson.