How to use custom stop words in spaCy
Stop words are very common words like “a”, “the” etc which do not provide much information (low information density words) about the text you are analyzing.
spaCy allows you to check if a given word is a stop word. Let us see how we can do that.
First, create a file called stop_words.py and add the following code to it.
Note: you need to download the en_core_web_sm model first to be able to run the script below
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(
"Dialogflow, previously known as api.ai, is a chatbot framework provided by Google. Google acquired API.AI in 2016.")
full_text = ''
for tok in doc:
tok_text = tok.text
if tok.is_stop:
tok_text = f'[{tok.text}]'
full_text += tok_text + ' '
print(full_text)
As you can see, I am simply iterating over all the tokens, and if the token is a stop word, I enclose it in square brackets.
This is the output when you run this script

spaCy allows you to modify the list of stop words.
Create a file called custom_stopwords.py and add the following code:
import spacy
cls = spacy.util.get_lang_class('en')
cls.Defaults.stop_words.remove('by')
cls.Defaults.stop_words.add('google')
nlp = spacy.load("en_core_web_sm")
doc = nlp(
"Dialogflow, previously known as api.ai, is a chatbot framework provided by Google. Google acquired API.AI in 2016.")
full_text = ''
for tok in doc:
tok_text = tok.text
if tok.is_stop:
tok_text = f'[{tok.text}]'
full_text += tok_text + ' '
print(full_text)
Notice that I have added the word ‘google’ as a stop word, and removed the word ‘by’ from the existing list.
Very important: you must modify the list of stop words before you load the model
This is what the output looks like now

About this website BotFlo1 was created by Aravind Mohanoor as a website which provided training and tools for non-programmers who were2 building Dialogflow chatbots. This website has now expanded into other topics in Natural Language Processing, including the recent Large Language Models (GPT etc.) with a special focus on helping non-programmers identify and use the right tool for their specific NLP task. For example, when not to use GPT 1 BotFlo was previously called MiningBusinessData. That is why you see that name in many videos 2 And still are building Dialogflow chatbots. Dialogflow ES first evolved into Dialogflow CX, and Dialogflow CX itself evolved to add Generative AI features in mid-2023
You must be logged in to post a comment.