How to get part of speech tags using spaCy

What are part of speech tags?

In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech,[1] based on both its definition and its context. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Source

Create a new file called part_of_speech.py and add the following code to it

Note: you need to download the en_core_web_sm model first to be able to run the script below
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Dialogflow, previously known as api.ai, is a chatbot framework provided by Google. Google acquired API.AI in 2016.")

for tok in doc:
    print(f'Text: {tok.text} Part-of-speech: {tok.pos_}')

Here is the output

See the definition of these part of speech tags

When you expand the tags, you can see that there is a list of universal part of speech tags. These only cover the word type and are available in (presumably) all languages.

There is also a second list of part of speech tags specific to the English language


About this website

BotFlo1 was created by Aravind Mohanoor as a website which provided training and tools for non-programmers who were2 building Dialogflow chatbots.

This website has now expanded into other topics in Natural Language Processing, including the recent Large Language Models (GPT etc.) with a special focus on helping non-programmers identify and use the right tool for their specific NLP task. 

For example, when not to use GPT

1 BotFlo was previously called MiningBusinessData. That is why you see that name in many videos

2 And still are building Dialogflow chatbots. Dialogflow ES first evolved into Dialogflow CX, and Dialogflow CX itself evolved to add Generative AI features in mid-2023