How to get part of speech tags using spaCy
What are part of speech tags?
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech,[1] based on both its definition and its context. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.
Create a new file called part_of_speech.py and add the following code to it
Note: you need to download the en_core_web_sm model first to be able to run the script below
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Dialogflow, previously known as api.ai, is a chatbot framework provided by Google. Google acquired API.AI in 2016.")
for tok in doc:
print(f'Text: {tok.text} Part-of-speech: {tok.pos_}')
Here is the output

See the definition of these part of speech tags
When you expand the tags, you can see that there is a list of universal part of speech tags. These only cover the word type and are available in (presumably) all languages.

There is also a second list of part of speech tags specific to the English language

About this website BotFlo1 was created by Aravind Mohanoor as a website which provided training and tools for non-programmers who were2 building Dialogflow chatbots. This website has now expanded into other topics in Natural Language Processing, including the recent Large Language Models (GPT etc.) with a special focus on helping non-programmers identify and use the right tool for their specific NLP task. For example, when not to use GPT 1 BotFlo was previously called MiningBusinessData. That is why you see that name in many videos 2 And still are building Dialogflow chatbots. Dialogflow ES first evolved into Dialogflow CX, and Dialogflow CX itself evolved to add Generative AI features in mid-2023
You must be logged in to post a comment.