How to get part of speech tags using spaCy

What are part of speech tags?

In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech,[1] based on both its definition and its context. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc.

Source

Create a new file called part_of_speech.py and add the following code to it

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Dialogflow, previously known as api.ai, is a chatbot framework provided by Google. Google acquired API.AI in 2016.")

for tok in doc:
    print(f'Text: {tok.text} Part-of-speech: {tok.pos_}')

Here is the output

See the definition of these part of speech tags

When you expand the tags, you can see that there is a list of universal part of speech tags. These only cover the word type and are available in (presumably) all languages.

There is also a second list of part of speech tags specific to the English language


Generic filters
>