Home / Learn spaCy / How to extract entities (Named Entity Recognition) in spaCy
Learn spaCy

How to extract entities (Named Entity Recognition) in spaCy

This article is part of the Learn spaCy series

What are named entities?

A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction. Because models are statistical and strongly depend on the examples they were trained on, this doesn’t always work perfectly and might need some tuning later, depending on your use case.

Source

As you can see, we can already use the default model in spaCy to extract well known entities from text.

Create a new file called ner_test.py and add the following code to it.

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Dialogflow, previously known as api.ai, is a chatbot framework provided by Google. Google acquired API.AI in 2016.")

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

This code prints 4 values per sentence – the text of the entity, the starting position, the ending position, and the entity label (i.e. type of entity).

Now run the code

You can notice quite a few things by looking at the output.

First, Dialogflow itself is an entity (a software product) but isn’t identified.

Second, in the first sentence, api.ai also refers to a software product and not to a company (the ORG label means “organization”). The first time api.ai is mislabeled as an ORG.

Also, notice that the first word of second sentence is not identified as an entity at all. This is because the word Google is also used as a verb which can cause ambiguity when it is the first word in a sentence.

Why? Because the capitalization of the first letter is usually a hint to the statistical model that a word could be an entity. However, this rule of thumb cannot be used for the first word of a sentence, which is always capitalized for the sake of grammar. Normally spaCy can still extract the entity, but the word google is also used as a verb, which creates additional ambiguity for the model.

As you can see, the out-of-the-box entity extraction in spaCy is decent, but there is a lot of scope for improvement.

<— End of article —>


This website contains affiliate links. See the disclosure page for more details. 
"The magic key I needed as a non-programmer"

The custom payload generator was the magic key I needed (as a non-programmer) to build a good demo with rich responses in DialogFlow Messenger. I've only used it for 30 minutes and am thrilled. I've spent hours trying to figure out some of the intricacies of DialogFlow on my own. Over and over, I kept coming back to Aravind's tutorials available on-line. I trust the other functionalities I learn to use in the app will save me additional time and heartburn.

- Kathleen R
Cofounder, gathrHealth
"Much clearer than the official documentation to be honest"

Thanks a lot for the advice (of buying and following your videos)! They helped a lot indeed. Everything is very clear when you explain, much clearer than the official documentation to be honest 🙂

Neuraz T
Review for Learn Dialogflow CX
"I will strongly recommend this course because even I can learn how to design chatbot (no programming background)"

I think Aravind really did a great job to introduce dialogflow to people like me, without programming background. He organizes his course in very clear manner since I have been a college professor for 20 years. It is very easy for me to recognize how great Aravind’s course is! Very use-friend and very easy to follow. He doesn’t have any strong accent when he gives the lectures. It is so easy for me to understand. Really appreciate it.

Yes, I will strongly recommend this course because even I can learn how to design chatbot (no programming background) after studying Avarind’s course, you definitely can!

Ann Cai
Review for Learn Dialogflow ES

Similar Posts