How to split text into sentences using spaCy

When you are using spaCy to process text, one of the first things you want to do is split the text (paragraph, document etc) into individual sentences.

I will explain how to do that in this tutorial.

First, download and install spaCy

Create a new file in the same project called sentences.py

Add the following code into the sentences.py file

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp('This is the first sentence. This is the second sentence.')
for sent in doc.sents:
    print(sent)

Run the Python script.

Here is what you will see in the output.

As you can see, the paragraph has been split into the two sentences.

You can also iterate over each token in a sentence.

For example, create a new Python file called sentence_tokens.py and add the following code into it:

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp('This is the first sentence. This is the second sentence.')
for sent in doc.sents:
    for tok in sent:
        print(tok)

When you run this script, this is what you see


Generic filters
>