How to download and install spaCy

In this tutorial I explain how to get started with spaCy. This is intended for programmers who are familiar with Python and are interested in using the PyCharm IDE.

Create a new Python project (note, PyCharm automatically creates it in a virtual environment)

Add a requirements.txt file into the project

In the requirements.txt add spaCy as a requirement

Use pip to install spaCy

pip install -r requirements.txt

You also need to download the en_core_web_sm file to use spaCy.

What is en_core_web_sm?

en_core_web_sm is a small English pipeline trained on written web text (blogs, news, comments), that includes vocabulary, syntax and entities

Source

python -m spacy download en_core_web_sm

Now add a file called main.py and add the following code

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp('This is the first sentence')
for tok in doc:
    print(tok)

Run the script

Here is the output from the Run window inside PyCharm

As you can see, the code takes the sentence, splits it into words (we refer to them as tokens in NLU) and then prints the tokens one per line.


Generic filters
>