Natural Language Processing IT & Software Udemy

A Beginner’s guide to the spaCy NLP library

April 18, 2025 113 0

A Beginner’s guide to the spaCy NLP library, A visual tour of spaCy Document objects.

Course Description

The makers of spaCy say this:

“For complex tasks, it’s usually better to train a statistical entity recognition model. However, statistical models require training data, so for many situations, rule-based approaches are more practical. This is especially true at the start of a project: you can use a rule-based approach as part of a data collection process, to help you “bootstrap” a statistical model.

Training a model is useful if you have some examples and you want your system to be able to generalize based on those examples. It works especially well if there are clues in the local context. For instance, if you’re trying to detect person or company names, your application may benefit from a statistical named entity recognition model.

Rule-based systems are a good choice if there’s a more or less finite number of examples that you want to find in the data, or if there’s a very clear, structured pattern you can express with token rules or regular expressions. For instance, country names, IP addresses or URLs are things you might be able to handle well with a purely rule-based approach.”

In other words, even the makers of spaCy recommend that you do as much as you can with rule-based approaches, especially at the start of a project. This is all the more true if you are just beginning to learn spaCy.

In my opinion, it is much easier to use rule based systems once you develop a solid understanding of the spaCy document object. And it is very easy to develop this understanding using the visualization technique I explain in this course.