0% found this document useful (0 votes)

2 views30 pages

spaCy for Natural Language Processing

The document provides an overview of Natural Language Processing (NLP) and its applications, including sentiment analysis and named entity recognition, using the spaCy library. It covers the installation, basic functionalities, and key components of spaCy, such as tokenization, lemmatization, and part-of-speech tagging. Additionally, it introduces the visualizer displaCy for highlighting named entities in text.

Uploaded by

Bhuvnesh Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views30 pages

spaCy for Natural Language Processing

Uploaded by

Bhuvnesh Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Natural Language

Processing (NLP)
basics
N AT U R A L L A N G U A G E P R O C E S S I N G W I T H S PA C Y

Azadeh Mobasher
Principal Data Scientist
Natural Language Processing (NLP)

A subfield of Artificial Intelligence (AI)

Helps computers to understand human

language

Helps extract insights from unstructured

data

Incorporates statistics, machine learning

models and deep learning models

NATURAL LANGUAGE PROCESSING WITH SPACY

NLP use cases
Sentiment analysis

Use of computers to determine the underlying subjective tone of a piece of writing

NATURAL LANGUAGE PROCESSING WITH SPACY

NLP use cases
Named entity recognition (NER)

Locating and classifying named entities mentioned in unstructured text into pre-defined
categories

Named entities are real-world objects such as a person or location

NATURAL LANGUAGE PROCESSING WITH SPACY

NLP use cases

Generate human-like responses to text input, such as ChatGPT

NATURAL LANGUAGE PROCESSING WITH SPACY

Introduction to spaCy
spaCy is a free, open-source library for NLP in
Python which:

Is designed to build systems for information

extraction

Provides production-ready code for NLP

use cases

Supports 64+ languages

Is robust and fast and has visualization

libraries

NATURAL LANGUAGE PROCESSING WITH SPACY

Install and import spaCy

As the first step, spaCy can be installed $ python3 pip install spacy
using the Python package manager pip

spaCy trained models can be downloaded python3 -m spacy download en_core_web_sm

import spacy
Multiple trained models are available for nlp = [Link]("en_core_web_sm")
English language at [Link]

NATURAL LANGUAGE PROCESSING WITH SPACY

Read and process text with spaCy
Loaded spaCy model en_core_web_sm = nlp object
nlp object converts text into a Doc object (container) to store processed text

NATURAL LANGUAGE PROCESSING WITH SPACY

spaCy in action
Processing a string using spaCy

import spacy
nlp = [Link]("en_core_web_sm")
text = "A spaCy pipeline object is created."
doc = nlp(text)

Tokenization
A Token is defined as the smallest meaningful part of the text.

Tokenization: The process of dividing a text into a list of meaningful tokens

print([[Link] for token in doc])

['A', 'spaCy', 'pipeline', 'object', 'is', 'created', '.']

NATURAL LANGUAGE PROCESSING WITH SPACY

Let's practice!
N AT U R A L L A N G U A G E P R O C E S S I N G W I T H S PA C Y
spaCy basics
N AT U R A L L A N G U A G E P R O C E S S I N G W I T H S PA C Y

Azadeh Mobasher
Principal Data Scientist
spaCy NLP pipeline
Import spaCy
import spacy
nlp = [Link]("en_core_web_sm") Use [Link]() to return nlp , a
doc = nlp("Here's my spaCy pipeline.") Language class
The Language object is the text
processing pipeline

Apply nlp() on any text to get a Doc

container

NATURAL LANGUAGE PROCESSING WITH SPACY

spaCy NLP pipeline

spaCy applies some processing steps using its Language class:

NATURAL LANGUAGE PROCESSING WITH SPACY

Container objects in spaCy
There are multiple data structures to represent text data in spaCy :

Name Description
Doc A container for accessing linguistic annotations of text

Span A slice from a Doc object

Token An individual token, i.e. a word, punctuation, whitespace, etc.

NATURAL LANGUAGE PROCESSING WITH SPACY

Pipeline components
The spaCy language processing pipeline always depends on the loaded model and its
capabilities.

Component Name Description

Tokenizer Tokenizer Segment text into tokens and create Doc object

Tagger Tagger Assign part-of-speech tags

Lemmatizer Lemmatizer Reduce the words to their root forms
EntityRecognizer NER Detect and label named entities

NATURAL LANGUAGE PROCESSING WITH SPACY

Pipeline components

Each component has unique features to process text

Language

DependencyParser

Sentencizer

NATURAL LANGUAGE PROCESSING WITH SPACY

Tokenization
Always the first operation
All the other operations require tokens

Tokens can be words, numbers and punctuation

import spacy
nlp = [Link]("en_core_web_sm")

doc = nlp("Tokenization splits a sentence into its tokens.")

print([[Link] for token in doc])

['Tokenization', 'splits', 'a', 'sentence', 'into', 'its', 'tokens', '.']

NATURAL LANGUAGE PROCESSING WITH SPACY

Sentence segmentation
More complex than tokenization
Is a part of DependencyParser component

import spacy
nlp = [Link]("en_core_web_sm")

text = "We are learning NLP. This course introduces spaCy."

doc = nlp(text)
for sent in [Link]:
print([Link])

We are learning NLP.

This course introduces spaCy.

NATURAL LANGUAGE PROCESSING WITH SPACY

Lemmatization
A lemma is a the base form of a token
The lemma of eats and ate is eat

Improves accuracy of language models

import spacy
nlp = [Link]("en_core_web_sm")
doc = nlp("We are seeing her after one year.")
print([([Link], token.lemma_) for token in doc])

[('We', 'we'), ('are', 'be'), ('seeing', 'see'), ('her', 'she'),

('after', 'after'), ('one', 'one'), ('year', 'year'), ('.', '.')]

NATURAL LANGUAGE PROCESSING WITH SPACY

Let's practice!
N AT U R A L L A N G U A G E P R O C E S S I N G W I T H S PA C Y
Linguistic features in
spaCy
N AT U R A L L A N G U A G E P R O C E S S I N G W I T H S PA C Y

Azadeh Mobasher
Principal Data Scientist
POS tagging
Categorizing words grammatically, based on function and context within a sentence

POS Description Example

VERB Verb run, eat, ate, take
NOUN Noun man, airplane, tree, flower
ADJ Adjective big, old, incompatible, conflicting
ADV Adverb very, down, there, tomorrow
CONJ Conjunction and, or, but

NATURAL LANGUAGE PROCESSING WITH SPACY

POS tagging with spaCy

POS tagging confirms the meaning of a word

Some words such as watch can be both noun and verb

spaCy captures POS tags in the pos_ feature of the nlp pipeline

[Link]() explains a given POS tag

NATURAL LANGUAGE PROCESSING WITH SPACY

POS tagging with spaCy
verb_sent = "I watch TV." noun_sent = "I left without my watch."

print([([Link], token.pos_, print([([Link], token.pos_,

[Link](token.pos_)) [Link](token.pos_))
for token in nlp(verb_sent)]) for token in nlp(noun_sent)])

[('I', 'PRON', 'pronoun'), [('I', 'PRON', 'pronoun'),

('watch', 'VERB', 'verb'), ('left', 'VERB', 'verb'),
('TV', 'NOUN', 'noun'), ('without', 'ADP', 'adposition'),
('.', 'PUNCT', 'punctuation')] ('my', 'PRON', 'pronoun'),
('watch', 'NOUN', 'noun'),
('.', 'PUNCT', 'punctuation')]

NATURAL LANGUAGE PROCESSING WITH SPACY

Named entity recognition
A named entity is a word or phrase that refers to a specific entity with a name
Named-entity recognition (NER) classifies named entities into pre-defined categories

Entity type Description

PERSON Named person or family
ORG Companies, institutions, etc.
GPE Geo-political entity, countries, cities, etc.
LOC Non-GPE locations, mountain ranges, etc.
DATE Absolute or relative dates or periods
TIME Time smaller than a day

NATURAL LANGUAGE PROCESSING WITH SPACY

NER and spaCy

spaCy models extract named entities using the NER pipeline component

Named entities are available via the [Link] property

spaCy will also tag each entity with its entity label ( .label_ )

NATURAL LANGUAGE PROCESSING WITH SPACY

NER and spaCy

import spacy
nlp = [Link]("en_core_web_sm")
text = "Albert Einstein was genius."
doc = nlp(text)
print([([Link], ent.start_char,
ent.end_char, ent.label_) for ent in [Link]])

>>> [('Albert Einstein', 0, 15, 'PERSON')]

NATURAL LANGUAGE PROCESSING WITH SPACY

NER and spaCy
We can also access entity types of each token in a Doc container

import spacy
nlp = [Link]("en_core_web_sm")
text = "Albert Einstein was genius."
doc = nlp(text)
print([([Link], token.ent_type_) for token in doc])

>>> [('Albert', 'PERSON'), ('Einstein', 'PERSON'),

('was', ''), ('genius', ''), ('.', '')]

NATURAL LANGUAGE PROCESSING WITH SPACY

displaCy
import spacy
from spacy import displacy
spaCy is equipped with a modern
visualizer: displaCy
text = "Albert Einstein was genius."
The displaCy entity visualizer highlights nlp = [Link]("en_core_web_sm")
named entities and their labels
doc = nlp(text)
[Link](doc, style="ent")

NATURAL LANGUAGE PROCESSING WITH SPACY

Let's practice!
N AT U R A L L A N G U A G E P R O C E S S I N G W I T H S PA C Y

Introduction to NLP with spaCy
No ratings yet
Introduction to NLP with spaCy
28 pages
Customizing spaCy NLP Pipelines
No ratings yet
Customizing spaCy NLP Pipelines
29 pages
Natural Language Processing with spaCy
No ratings yet
Natural Language Processing with spaCy
11 pages
Mastering spaCy NLP Pipelines
No ratings yet
Mastering spaCy NLP Pipelines
29 pages
NLP Techniques by Azadeh Mobasher
No ratings yet
NLP Techniques by Azadeh Mobasher
32 pages
Advanced NLP Techniques with spaCy
No ratings yet
Advanced NLP Techniques with spaCy
26 pages
Custom spaCy Model Training Guide
No ratings yet
Custom spaCy Model Training Guide
28 pages
Advanced NLP Techniques with spaCy
100% (1)
Advanced NLP Techniques with spaCy
28 pages
Mastering NLP with spaCy Basics
No ratings yet
Mastering NLP with spaCy Basics
1 page
spaCy: Advanced NLP Library in Python
No ratings yet
spaCy: Advanced NLP Library in Python
19 pages
Python Text Processing and NLP Basics
No ratings yet
Python Text Processing and NLP Basics
32 pages
29aa28bf-570a-4965-8f54-d6a541ae4e06
No ratings yet
29aa28bf-570a-4965-8f54-d6a541ae4e06
2 pages
spaCy NLP Pipelines Explained
No ratings yet
spaCy NLP Pipelines Explained
32 pages
Text Preprocessing for NLP with spaCy
No ratings yet
Text Preprocessing for NLP with spaCy
5 pages
Text Classification with spaCy in Python
No ratings yet
Text Classification with spaCy in Python
22 pages
spaCy: Vocab and Semantic Similarity
No ratings yet
spaCy: Vocab and Semantic Similarity
28 pages
Essential NLP Libraries and Techniques
No ratings yet
Essential NLP Libraries and Techniques
10 pages
spaCy Cheat Sheet for NLP Tasks
No ratings yet
spaCy Cheat Sheet for NLP Tasks
2 pages
Installing and Using SpaCy Models
No ratings yet
Installing and Using SpaCy Models
4 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
37 pages
Introduction to spaCy Features
No ratings yet
Introduction to spaCy Features
10 pages
Tokenization in NLP: Libraries & Process
No ratings yet
Tokenization in NLP: Libraries & Process
4 pages
Training Models with spaCy
No ratings yet
Training Models with spaCy
26 pages
NLP
No ratings yet
NLP
29 pages
Introduction to spaCy for NLP
No ratings yet
Introduction to spaCy for NLP
3 pages
Advanced NLP Model Training with spaCy
No ratings yet
Advanced NLP Model Training with spaCy
26 pages
NLTK and spaCy Installation Guide
No ratings yet
NLTK and spaCy Installation Guide
63 pages
SpaCy for Paragraph Segmentation
No ratings yet
SpaCy for Paragraph Segmentation
15 pages
Filtering DataFrame by Location
No ratings yet
Filtering DataFrame by Location
10 pages
NLP with SpaCy and Pandas in Python
No ratings yet
NLP with SpaCy and Pandas in Python
2 pages
Text Preprocessing
No ratings yet
Text Preprocessing
24 pages
NLP Basics for AI Chatbot Development
No ratings yet
NLP Basics for AI Chatbot Development
13 pages
NLP Techniques Cheat Sheet
No ratings yet
NLP Techniques Cheat Sheet
10 pages
POS Tagging with spaCy in NLP
No ratings yet
POS Tagging with spaCy in NLP
8 pages
NLP Data Analytics with Python Guide
No ratings yet
NLP Data Analytics with Python Guide
58 pages
Text Data Processing in NLP Techniques
No ratings yet
Text Data Processing in NLP Techniques
52 pages
Pos & Ner Tagging
No ratings yet
Pos & Ner Tagging
37 pages
NLP Text Processing Techniques in Python
No ratings yet
NLP Text Processing Techniques in Python
24 pages
NLP Customer Complaint Classification
No ratings yet
NLP Customer Complaint Classification
306 pages
Named Entity Recognition in NLP
No ratings yet
Named Entity Recognition in NLP
17 pages
Python NLP: Word Analysis & Generation
No ratings yet
Python NLP: Word Analysis & Generation
18 pages
Tokenization in Natural Language Processing
No ratings yet
Tokenization in Natural Language Processing
61 pages
NER Techniques in Python NLP
No ratings yet
NER Techniques in Python NLP
17 pages
NLP - Practical No 6
No ratings yet
NLP - Practical No 6
3 pages
Text Preprocessing with spaCy & nltk
No ratings yet
Text Preprocessing with spaCy & nltk
3 pages
Named Entity Recognition with spaCy
No ratings yet
Named Entity Recognition with spaCy
5 pages
Code Lecture1 NLP
No ratings yet
Code Lecture1 NLP
9 pages
NLTK and spaCy Text Processing Techniques
No ratings yet
NLTK and spaCy Text Processing Techniques
19 pages
Parts of Speech in NLP with Python
No ratings yet
Parts of Speech in NLP with Python
11 pages
NLP - LAB MANUAL - Cse - 3258 - Students
No ratings yet
NLP - LAB MANUAL - Cse - 3258 - Students
26 pages
NLP Foundations Lab 5 Manual
No ratings yet
NLP Foundations Lab 5 Manual
8 pages
spaCy NLP Text Processing Guide
No ratings yet
spaCy NLP Text Processing Guide
8 pages
NLP Lecture 3
No ratings yet
NLP Lecture 3
14 pages
Semma Tablet Uses in Urdu
No ratings yet
Semma Tablet Uses in Urdu
34 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
11 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
57 pages
NLP Lab Manual Cse 3246-1
No ratings yet
NLP Lab Manual Cse 3246-1
34 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
19 pages
Module 1.ipynb Colab
No ratings yet
Module 1.ipynb Colab
6 pages
BAI601 Natural Language Processing Syllabus
No ratings yet
BAI601 Natural Language Processing Syllabus
5 pages
NLP with Python: NLTK and Jupyter Guide
No ratings yet
NLP with Python: NLTK and Jupyter Guide
11 pages
Spelling Error Detection in NLP
No ratings yet
Spelling Error Detection in NLP
9 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
284 pages
N-Grams in NLP: Applications & Smoothing Techniques
No ratings yet
N-Grams in NLP: Applications & Smoothing Techniques
90 pages
Crime Analysis in Newspaper Articles
No ratings yet
Crime Analysis in Newspaper Articles
7 pages
Kurd NLP Examination Questions
No ratings yet
Kurd NLP Examination Questions
4 pages
Hate Speech Detection via SVM in Social Media
No ratings yet
Hate Speech Detection via SVM in Social Media
25 pages
2025-26 NLP Mid-1 Question Paper Sets
No ratings yet
2025-26 NLP Mid-1 Question Paper Sets
9 pages
Foundations and Applications of AI
No ratings yet
Foundations and Applications of AI
93 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
6 pages
Natural Language Processing Lecture Notes
No ratings yet
Natural Language Processing Lecture Notes
103 pages
NLP Course Overview by Rada Mihalcea
No ratings yet
NLP Course Overview by Rada Mihalcea
26 pages
Understanding Brill and NLTK Taggers
No ratings yet
Understanding Brill and NLTK Taggers
5 pages
Setswana Part of Speech Tagging
No ratings yet
Setswana Part of Speech Tagging
6 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
61 pages
Identifying Important Scholarly Citations
No ratings yet
Identifying Important Scholarly Citations
6 pages
Intelligent Assistant for Linux CLI
No ratings yet
Intelligent Assistant for Linux CLI
7 pages
NLP Lab Manual for M.Sc. IT Students
No ratings yet
NLP Lab Manual for M.Sc. IT Students
48 pages
NLP Questions Answers
No ratings yet
NLP Questions Answers
13 pages
Essential Python NLP Libraries Guide
No ratings yet
Essential Python NLP Libraries Guide
6 pages
NLP Module Wise Previous Year Questions
No ratings yet
NLP Module Wise Previous Year Questions
3 pages
Cyber Security Definition Analysis
No ratings yet
Cyber Security Definition Analysis
24 pages
NLP Text Pre-Processing Techniques
No ratings yet
NLP Text Pre-Processing Techniques
27 pages
NLP Morphology and Parts-of-Speech Guide
No ratings yet
NLP Morphology and Parts-of-Speech Guide
19 pages
Formality in Thai EFL Academic Writing
No ratings yet
Formality in Thai EFL Academic Writing
20 pages
Statistical Machine Learning For Information Retrieval - Adam Berger PDF
No ratings yet
Statistical Machine Learning For Information Retrieval - Adam Berger PDF
147 pages
Sanskrit POS Tagging Methods Survey
No ratings yet
Sanskrit POS Tagging Methods Survey
6 pages
POS Tagging Implementation Guide
No ratings yet
POS Tagging Implementation Guide
6 pages

spaCy for Natural Language Processing

Uploaded by

spaCy for Natural Language Processing

Uploaded by

Natural Language

A subfield of Artificial Intelligence (AI)

Helps computers to understand human

Helps extract insights from unstructured

Incorporates statistics, machine learning

NATURAL LANGUAGE PROCESSING WITH SPACY

Use of computers to determine the underlying subjective tone of a piece of writing

NATURAL LANGUAGE PROCESSING WITH SPACY

Named entities are real-world objects such as a person or location

NATURAL LANGUAGE PROCESSING WITH SPACY

Generate human-like responses to text input, such as ChatGPT

NATURAL LANGUAGE PROCESSING WITH SPACY

Is designed to build systems for information

Provides production-ready code for NLP

Supports 64+ languages

Is robust and fast and has visualization

NATURAL LANGUAGE PROCESSING WITH SPACY

spaCy trained models can be downloaded python3 -m spacy download en_core_web_sm

NATURAL LANGUAGE PROCESSING WITH SPACY

NATURAL LANGUAGE PROCESSING WITH SPACY

Tokenization: The process of dividing a text into a list of meaningful tokens

print([[Link] for token in doc])

['A', 'spaCy', 'pipeline', 'object', 'is', 'created', '.']

NATURAL LANGUAGE PROCESSING WITH SPACY

Apply nlp() on any text to get a Doc

NATURAL LANGUAGE PROCESSING WITH SPACY

spaCy applies some processing steps using its Language class:

NATURAL LANGUAGE PROCESSING WITH SPACY

Span A slice from a Doc object

Token An individual token, i.e. a word, punctuation, whitespace, etc.

NATURAL LANGUAGE PROCESSING WITH SPACY

Component Name Description

Tagger Tagger Assign part-of-speech tags

NATURAL LANGUAGE PROCESSING WITH SPACY

Each component has unique features to process text

NATURAL LANGUAGE PROCESSING WITH SPACY

Tokens can be words, numbers and punctuation

doc = nlp("Tokenization splits a sentence into its tokens.")

['Tokenization', 'splits', 'a', 'sentence', 'into', 'its', 'tokens', '.']

NATURAL LANGUAGE PROCESSING WITH SPACY

text = "We are learning NLP. This course introduces spaCy."

We are learning NLP.

NATURAL LANGUAGE PROCESSING WITH SPACY

Improves accuracy of language models

[('We', 'we'), ('are', 'be'), ('seeing', 'see'), ('her', 'she'),

NATURAL LANGUAGE PROCESSING WITH SPACY

POS Description Example

NATURAL LANGUAGE PROCESSING WITH SPACY

POS tagging confirms the meaning of a word

Some words such as watch can be both noun and verb

[Link]() explains a given POS tag

NATURAL LANGUAGE PROCESSING WITH SPACY

print([([Link], token.pos_, print([([Link], token.pos_,

[('I', 'PRON', 'pronoun'), [('I', 'PRON', 'pronoun'),

NATURAL LANGUAGE PROCESSING WITH SPACY

Entity type Description

NATURAL LANGUAGE PROCESSING WITH SPACY

Named entities are available via the [Link] property

NATURAL LANGUAGE PROCESSING WITH SPACY

>>> [('Albert Einstein', 0, 15, 'PERSON')]

NATURAL LANGUAGE PROCESSING WITH SPACY

>>> [('Albert', 'PERSON'), ('Einstein', 'PERSON'),

NATURAL LANGUAGE PROCESSING WITH SPACY

NATURAL LANGUAGE PROCESSING WITH SPACY

You might also like