NLP Phase - 01

Natural Language Processing (NLP) is a branch of Artificial Intelligence that enables computers to understand and interact with human language. It has various applications, including chatbots, email spam detection, and sentiment analysis, utilizing different approaches such as rule-based, statistical, and deep learning methods. The NLP pipeline involves stages like text acquisition, preprocessing, feature extraction, modeling, evaluation, and deployment.

Uploaded by

shrutishukla2235

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views12 pages

NLP Phase - 01

Uploaded by

shrutishukla2235

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

NLP

Natural Language Processing

TABLE OF CONTENT
Introduction
Applications
Approaches
Pipeline
INTRODUCTION
Natural Language Processing (NLP) is a part of Artificial Intelligence
that helps computers understand human language. It’s about teaching
machines to read, listen, and talk like people do.

NEED OF NLP : humans talk in languages like hindi , english tamil but computers
only understand numbers (o & 1). To bridge this gap we need NLP so, that
humans can read and understand our language.
Chatbots and Virtual Assistants
They use NLP to understand
questions and give answers.
Example: Siri, Alexa, or Copilot

REAL replying to your queries in natural

language.

WORLD Email Spam Detection

NLP helps filter unwanted or harmful

APPLICATIONS emails.
It looks at words, patterns, and sender
info to decide if an email is spam.

Sentiment Analysis
NLP checks text to find emotions or
opinions.
Example: Analyzing customer reviews to
see if they are positive, negative, or
neutral.
01
Rule-Based Approach:
How it works: Uses manually written grammar rules,
dictionaries, and pattern-matching to process language.
Example: A chatbot that replies based on fixed “if–then” rules.

Drawbacks:

APPROCHES Hard to scale (too many rules needed for complex language).
Not flexible—fails when input doesn’t match predefined rules.
Maintenance is difficult as language evolves.

OF NLP 02
Statistical / Machine Learning Approach:
How it works: Uses probability and statistical models
trained on large text datasets.
Example: Naive Bayes classifier for spam detection.

Drawbacks:
Needs a lot of labeled data for good accuracy.
Struggles with rare words or unseen phrases.
Often ignores deeper meaning (focuses on word
frequency, not context).
03
Deep Learning-Based Approach
How it works: Uses neural networks (RNNs, CNNs,
Transformers) to learn complex patterns and context in
language.
Example: BERT, GPT models for translation, chatbots,

APPROCHES
summarization.

Drawbacks:

OF NLP
Requires massive data and computing power.
Can be a “black box”—hard to explain why it gives
certain results.
Risk of bias if training data is biased.

Early systems used rule-based sentiment lexicons.

Then came ML classifiers (Naive Bayes, SVM with Bag of Words/TF-IDF).
Now, deep learning with embeddings and transformers dominates.
PIPELINE
PIPELINE
1. Text Acquisition
Collect raw text data (documents, tweets, emails, speech converted to
text).

2. Text Preprocessing
Tokenization: Split text into words or sentences.
Normalization: Lowercasing, removing punctuation, handling contractions.
Stopword Removal: Filter out common words like the, is, and.
Stemming/Lemmatization: Reduce words to their root form (running →
run).
Noise Removal: Clean HTML tags, special symbols, etc.

3. Feature Extraction
Convert text into numerical form for algorithms.
Methods: Bag of Words, TF‑IDF, Word Embeddings (Word2Vec, GloVe,
BERT).
PIPELINE
4. Modeling / Learning
Apply algorithms to learn patterns.
Approaches:
Rule‑based (simple patterns).
Statistical ML (Naive Bayes, SVM).
Deep Learning (RNNs, Transformers like BERT, GPT).

5. Evaluation
Measure performance using metrics like accuracy, precision, recall, F1‑score.
Ensures the model works well on unseen data.

6. Deployment
Integrate into applications (chatbots, sentiment analysis tools, search engines).
Monitor and update with new data.
1 TEXT GATHERING
01 03
Web Scraping
Public Datasets (e.g., Kaggle)
What it is: Extracting text directly from websites using tools like
What it is: Ready-made datasets shared by researchers,
02
BeautifulSoup, Scrapy, or Selenium.
organizations, or communities. Kaggle is one of the most
Advantages:
popular platforms.
Flexible: you can target any website with textual content.
Advantages: APIs (Application Programming Interfaces) Useful when APIs are unavailable or limited.
Free and easily accessible. What it is: Structured access to data provided by platforms Limitations:
Often cleaned, preprocessed, and labeled (e.g., (Twitter API, Reddit API, News API, etc.). Legal/ethical concerns (must respect [Link] and site
sentiment datasets, spam detection datasets).
Advantages: policies).
Saves time compared to raw data collection.
Real-time or regularly updated data. HTML parsing can be messy (ads, navigation text,
Limitations:
Structured format (JSON/XML), making parsing easier. duplicates).
May not perfectly fit your specific problem.
Often includes metadata (timestamps, user info, etc.). Requires more preprocessing to clean raw text.
Risk of overfitting if the dataset is too small or outdated.
Limitations: Example: Scraping product reviews from e-commerce sites for
Example: Using Kaggle’s "IMDB Movie Reviews" dataset for
Rate limits (restricted number of requests). sentiment analysis.
sentiment analysis.
Requires authentication (API keys).
Sometimes paid access for large-scale usage.
Example: Collecting tweets via Twitter API for hate speech
detection.
2 TEXT CLEANING
LOWECASING
REMOVE PUNCTUATIONS
REMOVING NUMBERS
REMOVING URL’S / LINKS
REMOVING HTML TAGS
REMOVING EMOJI’S / SPECIAL CHARACTERS
REMOVING STOPWORDS
THANK
YOU
Presented by Group - 03

CCPM Unit 2 Notes
No ratings yet
CCPM Unit 2 Notes
19 pages
History and Applications of NLP
No ratings yet
History and Applications of NLP
11 pages
NLP Overview and Key Techniques
No ratings yet
NLP Overview and Key Techniques
16 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
16 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
3 pages
AIA Unit3
No ratings yet
AIA Unit3
68 pages
NLP Fundamentals and Techniques Overview
No ratings yet
NLP Fundamentals and Techniques Overview
55 pages
NLP Week 1: Foundations & Applications
No ratings yet
NLP Week 1: Foundations & Applications
20 pages
NLP Study Notes
No ratings yet
NLP Study Notes
18 pages
NLP Techniques in Data Science
No ratings yet
NLP Techniques in Data Science
49 pages
FALLSEM2025-26 VL BCSE409L 00100 TH 2025-08-15 NLP-phases - Ambiguity - NLP-pipeline
No ratings yet
FALLSEM2025-26 VL BCSE409L 00100 TH 2025-08-15 NLP-phases - Ambiguity - NLP-pipeline
13 pages
Unit 4 Nitya
No ratings yet
Unit 4 Nitya
15 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
16 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
8 pages
Introduction to Natural Language Processing
100% (1)
Introduction to Natural Language Processing
12 pages
Unit 1 NLP
No ratings yet
Unit 1 NLP
20 pages
NLP: Understanding NLU and NLG
No ratings yet
NLP: Understanding NLU and NLG
29 pages
Key Components of Natural Language Processing
No ratings yet
Key Components of Natural Language Processing
53 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
73 pages
History and Evolution of NLP
No ratings yet
History and Evolution of NLP
26 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
19 pages
Natural Language Processing: What Is NLP
No ratings yet
Natural Language Processing: What Is NLP
3 pages
NLP Intro
No ratings yet
NLP Intro
42 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
4 pages
Overview NLP
No ratings yet
Overview NLP
38 pages
Understanding Human Languages in NLP
No ratings yet
Understanding Human Languages in NLP
18 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
23 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
22 pages
NLP Overview and Key Concepts
No ratings yet
NLP Overview and Key Concepts
25 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
19 pages
NLP Unit-1
No ratings yet
NLP Unit-1
46 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
29 pages
NLP Applications and Strategies Overview
No ratings yet
NLP Applications and Strategies Overview
16 pages
NLP in AI: Key Concepts and Applications
No ratings yet
NLP in AI: Key Concepts and Applications
18 pages
NLP Pipeline: Steps for Model Development
No ratings yet
NLP Pipeline: Steps for Model Development
109 pages
NLP Lecture Notes - January 2025
No ratings yet
NLP Lecture Notes - January 2025
8 pages
Text Processing Techniques for NLP
No ratings yet
Text Processing Techniques for NLP
15 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
65 pages
NLP Techniques and Business Uses
No ratings yet
NLP Techniques and Business Uses
13 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
16 pages
NLP Techniques and Trends Overview
No ratings yet
NLP Techniques and Trends Overview
16 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
11 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
8 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
93 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
9 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
21 pages
Key Concepts in Natural Language Processing
No ratings yet
Key Concepts in Natural Language Processing
26 pages
Text Analytics and NLP Overview
No ratings yet
Text Analytics and NLP Overview
14 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
65 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
22 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
31 pages
Getting Started With Artificial Intelligence - Preview - Final 1 - KUO12425USEN PDF
No ratings yet
Getting Started With Artificial Intelligence - Preview - Final 1 - KUO12425USEN PDF
18 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
74 pages
GAI2
No ratings yet
GAI2
4 pages
Grammatical Terms in Chatbots and NLP
No ratings yet
Grammatical Terms in Chatbots and NLP
48 pages
Natural Language Processing-1
No ratings yet
Natural Language Processing-1
21 pages
NLP Pipeline Stages Explained
No ratings yet
NLP Pipeline Stages Explained
12 pages
SQL Exam: Database Table Creation
No ratings yet
SQL Exam: Database Table Creation
3 pages
Evolution of Sports Management Systems
No ratings yet
Evolution of Sports Management Systems
4 pages
DBMS Lab Experiments for CSE Students
No ratings yet
DBMS Lab Experiments for CSE Students
20 pages
Understanding SQL Constraints
No ratings yet
Understanding SQL Constraints
10 pages
Ambo University Final Year Project Guide
No ratings yet
Ambo University Final Year Project Guide
1 page
Osmania Univ. B.E Results 2023
No ratings yet
Osmania Univ. B.E Results 2023
1 page
Modern Systems Analysis and Design 6th Edition Hoffer Ebook & Testbank
No ratings yet
Modern Systems Analysis and Design 6th Edition Hoffer Ebook & Testbank
206 pages
Overview of Library Automation Systems
No ratings yet
Overview of Library Automation Systems
45 pages
AI-Driven Case Resolution with Agentforce
No ratings yet
AI-Driven Case Resolution with Agentforce
3 pages
Call for Papers: HCI Systems Approach
No ratings yet
Call for Papers: HCI Systems Approach
4 pages
Big Data Syllabus for B.Tech CSE
No ratings yet
Big Data Syllabus for B.Tech CSE
5 pages
Types of Libraries and Their Functions
No ratings yet
Types of Libraries and Their Functions
19 pages
ID3 Decision Tree Algorithm in Python
No ratings yet
ID3 Decision Tree Algorithm in Python
2 pages
Libri 2025 Best Student Paper Award
No ratings yet
Libri 2025 Best Student Paper Award
2 pages
Data Analyst Resume for Tu Nguyen
No ratings yet
Data Analyst Resume for Tu Nguyen
1 page
SAP BW InfoCube Table Overview
No ratings yet
SAP BW InfoCube Table Overview
3 pages
PDF - Chitkara
No ratings yet
PDF - Chitkara
12 pages
Ask-EDA: A Design Assistant Empowered by LLM, Hybrid RAG and Abbreviation De-Hallucination
No ratings yet
Ask-EDA: A Design Assistant Empowered by LLM, Hybrid RAG and Abbreviation De-Hallucination
5 pages
Python Relational Database Basics
No ratings yet
Python Relational Database Basics
33 pages
SAP IBP Data Volume Report Guide
No ratings yet
SAP IBP Data Volume Report Guide
6 pages
Types of Reports in Information Systems
No ratings yet
Types of Reports in Information Systems
6 pages
Understanding Management Information Systems
No ratings yet
Understanding Management Information Systems
4 pages
Advanced Database System Course Outline
No ratings yet
Advanced Database System Course Outline
3 pages
Database Concepts and Techniques
No ratings yet
Database Concepts and Techniques
26 pages
Sentiment Analysis for E-Consultation Feedback
No ratings yet
Sentiment Analysis for E-Consultation Feedback
10 pages
Data Visualization Techniques Explained
No ratings yet
Data Visualization Techniques Explained
13 pages
Introduction to Data Science Overview
No ratings yet
Introduction to Data Science Overview
16 pages
Bank Database Management System
No ratings yet
Bank Database Management System
19 pages
Major Challenges in Data Mining
No ratings yet
Major Challenges in Data Mining
2 pages
Sakila Database Overview and Queries
No ratings yet
Sakila Database Overview and Queries
12 pages