NLP-Natural
Language
Processing
TOPIC: -
Nyka product reviews
Group-1
NLP
Natural language processing (NLP) is a field that focuses on
making natural human language usable by computer programs.
NLTK, or Natural Language Toolkit, is a Python package that you
can use for NLP.
A lot of the data that you could be analyzing is unstructured data
and contains the human-readable text.
Since machines can able to understand only binary values, We
use NLP Technics to convert text into vectors.
LIBRARIES USED
import pandas as pd
import nltk
from [Link] import sent_tokenize,word_tokenize
from [Link] import stopwords
from [Link] import PorterStemmer, LancasterStemmer, SnowballStemmer,
from [Link] import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer
from sys import getsizeof
from sklearn.feature_extraction.text import TfidfVectorizer
import re
import warnings
[Link]('ignore')
Nyka product reviews-Data
Frame
Number of data points =
61284
Dependent and independent columns
Independent text column: - Review text
DataFrame After removing Nan value
rows
Number of data points =
61276
Steps involved in Text Pre
Processing
1 2 3 4 5 6 7
Convert Demoniz Remove Remove the Remove Remove Apply
text into e the HTML Web page special stop stemmin
lowercase emojis. tags. Hypertext characte words. g and
or Transfer rs except lemmatiz
uppercas Protocol for ation.
e. Secure space.
Text pre-
processing
Cleaned
review
Converting text to vectors using Bag of
Words
Converting text to vectors using
TF-IDF
Vector conversion of Review Rating Column
Steps involved in converting numbers into vectors.
Dealing With Nan’s (removing)
Separate the ratings which are >3 or <3 and convert into
positive or negative reviews
Applying One-Hot Encoding
One-Hot Encoding on Rating
column
Github Links
Name GitHub Link
Gogula Vinay [Link]
arning_NLP_Assingnment-[Link]
Dhanya [Link]
Palacharla n%20nykaa%[Link]
Sahithi Chowdary [Link]
ews-NLP-
Vijay Bhaskar [Link]
Suresh Gurrala [Link]