Data Science | 30 Days of Machine Learning | Day - 15
Educator Name: Nishant Dhote
Support Team: +91-7880-113-112
----Today Topics | Day 15----
Feature Engineering (Missing Value Imputation)
----
- KNN Imputer
- K-Nearest Neighbour Calculation Method
- What is Euclidean Distance in Machine Learning?
- How to find K nearest neighbour?
- Find missing imputation value?
Dataset Link GitHub: [Link]
Feature
Engineering
Feature Feature Feature Feature
Transformation Construction Selection Extraction
- Missing Value Imputation
- Handling Categorical Variables
- Outlier Detection
- Feature Scaling
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
Today’s Topics:
Missing Values
Remove Data Impute Values
Univariate Multivariate
Imputation Imputation
Numerical Data Categorical Data KNN Imputer
Integrative
- Mean/Median Imputer
- Mode (Most Frequent)
- Arbitrary Values
- Missing Value
- End of Distribution
- Random Sampling Today Class
Imputation
- Missing Indicator
- Automatic select value
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
- KNN Imputer: KNN imputer is a scikit-learn class used to fill out or predict
the missing values in a dataset. It is a more useful method which works on
the basic approach of the KNN algorithm rather than the naive approach
(straightforward attempt to solve solution) of filling all the values with
mean or the median. In this approach, we specify a distance from the
missing values which is also known as the K parameter. The missing value
will be predicted in reference to the mean of the neighbours.
[Link] Variable 1 Variable 2 Variable 3 Variable 4
1 28 -- 48 22
2 -- 40 37 24
3 34 22 55 26
4 26 -- 30 --
5 50 20 49 --
K-Nearest Neighbour: K= The number of nearest neighbours to a new
unknown variable that has to be predicted or classified is denoted by the
symbol 'K'.
[Link].nan_euclidean_distances
[Link]
[Link]/stable/modules/generated/[Link].nan_euclidean_distances.html#sklear
[Link].nan_euclidean_distances
dist(x,y) = sqrt(weight * sq. distance from present coordinates) where
weight = Total # of coordinates / # of present coordinates
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
What is Euclidean Distance in Machine Learning?
Euclidean distance is used in many machine learning algorithms as a
default distance metric to measure the similarity between two recorded
observations. However, the observations to be compared must include
features that are continuous and have numeric variables like weight,
height, salary, etc.
2 Step We follow:
1. Find “K” nearest neighbour?
2. Find the value?
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
Example 1: Calculation between Row 1 and Row 2
[Link] Feature 1 Feature 2 Feature 3 Feature 4
1 28 -- 48 22
2 -- 40 37 24
3 34 22 55 26
4 26 -- 30 --
5 50 20 49 --
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
Example 2: Calculation between Row 2 and Row 3
[Link] Feature 1 Feature 2 Feature 3 Feature 4
1 28 -- 48 22
2 -- 40 37 24
3 34 22 55 26
4 26 -- 30 --
5 50 20 49 --
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
Example 3: Calculation between Row 2 and Row 4
[Link] Feature 1 Feature 2 Feature 3 Feature 4
1 28 -- 48 22
2 -- 40 37 24
3 34 22 55 26
4 26 -- 30 --
5 50 20 49 --
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
Example 4: Calculation between Row 2 and Row 5
[Link] Feature 1 Feature 2 Feature 3 Feature 4
1 28 -- 48 22
2 -- 40 37 24
3 34 22 55 26
4 26 -- 30 --
5 50 20 49 --
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
All 4 Euclidean Distance Example:
[Link] Feature 1 Feature 2 Feature 3 Feature 4
1 28 -- 48 22
2 -- 40 37 24
3 34 22 55 26
4 26 -- 30 --
5 50 20 49 --
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
[Link] Feature 1 Feature 2 Feature 3 Feature 4
1 28 -- 48 22
2 -- 40 37 24
3 34 22 55 26
4 26 -- 30 --
5 50 20 49 --
2 Step We follow:
Find “K” nearest neighbour?
Find the value?
v
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
<Start-Coding>
#Import Library
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from [Link] import KNNImputer,SimpleImputer
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score
#Import Dataset
df =
pd.read_csv('[Link]')[['Age','Pclass','Fare','Survi
ved']]
----
[Link](10)
#Check Missing Value
[Link]().mean() * 100
#Define X & Y
X = [Link](columns=['Survived'])
y = df['Survived']
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
#Train Test Split
X_train,X_test,y_train,y_test =
train_test_split(X,y,test_size=0.2,random_state=2)
----
X_train
#Apply KNN Imputer
knn = KNNImputer(n_neighbors=1,weights='distance')
X_train_trf = knn.fit_transform(X_train)
X_test_trf = [Link](X_test)
#Convert in Data Frame
[Link](X_train_trf,columns=X_train.columns)
#Apply Logistic Regression
lr = LogisticRegression()
[Link](X_train_trf,y_train)
y_pred = [Link](X_test_trf)
accuracy_score(y_test,y_pred)
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]
Day 15: Curious Data Minds
What is a machine learning role in healthcare sector?
Read Blog: [Link]
healthcare
[Link]
Install our IHHPET Android App: [Link]
Contact : +91-7880-113-112 | Visit Website: [Link]