Data Normalization and Clustering Guide

The document outlines a data normalization and clustering process using the Standard Scaler and Agglomerative Clustering. It details the creation of a synthetic customer dataset, the normalization of features, and the clustering of customers into four groups based on selected features. Additionally, it includes steps for visualizing the data distributions and cluster characteristics through various plots.

Uploaded by

skandapmwork2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views4 pages

Data Normalization and Clustering Guide

Uploaded by

skandapmwork2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Data Normalization (Standard Scaler)

Before performing clustering, it is crucial to normalize the data, especially when features are on
different scales. For example, age might range from 18 to 70, whereas monthly_spending could
range from 50 to 500.
Standard Scaler is used here to standardize features by subtracting the mean and scaling to unit
variance, making them comparable.

Clustering Process
Feature Selection: Only a subset of features (age, tenure, monthly_spending) is selected for
clustering.
Agglomerative Clustering:
Uses linkage='ward' to minimize variance within clusters.
The n_clusters=4 parameter specifies that the data will be grouped into 4 clusters.

Code:
import pandas as pd
import numpy as np
import [Link] as plt
import seaborn as sns
from [Link] import StandardScaler
from [Link] import AgglomerativeClustering
import [Link] as sch

# Step 1: Data Aggregation

# 1.1 Create a synthetic customer dataset with random values
[Link](42)
n = 500 # Number of customers

# Features: age, tenure, monthly_spending, number of products

data = {
'age': [Link](18, 70, size=n), # Age between 18 and 70
'tenure': [Link](1, 10, size=n), # Tenure between 1 and 10 years
'monthly_spending': [Link](50, 500, size=n), # Monthly spending between 50 and
500
'num_products': [Link](1, 6, size=n) # Number of products between 1 and 5
}

df = [Link](data)

# 1.2 Introduce missing values in 'monthly_spending' column

[Link][::10, 'monthly_spending'] = [Link] # Set every 10th value as NaN

# 1.3 Handle missing values by filling them with the mean of the column
df['monthly_spending'].fillna(df['monthly_spending'].mean(), inplace=True)

# 1.4 Visualize the distribution of key features

[Link](style="whitegrid")
[Link](figsize=(12, 8))

# Plot the distribution of key features

for i, feature in enumerate(['age', 'tenure', 'monthly_spending', 'num_products'], 1):
[Link](2, 2, i)
[Link](df[feature], kde=True, color="teal")
[Link](f'Distribution of {feature}')
[Link](feature)
[Link]("Frequency")

plt.tight_layout()
[Link]()

# 1.5 Normalize the numerical columns (age, tenure, monthly_spending, num_products) using
StandardScaler
scaler = StandardScaler()
df_scaled = [Link](scaler.fit_transform(df), columns=[Link])
# Step 2: Clustering Using Hierarchical Clustering
# 2.1 Select features for clustering (scaled data)
X = df_scaled[['age', 'tenure', 'monthly_spending']]

# 2.2 Perform Agglomerative Clustering

agg_clust = AgglomerativeClustering(linkage='ward', n_clusters=4) # We start with 4 clusters
df['cluster'] = agg_clust.fit_predict(X)

# 2.3 Plot a dendrogram to visualize the hierarchical clustering process

[Link](figsize=(10, 7))
[Link]([Link](X, method='ward'))
[Link]('Dendrogram of Customer Segments')
[Link]('Customers')
[Link]('Euclidean Distance')
[Link]()

# Step 3: Cluster Evaluation

# 3.1 Analyze the characteristics of each cluster using mean, median, and std deviation
cluster_means = [Link]('cluster')[['age', 'tenure', 'monthly_spending', 'num_products']].mean()
cluster_medians = [Link]('cluster')[['age', 'tenure', 'monthly_spending',
'num_products']].median()
cluster_std = [Link]('cluster')[['age', 'tenure', 'monthly_spending', 'num_products']].std()

# 3.2 Print out the cluster statistics (mean, median, std)

print("Cluster Means:\n", cluster_means)
print("\nCluster Medians:\n", cluster_medians)
print("\nCluster Standard Deviations:\n", cluster_std)

# Step 4: Cluster Profiling

# 4.1 Visualize the clusters using a pairplot, color-coded by cluster labels
[Link](df[['age', 'tenure', 'monthly_spending', 'num_products', 'cluster']], hue='cluster',
palette='Set2')
[Link]("Pairplot of Customer Features by Cluster", y=1.02)
[Link]()

# 4.2 Visualize clusters using scatter plots

# Scatter plot of Age vs Monthly Spending
[Link](figsize=(10, 6))
[Link](x='age', y='monthly_spending', hue='cluster', data=df, palette='Set2', s=100,
alpha=0.7)
[Link]('Customer Segments: Age vs Monthly Spending')
[Link]('Age')
[Link]('Monthly Spending')
[Link]()

# Scatter plot of Tenure vs Number of Products

[Link](figsize=(10, 6))
[Link](x='tenure', y='num_products', hue='cluster', data=df, palette='Set2', s=100,
alpha=0.7)
[Link]('Customer Segments: Tenure vs Number of Products')
[Link]('Tenure (Years)')
[Link]('Number of Products')
[Link]()

Customer Segmentation via Clustering Analysis
No ratings yet
Customer Segmentation via Clustering Analysis
8 pages
K-Means and Hierarchical Clustering in Python
No ratings yet
K-Means and Hierarchical Clustering in Python
7 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
25 pages
Common Unsupervised Algorithms
No ratings yet
Common Unsupervised Algorithms
10 pages
Common Unsupervised Learning Algorithms
No ratings yet
Common Unsupervised Learning Algorithms
10 pages
Customer Segmentation with Clustering Techniques
No ratings yet
Customer Segmentation with Clustering Techniques
6 pages
Hierarchical Clustering for Targeted Marketing
No ratings yet
Hierarchical Clustering for Targeted Marketing
5 pages
Machine Learning Case Studies in Python
No ratings yet
Machine Learning Case Studies in Python
62 pages
Clustering Techniques in Data Analytics
No ratings yet
Clustering Techniques in Data Analytics
36 pages
Clustering & Dimens Reduc
No ratings yet
Clustering & Dimens Reduc
6 pages
Experiment9 - Hierarchical Clustering
No ratings yet
Experiment9 - Hierarchical Clustering
8 pages
Lab Ex - K-Means Clustering
No ratings yet
Lab Ex - K-Means Clustering
13 pages
Customer Segmentation with Clustering
No ratings yet
Customer Segmentation with Clustering
5 pages
Unsupervised Learning on Wholesale Dataset
No ratings yet
Unsupervised Learning on Wholesale Dataset
7 pages
Customer Data Analysis with Python
No ratings yet
Customer Data Analysis with Python
4 pages
K-Means and Hierarchical Clustering Code
No ratings yet
K-Means and Hierarchical Clustering Code
10 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
2 pages
K-Means & Agglomerative Clustering Analysis
No ratings yet
K-Means & Agglomerative Clustering Analysis
7 pages
Data Cleaning in Hierarchical Clustering
No ratings yet
Data Cleaning in Hierarchical Clustering
6 pages
Customer Segmentation with K-Means
No ratings yet
Customer Segmentation with K-Means
8 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
Credit Card Customer Clustering Guide
No ratings yet
Credit Card Customer Clustering Guide
17 pages
K-means Clustering Analysis of Wholesale Customers
No ratings yet
K-means Clustering Analysis of Wholesale Customers
5 pages
DBSCAN Clustering on Mall Customers
No ratings yet
DBSCAN Clustering on Mall Customers
3 pages
DBSCAN and Clustering Algorithms Guide
No ratings yet
DBSCAN and Clustering Algorithms Guide
6 pages
Customer Segmentation with K-Means in Python
No ratings yet
Customer Segmentation with K-Means in Python
37 pages
Installing Python Libraries for ML
No ratings yet
Installing Python Libraries for ML
5 pages
Business Analytics: Unsupervised Learning Assignment
No ratings yet
Business Analytics: Unsupervised Learning Assignment
27 pages
Customer Segmentation with K-Means
No ratings yet
Customer Segmentation with K-Means
6 pages
Data Mining Assignment: Schema Design & Clustering
No ratings yet
Data Mining Assignment: Schema Design & Clustering
22 pages
Lesson 6 - Unsupervised Learning
No ratings yet
Lesson 6 - Unsupervised Learning
63 pages
UNIT 4 Short Notes
No ratings yet
UNIT 4 Short Notes
38 pages
Mba Ba Epap Unit 5 Notes.
No ratings yet
Mba Ba Epap Unit 5 Notes.
11 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
17 pages
Clustering Analysis of Crime and Airlines Data
No ratings yet
Clustering Analysis of Crime and Airlines Data
8 pages
Customer Segmentation Analysis Report
No ratings yet
Customer Segmentation Analysis Report
27 pages
Cluster Analysis in Machine Learning
No ratings yet
Cluster Analysis in Machine Learning
40 pages
Feature Engineering with Elcometer Data
No ratings yet
Feature Engineering with Elcometer Data
13 pages
Customer Segmentation with K-Means Clustering
No ratings yet
Customer Segmentation with K-Means Clustering
4 pages
Customer Segmentation with Python Guide
No ratings yet
Customer Segmentation with Python Guide
14 pages
Data Mining Report by Hansraj Yadav
83% (12)
Data Mining Report by Hansraj Yadav
34 pages
Solution Code Activity1 Seminar K Means Clustering
No ratings yet
Solution Code Activity1 Seminar K Means Clustering
2 pages
Customer Segmentation through Clustering Analysis
100% (2)
Customer Segmentation through Clustering Analysis
20 pages
Customer Segmentation Analysis
No ratings yet
Customer Segmentation Analysis
34 pages
K-Means & Hierarchical Clustering Analysis
No ratings yet
K-Means & Hierarchical Clustering Analysis
6 pages
Customer Segmentation via Data Mining
No ratings yet
Customer Segmentation via Data Mining
6 pages
Clustering & Dimensionality Reduction Techniques
No ratings yet
Clustering & Dimensionality Reduction Techniques
16 pages
Clustering Techniques for Customer Segmentation
No ratings yet
Clustering Techniques for Customer Segmentation
25 pages
K-Means for Customer Segmentation
100% (2)
K-Means for Customer Segmentation
41 pages
AI Skillflow 12
No ratings yet
AI Skillflow 12
9 pages
K-Means Clustering for Customer Segmentation
100% (5)
K-Means Clustering for Customer Segmentation
83 pages
K-means and Hierarchical Clustering
No ratings yet
K-means and Hierarchical Clustering
16 pages
Unsupervised Learning Business Report
No ratings yet
Unsupervised Learning Business Report
67 pages
Clustering Techniques in Python
No ratings yet
Clustering Techniques in Python
31 pages
Customer Segmentation via Clustering Analysis
100% (3)
Customer Segmentation via Clustering Analysis
39 pages
K-Means Clustering for Market Segmentation
No ratings yet
K-Means Clustering for Market Segmentation
6 pages
Overview of Clustering Techniques
No ratings yet
Overview of Clustering Techniques
55 pages
Cloud Productivity Seminar Report
No ratings yet
Cloud Productivity Seminar Report
9 pages
Deep Learning Fundamentals and Applications
No ratings yet
Deep Learning Fundamentals and Applications
3 pages
Applying LBW Principles to Software Engineering
No ratings yet
Applying LBW Principles to Software Engineering
8 pages
Soil Fertility Prediction Using DRQN
No ratings yet
Soil Fertility Prediction Using DRQN
6 pages
Ayur Health Chatbot with LLM Integration
No ratings yet
Ayur Health Chatbot with LLM Integration
25 pages
Housing Price Prediction Using KNN
No ratings yet
Housing Price Prediction Using KNN
16 pages
1
No ratings yet
1
128 pages
Context Free Grammar Overview
No ratings yet
Context Free Grammar Overview
9 pages
R-Squared Value Range in Statistics
No ratings yet
R-Squared Value Range in Statistics
33 pages
Forecasting Principles and Methods Guide
No ratings yet
Forecasting Principles and Methods Guide
63 pages
Enhancing Student Focus in Classrooms
100% (1)
Enhancing Student Focus in Classrooms
28 pages
Business Regression Analysis Basics
No ratings yet
Business Regression Analysis Basics
15 pages
Utilizing Random Forest and XGBoost DataMining Algorithms For Anticipating Students'Academic Performance
No ratings yet
Utilizing Random Forest and XGBoost DataMining Algorithms For Anticipating Students'Academic Performance
16 pages
Research Methods in Business Management
No ratings yet
Research Methods in Business Management
162 pages
MBA Dissertation Guidelines 2019-20
No ratings yet
MBA Dissertation Guidelines 2019-20
7 pages
Exploratory Data Analysis Fundamentals
No ratings yet
Exploratory Data Analysis Fundamentals
33 pages
Introduction to MANOVA Techniques
No ratings yet
Introduction to MANOVA Techniques
15 pages
Election Prediction Model Analysis
No ratings yet
Election Prediction Model Analysis
324 pages
Huawei Career Certification Overview
No ratings yet
Huawei Career Certification Overview
89 pages
Understanding Secondary Data in Research
No ratings yet
Understanding Secondary Data in Research
59 pages
Mahitha Polana Resume
No ratings yet
Mahitha Polana Resume
1 page
Business Analytics Fundamentals
No ratings yet
Business Analytics Fundamentals
62 pages
Essential Stata Commands Overview
No ratings yet
Essential Stata Commands Overview
3 pages
Logistic Regression Analysis in SPSS
No ratings yet
Logistic Regression Analysis in SPSS
4 pages
Unit 1 DM&W
No ratings yet
Unit 1 DM&W
12 pages
National University MBA Syllabus Overview
No ratings yet
National University MBA Syllabus Overview
16 pages
Job Description Healthcare Data Analyst
No ratings yet
Job Description Healthcare Data Analyst
2 pages
Data Analysis Guide for Experiments
No ratings yet
Data Analysis Guide for Experiments
4 pages
(Ebook) Introduction To Error Analysis: The Science of Measurements, Uncertainties, and Data Analysis by Merrin, Jack Isbn 9781975906658, 1975906659
No ratings yet
(Ebook) Introduction To Error Analysis: The Science of Measurements, Uncertainties, and Data Analysis by Merrin, Jack Isbn 9781975906658, 1975906659
74 pages
Business Statistics Course Overview
No ratings yet
Business Statistics Course Overview
10 pages
Entrance Design's Impact on Institute Image
No ratings yet
Entrance Design's Impact on Institute Image
6 pages
Big Data's Impact on Human Capital Management
No ratings yet
Big Data's Impact on Human Capital Management
11 pages
Data Science Course Overview by Unacademy
No ratings yet
Data Science Course Overview by Unacademy
22 pages
Data Mining Techniques in Marketing Analysis
No ratings yet
Data Mining Techniques in Marketing Analysis
3 pages
Strategy & Operations in Tech Explained
No ratings yet
Strategy & Operations in Tech Explained
96 pages
OLS Regression Assumptions Explained
No ratings yet
OLS Regression Assumptions Explained
32 pages
Quantitative Methods Assessment Questions
No ratings yet
Quantitative Methods Assessment Questions
5 pages
Action Research in Education: A Comprehensive Guide
No ratings yet
Action Research in Education: A Comprehensive Guide
50 pages