0% found this document useful (0 votes)
4 views10 pages

Unsupervised Learning and Clustering

The document discusses unsupervised learning, highlighting its difference from supervised learning, and emphasizes exploratory analysis to uncover patterns in data. It focuses on clustering techniques, particularly K-means clustering, detailing the process of assigning data points to clusters based on proximity to centroids. The document also includes code snippets for implementing K-means clustering using Python and visualizing the results.

Uploaded by

oulla898
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views10 pages

Unsupervised Learning and Clustering

The document discusses unsupervised learning, highlighting its difference from supervised learning, and emphasizes exploratory analysis to uncover patterns in data. It focuses on clustering techniques, particularly K-means clustering, detailing the process of assigning data points to clusters based on proximity to centroids. The document also includes code snippets for implementing K-means clustering using Python and visualizing the results.

Uploaded by

oulla898
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unsupervised - Jupyter Notebook [Link]

ipynb

Unsupervised learning
• Supervised learning – Use the data to learn the output values

• Unsupervised learning – No output variables available

• Use the data to learn from the data – Sometimes called exploratory analysis – What to find in
the data?

• Structure

• Regularities • Hidden information • Etc.

Clustering
• Divide data into groups (subsets/clusters) that are

1 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]

– Meaningful: Capture the natural structure of the data


– Useful: Depends on purpose

• Observations in the same cluster are similar in some sense


• Unsupervised classification

2 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]

3 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]

K-means clustering
Select K points as initial centroids
Repeat
– Form K clusters by assigning eachpoint to its closest centroid using Euclidean distance
– Recompute the centroids of each cluster ( mean of all objects in each cluster)
Until centroids do not change

4 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]

5 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]

6 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]

7 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]

In [10]: from [Link] import make_blobs


import numpy as np
import matplotlib as mpl
import [Link] as plt
from [Link] import KMeans

blob_centers = [Link](
[[ 0.2, 2.3],
[-1.5 , 2.3],
[-2.8, 1.8],
[-2.8, 2.8],
[-2.8, 1.3]])
blob_std = [Link]([0.4, 0.3, 0.1, 0.1, 0.1])
X, y = make_blobs(n_samples=2000, centers=blob_centers,
cluster_std=blob_std, random_state=7)
def plot_clusters(X, y=None):
[Link](X[:, 0], X[:, 1], c=y, s=1)
[Link]("$x_1$", fontsize=14)
[Link]("$x_2$", fontsize=14, rotation=0)
[Link](figsize=(8, 4))
plot_clusters(X)
plt show()

<Figure size 576x288 with 0 Axes>

8 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]

In [31]: k = 5
kmeans = KMeans(n_clusters=k, random_state=42)
y_pred = kmeans.fit_predict(X)
print(kmeans.cluster_centers_)
X_new = [Link]([[-2.8, 1.7], [3, 2], [-3, 3], [-3, 2.5]])
kmeans predict(X_new)
[[ 0.06154126 2.58026834]
[-2.80389616 1.80117999]
[-2.79290307 2.79641063]
[-1.47083264 2.28276928]
[ 0.32780688 1.98072917]
[-2.80037642 1.30082566]]

Out[31]: array([1, 4, 2, 2])

In [18]: def plot_data(X):


[Link](X[:, 0], X[:, 1], 'k.', markersize=2)

def plot_centroids(centroids, weights=None, circle_color='w', cross_color='k'):


if weights is not None:
centroids = centroids[weights > [Link]() / 10]
[Link](centroids[:, 0], centroids[:, 1],
marker='o', s=30, linewidths=8,
color=circle_color, zorder=10, alpha=0.9)
[Link](centroids[:, 0], centroids[:, 1],
marker='x', s=50, linewidths=50,
color=cross_color, zorder=11, alpha=1)

def plot_decision_boundaries(clusterer, X, resolution=1000, show_centroids=True


show_xlabels=True, show_ylabels=True):
mins = [Link](axis=0) - 0.1
maxs = [Link](axis=0) + 0.1
xx, yy = [Link]([Link](mins[0], maxs[0], resolution),
[Link](mins[1], maxs[1], resolution))
Z = [Link](np.c_[[Link](), [Link]()])
Z = [Link]([Link])

[Link](Z, extent=(mins[0], maxs[0], mins[1], maxs[1]),


cmap="Pastel2")
[Link](Z, extent=(mins[0], maxs[0], mins[1], maxs[1]),
linewidths=1, colors='k')
plot_data(X)
if show_centroids:
plot_centroids(clusterer.cluster_centers_)

if show_xlabels:
[Link]("$x_1$", fontsize=14)
else:
plt.tick_params(labelbottom=False)
if show_ylabels:
[Link]("$x_2$", fontsize=14, rotation=0)
else:
plt tick_params(labelleft False)

9 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]

In [32]: [Link](figsize=(8, 4))


plot_decision_boundaries(kmeans, X)
[Link]()
kmeans inertia_

Out[32]: 169.23715382893596

10 of 10 3/27/2023, 12:33 PM

You might also like