Unsupervised - Jupyter Notebook [Link]
ipynb
Unsupervised learning
• Supervised learning – Use the data to learn the output values
• Unsupervised learning – No output variables available
• Use the data to learn from the data – Sometimes called exploratory analysis – What to find in
the data?
• Structure
• Regularities • Hidden information • Etc.
Clustering
• Divide data into groups (subsets/clusters) that are
1 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]
– Meaningful: Capture the natural structure of the data
– Useful: Depends on purpose
• Observations in the same cluster are similar in some sense
• Unsupervised classification
2 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]
3 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]
K-means clustering
Select K points as initial centroids
Repeat
– Form K clusters by assigning eachpoint to its closest centroid using Euclidean distance
– Recompute the centroids of each cluster ( mean of all objects in each cluster)
Until centroids do not change
4 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]
5 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]
6 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]
7 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]
In [10]: from [Link] import make_blobs
import numpy as np
import matplotlib as mpl
import [Link] as plt
from [Link] import KMeans
blob_centers = [Link](
[[ 0.2, 2.3],
[-1.5 , 2.3],
[-2.8, 1.8],
[-2.8, 2.8],
[-2.8, 1.3]])
blob_std = [Link]([0.4, 0.3, 0.1, 0.1, 0.1])
X, y = make_blobs(n_samples=2000, centers=blob_centers,
cluster_std=blob_std, random_state=7)
def plot_clusters(X, y=None):
[Link](X[:, 0], X[:, 1], c=y, s=1)
[Link]("$x_1$", fontsize=14)
[Link]("$x_2$", fontsize=14, rotation=0)
[Link](figsize=(8, 4))
plot_clusters(X)
plt show()
<Figure size 576x288 with 0 Axes>
8 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]
In [31]: k = 5
kmeans = KMeans(n_clusters=k, random_state=42)
y_pred = kmeans.fit_predict(X)
print(kmeans.cluster_centers_)
X_new = [Link]([[-2.8, 1.7], [3, 2], [-3, 3], [-3, 2.5]])
kmeans predict(X_new)
[[ 0.06154126 2.58026834]
[-2.80389616 1.80117999]
[-2.79290307 2.79641063]
[-1.47083264 2.28276928]
[ 0.32780688 1.98072917]
[-2.80037642 1.30082566]]
Out[31]: array([1, 4, 2, 2])
In [18]: def plot_data(X):
[Link](X[:, 0], X[:, 1], 'k.', markersize=2)
def plot_centroids(centroids, weights=None, circle_color='w', cross_color='k'):
if weights is not None:
centroids = centroids[weights > [Link]() / 10]
[Link](centroids[:, 0], centroids[:, 1],
marker='o', s=30, linewidths=8,
color=circle_color, zorder=10, alpha=0.9)
[Link](centroids[:, 0], centroids[:, 1],
marker='x', s=50, linewidths=50,
color=cross_color, zorder=11, alpha=1)
def plot_decision_boundaries(clusterer, X, resolution=1000, show_centroids=True
show_xlabels=True, show_ylabels=True):
mins = [Link](axis=0) - 0.1
maxs = [Link](axis=0) + 0.1
xx, yy = [Link]([Link](mins[0], maxs[0], resolution),
[Link](mins[1], maxs[1], resolution))
Z = [Link](np.c_[[Link](), [Link]()])
Z = [Link]([Link])
[Link](Z, extent=(mins[0], maxs[0], mins[1], maxs[1]),
cmap="Pastel2")
[Link](Z, extent=(mins[0], maxs[0], mins[1], maxs[1]),
linewidths=1, colors='k')
plot_data(X)
if show_centroids:
plot_centroids(clusterer.cluster_centers_)
if show_xlabels:
[Link]("$x_1$", fontsize=14)
else:
plt.tick_params(labelbottom=False)
if show_ylabels:
[Link]("$x_2$", fontsize=14, rotation=0)
else:
plt tick_params(labelleft False)
9 of 10 3/27/2023, 12:33 PM
Unsupervised - Jupyter Notebook [Link]
In [32]: [Link](figsize=(8, 4))
plot_decision_boundaries(kmeans, X)
[Link]()
kmeans inertia_
Out[32]: 169.23715382893596
10 of 10 3/27/2023, 12:33 PM