0% found this document useful (0 votes)
8 views2 pages

Unsupervised Learning Model Cheat Sheet

Uploaded by

mido473mi3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views2 pages

Unsupervised Learning Model Cheat Sheet

Uploaded by

mido473mi3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

7/5/25, 2:40 PM about:blank

Cheat Sheet: Building Unsupervised Learning Models


Unsupervised learning models

Model Name Brief Description Code Syntax

UMAP (Uniform Manifold Approximation and Projection) is used for


dimensionality reduction. from umap.umap_ import UMAP
Pros: High performance, preserves global structure. umap = UMAP(n_neighbors=15, min_dist=0.1, n_components=2)
Cons: Sensitive to parameters.
Applications: Data visualization, feature extraction.
Key hyperparameters:
UMAP
n_neighbors: Controls the local neighborhood size (default =
15).
min_dist: Controls the minimum distance between points in the
embedded space (default = 0.1).
n_components: The dimensionality of the embedding (default =
2).

t-SNE (t-Distributed Stochastic Neighbor Embedding) is a nonlinear


dimensionality reduction technique.
from [Link] import TSNE
Pros: Good for visualizing high-dimensional data. tsne = TSNE(n_components=2, perplexity=30, learning_rate=200)
Cons: Computationally expensive, prone to overfitting.
Applications: Data visualization, anomaly detection.
Key hyperparameters:
t-SNE
n_components: The number of dimensions for the output
(default = 2).
perplexity: Balances attention between local and global aspects
of the data (default = 30).
learning_rate: Controls the step size during optimization
(default = 200).

PCA (principal component analysis) is used for linear dimensionality


reduction. from [Link] import PCA
Pros: Easy to interpret, reduces noise. pca = PCA(n_components=2)
Cons: Linear, may lose information in nonlinear data.
Applications: Feature extraction, compression.
Key hyperparameters:
PCA
n_components: Number of principal components to retain
(default = 2).
whiten: Whether to scale the components (default = False).
svd_solver: The algorithm to compute the components (default =
'auto').

DBSCAN (Density-Based Spatial Clustering of Applications with from [Link] import DBSCAN
Noise) is a density-based clustering algorithm. dbscan = DBSCAN(eps=0.5, min_samples=5)
Pros: Identifies outliers, does not require the number of clusters.
Cons: Difficult with varying density clusters.
Applications: Anomaly detection, spatial data clustering.
DBSCAN Key hyperparameters:

eps: The maximum distance between two points to be considered


neighbors (default = 0.5).
min_samples: Minimum number of samples in a neighborhood
to form a cluster (default = 5).

HDBSCAN (Hierarchical DBSCAN) improves on DBSCAN by import hdbscan


handling varying density clusters. clusterer = [Link](min_cluster_size=5)
Pros: Better handling of varying densities.
Cons: Can be slower than DBSCAN.
Applications: Large datasets, complex clustering problems.
HDBSCAN Key hyperparameters:

min_cluster_size: The minimum size of clusters (default = 5).


min_samples: Minimum number of samples to form a cluster
(default = 10).

K-Means is a centroid-based clustering algorithm that groups data into


k clusters. from [Link] import KMeans
Pros: Efficient, simple to implement. kmeans = KMeans(n_clusters=3)
Cons: Sensitive to initial cluster centroids.
Applications: Customer segmentation, pattern recognition.
K-Means Key hyperparameters:
clustering
n_clusters: Number of clusters (default = 8).
init: Method for initializing the centroids ('k-means++' or
'random', default = 'k-means++').
n_init: Number of times the algorithm will run with different
centroid seeds (default = 10).

Associated fuctions used

Method Brief Description Code Syntax

make_blobs Generates isotropic Gaussian blobs for from [Link] import make_blobs
clustering. X, y = make_blobs(n_samples=100, centers=2, random_state=42)

about:blank 1/2
7/5/25, 2:40 PM about:blank

Method Brief Description Code Syntax

from [Link] import multivariate_normal


samples = multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]], size=100)

Generates samples from a multivariate


multivariate_normal
normal distribution.

import [Link] as px
fig = px.scatter_3d(df, x='x', y='y', z='z')
[Link]()

Creates a 3D scatter plot using Plotly


[Link].scatter_3d
Express.

import geopandas as gpd


gdf = [Link](df, geometry='geometry')

Creates a GeoDataFrame from a Pandas


[Link]
DataFrame.

gdf = gdf.to_crs(epsg=3857)

Transforms the coordinate reference


geopandas.to_crs
system of a GeoDataFrame.

import contextily as ctx


ax = [Link](figsize=(10, 10))
ctx.add_basemap(ax)

Adds a basemap to a GeoDataFrame plot


contextily.add_basemap
for context.

from [Link] import PCA


pca = PCA(n_components=2)
[Link](X)
variance_ratio = pca.explained_variance_ratio_

Returns the proportion of variance


pca.explained_variance_ratio_
explained by each principal component.

Author
Jeff Grossman
Abhishek Gagneja

about:blank 2/2

You might also like