Atal Bihari Vajpayee
Indian Institute of Information Technology
and Management, Gwalior
LAB FILE OF
MACHINE LEARNING TECHNIQUES
SUBMITTED BY:- SUBMITTED TO:-
RADHIKA GUPTA Prof. ADITYA TRIVEDI
ROLL No: 2025MIT-009
[Link] (I.T) 1st sem DEPARTMENT OF I.T
INDEX
Exp. No. Title of experiment Date
1 Polynomial Curve Fitting using 21/08/2025
Gradient Descent
2 Binary Classification using 28/08/2025
Logistic Regression
3 Binary Logistic Regression: 04/09/2025
Breast Cancer Classifier
4 Multiclass Logistic Regression: 11/09/2025
Softmax vs One-vs-Rest
5 Logistic Regression using 09/10/2025
Newton’s Method
6 Personalized Book 16/10/2025
Recommendation using Naïve
Bayes
7 Kernelized SVM Classifier 30/10/2025
8 PCA with Variance Retention 06/11/2025
9 Unsupervised Learning – PCA 13/11/2025
& K-Means Clustering
1. Polynomial Curve Fitting
Question / Objective
To study how the polynomial degree (M), noise in data, and gradient descent method (Batch vs.
Stochastic) affect model fitting, underfitting, and overfitting.
Theory
• The target function is:
t = sin(2πx) + ϵ, x ∈ [0, 1], ϵ ∼ N (0, σ2)
• We fit a polynomial model:
y(x, w) = w0 + w1x + w2x2 + ⋯ + wMxM.
• Error is measured by:
E(w) = (1/2N) ∑(y(xn, w) − tn)2
• Gradient Descent Methods:
o Batch Gradient Descent (BGD): updates weights using the full dataset.
o Stochastic Gradient Descent (SGD): updates weights using one random sample
at a time.
• RMSE (Root Mean Square Error) is used for evaluating performance.
o Training RMSE decreases as degree M increases.
o Test RMSE first decreases, then increases (U-shape).
o Small M → underfitting.
o Large M (e.g., M=9) → overfitting.
Result
• Training error goes down as polynomial degree increases.
• Test error follows a U-shape: first reduces, then grows when overfitting occurs.
• With M=9, training error is almost 0, but test error is large.
• BGD and SGD both work, but BGD is more stable, while SGD is faster but noisier.
• Graphs of polynomial fits show:
o M=0: Underfit (flat line).
o M=1: Linear fit, still underfit.
o M=3: Good fit (low test error).
o M=9: Overfit (too wiggly).
• RMSE vs M plots confirm expected outcome.
Code:
import numpy as np
import [Link] as plt
import pandas as pd
[Link](0)
N_train = 10
x_train = [Link]([Link](N_train), 2)
x_train = [Link](x_train)
sigmas = {"A": 0.05, "B": 0.01}
datasets = {}
for key, sigma in [Link]():
noise = [Link](0, sigma, size=N_train)
t = [Link](2 * [Link] * x_train) + noise
datasets[key] = {"x": x_train, "t": t, "sigma": sigma}
x_test = [Link]([Link](0, 1, 100), 2)
t_test_true = [Link](2 * [Link] * x_test)
def design_matrix(x, M):
"""Polynomial design matrix up to degree M"""
return [Link](x, N=M+1, increasing=True)
def predict(w, X):
return X @ w
def rmse(y_true, y_pred):
return [Link]([Link]((y_true - y_pred)**2))
def compute_gradient(w, X, t):
"""Gradient of squared error loss"""
y=X@w
return (X.T @ (y - t)) / len(t)
def train_bgd(X, t, lr=1e-3, max_epochs=5000, tol=1e-8):
w = [Link]([Link][1])
losses = []
prev_loss = None
for epoch in range(max_epochs):
y=X@w
loss = 0.5 * [Link]((y - t)**2)
[Link](loss)
if prev_loss and abs(prev_loss - loss) < tol:
break
prev_loss = loss
grad = compute_gradient(w, X, t)
w -= lr * grad
return w, losses
def train_sgd(X, t, lr=1e-4, max_epochs=10000, tol=1e-8):
w = [Link]([Link][1])
losses = []
prev_loss = None
m = len(t)
for epoch in range(max_epochs):
i = [Link](m)
xi = X[i:i+1]
ti = t[i]
yi = xi @ w
grad = xi.T * (yi - ti)
w -= lr * [Link]()
if epoch % m == 0:
y=X@w
loss = 0.5 * [Link]((y - t)**2)
[Link](loss)
if prev_loss and abs(prev_loss - loss) < tol:
break
prev_loss = loss
return w, losses
M_values = range(0, 10)
results = []
for ds_key, ds in [Link]():
x, t = ds["x"], ds["t"]
for method in ["BGD", "SGD"]:
for M in M_values:
X_train = design_matrix(x, M)
X_test = design_matrix(x_test, M)
if method == "BGD":
w, _ = train_bgd(X_train, t)
else:
w, _ = train_sgd(X_train, t)
y_train = predict(w, X_train)
y_test = predict(w, X_test)
[Link]({
"dataset": ds_key,
"method": method,
"M": M,
"train_rmse": rmse(t, y_train),
"test_rmse": rmse(t_test_true, y_test),
"weights": w
})
df_results = [Link](results)
df_results.head()
plot_Ms = [0, 1, 3, 9]
for ds_key, ds in [Link]():
x, t = ds["x"], ds["t"]
for method in ["BGD", "SGD"]:
[Link](figsize=(10,8))
for i, M in enumerate(plot_Ms, 1):
[Link](2,2,i)
row = df_results[(df_results.dataset==ds_key) &
(df_results.method==method) &
(df_results.M==M)].iloc[0]
w = row["weights"]
X_test = design_matrix(x_test, M)
y_test = predict(w, X_test)
[Link](x_test, t_test_true, "k--", label="True sin(2πx)")
[Link](x_test, y_test, label=f"Model M={M}")
[Link](x, t, c="red", marker="o", label="Train data")
[Link](f"{ds_key}, {method}, M={M}")
[Link]()
plt.tight_layout()
[Link]()
for ds_key in [Link]():
for method in ["BGD", "SGD"]:
sub = df_results[(df_results.dataset==ds_key) &
(df_results.method==method)]
[Link](figsize=(6,4))
[Link](sub["M"], sub["train_rmse"], marker="o", label="Train RMSE")
[Link](sub["M"], sub["test_rmse"], marker="o", label="Test RMSE")
[Link]("Polynomial degree (M)")
[Link]("RMSE")
[Link](f"RMSE vs M ({ds_key}, {method})")
[Link]()
[Link]()
2. Logistic Regression for Spam Email Classification
Question / Objective
To build a logistic regression model from scratch for binary classification using the dataset
[Link], plot the cost function vs iterations, show the confusion matrix, and calculate
performance metrics: Accuracy, Precision, Recall, and F1 Score.
Theory
• Logistic regression is used for binary classification (two classes: e.g., spam / not spam).
• The model predicts probability using the sigmoid function:
σ(x) = 1 / (1 + e-x)), z = wTx+b
• The cost function used is Binary Cross-Entropy (Log Loss):
J(w) = -(1/N) Σ [y * log(p) + (1 - y) * log(1 - p)]
• Optimization is done with Gradient Descent:
o Update weights www and bias bbb until cost stabilizes.
• After training, predictions are compared with actual labels to form a confusion matrix:
o TP (True Positive): Spam correctly detected
o TN (True Negative): Non-spam correctly detected
o FP (False Positive): Non-spam wrongly marked as spam
o FN (False Negative): Spam missed
• From confusion matrix we calculate:
o Accuracy = (TP + TN) / Total
o Precision = TP / (TP + FP)
o Recall = TP / (TP + FN)
o F1 Score = 2 × (Precision × Recall) / (Precision + Recall)
Result
• The cost function decreases steadily with iterations, showing proper learning.
• Confusion matrix shows how well the model classifies spam vs non-spam.
• Performance metrics (Accuracy, Precision, Recall, F1 Score) are calculated — good
values indicate a working classifier.
• Logistic regression successfully separates spam from non-spam emails.
Code:
import numpy as np
import pandas as pd
import [Link] as plt
from sklearn.model_selection import train_test_split
from [Link] import files
uploaded = [Link]()
df = pd.read_csv("[Link]")
print("Dataset shape:", [Link])
print([Link]())
target_col = [Link][-1]
if df[target_col].dtype == 'object':
df[target_col] = df[target_col].map({'spam':1, 'ham':0, 'yes':1, 'no':0}).fillna(df[target_col])
X = [Link](columns=[target_col])
y = df[target_col].[Link](-1,1)
X = X.select_dtypes(include=[[Link]]).values
X = (X - [Link](axis=0)) / ([Link](axis=0) + 1e-9)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
def sigmoid(z):
return 1 / (1 + [Link](-z))
def compute_cost(y_true, y_pred, w):
m = len(y_true)
cost = -(1/m) * [Link](y_true*[Link](y_pred+1e-9) + (1-y_true)*[Link](1-y_pred+1e-9))
return cost
def train_logistic_regression(X, y, lr=0.01, epochs=1000):
m, n = [Link]
w = [Link]((n,1))
b=0
costs = []
for i in range(epochs):
z = [Link](X, w) + b
y_pred = sigmoid(z)
dw = (1/m) * [Link](X.T, (y_pred - y))
db = (1/m) * [Link](y_pred - y)
w -= lr * dw
b -= lr * db
if i % 10 == 0:
[Link](compute_cost(y, y_pred, w))
return w, b, costs
def predict(X, w, b, threshold=0.5):
z = [Link](X, w) + b
y_pred = sigmoid(z)
return (y_pred >= threshold).astype(int)
w, b, costs = train_logistic_regression(X_train, y_train, lr=0.1, epochs=2000)
[Link](range(0,2000,10), costs)
[Link]("Iterations")
[Link]("Cost")
[Link]("Cost Function vs Iterations")
[Link]()
def confusion_matrix(y_true, y_pred):
TP = [Link]((y_true==1) & (y_pred==1))
TN = [Link]((y_true==0) & (y_pred==0))
FP = [Link]((y_true==0) & (y_pred==1))
FN = [Link]((y_true==1) & (y_pred==0))
return TP, TN, FP, FN
def compute_metrics(y_true, y_pred):
TP, TN, FP, FN = confusion_matrix(y_true, y_pred)
accuracy = (TP + TN) / (TP + TN + FP + FN)
precision = TP / (TP + FP + 1e-9)
recall = TP / (TP + FN + 1e-9)
f1 = 2 * precision * recall / (precision + recall + 1e-9)
return accuracy, precision, recall, f1, (TP,TN,FP,FN)
y_pred_test = predict(X_test, w, b)
acc, prec, rec, f1, (TP,TN,FP,FN) = compute_metrics(y_test, y_pred_test)
print("Confusion Matrix")
print(f"TP={TP}, TN={TN}, FP={FP}, FN={FN}")
print(f"Accuracy: {acc:.4f}")
print(f"Precision: {prec:.4f}")
print(f"Recall: {rec:.4f}")
print(f"F1 Score: {f1:.4f}")
3. Binary Logistic Regression – Breast Cancer Classifier
Question / Objective
To build a Breast Cancer classifier using Logistic Regression from scratch on the Wisconsin
Breast Cancer Dataset, implementing different optimizers (Batch GD, Stochastic GD,
Newton’s Method/IRLS), adding L2 regularization, and comparing performance metrics.
Theory
• Dataset: Breast Cancer Wisconsin Diagnostic dataset has 569 samples and 30 features.
• Preprocessing:
o Check for missing values.
o Standardize features to mean = 0 and variance = 1.
o Optional: Use PCA (via SVD) to reduce to 2D for visualization.
• Model:
o Logistic Regression predicts probability using sigmoid:
y^= 1/1+e−(wTx+b)
o Loss function: Negative Log Likelihood (Binary Cross Entropy).
o Gradient and Hessian are derived analytically for optimization.
• Optimizers:
o Batch Gradient Descent (BGD): updates weights using all samples.
o Stochastic Gradient Descent (SGD): updates using random mini-batches.
o Newton’s Method / IRLS: uses Hessian for faster convergence.
o L2 Regularization prevents overfitting by penalizing large weights.
• Metrics: Calculated from scratch (no [Link]):
o Accuracy, Precision, Recall, F1 Score, ROC-AUC.
Result
• BGD converges slowly but steadily.
• SGD converges faster but with noise.
• Newton’s Method converges very quickly (few iterations).
• With L2 Regularization, weights shrink and generalization improves (less overfitting).
• Performance metrics show high accuracy (>90%), good precision and recall, proving that
logistic regression is effective for breast cancer classification.
Code:
import numpy as np
import [Link] as plt
from [Link] import load_breast_cancer
data = load_breast_cancer()
X = [Link]
y = [Link](-1,1)
print("Dataset shape:", [Link], [Link])
X = (X - [Link](axis=0)) / ([Link](axis=0) + 1e-9)
[Link](42)
indices = [Link]([Link][0])
train_size = int(0.8 * [Link][0])
train_idx, test_idx = indices[:train_size], indices[train_size:]
X_train, y_train = X[train_idx], y[train_idx]
X_test, y_test = X[test_idx], y[test_idx]
def sigmoid(z):
return 1 / (1 + [Link](-z))
def compute_loss(y, y_hat, w, l2=0.0):
m = len(y)
loss = -(1/m)*[Link](y*[Link](y_hat+1e-9) + (1-y)*[Link](1-y_hat+1e-9))
reg = (l2/(2*m)) * [Link](w**2)
return loss + reg
def gradients(X, y, y_hat, w, l2=0.0):
m = len(y)
dw = (1/m) * [Link](X.T, (y_hat - y)) + (l2/m)*w
db = (1/m) * [Link](y_hat - y)
return dw, db
def hessian(X, y_hat, l2=0.0):
m = [Link][0]
R = [Link]((y_hat*(1-y_hat)).flatten())
H = (1/m) * [Link](X.T, [Link](R, X)) + l2*[Link]([Link][1])
return H
def predict(X, w, b, threshold=0.5):
probs = sigmoid([Link](X, w) + b)
return (probs >= threshold).astype(int), probs
def train_bgd(X, y, lr=0.01, epochs=1000, l2=0.0):
m, n = [Link]
w = [Link]((n,1))
b=0
costs = []
for i in range(epochs):
y_hat = sigmoid([Link](X, w) + b)
cost = compute_loss(y, y_hat, w, l2)
dw, db = gradients(X, y, y_hat, w, l2)
w -= lr * dw
b -= lr * db
if i % 10 == 0:
[Link](cost)
return w, b, costs
def train_sgd(X, y, lr=0.01, epochs=100, batch_size=32, l2=0.0):
m, n = [Link]
w = [Link]((n,1))
b=0
costs = []
for i in range(epochs):
indices = [Link](m)
for start in range(0, m, batch_size):
idx = indices[start:start+batch_size]
X_batch, y_batch = X[idx], y[idx]
y_hat = sigmoid([Link](X_batch, w) + b)
dw, db = gradients(X_batch, y_batch, y_hat, w, l2)
w -= lr * dw
b -= lr * db
if i % 5 == 0:
y_hat_full = sigmoid([Link](X, w) + b)
[Link](compute_loss(y, y_hat_full, w, l2))
return w, b, costs
def train_newton(X, y, epochs=20, l2=0.0):
m, n = [Link]
w = [Link]((n,1))
b=0
costs = []
for i in range(epochs):
z = [Link](X, w) + b
y_hat = sigmoid(z)
cost = compute_loss(y, y_hat, w, l2)
grad_w, grad_b = gradients(X, y, y_hat, w, l2)
H = hessian(X, y_hat, l2)
w -= [Link](H) @ grad_w
b -= grad_b
[Link](cost)
return w, b, costs
w_bgd, b_bgd, costs_bgd = train_bgd(X_train, y_train, lr=0.1, epochs=500)
w_sgd, b_sgd, costs_sgd = train_sgd(X_train, y_train, lr=0.1, epochs=100, batch_size=32)
w_newton, b_newton, costs_newton = train_newton(X_train, y_train, epochs=15)
def compute_metrics(y_true, y_pred):
TP = [Link]((y_true==1) & (y_pred==1))
TN = [Link]((y_true==0) & (y_pred==0))
FP = [Link]((y_true==0) & (y_pred==1))
FN = [Link]((y_true==1) & (y_pred==0))
accuracy = (TP+TN)/(TP+TN+FP+FN)
precision = TP/(TP+FP+1e-9)
recall = TP/(TP+FN+1e-9)
f1 = 2*precision*recall/(precision+recall+1e-9)
return accuracy, precision, recall, f1, (TP,TN,FP,FN)
y_pred_test, probs_test = predict(X_test, w_newton, b_newton)
acc, prec, rec, f1, cm_vals = compute_metrics(y_test, y_pred_test)
print("Confusion Matrix (TP,TN,FP,FN):", cm_vals)
print(f"Accuracy: {acc:.4f}")
print(f"Precision: {prec:.4f}")
print(f"Recall: {rec:.4f}")
print(f"F1 Score: {f1:.4f}")
[Link](costs_bgd, label="BGD")
[Link](costs_sgd, label="SGD")
[Link](costs_newton, label="Newton")
[Link]("Iterations")
[Link]("Cost")
[Link]("Cost Function Convergence")
[Link]()
[Link]()
4. Multiclass Logistic Regression – Softmax vs One-vs-Rest
Question / Objective
To implement and compare two approaches for multiclass classification:
1. Softmax Logistic Regression (single model for all classes)
2. One-vs-Rest Logistic Regression (multiple binary models, one for each class).
Dataset used: Iris dataset (150 samples, 3 classes, 4 features).
Theory
• Iris Dataset: Each sample has 4 features (sepal length, sepal width, petal length, petal
width) and belongs to one of 3 species (Setosa, Versicolor, Virginica).
• Softmax Logistic Regression:
o Extends binary logistic regression to multiple classes.
o Computes scores for each class:
zj=wjTx
Then applies softmax:
P(y=j∣x) = ezj/∑kezk
o Loss: Cross-Entropy Loss.
• One-vs-Rest (OvR):
o Trains K binary logistic regressors (one for each class vs the rest).
o Predicts class with the highest probability among the K models.
• Optimization:
o Gradient Descent (slow but easy).
o Newton/IRLS (fast using Hessian).
o Regularization λ prevents overfitting.
• Evaluation:
o Confusion Matrix.
o Per-class Precision, Recall, F1 Score (must be implemented manually).
o PCA can reduce features to 2D for visualization of decision boundaries.
Result
• Softmax regression directly optimizes multiclass probabilities → smooth decision
boundaries.
• One-vs-Rest works but may give overlapping/conflicting probabilities.
• With PCA to 2D, decision regions clearly show class separation.
• Performance metrics (Accuracy, Precision, Recall, F1) are high (>90% on Iris dataset).
• Both methods perform well, but Softmax is more elegant and stable.
Code:
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from sklearn import datasets
from copy import deepcopy
[Link](style="whitegrid", context="notebook")
def one_hot(y, n_classes=None):
if n_classes is None:
n_classes = [Link](y) + 1
Y = [Link]((len(y), n_classes))
Y[[Link](len(y)), y] = 1
return Y
def standardize(X):
mu = [Link](axis=0)
sigma = [Link](axis=0, ddof=0)
sigma[sigma == 0] = 1.0
Xs = (X - mu) / sigma
return Xs, mu, sigma
def pca_svd(X, n_components=2):
"""
PCA via SVD. Returns projected data, components (eigenvectors), and explained variance.
X expected to be zero-centered (or we'll center inside).
"""
Xc = X - [Link](axis=0)
U, S, Vt = [Link](Xc, full_matrices=False)
components = Vt[:n_components]
projected = [Link](components.T)
explained_variance = (S ** 2) / (len(X) - 1)
return projected, components, explained_variance[:n_components]
def softmax(logits):
z = logits - [Link](logits, axis=1, keepdims=True)
expz = [Link](z)
probs = expz / [Link](expz, axis=1, keepdims=True)
return probs
def cross_entropy_loss(probs, Y_onehot, reg=0.0, W=None):
n = [Link][0]
eps = 1e-12
loss = -[Link](Y_onehot * [Link](probs + eps)) / n
if reg and W is not None:
loss += 0.5 * reg * [Link](W * W)
return loss
class SoftmaxGD:
def __init__(self, n_features, n_classes, lr=0.5, reg=1e-3, max_iter=1000, verbose=False):
self.n_features = n_features
self.n_classes = n_classes
[Link] = lr
[Link] = reg
self.max_iter = max_iter
[Link] = verbose
self.W = [Link]((n_features, n_classes))
self.b = [Link](n_classes)
def fit(self, X, y, X_val=None, y_val=None):
n, d = [Link]
Y = one_hot(y, self.n_classes)
for it in range(self.max_iter):
logits = [Link](self.W) + self.b
probs = softmax(logits)
error = probs - Y
grad_W = ([Link](error)) / n + [Link] * self.W
grad_b = [Link](error, axis=0) / n
self.W -= [Link] * grad_W
self.b -= [Link] * grad_b
if [Link] and (it % (self.max_iter // 10 + 1) == 0):
loss = cross_entropy_loss(probs, Y, reg=[Link], W=self.W)
print(f"Iter {it}/{self.max_iter} loss: {loss:.4f}")
def predict_proba(self, X):
logits = [Link](self.W) + self.b
return softmax(logits)
def predict(self, X):
probs = self.predict_proba(X)
return [Link](probs, axis=1)
class SoftmaxNewton:
def __init__(self, n_features, n_classes, reg=1e-3, max_iter=50, damping=1e-3,
verbose=False):
self.n_features = n_features
self.n_classes = n_classes
[Link] = reg
self.max_iter = max_iter
[Link] = damping
[Link] = verbose
self.W = [Link]((n_features, n_classes))
self.b = [Link](n_classes)
def fit(self, X, y):
n, d = [Link]
K = self.n_classes
Y = one_hot(y, K)
for it in range(self.max_iter):
logits = [Link](self.W) + self.b
P = softmax(logits)
G = [Link](P - Y) / n + [Link] * self.W
dK = d * K
if dK <= 4000:
H = [Link]((dK, dK))
Xt = X
for i in range(n):
p = P[i][:, None]
Ri = [Link]([Link]()) - [Link](p.T)
Xi = Xt[i][:, None]
kron = [Link](Ri, [Link](Xi.T))
H += kron
H=H/n
reg_block = [Link] * [Link](dK)
H += reg_block
H += [Link] * [Link](dK)
g_vec = [Link](-1, order='F')
try:
delta = [Link](H, g_vec)
except [Link]:
delta = [Link](H, g_vec, rcond=None)[0]
deltaW = [Link]((d, K), order='F')
self.W -= deltaW
grad_b = [Link](P - Y, axis=0) / n
self.b -= grad_b
else:
if [Link]:
print("Large d*K, falling back to gradient step for stability.")
self.W -= 0.5 * G
self.b -= 0.5 * [Link](P - Y, axis=0) / n
if [Link] and (it % max(1, self.max_iter // 10) == 0):
loss = cross_entropy_loss(P, Y, reg=[Link], W=self.W)
print(f"Newton iter {it}/{self.max_iter} loss: {loss:.6f}")
def predict_proba(self, X):
logits = [Link](self.W) + self.b
return softmax(logits)
def predict(self, X):
return [Link](self.predict_proba(X), axis=1)
def sigmoid(z):
return 1.0 / (1.0 + [Link](-z))
class BinaryLogisticGD:
def __init__(self, n_features, lr=0.5, reg=1e-3, max_iter=500, verbose=False):
self.w = [Link](n_features)
self.b = 0.0
[Link] = lr
[Link] = reg
self.max_iter = max_iter
[Link] = verbose
def fit(self, X, y_binary):
n = [Link][0]
for it in range(self.max_iter):
z = [Link](self.w) + self.b
p = sigmoid(z)
error = p - y_binary
grad_w = ([Link](error)) / n + [Link] * self.w
grad_b = [Link](error) / n
self.w -= [Link] * grad_w
self.b -= [Link] * grad_b
if [Link] and (it % (self.max_iter // 10 + 1) == 0):
loss = -[Link](y_binary * [Link](p + 1e-12) + (1 - y_binary) * [Link](1 - p + 1e-12))
print(f"Binary iter {it}/{self.max_iter} loss: {loss:.4f}")
def predict_proba(self, X):
return sigmoid([Link](self.w) + self.b)
def predict(self, X):
return (self.predict_proba(X) >= 0.5).astype(int)
class OneVsRest:
def __init__(self, n_features, n_classes, lr=0.5, reg=1e-3, max_iter=500, verbose=False):
self.n_classes = n_classes
[Link] = [BinaryLogisticGD(n_features, lr=lr, reg=reg, max_iter=max_iter,
verbose=verbose) for _ in range(n_classes)]
def fit(self, X, y):
for k in range(self.n_classes):
yk = (y == k).astype(int)
[Link][k].fit(X, yk)
def predict_proba(self, X):
probs = [Link]([m.predict_proba(X) for m in [Link]]).T
probs = [Link](probs, 1e-12, 1 - 1e-12)
probs_norm = probs / [Link](probs, axis=1, keepdims=True)
return probs_norm
def predict(self, X):
return [Link](self.predict_proba(X), axis=1)
def confusion_matrix(y_true, y_pred, n_classes=None):
if n_classes is None:
n_classes = [Link](y_true) + 1
cm = [Link]((n_classes, n_classes), dtype=int)
for t, p in zip(y_true, y_pred):
cm[t, p] += 1
return cm
def precision_recall_f1(cm):
K = [Link][0]
prec = [Link](K)
rec = [Link](K)
f1 = [Link](K)
for k in range(K):
tp = cm[k, k]
fp = cm[:, k].sum() - tp
fn = cm[k, :].sum() - tp
prec[k] = tp / (tp + fp) if (tp + fp) > 0 else 0.0
rec[k] = tp / (tp + fn) if (tp + fn) > 0 else 0.0
if prec[k] + rec[k] > 0:
f1[k] = 2 * prec[k] * rec[k] / (prec[k] + rec[k])
else:
f1[k] = 0.0
return prec, rec, f1
def plot_decision_regions_2d(model, X_2d, y, title="Decision regions", resolution=200,
cmap='viridis'):
x_min, x_max = X_2d[:, 0].min() - 1, X_2d[:, 0].max() + 1
y_min, y_max = X_2d[:, 1].min() - 1, X_2d[:, 1].max() + 1
xx, yy = [Link]([Link](x_min, x_max, resolution),
[Link](y_min, y_max, resolution))
grid = np.c_[[Link](), [Link]()]
probs = model.predict_proba(grid)
Z = [Link](probs, axis=1).reshape([Link])
[Link](figsize=(8, 6))
[Link](xx, yy, Z, alpha=0.3)
scatter = [Link](X_2d[:, 0], X_2d[:, 1], c=y, edgecolor='k', s=50)
[Link](title)
[Link]("PC1")
[Link]("PC2")
[Link](*scatter.legend_elements(), title="Classes")
[Link]()
def plot_probability_heatmaps(model, X_2d, title_prefix="Probabilities", resolution=200):
x_min, x_max = X_2d[:, 0].min() - 1, X_2d[:, 0].max() + 1
y_min, y_max = X_2d[:, 1].min() - 1, X_2d[:, 1].max() + 1
xx, yy = [Link]([Link](x_min, x_max, resolution),
[Link](y_min, y_max, resolution))
grid = np.c_[[Link](), [Link]()]
probs = model.predict_proba(grid)
K = [Link][1]
for k in range(K):
Z = probs[:, k].reshape([Link])
[Link](figsize=(6, 5))
[Link](xx, yy, Z, alpha=0.9)
[Link](label=f'P(class={k})')
[Link](f"{title_prefix} - class {k}")
[Link]("PC1")
[Link]("PC2")
[Link]()
def plot_weight_bars(W, title="Weights per class"):
d, K = [Link]
[Link](figsize=(10, 5))
df = [Link](W, index=[f"f{i}" for i in range(d)], columns=[f"class{k}" for k in
range(K)])
[Link](kind='bar')
[Link](title)
[Link]("Feature index")
[Link]("Weight value")
plt.tight_layout()
[Link]()
def plot_confusion_matrix(cm, labels=None, title="Confusion matrix"):
[Link](figsize=(5, 4))
[Link](cm, annot=True, fmt='d', cmap='Blues', xticklabels=labels, yticklabels=labels)
[Link]("True")
[Link]("Predicted")
[Link](title)
[Link]()
def calibration_plot(model, X, y, n_bins=10):
probs = model.predict_proba(X)
pred_conf = [Link](probs, axis=1)
pred_label = [Link](probs, axis=1)
correct = (pred_label == y).astype(int)
bins = [Link](0, 1, n_bins + 1)
bin_idx = [Link](pred_conf, bins) - 1
bin_acc = []
bin_conf = []
for b in range(n_bins):
mask = bin_idx == b
if [Link]() == 0:
bin_acc.append([Link])
bin_conf.append((bins[b] + bins[b+1]) / 2)
else:
bin_acc.append(correct[mask].mean())
bin_conf.append(pred_conf[mask].mean())
[Link](figsize=(6, 6))
[Link](bin_conf, bin_acc, marker='o', label='Calibration')
[Link]([0, 1], [0, 1], '--', color='gray', label='Perfect')
[Link]("Mean predicted probability")
[Link]("Empirical accuracy")
[Link]("Calibration plot")
[Link]()
[Link]()
def main():
iris = datasets.load_iris()
X = iris['data']
y = iris['target']
class_names = iris['target_names']
Xs, mu, sigma = standardize(X)
X_pca2, components, var_explained = pca_svd(Xs, n_components=2)
print("Explained variance (PC1, PC2):", var_explained)
n_features = [Link][1]
n_classes = int([Link]()) + 1
soft_gd = SoftmaxGD(n_features=n_features, n_classes=n_classes, lr=0.5, reg=1e-3,
max_iter=2000, verbose=False)
soft_gd.fit(Xs, y)
y_pred_soft_gd = soft_gd.predict(Xs)
cm_soft = confusion_matrix(y, y_pred_soft_gd, n_classes=n_classes)
prec, rec, f1 = precision_recall_f1(cm_soft)
print("\nSoftmax (GD) metrics:")
print("Confusion matrix:\n", cm_soft)
for k in range(n_classes):
print(f"Class {k} ({class_names[k]}): precision={prec[k]:.3f}, recall={rec[k]:.3f},
f1={f1[k]:.3f}")
ovr = OneVsRest(n_features=n_features, n_classes=n_classes, lr=0.5, reg=1e-3,
max_iter=1500, verbose=False)
[Link](Xs, y)
y_pred_ovr = [Link](Xs)
cm_ovr = confusion_matrix(y, y_pred_ovr, n_classes=n_classes)
prec_o, rec_o, f1_o = precision_recall_f1(cm_ovr)
print("\nOne-vs-Rest metrics:")
print("Confusion matrix:\n", cm_ovr)
for k in range(n_classes):
print(f"Class {k} ({class_names[k]}): precision={prec_o[k]:.3f}, recall={rec_o[k]:.3f},
f1={f1_o[k]:.3f}")
X_reduced = X_pca2
newton = SoftmaxNewton(n_features=X_reduced.shape[1], n_classes=n_classes, reg=1e-3,
max_iter=30, damping=1e-3, verbose=True)
[Link](X_reduced, y)
y_pred_newton = [Link](X_reduced)
cm_newton = confusion_matrix(y, y_pred_newton, n_classes=n_classes)
prec_n, rec_n, f1_n = precision_recall_f1(cm_newton)
print("\nSoftmax (Newton on PCA2) metrics:")
print("Confusion matrix:\n", cm_newton)
for k in range(n_classes):
print(f"Class {k} ({class_names[k]}): precision={prec_n[k]:.3f}, recall={rec_n[k]:.3f},
f1={f1_n[k]:.3f}")
class ProjectedModel:
def __init__(self, base_model, pca_components, mean_center, sigma):
self.base_model = base_model
[Link] = pca_components
[Link] = mean_center
[Link] = sigma
def predict_proba(self, X2d):
X_approx = [Link]([Link]) + 0.0
return self.base_model.predict_proba(X_approx)
proj_soft = ProjectedModel(soft_gd, components, mu, sigma)
proj_ovr = ProjectedModel(ovr, components, mu, sigma)
plot_decision_regions_2d(proj_soft, X_pca2, y, title="Decision regions (Softmax GD) on
PCA(2)")
plot_probability_heatmaps(proj_soft, X_pca2, title_prefix="Softmax GD class probability")
plot_weight_bars(soft_gd.W, title="Softmax GD weights (original features)")
plot_decision_regions_2d(proj_ovr, X_pca2, y, title="Decision regions (One-vs-Rest) on
PCA(2)")
plot_probability_heatmaps(proj_ovr, X_pca2, title_prefix="OvR class probability")
W_ovr = np.column_stack([m.w for m in [Link]])
plot_weight_bars(W_ovr, title="One-vs-Rest weights (original features)")
plot_decision_regions_2d(newton, X_reduced, y, title="Decision regions (Multinomial
Newton on PCA2)")
plot_probability_heatmaps(newton, X_reduced, title_prefix="Newton class probability")
plot_weight_bars(newton.W, title="Newton weights (PCA features)")
plot_confusion_matrix(cm_soft, labels=class_names, title="Confusion Matrix - Softmax GD")
plot_confusion_matrix(cm_ovr, labels=class_names, title="Confusion Matrix - One-vs-Rest")
plot_confusion_matrix(cm_newton, labels=class_names, title="Confusion Matrix - Newton
(PCA)")
calibration_plot(proj_soft, X_pca2, y)
calibration_plot(proj_ovr, X_pca2, y)
calibration_plot(newton, X_pca2, y)
print("\nDone.")
if __name__ == "__main__":
main()
5. Binary Logistic Regression using Newton’s Method
Question / Objective
To implement binary logistic regression from scratch using Newton’s Method (IRLS) for
parameter optimization.
We use the Iris dataset and select:
• Two classes: Versicolor (1) vs Virginica (2)
• Two features: Petal length, Petal width
We then:
1. Train logistic regression using Newton’s method
2. Plot loss vs iterations
3. Visualize decision boundary
4. Compare parameters and accuracy with sklearn.linear_model.LogisticRegression
(Newton-CG solver)
Theory
Iris Dataset
• 150 samples, 3 classes
• We select only two classes → reduces it to binary classification
• We select two features → easy 2D visualization
Logistic Regression (Binary)
• Predicts probability of class 1:
p=σ(wTx)p = \sigma(w^Tx)p=σ(wTx)
where
σ(z)=11+e−z\sigma(z)=\frac{1}{1+e^{-z}}σ(z)=1+e−z1
• Loss: Negative Log Likelihood
L=−∑[yln(p)+(1−y)ln(1−p)]L = -\sum [y\ln(p) + (1-y)\ln(1-p)]L=−∑[yln(p)+(1−y)ln(1−p)]
Newton’s Method (IRLS)
Newton method uses second-order derivatives:
wnew=w−H−1gw_{new} = w - H^{-1} gwnew=w−H−1g
Where:
• g = gradient
• H = Hessian matrix
Properties:
• Converges in very few iterations
• More stable than gradient descent
Evaluation
• Accuracy
• Coefficients comparison
• Decision boundary
Result
• Newton’s method converges very fast (2–5 iterations)
• Smooth decreasing NLL loss curve
• Decision boundary clearly separates the two Iris classes
Code
import numpy as np
import [Link] as plt
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
iris = datasets.load_iris()
X = [Link][:, 2:4]
y = [Link]
mask = y != 0
X = X[mask]
y = y[mask] - 1
Xb = np.c_[[Link](([Link][0], 1)), X]
def sigmoid(z):
return 1 / (1 + [Link](-z))
def nll(w, X, y):
z=X@w
return -[Link](y*[Link](sigmoid(z)) + (1-y)*[Link](1-sigmoid(z)))
def newton_logreg(X, y, max_iter=10):
w = [Link]([Link][1])
losses = []
for _ in range(max_iter):
z=X@w
p = sigmoid(z)
g = X.T @ (p - y)
W = [Link](p*(1-p))
H = X.T @ W @ X
w -= [Link](H) @ g
[Link](nll(w, X, y))
return w, losses
w, losses = newton_logreg(Xb, y)
[Link](losses)
[Link]("Iteration")
[Link]("Negative Log-Likelihood")
[Link]()
[Link](X[:, 0], X[:, 1], c=y, cmap='coolwarm')
x1 = [Link](min(X[:,0]), max(X[:,0]), 100)
x2 = -(w[0] + w[1]*x1) / w[2]
[Link](x1, x2)
[Link]("Petal Length")
[Link]("Petal Width")
[Link]()
clf = LogisticRegression(C=1e6, solver='newton-cg').fit(X, y)
print("Our Coeff:", w)
print("Sklearn Coeff:", np.r_[clf.intercept_, clf.coef_[0]])
print("Our Accuracy:", [Link]((sigmoid(Xb@w) >= 0.5) == y))
print("Sklearn Accuracy:", [Link](X, y))
6. Personalized Book Recommender using Multinomial Naïve Bayes
Objective
The goal of this experiment is to build a personalized book recommender system using a
Multinomial Naïve Bayes classifier. The model predicts whether a user will like a book based on
two factors: the user’s past rating behavior and the genre tags associated with the book. The
Goodreads Books Dataset is used.
Theory
The dataset contains four tables: user ratings, book metadata, book–tag mappings, and tag
names. To build a predictive model, all four must be cleaned and merged. Each book may have
multiple genre tags; therefore, tag aggregation is required to create a single genre string per
book.
Multinomial Naïve Bayes is suitable for text-based features such as genres. It models count-
based data and treats each feature independently. Laplace smoothing (α = 1.0) is used to avoid
zero probability when a word or feature does not appear in the training data of a class.
The target label liked is defined from ratings: any rating > 3 is considered like (1), otherwise
dislike (0). The final feature matrix combines user identity (one-hot encoded) and book genre
text (converted into bag-of-words representation). The final classifier predicts whether a user
will like a book and outputs the probability.
Data Preparation and Preprocessing
Book tags and tag names are merged to map each book to its list of tag names. The tags are
aggregated into a single string per book. User ratings are merged with book metadata and the
aggregated genre table to form a unified dataset with user_id, book_id, rating, and genres.
Missing entries are removed. The binary target liked is created from rating values.
Feature Engineering
The user_id column is converted into numerical features through one-hot encoding. The genres
text is converted into numerical features using CountVectorizer, resulting in a sparse bag-of-
words representation. Both components are horizontally concatenated to form the final feature
matrix X. The target vector y is the liked column.
Model Training
The dataset is split into 80% training and 20% testing using stratified sampling to maintain the
distribution of liked values. The Multinomial Naïve Bayes classifier is trained with α = 1.0
(Laplace smoothing). This prevents zero-probability situations when a feature appears in a class
during prediction but was unseen during training.
Model Evaluation
The model is evaluated using accuracy, precision, recall, and confusion matrix. These metrics
give insight into how well the recommender predicts user preference. Precision and recall help
analyze performance on the positive (liked) class.
Book Recommendation Function
A recommendation function predict_recommendation is created. It accepts a user ID and a new
book’s genre string and transforms these inputs using the fitted user encoder and genre
vectorizer. It returns a predicted label and probability that the user will like the given book. This
function can be used to build an interactive recommender.
Code
import pandas as pd
import numpy as np
from [Link] import OneHotEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from [Link] import accuracy_score, precision_score, recall_score, confusion_matrix
ratings = pd.read_csv("[Link]")
books = pd.read_csv("[Link]")
book_tags = pd.read_csv("book_tags.csv")
tags = pd.read_csv("[Link]")
tag_map = book_tags.merge(tags, on="tag_id")
tag_map = tag_map.groupby("goodreads_book_id")["tag_name"].apply(lambda x: "
".join(x)).reset_index()
merged = [Link](books[["book_id","goodreads_book_id"]], on="book_id")
merged = [Link](tag_map, on="goodreads_book_id")
merged["liked"] = (merged["rating"] > 3).astype(int)
merged = [Link]()
user_encoder = OneHotEncoder(handle_unknown="ignore")
user_features = user_encoder.fit_transform(merged[["user_id"]])
vectorizer = CountVectorizer()
genre_features = vectorizer.fit_transform(merged["tag_name"])
from [Link] import hstack
X = hstack([user_features, genre_features])
y = merged["liked"].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y,
random_state=42)
model = MultinomialNB(alpha=1.0)
[Link](X_train, y_train)
y_pred = [Link](X_test)
acc = accuracy_score(y_test, y_pred)
prec = precision_score(y_test, y_pred)
rec = recall_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
print("Accuracy:", acc)
print("Precision:", prec)
print("Recall:", rec)
print("Confusion Matrix:\n", cm)
def predict_recommendation(user_id, book_genres, model, user_encoder, genre_vectorizer):
u = user_encoder.transform([[user_id]])
g = genre_vectorizer.transform([book_genres])
x = hstack([u, g])
pred = [Link](x)[0]
prob = model.predict_proba(x)[0][1]
return pred, prob
7. Kernelized SVM Classifier
Goal
Implement an SVM classifier using the dual formulation and kernel trick on the Breast Cancer
Wisconsin (Diagnostic) dataset. Map labels to binary: Malignant (0) → -1, Benign (1) → +1.
Theory (brief)
Support Vector Machine (dual) optimizes alphas in:
maxα∑iαi−1/2∑i∑jαiαjyiyjK(xi,xj)
subject to 0≤αi≤C and ∑iαiyi=0. The kernel function K lets us work in implicit feature spaces.
Linear kernel corresponds to K(x,x′)=xTx′. RBF kernel uses K(x,x′)=exp(−γ∥x−x′∥2)
Dataset & preprocessing
Load [Link].load_breast_cancer. Standardize features manually (subtract mean, divide
by std) using NumPy. Implement shuffle and split manually to create 80% train / 20% test.
Kernel functions
Linear kernel and RBF kernel (with tunable gamma).
Model implementation
A simplified SMO-style optimizer updates pairs of alphas and enforces bounds [0, C] and the
equality constraint indirectly by how updates are computed. After training, weights can be
computed for the linear kernel as w=∑iαiyixiw=\sum_i \alpha_i y_i x_iw=∑iαiyixi. Bias b is
computed from support vectors.
Prediction
Decision function f(x)=sign(∑iαiyiK(xi,x)+b)f(x)=\text{sign}\left(\sum_i \alpha_i y_i K(x_i, x)
+ b\right)f(x)=sign(∑iαiyiK(xi,x)+b). For probabilities we do not compute calibrated
probabilities; we report raw decision values and threshold at zero.
Visualization
Select two features (mean radius, mean texture — feature indices 0 and 1). Train separate models
on only those two features and plot decision boundaries for Linear vs RBF kernels.
Evaluation (from scratch)
Compute accuracy, precision, recall, and F1-score manually. Also count support vectors (alphas
> tol). Compare results across kernels and across different regularization strengths C (example C
= [0.1, 1, 10]).
Expected output
A printed table of performance metrics for each kernel and each C, the number of support
vectors, and decision boundary plots for the two-feature case for linear and RBF kernels.
Below is the code that implements everything. Copy-paste and run in a Python environment with
the allowed packages installed.
import numpy as np
import [Link] as plt
import seaborn as sns
from [Link] import load_breast_cancer
def standardize_numpy(X):
mu = [Link](axis=0)
sigma = [Link](axis=0, ddof=0)
sigma[sigma==0] = 1.0
Xs = (X - mu) / sigma
return Xs, mu, sigma
def shuffle_split(X, y, test_size=0.2, random_state=None):
if random_state is not None:
[Link](random_state)
idx = [Link]([Link][0])
[Link](idx)
Xs = X[idx]
ys = y[idx]
split = int([Link][0] * (1 - test_size))
X_train = Xs[:split]
X_test = Xs[split:]
y_train = ys[:split]
y_test = ys[split:]
return X_train, X_test, y_train, y_test
def linear_kernel(X, Y):
return X @ Y.T
def rbf_kernel(X, Y, gamma):
XX = [Link](X * X, axis=1).reshape(-1,1)
YY = [Link](Y * Y, axis=1).reshape(1,-1)
distances = XX + YY - 2 * X @ Y.T
return [Link](-gamma * distances)
def svm_train_dual_smo(X, y, C=1.0, kernel_func=None, kernel_params=None, max_iter=1000,
tol=1e-5):
n = [Link][0]
alpha = [Link](n)
b = 0.0
if kernel_func is None:
K = linear_kernel(X, X)
else:
if kernel_params is None:
K = kernel_func(X, X)
else:
K = kernel_func(X, X, **kernel_params)
it = 0
passes = 0
max_passes = 5
while passes < max_passes and it < max_iter:
num_changed = 0
for i in range(n):
Ai = alpha[i]
Ki = K[i]
Ei = ([Link](alpha * y * Ki) + b) - y[i]
if (y[i]*Ei < -tol and Ai < C) or (y[i]*Ei > tol and Ai > 0):
j = [Link]([x for x in range(n) if x != i])
Aj = alpha[j]
Kj = K[j]
Ej = ([Link](alpha * y * Kj) + b) - y[j]
if y[i] == y[j]:
L = max(0, Ai + Aj - C)
H = min(C, Ai + Aj)
else:
L = max(0, Aj - Ai)
H = min(C, C + Aj - Ai)
if abs(L - H) < 1e-12:
continue
eta = K[i,i] + K[j,j] - 2*K[i,j]
if eta <= 0:
continue
new_Aj = Aj + y[j]*(Ei - Ej)/eta
if new_Aj > H:
new_Aj = H
elif new_Aj < L:
new_Aj = L
if abs(new_Aj - Aj) < 1e-12:
continue
new_Ai = Ai + y[i]*y[j]*(Aj - new_Aj)
alpha[i] = new_Ai
alpha[j] = new_Aj
b1 = b - Ei - y[i]*(alpha[i]-Ai)*K[i,i] - y[j]*(alpha[j]-Aj)*K[i,j]
b2 = b - Ej - y[i]*(alpha[i]-Ai)*K[i,j] - y[j]*(alpha[j]-Aj)*K[j,j]
if 0 < alpha[i] < C:
b = b1
elif 0 < alpha[j] < C:
b = b2
else:
b = 0.5*(b1 + b2)
num_changed += 1
it += 1
if num_changed == 0:
passes += 1
else:
passes = 0
sv_idx = [Link](alpha > tol)[0]
return alpha, b, sv_idx
def svm_predict(alpha, b, X_train, y_train, X_test, kernel_func=None, kernel_params=None):
if kernel_func is None:
Ktest = linear_kernel(X_test, X_train)
else:
if kernel_params is None:
Ktest = kernel_func(X_test, X_train)
else:
Ktest = kernel_func(X_test, X_train, **kernel_params)
decision = (Ktest @ (alpha * y_train)) + b
preds = [Link](decision)
preds[preds==0] = 1
return preds, decision
def compute_weights_linear(alpha, y, X):
return (alpha * y) @ X
def metrics_from_scratch(y_true, y_pred):
tp = [Link]((y_true==1) & (y_pred==1))
tn = [Link]((y_true==-1) & (y_pred==-1))
fp = [Link]((y_true==-1) & (y_pred==1))
fn = [Link]((y_true==1) & (y_pred==-1))
accuracy = (tp + tn) / (tp + tn + fp + fn)
precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0.0
return accuracy, precision, recall, f1, [Link]([[tn, fp],[fn, tp]])
data = load_breast_cancer()
X_raw = [Link]
y_raw = [Link]
y_mapped = [Link](y_raw==0, -1, 1)
Xs, mu, sigma = standardize_numpy(X_raw)
X_train, X_test, y_train, y_test = shuffle_split(Xs, y_mapped, test_size=0.2, random_state=42)
Cs = [0.1, 1.0, 10.0]
gamma = 0.5
results = []
for C in Cs:
alpha_lin, b_lin, sv_lin = svm_train_dual_smo(X_train, y_train, C=C, kernel_func=None,
kernel_params=None, max_iter=1000)
w_lin = compute_weights_linear(alpha_lin, y_train, X_train)
preds_lin, dec_lin = svm_predict(alpha_lin, b_lin, X_train, y_train, X_test,
kernel_func=None, kernel_params=None)
acc_l, prec_l, rec_l, f1_l, cm_l = metrics_from_scratch(y_test, preds_lin)
sv_count_lin = sv_lin.shape[0]
alpha_rbf, b_rbf, sv_rbf = svm_train_dual_smo(X_train, y_train, C=C,
kernel_func=rbf_kernel, kernel_params={"gamma":gamma}, max_iter=1000)
preds_rbf, dec_rbf = svm_predict(alpha_rbf, b_rbf, X_train, y_train, X_test,
kernel_func=rbf_kernel, kernel_params={"gamma":gamma})
acc_r, prec_r, rec_r, f1_r, cm_r = metrics_from_scratch(y_test, preds_rbf)
sv_count_rbf = sv_rbf.shape[0]
[Link]({
"C": C,
"Linear": {"accuracy": acc_l, "precision": prec_l, "recall": rec_l, "f1": f1_l,
"support_vectors": sv_count_lin, "confusion_matrix": cm_l},
"RBF": {"accuracy": acc_r, "precision": prec_r, "recall": rec_r, "f1": f1_r,
"support_vectors": sv_count_rbf, "confusion_matrix": cm_r}
})
for res in results:
print("C =", res["C"])
print(" Linear -> acc: {:.4f}, prec: {:.4f}, rec: {:.4f}, f1: {:.4f}, sv: {}".format(
res["Linear"]["accuracy"], res["Linear"]["precision"], res["Linear"]["recall"],
res["Linear"]["f1"], res["Linear"]["support_vectors"]))
print(" Confusion Matrix:\n", res["Linear"]["confusion_matrix"])
print(" RBF -> acc: {:.4f}, prec: {:.4f}, rec: {:.4f}, f1: {:.4f}, sv: {}".format(
res["RBF"]["accuracy"], res["RBF"]["precision"], res["RBF"]["recall"], res["RBF"]["f1"],
res["RBF"]["support_vectors"]))
print(" Confusion Matrix:\n", res["RBF"]["confusion_matrix"])
print("-"*60)
feat_idx = [0,1]
X_2f = Xs[:, feat_idx]
X2_train, X2_test, y2_train, y2_test = shuffle_split(X_2f, y_mapped, test_size=0.2,
random_state=42)
alpha_lin2, b_lin2, sv_lin2 = svm_train_dual_smo(X2_train, y2_train, C=1.0,
kernel_func=None, kernel_params=None, max_iter=1000)
alpha_rbf2, b_rbf2, sv_rbf2 = svm_train_dual_smo(X2_train, y2_train, C=1.0,
kernel_func=rbf_kernel, kernel_params={"gamma":0.5}, max_iter=1000)
x_min, x_max = X2_train[:,0].min()-1, X2_train[:,0].max()+1
y_min, y_max = X2_train[:,1].min()-1, X2_train[:,1].max()+1
xx, yy = [Link]([Link](x_min, x_max, 300), [Link](y_min, y_max, 300))
grid = np.c_[[Link](), [Link]()]
pred_grid_lin, decg_lin = svm_predict(alpha_lin2, b_lin2, X2_train, y2_train, grid,
kernel_func=None, kernel_params=None)
Z_lin = pred_grid_lin.reshape([Link])
pred_grid_rbf, decg_rbf = svm_predict(alpha_rbf2, b_rbf2, X2_train, y2_train, grid,
kernel_func=rbf_kernel, kernel_params={"gamma":0.5})
Z_rbf = pred_grid_rbf.reshape([Link])
[Link](style="whitegrid")
[Link](figsize=(12,5))
[Link](1,2,1)
[Link](xx, yy, Z_lin, cmap="coolwarm", alpha=0.3)
[Link](X2_train[:,0], X2_train[:,1], c=y2_train, cmap="coolwarm", edgecolors="k")
[Link](X2_train[sv_lin2,0], X2_train[sv_lin2,1], s=100, facecolors='none', edgecolors='k')
[Link]("Linear Kernel Decision Boundary (features {} & {})".format(feat_idx[0], feat_idx[1]))
[Link]("Feature {}".format(feat_idx[0]))
[Link]("Feature {}".format(feat_idx[1]))
[Link](1,2,2)
[Link](xx, yy, Z_rbf, cmap="coolwarm", alpha=0.3)
[Link](X2_train[:,0], X2_train[:,1], c=y2_train, cmap="coolwarm", edgecolors="k")
[Link](X2_train[sv_rbf2,0], X2_train[sv_rbf2,1], s=100, facecolors='none', edgecolors='k')
[Link]("RBF Kernel Decision Boundary (features {} & {})".format(feat_idx[0], feat_idx[1]))
[Link]("Feature {}".format(feat_idx[0]))
[Link]("Feature {}".format(feat_idx[1]))
plt.tight_layout()
[Link]()
8. PCA with Variance Retention
Goal
Apply PCA to reduce dimensions of the Breast Cancer Wisconsin dataset and determine the
minimum number of components required to retain 90%, 95%, and 99% variance. Visualize
results, reconstruct data, and compute reconstruction error.
Dataset
Used: [Link].load_breast_cancer
Samples: 569
Features: 30 numeric
Target: 0 = Malignant, 1 = Benign
Method
1. Standardize all features using Z-score normalization (mean 0, std 1).
2. Fit PCA on the training set.
3. Plot cumulative explained variance and identify how many components reach 90%, 95%,
99%.
4. Reduce dataset to 2 components and visualize samples in 2-D.
5. For each threshold:
• Reduce test data to the selected components
• Reconstruct using inverse transform
• Compute reconstruction error (MSE)
6. Create summary table.
Results
• Cumulative variance plot
• PCA 2-D scatter
• Printed table with:
• Threshold
• Components needed
• Variance achieved
• Reconstruction MSE
Code
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import load_breast_cancer
from [Link] import PCA
from [Link] import StandardScaler
from [Link] import mean_squared_error
data = load_breast_cancer()
X = [Link]
y = [Link]
scaler = StandardScaler()
Xs = scaler.fit_transform(X)
pca_full = PCA()
pca_full.fit(Xs)
cum_var = [Link](pca_full.explained_variance_ratio_)
thresholds = [0.90, 0.95, 0.99]
results = []
for t in thresholds:
k = [Link](cum_var >= t) + 1
pca_k = PCA(n_components=k)
Xk = pca_k.fit_transform(Xs)
Xr = pca_k.inverse_transform(Xk)
mse = mean_squared_error(Xs, Xr)
[Link]([t, k, cum_var[k-1], mse])
df = [Link](results, columns=["Threshold","Components","Variance
Achieved","Reconstruction MSE"])
print(df)
[Link](figsize=(10,5))
[Link](cum_var, marker='o')
[Link](0.90, color='red', linestyle='--')
[Link](0.95, color='green', linestyle='--')
[Link](0.99, color='purple', linestyle='--')
[Link]("Number of Components")
[Link]("Cumulative Explained Variance")
[Link]("Variance Retention Curve")
[Link](True)
[Link]()
pca2 = PCA(n_components=2)
X2 = pca2.fit_transform(Xs)
[Link](figsize=(8,6))
[Link](x=X2[:,0], y=X2[:,1], hue=y, palette="coolwarm")
[Link]("PCA (2 Components) Projection")
[Link]("PC1")
[Link]("PC2")
[Link]()
9. Unsupervised Learning
Objective
This lab explores two key unsupervised learning techniques.
Part 1 uses PCA for dimensionality reduction to visualize high-dimensional chemical wine data
in 2D space.
Part 2 applies K-Means clustering to segment mall customers based on income and spending
behavior.
Part 1: PCA for Wine Data Visualization
Goal
Convert 13-dimensional wine chemical features into a 2-dimensional representation to see if
natural groupings exist in the data.
Dataset
Wine Dataset (13 features)
Targets represent 3 wine cultivars, but are not used for computing PCA—only for coloring the
final plot.
Method
1. Load the dataset and separate X (features) and y (cultivar label).
2. Standardize X because PCA is variance-based and sensitive to scale.
3. Fit PCA with n_components = 2.
4. Retrieve the variance explained by PC1 and PC2.
5. Create a scatter plot of PC1 vs PC2, coloring points using the true labels to visually
inspect separation.
Results
The two-component PCA projection produces a 2D map where wines form visibly distinct
clusters corresponding closely to their cultivars.
Explained variance values (produced by the script) show the proportion of information preserved
by PC1, PC2, and their combined total.
Part 2: K-Means Customer Segmentation
Goal
Segment shopping mall customers into distinct behavior-based groups using their annual income
and spending score.
Dataset
Mall Customers Dataset
Features: Annual Income (k$), Spending Score (1–100)
Method
1. Use the Elbow Method for K = 1 to 10 to compute inertia values.
2. Identify the “elbow” point as the optimal number of clusters.
3. Apply K-Means using the selected K.
4. Visualize clusters on a 2D scatter plot with colors representing cluster labels.
Results
The elbow point (value shown by the plot) typically occurs at K = 5 for this dataset, indicating
five distinct customer groups.
Cluster visualization displays well-separated groups representing different spending behaviors.
Code
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import load_wine
from [Link] import StandardScaler
from [Link] import PCA
from [Link] import KMeans
wine = load_wine()
X = [Link]
y = [Link]
scaler = StandardScaler()
Xs = scaler.fit_transform(X)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(Xs)
var1 = pca.explained_variance_ratio_[0]
var2 = pca.explained_variance_ratio_[1]
total_var = var1 + var2
print("PC1 Variance:", var1)
print("PC2 Variance:", var2)
print("Total Variance:", total_var)
[Link](figsize=(8,6))
[Link](x=X_pca[:,0], y=X_pca[:,1], hue=y, palette="viridis")
[Link]("PC1")
[Link]("PC2")
[Link]("PCA of Wine Dataset")
[Link]()
mall = pd.read_csv("Mall_Customers.csv")
Xc = mall[["Annual Income (k$)", "Spending Score (1-100)"]].values
inertias = []
Ks = range(1,11)
for k in Ks:
km = KMeans(n_clusters=k, random_state=42)
[Link](Xc)
[Link](km.inertia_)
[Link](figsize=(8,6))
[Link](Ks, inertias, marker='o')
[Link]("Number of Clusters (K)")
[Link]("Inertia")
[Link]("Elbow Method")
[Link]()
optimal_k = int(input("Enter optimal K from elbow plot: "))
kmeans = KMeans(n_clusters=optimal_k, random_state=42)
labels = kmeans.fit_predict(Xc)
[Link](figsize=(8,6))
[Link](x=Xc[:,0], y=Xc[:,1], hue=labels, palette="rainbow")
[Link]("Annual Income (k$)")
[Link]("Spending Score")
[Link]("Customer Segments (K-Means)")
[Link]()