0% found this document useful (0 votes)

3 views53 pages

Machine Learning Techniques Lab File

The document is a lab file detailing various machine learning experiments conducted by Radhika Gupta at the Indian Institute of Information Technology and Management, Gwalior. It includes experiments on polynomial curve fitting, logistic regression for spam classification, and a breast cancer classifier, along with objectives, theories, results, and code implementations for each experiment. The experiments utilize techniques such as gradient descent, logistic regression, and performance metric calculations to evaluate model effectiveness.

Uploaded by

Abhishek Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views53 pages

Machine Learning Techniques Lab File

Uploaded by

Abhishek Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Atal Bihari Vajpayee

Indian Institute of Information Technology

and Management, Gwalior

LAB FILE OF

MACHINE LEARNING TECHNIQUES

SUBMITTED BY:- SUBMITTED TO:-

RADHIKA GUPTA Prof. ADITYA TRIVEDI

ROLL No: 2025MIT-009

[Link] (I.T) 1st sem DEPARTMENT OF I.T

INDEX
Exp. No. Title of experiment Date

1 Polynomial Curve Fitting using 21/08/2025

Gradient Descent

2 Binary Classification using 28/08/2025

Logistic Regression

3 Binary Logistic Regression: 04/09/2025

Breast Cancer Classifier

4 Multiclass Logistic Regression: 11/09/2025

Softmax vs One-vs-Rest

5 Logistic Regression using 09/10/2025

Newton’s Method

6 Personalized Book 16/10/2025

Recommendation using Naïve
Bayes

7 Kernelized SVM Classifier 30/10/2025

8 PCA with Variance Retention 06/11/2025

9 Unsupervised Learning – PCA 13/11/2025

& K-Means Clustering
1. Polynomial Curve Fitting

Question / Objective

To study how the polynomial degree (M), noise in data, and gradient descent method (Batch vs.
Stochastic) affect model fitting, underfitting, and overfitting.

Theory

• The target function is:

t = sin(2πx) + ϵ, x ∈ [0, 1], ϵ ∼ N (0, σ2)
• We fit a polynomial model:
y(x, w) = w0 + w1x + w2x2 + ⋯ + wMxM.
• Error is measured by:
E(w) = (1/2N) ∑(y(xn, w) − tn)2
• Gradient Descent Methods:
o Batch Gradient Descent (BGD): updates weights using the full dataset.
o Stochastic Gradient Descent (SGD): updates weights using one random sample
at a time.
• RMSE (Root Mean Square Error) is used for evaluating performance.
o Training RMSE decreases as degree M increases.
o Test RMSE first decreases, then increases (U-shape).
o Small M → underfitting.
o Large M (e.g., M=9) → overfitting.

Result

• Training error goes down as polynomial degree increases.

• Test error follows a U-shape: first reduces, then grows when overfitting occurs.
• With M=9, training error is almost 0, but test error is large.
• BGD and SGD both work, but BGD is more stable, while SGD is faster but noisier.
• Graphs of polynomial fits show:
o M=0: Underfit (flat line).
o M=1: Linear fit, still underfit.
o M=3: Good fit (low test error).
o M=9: Overfit (too wiggly).
• RMSE vs M plots confirm expected outcome.

Code:

import numpy as np
import [Link] as plt
import pandas as pd

[Link](0)

N_train = 10
x_train = [Link]([Link](N_train), 2)
x_train = [Link](x_train)

sigmas = {"A": 0.05, "B": 0.01}

datasets = {}
for key, sigma in [Link]():
noise = [Link](0, sigma, size=N_train)
t = [Link](2 * [Link] * x_train) + noise
datasets[key] = {"x": x_train, "t": t, "sigma": sigma}

x_test = [Link]([Link](0, 1, 100), 2)

t_test_true = [Link](2 * [Link] * x_test)

def design_matrix(x, M):

"""Polynomial design matrix up to degree M"""
return [Link](x, N=M+1, increasing=True)

def predict(w, X):

return X @ w

def rmse(y_true, y_pred):

return [Link]([Link]((y_true - y_pred)**2))

def compute_gradient(w, X, t):

"""Gradient of squared error loss"""
y=X@w
return (X.T @ (y - t)) / len(t)

def train_bgd(X, t, lr=1e-3, max_epochs=5000, tol=1e-8):

w = [Link]([Link][1])
losses = []
prev_loss = None
for epoch in range(max_epochs):
y=X@w
loss = 0.5 * [Link]((y - t)**2)
[Link](loss)
if prev_loss and abs(prev_loss - loss) < tol:
break
prev_loss = loss
grad = compute_gradient(w, X, t)
w -= lr * grad
return w, losses

def train_sgd(X, t, lr=1e-4, max_epochs=10000, tol=1e-8):

w = [Link]([Link][1])
losses = []
prev_loss = None
m = len(t)
for epoch in range(max_epochs):
i = [Link](m)
xi = X[i:i+1]
ti = t[i]
yi = xi @ w
grad = xi.T * (yi - ti)
w -= lr * [Link]()
if epoch % m == 0:
y=X@w
loss = 0.5 * [Link]((y - t)**2)
[Link](loss)
if prev_loss and abs(prev_loss - loss) < tol:
break
prev_loss = loss
return w, losses

M_values = range(0, 10)

results = []

df_results = [Link](results)
df_results.head()

plot_Ms = [0, 1, 3, 9]

[Link](x_test, t_test_true, "k--", label="True sin(2πx)")

[Link](x_test, y_test, label=f"Model M={M}")
[Link](x, t, c="red", marker="o", label="Train data")
[Link](f"{ds_key}, {method}, M={M}")
[Link]()
plt.tight_layout()
[Link]()
for ds_key in [Link]():
for method in ["BGD", "SGD"]:
sub = df_results[(df_results.dataset==ds_key) &
(df_results.method==method)]
[Link](figsize=(6,4))
[Link](sub["M"], sub["train_rmse"], marker="o", label="Train RMSE")
[Link](sub["M"], sub["test_rmse"], marker="o", label="Test RMSE")
[Link]("Polynomial degree (M)")
[Link]("RMSE")
[Link](f"RMSE vs M ({ds_key}, {method})")
[Link]()
[Link]()
2. Logistic Regression for Spam Email Classification

Question / Objective

To build a logistic regression model from scratch for binary classification using the dataset
[Link], plot the cost function vs iterations, show the confusion matrix, and calculate
performance metrics: Accuracy, Precision, Recall, and F1 Score.

Theory

• Logistic regression is used for binary classification (two classes: e.g., spam / not spam).
• The model predicts probability using the sigmoid function:

σ(x) = 1 / (1 + e-x)), z = wTx+b

• The cost function used is Binary Cross-Entropy (Log Loss):

J(w) = -(1/N) Σ [y * log(p) + (1 - y) * log(1 - p)]

• Optimization is done with Gradient Descent:

o Update weights www and bias bbb until cost stabilizes.
• After training, predictions are compared with actual labels to form a confusion matrix:
o TP (True Positive): Spam correctly detected
o TN (True Negative): Non-spam correctly detected
o FP (False Positive): Non-spam wrongly marked as spam
o FN (False Negative): Spam missed
• From confusion matrix we calculate:
o Accuracy = (TP + TN) / Total
o Precision = TP / (TP + FP)
o Recall = TP / (TP + FN)
o F1 Score = 2 × (Precision × Recall) / (Precision + Recall)
Result

• The cost function decreases steadily with iterations, showing proper learning.
• Confusion matrix shows how well the model classifies spam vs non-spam.
• Performance metrics (Accuracy, Precision, Recall, F1 Score) are calculated — good
values indicate a working classifier.
• Logistic regression successfully separates spam from non-spam emails.

Code:

import numpy as np
import pandas as pd
import [Link] as plt
from sklearn.model_selection import train_test_split

from [Link] import files

uploaded = [Link]()

df = pd.read_csv("[Link]")
print("Dataset shape:", [Link])
print([Link]())

target_col = [Link][-1]

if df[target_col].dtype == 'object':
df[target_col] = df[target_col].map({'spam':1, 'ham':0, 'yes':1, 'no':0}).fillna(df[target_col])

X = [Link](columns=[target_col])
y = df[target_col].[Link](-1,1)

X = X.select_dtypes(include=[[Link]]).values

X = (X - [Link](axis=0)) / ([Link](axis=0) + 1e-9)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

def sigmoid(z):
return 1 / (1 + [Link](-z))

def compute_cost(y_true, y_pred, w):

m = len(y_true)
cost = -(1/m) * [Link](y_true*[Link](y_pred+1e-9) + (1-y_true)*[Link](1-y_pred+1e-9))
return cost

def train_logistic_regression(X, y, lr=0.01, epochs=1000):

m, n = [Link]
w = [Link]((n,1))
b=0
costs = []

for i in range(epochs):
z = [Link](X, w) + b
y_pred = sigmoid(z)

dw = (1/m) * [Link](X.T, (y_pred - y))

db = (1/m) * [Link](y_pred - y)

w -= lr * dw
b -= lr * db

if i % 10 == 0:
[Link](compute_cost(y, y_pred, w))

return w, b, costs

def predict(X, w, b, threshold=0.5):

z = [Link](X, w) + b
y_pred = sigmoid(z)
return (y_pred >= threshold).astype(int)

w, b, costs = train_logistic_regression(X_train, y_train, lr=0.1, epochs=2000)

[Link](range(0,2000,10), costs)
[Link]("Iterations")
[Link]("Cost")
[Link]("Cost Function vs Iterations")
[Link]()

def confusion_matrix(y_true, y_pred):

TP = [Link]((y_true==1) & (y_pred==1))
TN = [Link]((y_true==0) & (y_pred==0))
FP = [Link]((y_true==0) & (y_pred==1))
FN = [Link]((y_true==1) & (y_pred==0))
return TP, TN, FP, FN

def compute_metrics(y_true, y_pred):

TP, TN, FP, FN = confusion_matrix(y_true, y_pred)
accuracy = (TP + TN) / (TP + TN + FP + FN)
precision = TP / (TP + FP + 1e-9)
recall = TP / (TP + FN + 1e-9)
f1 = 2 * precision * recall / (precision + recall + 1e-9)
return accuracy, precision, recall, f1, (TP,TN,FP,FN)

y_pred_test = predict(X_test, w, b)
acc, prec, rec, f1, (TP,TN,FP,FN) = compute_metrics(y_test, y_pred_test)

print("Confusion Matrix")
print(f"TP={TP}, TN={TN}, FP={FP}, FN={FN}")
print(f"Accuracy: {acc:.4f}")
print(f"Precision: {prec:.4f}")
print(f"Recall: {rec:.4f}")
print(f"F1 Score: {f1:.4f}")
3. Binary Logistic Regression – Breast Cancer Classifier

Question / Objective

To build a Breast Cancer classifier using Logistic Regression from scratch on the Wisconsin
Breast Cancer Dataset, implementing different optimizers (Batch GD, Stochastic GD,
Newton’s Method/IRLS), adding L2 regularization, and comparing performance metrics.

Theory

• Dataset: Breast Cancer Wisconsin Diagnostic dataset has 569 samples and 30 features.
• Preprocessing:
o Check for missing values.
o Standardize features to mean = 0 and variance = 1.
o Optional: Use PCA (via SVD) to reduce to 2D for visualization.
• Model:
o Logistic Regression predicts probability using sigmoid:

y^= 1/1+e−(wTx+b)

o Loss function: Negative Log Likelihood (Binary Cross Entropy).

o Gradient and Hessian are derived analytically for optimization.
• Optimizers:
o Batch Gradient Descent (BGD): updates weights using all samples.
o Stochastic Gradient Descent (SGD): updates using random mini-batches.
o Newton’s Method / IRLS: uses Hessian for faster convergence.
o L2 Regularization prevents overfitting by penalizing large weights.
• Metrics: Calculated from scratch (no [Link]):
o Accuracy, Precision, Recall, F1 Score, ROC-AUC.

Result

• BGD converges slowly but steadily.

• SGD converges faster but with noise.
• Newton’s Method converges very quickly (few iterations).
• With L2 Regularization, weights shrink and generalization improves (less overfitting).
• Performance metrics show high accuracy (>90%), good precision and recall, proving that
logistic regression is effective for breast cancer classification.

Code:

import numpy as np
import [Link] as plt
from [Link] import load_breast_cancer

data = load_breast_cancer()
X = [Link]
y = [Link](-1,1)

print("Dataset shape:", [Link], [Link])

X = (X - [Link](axis=0)) / ([Link](axis=0) + 1e-9)

[Link](42)
indices = [Link]([Link][0])
train_size = int(0.8 * [Link][0])
train_idx, test_idx = indices[:train_size], indices[train_size:]

X_train, y_train = X[train_idx], y[train_idx]

X_test, y_test = X[test_idx], y[test_idx]

def sigmoid(z):
return 1 / (1 + [Link](-z))

def compute_loss(y, y_hat, w, l2=0.0):

m = len(y)
loss = -(1/m)*[Link](y*[Link](y_hat+1e-9) + (1-y)*[Link](1-y_hat+1e-9))
reg = (l2/(2*m)) * [Link](w**2)
return loss + reg

def gradients(X, y, y_hat, w, l2=0.0):

m = len(y)
dw = (1/m) * [Link](X.T, (y_hat - y)) + (l2/m)*w
db = (1/m) * [Link](y_hat - y)
return dw, db

def hessian(X, y_hat, l2=0.0):

m = [Link][0]
R = [Link]((y_hat*(1-y_hat)).flatten())
H = (1/m) * [Link](X.T, [Link](R, X)) + l2*[Link]([Link][1])
return H

def predict(X, w, b, threshold=0.5):

probs = sigmoid([Link](X, w) + b)
return (probs >= threshold).astype(int), probs

def train_bgd(X, y, lr=0.01, epochs=1000, l2=0.0):

m, n = [Link]
w = [Link]((n,1))
b=0
costs = []
for i in range(epochs):
y_hat = sigmoid([Link](X, w) + b)
cost = compute_loss(y, y_hat, w, l2)
dw, db = gradients(X, y, y_hat, w, l2)
w -= lr * dw
b -= lr * db
if i % 10 == 0:
[Link](cost)
return w, b, costs

def train_sgd(X, y, lr=0.01, epochs=100, batch_size=32, l2=0.0):

m, n = [Link]
w = [Link]((n,1))
b=0
costs = []
for i in range(epochs):
indices = [Link](m)
for start in range(0, m, batch_size):
idx = indices[start:start+batch_size]
X_batch, y_batch = X[idx], y[idx]
y_hat = sigmoid([Link](X_batch, w) + b)
dw, db = gradients(X_batch, y_batch, y_hat, w, l2)
w -= lr * dw
b -= lr * db
if i % 5 == 0:
y_hat_full = sigmoid([Link](X, w) + b)
[Link](compute_loss(y, y_hat_full, w, l2))
return w, b, costs

def train_newton(X, y, epochs=20, l2=0.0):

m, n = [Link]
w = [Link]((n,1))
b=0
costs = []
for i in range(epochs):
z = [Link](X, w) + b
y_hat = sigmoid(z)
cost = compute_loss(y, y_hat, w, l2)
grad_w, grad_b = gradients(X, y, y_hat, w, l2)
H = hessian(X, y_hat, l2)
w -= [Link](H) @ grad_w
b -= grad_b
[Link](cost)
return w, b, costs

w_bgd, b_bgd, costs_bgd = train_bgd(X_train, y_train, lr=0.1, epochs=500)

w_sgd, b_sgd, costs_sgd = train_sgd(X_train, y_train, lr=0.1, epochs=100, batch_size=32)
w_newton, b_newton, costs_newton = train_newton(X_train, y_train, epochs=15)

def compute_metrics(y_true, y_pred):

TP = [Link]((y_true==1) & (y_pred==1))
TN = [Link]((y_true==0) & (y_pred==0))
FP = [Link]((y_true==0) & (y_pred==1))
FN = [Link]((y_true==1) & (y_pred==0))

accuracy = (TP+TN)/(TP+TN+FP+FN)
precision = TP/(TP+FP+1e-9)
recall = TP/(TP+FN+1e-9)
f1 = 2*precision*recall/(precision+recall+1e-9)

return accuracy, precision, recall, f1, (TP,TN,FP,FN)

y_pred_test, probs_test = predict(X_test, w_newton, b_newton)

acc, prec, rec, f1, cm_vals = compute_metrics(y_test, y_pred_test)

print("Confusion Matrix (TP,TN,FP,FN):", cm_vals)

print(f"Accuracy: {acc:.4f}")
print(f"Precision: {prec:.4f}")
print(f"Recall: {rec:.4f}")
print(f"F1 Score: {f1:.4f}")

[Link](costs_bgd, label="BGD")
[Link](costs_sgd, label="SGD")
[Link](costs_newton, label="Newton")
[Link]("Iterations")
[Link]("Cost")
[Link]("Cost Function Convergence")
[Link]()
[Link]()
4. Multiclass Logistic Regression – Softmax vs One-vs-Rest

Question / Objective

To implement and compare two approaches for multiclass classification:

1. Softmax Logistic Regression (single model for all classes)

2. One-vs-Rest Logistic Regression (multiple binary models, one for each class).

Dataset used: Iris dataset (150 samples, 3 classes, 4 features).

Theory

• Iris Dataset: Each sample has 4 features (sepal length, sepal width, petal length, petal
width) and belongs to one of 3 species (Setosa, Versicolor, Virginica).
• Softmax Logistic Regression:
o Extends binary logistic regression to multiple classes.
o Computes scores for each class:

zj=wjTx

Then applies softmax:

P(y=j∣x) = ezj/∑kezk

o Loss: Cross-Entropy Loss.

• One-vs-Rest (OvR):
o Trains K binary logistic regressors (one for each class vs the rest).
o Predicts class with the highest probability among the K models.
• Optimization:
o Gradient Descent (slow but easy).
o Newton/IRLS (fast using Hessian).
o Regularization λ prevents overfitting.
• Evaluation:
o Confusion Matrix.
o Per-class Precision, Recall, F1 Score (must be implemented manually).
o PCA can reduce features to 2D for visualization of decision boundaries.

Result

• Softmax regression directly optimizes multiclass probabilities → smooth decision

boundaries.
• One-vs-Rest works but may give overlapping/conflicting probabilities.
• With PCA to 2D, decision regions clearly show class separation.
• Performance metrics (Accuracy, Precision, Recall, F1) are high (>90% on Iris dataset).
• Both methods perform well, but Softmax is more elegant and stable.

Code:

import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from sklearn import datasets
from copy import deepcopy

[Link](style="whitegrid", context="notebook")

def one_hot(y, n_classes=None):

if n_classes is None:
n_classes = [Link](y) + 1
Y = [Link]((len(y), n_classes))
Y[[Link](len(y)), y] = 1
return Y

def standardize(X):
mu = [Link](axis=0)
sigma = [Link](axis=0, ddof=0)
sigma[sigma == 0] = 1.0
Xs = (X - mu) / sigma
return Xs, mu, sigma
def pca_svd(X, n_components=2):
"""
PCA via SVD. Returns projected data, components (eigenvectors), and explained variance.
X expected to be zero-centered (or we'll center inside).
"""
Xc = X - [Link](axis=0)
U, S, Vt = [Link](Xc, full_matrices=False)
components = Vt[:n_components]
projected = [Link](components.T)
explained_variance = (S ** 2) / (len(X) - 1)
return projected, components, explained_variance[:n_components]

def softmax(logits):
z = logits - [Link](logits, axis=1, keepdims=True)
expz = [Link](z)
probs = expz / [Link](expz, axis=1, keepdims=True)
return probs

def cross_entropy_loss(probs, Y_onehot, reg=0.0, W=None):

n = [Link][0]
eps = 1e-12
loss = -[Link](Y_onehot * [Link](probs + eps)) / n
if reg and W is not None:
loss += 0.5 * reg * [Link](W * W)
return loss

class SoftmaxGD:
def __init__(self, n_features, n_classes, lr=0.5, reg=1e-3, max_iter=1000, verbose=False):
self.n_features = n_features
self.n_classes = n_classes
[Link] = lr
[Link] = reg
self.max_iter = max_iter
[Link] = verbose

self.W = [Link]((n_features, n_classes))

self.b = [Link](n_classes)

def fit(self, X, y, X_val=None, y_val=None):

n, d = [Link]
Y = one_hot(y, self.n_classes)
for it in range(self.max_iter):
logits = [Link](self.W) + self.b
probs = softmax(logits)
error = probs - Y
grad_W = ([Link](error)) / n + [Link] * self.W
grad_b = [Link](error, axis=0) / n
self.W -= [Link] * grad_W
self.b -= [Link] * grad_b
if [Link] and (it % (self.max_iter // 10 + 1) == 0):
loss = cross_entropy_loss(probs, Y, reg=[Link], W=self.W)
print(f"Iter {it}/{self.max_iter} loss: {loss:.4f}")

def predict_proba(self, X):

logits = [Link](self.W) + self.b
return softmax(logits)

def predict(self, X):

probs = self.predict_proba(X)
return [Link](probs, axis=1)

class SoftmaxNewton:
def __init__(self, n_features, n_classes, reg=1e-3, max_iter=50, damping=1e-3,
verbose=False):
self.n_features = n_features
self.n_classes = n_classes
[Link] = reg
self.max_iter = max_iter
[Link] = damping
[Link] = verbose
self.W = [Link]((n_features, n_classes))
self.b = [Link](n_classes)

def fit(self, X, y):

n, d = [Link]
K = self.n_classes
Y = one_hot(y, K)
for it in range(self.max_iter):
logits = [Link](self.W) + self.b
P = softmax(logits)
G = [Link](P - Y) / n + [Link] * self.W

dK = d * K
if dK <= 4000:

H = [Link]((dK, dK))
Xt = X
for i in range(n):
p = P[i][:, None]
Ri = [Link]([Link]()) - [Link](p.T)
Xi = Xt[i][:, None]
kron = [Link](Ri, [Link](Xi.T))
H += kron
H=H/n
reg_block = [Link] * [Link](dK)
H += reg_block
H += [Link] * [Link](dK)
g_vec = [Link](-1, order='F')
try:
delta = [Link](H, g_vec)
except [Link]:
delta = [Link](H, g_vec, rcond=None)[0]
deltaW = [Link]((d, K), order='F')
self.W -= deltaW
grad_b = [Link](P - Y, axis=0) / n
self.b -= grad_b
else:

if [Link]:
print("Large d*K, falling back to gradient step for stability.")
self.W -= 0.5 * G
self.b -= 0.5 * [Link](P - Y, axis=0) / n

if [Link] and (it % max(1, self.max_iter // 10) == 0):

loss = cross_entropy_loss(P, Y, reg=[Link], W=self.W)
print(f"Newton iter {it}/{self.max_iter} loss: {loss:.6f}")

def predict_proba(self, X):

logits = [Link](self.W) + self.b
return softmax(logits)

def predict(self, X):

return [Link](self.predict_proba(X), axis=1)

def sigmoid(z):
return 1.0 / (1.0 + [Link](-z))

class BinaryLogisticGD:
def __init__(self, n_features, lr=0.5, reg=1e-3, max_iter=500, verbose=False):
self.w = [Link](n_features)
self.b = 0.0
[Link] = lr
[Link] = reg
self.max_iter = max_iter
[Link] = verbose

def fit(self, X, y_binary):

n = [Link][0]
for it in range(self.max_iter):
z = [Link](self.w) + self.b
p = sigmoid(z)
error = p - y_binary
grad_w = ([Link](error)) / n + [Link] * self.w
grad_b = [Link](error) / n
self.w -= [Link] * grad_w
self.b -= [Link] * grad_b
if [Link] and (it % (self.max_iter // 10 + 1) == 0):
loss = -[Link](y_binary * [Link](p + 1e-12) + (1 - y_binary) * [Link](1 - p + 1e-12))
print(f"Binary iter {it}/{self.max_iter} loss: {loss:.4f}")

def predict_proba(self, X):

return sigmoid([Link](self.w) + self.b)

def predict(self, X):

return (self.predict_proba(X) >= 0.5).astype(int)

class OneVsRest:
def __init__(self, n_features, n_classes, lr=0.5, reg=1e-3, max_iter=500, verbose=False):
self.n_classes = n_classes
[Link] = [BinaryLogisticGD(n_features, lr=lr, reg=reg, max_iter=max_iter,
verbose=verbose) for _ in range(n_classes)]

def fit(self, X, y):

for k in range(self.n_classes):
yk = (y == k).astype(int)
[Link][k].fit(X, yk)

def predict_proba(self, X):

probs = [Link]([m.predict_proba(X) for m in [Link]]).T

probs = [Link](probs, 1e-12, 1 - 1e-12)

probs_norm = probs / [Link](probs, axis=1, keepdims=True)
return probs_norm

def predict(self, X):

return [Link](self.predict_proba(X), axis=1)
def confusion_matrix(y_true, y_pred, n_classes=None):
if n_classes is None:
n_classes = [Link](y_true) + 1
cm = [Link]((n_classes, n_classes), dtype=int)
for t, p in zip(y_true, y_pred):
cm[t, p] += 1
return cm

def precision_recall_f1(cm):
K = [Link][0]
prec = [Link](K)
rec = [Link](K)
f1 = [Link](K)
for k in range(K):
tp = cm[k, k]
fp = cm[:, k].sum() - tp
fn = cm[k, :].sum() - tp
prec[k] = tp / (tp + fp) if (tp + fp) > 0 else 0.0
rec[k] = tp / (tp + fn) if (tp + fn) > 0 else 0.0
if prec[k] + rec[k] > 0:
f1[k] = 2 * prec[k] * rec[k] / (prec[k] + rec[k])
else:
f1[k] = 0.0
return prec, rec, f1

def plot_decision_regions_2d(model, X_2d, y, title="Decision regions", resolution=200,

cmap='viridis'):
x_min, x_max = X_2d[:, 0].min() - 1, X_2d[:, 0].max() + 1
y_min, y_max = X_2d[:, 1].min() - 1, X_2d[:, 1].max() + 1
xx, yy = [Link]([Link](x_min, x_max, resolution),
[Link](y_min, y_max, resolution))
grid = np.c_[[Link](), [Link]()]
probs = model.predict_proba(grid)
Z = [Link](probs, axis=1).reshape([Link])
[Link](figsize=(8, 6))
[Link](xx, yy, Z, alpha=0.3)
scatter = [Link](X_2d[:, 0], X_2d[:, 1], c=y, edgecolor='k', s=50)
[Link](title)
[Link]("PC1")
[Link]("PC2")
[Link](*scatter.legend_elements(), title="Classes")
[Link]()

def plot_probability_heatmaps(model, X_2d, title_prefix="Probabilities", resolution=200):

x_min, x_max = X_2d[:, 0].min() - 1, X_2d[:, 0].max() + 1
y_min, y_max = X_2d[:, 1].min() - 1, X_2d[:, 1].max() + 1
xx, yy = [Link]([Link](x_min, x_max, resolution),
[Link](y_min, y_max, resolution))
grid = np.c_[[Link](), [Link]()]
probs = model.predict_proba(grid)
K = [Link][1]
for k in range(K):
Z = probs[:, k].reshape([Link])
[Link](figsize=(6, 5))
[Link](xx, yy, Z, alpha=0.9)
[Link](label=f'P(class={k})')
[Link](f"{title_prefix} - class {k}")
[Link]("PC1")
[Link]("PC2")
[Link]()

def plot_weight_bars(W, title="Weights per class"):

d, K = [Link]
[Link](figsize=(10, 5))
df = [Link](W, index=[f"f{i}" for i in range(d)], columns=[f"class{k}" for k in
range(K)])
[Link](kind='bar')
[Link](title)
[Link]("Feature index")
[Link]("Weight value")
plt.tight_layout()
[Link]()

def plot_confusion_matrix(cm, labels=None, title="Confusion matrix"):

[Link](figsize=(5, 4))
[Link](cm, annot=True, fmt='d', cmap='Blues', xticklabels=labels, yticklabels=labels)
[Link]("True")
[Link]("Predicted")
[Link](title)
[Link]()

def calibration_plot(model, X, y, n_bins=10):

probs = model.predict_proba(X)
pred_conf = [Link](probs, axis=1)
pred_label = [Link](probs, axis=1)
correct = (pred_label == y).astype(int)
bins = [Link](0, 1, n_bins + 1)
bin_idx = [Link](pred_conf, bins) - 1
bin_acc = []
bin_conf = []
for b in range(n_bins):
mask = bin_idx == b
if [Link]() == 0:
bin_acc.append([Link])
bin_conf.append((bins[b] + bins[b+1]) / 2)
else:
bin_acc.append(correct[mask].mean())
bin_conf.append(pred_conf[mask].mean())
[Link](figsize=(6, 6))
[Link](bin_conf, bin_acc, marker='o', label='Calibration')
[Link]([0, 1], [0, 1], '--', color='gray', label='Perfect')
[Link]("Mean predicted probability")
[Link]("Empirical accuracy")
[Link]("Calibration plot")
[Link]()
[Link]()

def main():
iris = datasets.load_iris()
X = iris['data']
y = iris['target']
class_names = iris['target_names']

Xs, mu, sigma = standardize(X)

X_pca2, components, var_explained = pca_svd(Xs, n_components=2)

print("Explained variance (PC1, PC2):", var_explained)

n_features = [Link][1]
n_classes = int([Link]()) + 1

soft_gd = SoftmaxGD(n_features=n_features, n_classes=n_classes, lr=0.5, reg=1e-3,

max_iter=2000, verbose=False)
soft_gd.fit(Xs, y)
y_pred_soft_gd = soft_gd.predict(Xs)
cm_soft = confusion_matrix(y, y_pred_soft_gd, n_classes=n_classes)
prec, rec, f1 = precision_recall_f1(cm_soft)
print("\nSoftmax (GD) metrics:")
print("Confusion matrix:\n", cm_soft)
for k in range(n_classes):
print(f"Class {k} ({class_names[k]}): precision={prec[k]:.3f}, recall={rec[k]:.3f},
f1={f1[k]:.3f}")

ovr = OneVsRest(n_features=n_features, n_classes=n_classes, lr=0.5, reg=1e-3,

max_iter=1500, verbose=False)
[Link](Xs, y)
y_pred_ovr = [Link](Xs)
cm_ovr = confusion_matrix(y, y_pred_ovr, n_classes=n_classes)
prec_o, rec_o, f1_o = precision_recall_f1(cm_ovr)
print("\nOne-vs-Rest metrics:")
print("Confusion matrix:\n", cm_ovr)
for k in range(n_classes):
print(f"Class {k} ({class_names[k]}): precision={prec_o[k]:.3f}, recall={rec_o[k]:.3f},
f1={f1_o[k]:.3f}")

X_reduced = X_pca2
newton = SoftmaxNewton(n_features=X_reduced.shape[1], n_classes=n_classes, reg=1e-3,
max_iter=30, damping=1e-3, verbose=True)
[Link](X_reduced, y)
y_pred_newton = [Link](X_reduced)
cm_newton = confusion_matrix(y, y_pred_newton, n_classes=n_classes)
prec_n, rec_n, f1_n = precision_recall_f1(cm_newton)
print("\nSoftmax (Newton on PCA2) metrics:")
print("Confusion matrix:\n", cm_newton)
for k in range(n_classes):
print(f"Class {k} ({class_names[k]}): precision={prec_n[k]:.3f}, recall={rec_n[k]:.3f},
f1={f1_n[k]:.3f}")

class ProjectedModel:
def __init__(self, base_model, pca_components, mean_center, sigma):

self.base_model = base_model
[Link] = pca_components
[Link] = mean_center
[Link] = sigma

def predict_proba(self, X2d):

X_approx = [Link]([Link]) + 0.0
return self.base_model.predict_proba(X_approx)

proj_soft = ProjectedModel(soft_gd, components, mu, sigma)

proj_ovr = ProjectedModel(ovr, components, mu, sigma)

plot_decision_regions_2d(proj_soft, X_pca2, y, title="Decision regions (Softmax GD) on

PCA(2)")
plot_probability_heatmaps(proj_soft, X_pca2, title_prefix="Softmax GD class probability")
plot_weight_bars(soft_gd.W, title="Softmax GD weights (original features)")

plot_decision_regions_2d(proj_ovr, X_pca2, y, title="Decision regions (One-vs-Rest) on

PCA(2)")
plot_probability_heatmaps(proj_ovr, X_pca2, title_prefix="OvR class probability")
W_ovr = np.column_stack([m.w for m in [Link]])
plot_weight_bars(W_ovr, title="One-vs-Rest weights (original features)")

plot_decision_regions_2d(newton, X_reduced, y, title="Decision regions (Multinomial

Newton on PCA2)")
plot_probability_heatmaps(newton, X_reduced, title_prefix="Newton class probability")
plot_weight_bars(newton.W, title="Newton weights (PCA features)")

plot_confusion_matrix(cm_soft, labels=class_names, title="Confusion Matrix - Softmax GD")

plot_confusion_matrix(cm_ovr, labels=class_names, title="Confusion Matrix - One-vs-Rest")
plot_confusion_matrix(cm_newton, labels=class_names, title="Confusion Matrix - Newton
(PCA)")

calibration_plot(proj_soft, X_pca2, y)
calibration_plot(proj_ovr, X_pca2, y)
calibration_plot(newton, X_pca2, y)

print("\nDone.")

if __name__ == "__main__":
main()
5. Binary Logistic Regression using Newton’s Method

Question / Objective

To implement binary logistic regression from scratch using Newton’s Method (IRLS) for
parameter optimization.
We use the Iris dataset and select:

• Two classes: Versicolor (1) vs Virginica (2)

• Two features: Petal length, Petal width

We then:

1. Train logistic regression using Newton’s method

2. Plot loss vs iterations
3. Visualize decision boundary
4. Compare parameters and accuracy with sklearn.linear_model.LogisticRegression
(Newton-CG solver)

Theory

Iris Dataset

• 150 samples, 3 classes

• We select only two classes → reduces it to binary classification
• We select two features → easy 2D visualization

Logistic Regression (Binary)

• Predicts probability of class 1:

p=σ(wTx)p = \sigma(w^Tx)p=σ(wTx)

where

σ(z)=11+e−z\sigma(z)=\frac{1}{1+e^{-z}}σ(z)=1+e−z1

• Loss: Negative Log Likelihood

L=−∑[yln⁡(p)+(1−y)ln⁡(1−p)]L = -\sum [y\ln(p) + (1-y)\ln(1-p)]L=−∑[yln(p)+(1−y)ln(1−p)]

Newton’s Method (IRLS)

Newton method uses second-order derivatives:

wnew=w−H−1gw_{new} = w - H^{-1} gwnew=w−H−1g

Where:

• g = gradient
• H = Hessian matrix

Properties:

• Converges in very few iterations

• More stable than gradient descent

Evaluation

• Accuracy
• Coefficients comparison
• Decision boundary

Result

• Newton’s method converges very fast (2–5 iterations)

• Smooth decreasing NLL loss curve
• Decision boundary clearly separates the two Iris classes

Code
import numpy as np
import [Link] as plt
from sklearn import datasets
from sklearn.linear_model import LogisticRegression

iris = datasets.load_iris()
X = [Link][:, 2:4]
y = [Link]
mask = y != 0
X = X[mask]
y = y[mask] - 1
Xb = np.c_[[Link](([Link][0], 1)), X]
def sigmoid(z):
return 1 / (1 + [Link](-z))

def nll(w, X, y):

z=X@w
return -[Link](y*[Link](sigmoid(z)) + (1-y)*[Link](1-sigmoid(z)))

def newton_logreg(X, y, max_iter=10):

w = [Link]([Link][1])
losses = []
for _ in range(max_iter):
z=X@w
p = sigmoid(z)
g = X.T @ (p - y)
W = [Link](p*(1-p))
H = X.T @ W @ X
w -= [Link](H) @ g
[Link](nll(w, X, y))
return w, losses

w, losses = newton_logreg(Xb, y)

[Link](losses)
[Link]("Iteration")
[Link]("Negative Log-Likelihood")
[Link]()
[Link](X[:, 0], X[:, 1], c=y, cmap='coolwarm')
x1 = [Link](min(X[:,0]), max(X[:,0]), 100)
x2 = -(w[0] + w[1]*x1) / w[2]
[Link](x1, x2)
[Link]("Petal Length")
[Link]("Petal Width")
[Link]()

clf = LogisticRegression(C=1e6, solver='newton-cg').fit(X, y)

print("Our Coeff:", w)
print("Sklearn Coeff:", np.r_[clf.intercept_, clf.coef_[0]])
print("Our Accuracy:", [Link]((sigmoid(Xb@w) >= 0.5) == y))
print("Sklearn Accuracy:", [Link](X, y))
6. Personalized Book Recommender using Multinomial Naïve Bayes

Objective

The goal of this experiment is to build a personalized book recommender system using a
Multinomial Naïve Bayes classifier. The model predicts whether a user will like a book based on
two factors: the user’s past rating behavior and the genre tags associated with the book. The
Goodreads Books Dataset is used.

Theory

The dataset contains four tables: user ratings, book metadata, book–tag mappings, and tag
names. To build a predictive model, all four must be cleaned and merged. Each book may have
multiple genre tags; therefore, tag aggregation is required to create a single genre string per
book.

Multinomial Naïve Bayes is suitable for text-based features such as genres. It models count-
based data and treats each feature independently. Laplace smoothing (α = 1.0) is used to avoid
zero probability when a word or feature does not appear in the training data of a class.

The target label liked is defined from ratings: any rating > 3 is considered like (1), otherwise
dislike (0). The final feature matrix combines user identity (one-hot encoded) and book genre
text (converted into bag-of-words representation). The final classifier predicts whether a user
will like a book and outputs the probability.

Data Preparation and Preprocessing

Book tags and tag names are merged to map each book to its list of tag names. The tags are
aggregated into a single string per book. User ratings are merged with book metadata and the
aggregated genre table to form a unified dataset with user_id, book_id, rating, and genres.
Missing entries are removed. The binary target liked is created from rating values.

Feature Engineering

The user_id column is converted into numerical features through one-hot encoding. The genres
text is converted into numerical features using CountVectorizer, resulting in a sparse bag-of-
words representation. Both components are horizontally concatenated to form the final feature
matrix X. The target vector y is the liked column.

Model Training

The dataset is split into 80% training and 20% testing using stratified sampling to maintain the
distribution of liked values. The Multinomial Naïve Bayes classifier is trained with α = 1.0
(Laplace smoothing). This prevents zero-probability situations when a feature appears in a class
during prediction but was unseen during training.

Model Evaluation

The model is evaluated using accuracy, precision, recall, and confusion matrix. These metrics
give insight into how well the recommender predicts user preference. Precision and recall help
analyze performance on the positive (liked) class.

Book Recommendation Function

A recommendation function predict_recommendation is created. It accepts a user ID and a new

book’s genre string and transforms these inputs using the fitted user encoder and genre
vectorizer. It returns a predicted label and probability that the user will like the given book. This
function can be used to build an interactive recommender.

Code

import pandas as pd
import numpy as np
from [Link] import OneHotEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from [Link] import accuracy_score, precision_score, recall_score, confusion_matrix

ratings = pd.read_csv("[Link]")
books = pd.read_csv("[Link]")
book_tags = pd.read_csv("book_tags.csv")
tags = pd.read_csv("[Link]")

tag_map = book_tags.merge(tags, on="tag_id")

tag_map = tag_map.groupby("goodreads_book_id")["tag_name"].apply(lambda x: "
".join(x)).reset_index()
merged = [Link](books[["book_id","goodreads_book_id"]], on="book_id")
merged = [Link](tag_map, on="goodreads_book_id")
merged["liked"] = (merged["rating"] > 3).astype(int)
merged = [Link]()

user_encoder = OneHotEncoder(handle_unknown="ignore")
user_features = user_encoder.fit_transform(merged[["user_id"]])

vectorizer = CountVectorizer()
genre_features = vectorizer.fit_transform(merged["tag_name"])

from [Link] import hstack

X = hstack([user_features, genre_features])
y = merged["liked"].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y,

random_state=42)
model = MultinomialNB(alpha=1.0)
[Link](X_train, y_train)

y_pred = [Link](X_test)
acc = accuracy_score(y_test, y_pred)
prec = precision_score(y_test, y_pred)
rec = recall_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

print("Accuracy:", acc)
print("Precision:", prec)
print("Recall:", rec)
print("Confusion Matrix:\n", cm)

def predict_recommendation(user_id, book_genres, model, user_encoder, genre_vectorizer):

u = user_encoder.transform([[user_id]])
g = genre_vectorizer.transform([book_genres])
x = hstack([u, g])
pred = [Link](x)[0]
prob = model.predict_proba(x)[0][1]
return pred, prob
7. Kernelized SVM Classifier

Goal
Implement an SVM classifier using the dual formulation and kernel trick on the Breast Cancer
Wisconsin (Diagnostic) dataset. Map labels to binary: Malignant (0) → -1, Benign (1) → +1.

Theory (brief)
Support Vector Machine (dual) optimizes alphas in:

max⁡α∑iαi−1/2∑i∑jαiαjyiyjK(xi,xj)

subject to 0≤αi≤C and ∑iαiyi=0. The kernel function K lets us work in implicit feature spaces.
Linear kernel corresponds to K(x,x′)=xTx′. RBF kernel uses K(x,x′)=exp(−γ∥x−x′∥2)

Dataset & preprocessing

Load [Link].load_breast_cancer. Standardize features manually (subtract mean, divide
by std) using NumPy. Implement shuffle and split manually to create 80% train / 20% test.

Kernel functions
Linear kernel and RBF kernel (with tunable gamma).

Model implementation
A simplified SMO-style optimizer updates pairs of alphas and enforces bounds [0, C] and the
equality constraint indirectly by how updates are computed. After training, weights can be
computed for the linear kernel as w=∑iαiyixiw=\sum_i \alpha_i y_i x_iw=∑iαiyixi. Bias b is
computed from support vectors.

Prediction
Decision function f(x)=sign(∑iαiyiK(xi,x)+b)f(x)=\text{sign}\left(\sum_i \alpha_i y_i K(x_i, x)
+ b\right)f(x)=sign(∑iαiyiK(xi,x)+b). For probabilities we do not compute calibrated
probabilities; we report raw decision values and threshold at zero.

Visualization
Select two features (mean radius, mean texture — feature indices 0 and 1). Train separate models
on only those two features and plot decision boundaries for Linear vs RBF kernels.

Evaluation (from scratch)

Compute accuracy, precision, recall, and F1-score manually. Also count support vectors (alphas
> tol). Compare results across kernels and across different regularization strengths C (example C
= [0.1, 1, 10]).

Expected output
A printed table of performance metrics for each kernel and each C, the number of support
vectors, and decision boundary plots for the two-feature case for linear and RBF kernels.
Below is the code that implements everything. Copy-paste and run in a Python environment with
the allowed packages installed.

import numpy as np
import [Link] as plt
import seaborn as sns
from [Link] import load_breast_cancer

def standardize_numpy(X):
mu = [Link](axis=0)
sigma = [Link](axis=0, ddof=0)
sigma[sigma==0] = 1.0
Xs = (X - mu) / sigma
return Xs, mu, sigma

def shuffle_split(X, y, test_size=0.2, random_state=None):

if random_state is not None:
[Link](random_state)
idx = [Link]([Link][0])
[Link](idx)
Xs = X[idx]
ys = y[idx]
split = int([Link][0] * (1 - test_size))
X_train = Xs[:split]
X_test = Xs[split:]
y_train = ys[:split]
y_test = ys[split:]
return X_train, X_test, y_train, y_test

def linear_kernel(X, Y):

return X @ Y.T

def rbf_kernel(X, Y, gamma):

XX = [Link](X * X, axis=1).reshape(-1,1)
YY = [Link](Y * Y, axis=1).reshape(1,-1)
distances = XX + YY - 2 * X @ Y.T
return [Link](-gamma * distances)

def svm_train_dual_smo(X, y, C=1.0, kernel_func=None, kernel_params=None, max_iter=1000,

tol=1e-5):
n = [Link][0]
alpha = [Link](n)
b = 0.0
if kernel_func is None:
K = linear_kernel(X, X)
else:
if kernel_params is None:
K = kernel_func(X, X)
else:
K = kernel_func(X, X, **kernel_params)
it = 0
passes = 0
max_passes = 5
while passes < max_passes and it < max_iter:
num_changed = 0
for i in range(n):
Ai = alpha[i]
Ki = K[i]
Ei = ([Link](alpha * y * Ki) + b) - y[i]
if (y[i]*Ei < -tol and Ai < C) or (y[i]*Ei > tol and Ai > 0):
j = [Link]([x for x in range(n) if x != i])
Aj = alpha[j]
Kj = K[j]
Ej = ([Link](alpha * y * Kj) + b) - y[j]
if y[i] == y[j]:
L = max(0, Ai + Aj - C)
H = min(C, Ai + Aj)
else:
L = max(0, Aj - Ai)
H = min(C, C + Aj - Ai)
if abs(L - H) < 1e-12:
continue
eta = K[i,i] + K[j,j] - 2*K[i,j]
if eta <= 0:
continue
new_Aj = Aj + y[j]*(Ei - Ej)/eta
if new_Aj > H:
new_Aj = H
elif new_Aj < L:
new_Aj = L
if abs(new_Aj - Aj) < 1e-12:
continue
new_Ai = Ai + y[i]*y[j]*(Aj - new_Aj)
alpha[i] = new_Ai
alpha[j] = new_Aj
b1 = b - Ei - y[i]*(alpha[i]-Ai)*K[i,i] - y[j]*(alpha[j]-Aj)*K[i,j]
b2 = b - Ej - y[i]*(alpha[i]-Ai)*K[i,j] - y[j]*(alpha[j]-Aj)*K[j,j]
if 0 < alpha[i] < C:
b = b1
elif 0 < alpha[j] < C:
b = b2
else:
b = 0.5*(b1 + b2)
num_changed += 1
it += 1
if num_changed == 0:
passes += 1
else:
passes = 0
sv_idx = [Link](alpha > tol)[0]
return alpha, b, sv_idx

def svm_predict(alpha, b, X_train, y_train, X_test, kernel_func=None, kernel_params=None):

if kernel_func is None:
Ktest = linear_kernel(X_test, X_train)
else:
if kernel_params is None:
Ktest = kernel_func(X_test, X_train)
else:
Ktest = kernel_func(X_test, X_train, **kernel_params)
decision = (Ktest @ (alpha * y_train)) + b
preds = [Link](decision)
preds[preds==0] = 1
return preds, decision

def compute_weights_linear(alpha, y, X):

return (alpha * y) @ X

def metrics_from_scratch(y_true, y_pred):

tp = [Link]((y_true==1) & (y_pred==1))
tn = [Link]((y_true==-1) & (y_pred==-1))
fp = [Link]((y_true==-1) & (y_pred==1))
fn = [Link]((y_true==1) & (y_pred==-1))
accuracy = (tp + tn) / (tp + tn + fp + fn)
precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0.0
return accuracy, precision, recall, f1, [Link]([[tn, fp],[fn, tp]])

data = load_breast_cancer()
X_raw = [Link]
y_raw = [Link]
y_mapped = [Link](y_raw==0, -1, 1)
Xs, mu, sigma = standardize_numpy(X_raw)
X_train, X_test, y_train, y_test = shuffle_split(Xs, y_mapped, test_size=0.2, random_state=42)

Cs = [0.1, 1.0, 10.0]

gamma = 0.5
results = []
for C in Cs:
alpha_lin, b_lin, sv_lin = svm_train_dual_smo(X_train, y_train, C=C, kernel_func=None,
kernel_params=None, max_iter=1000)
w_lin = compute_weights_linear(alpha_lin, y_train, X_train)
preds_lin, dec_lin = svm_predict(alpha_lin, b_lin, X_train, y_train, X_test,
kernel_func=None, kernel_params=None)
acc_l, prec_l, rec_l, f1_l, cm_l = metrics_from_scratch(y_test, preds_lin)
sv_count_lin = sv_lin.shape[0]
alpha_rbf, b_rbf, sv_rbf = svm_train_dual_smo(X_train, y_train, C=C,
kernel_func=rbf_kernel, kernel_params={"gamma":gamma}, max_iter=1000)
preds_rbf, dec_rbf = svm_predict(alpha_rbf, b_rbf, X_train, y_train, X_test,
kernel_func=rbf_kernel, kernel_params={"gamma":gamma})
acc_r, prec_r, rec_r, f1_r, cm_r = metrics_from_scratch(y_test, preds_rbf)
sv_count_rbf = sv_rbf.shape[0]
[Link]({
"C": C,
"Linear": {"accuracy": acc_l, "precision": prec_l, "recall": rec_l, "f1": f1_l,
"support_vectors": sv_count_lin, "confusion_matrix": cm_l},
"RBF": {"accuracy": acc_r, "precision": prec_r, "recall": rec_r, "f1": f1_r,
"support_vectors": sv_count_rbf, "confusion_matrix": cm_r}
})

for res in results:

print("C =", res["C"])
print(" Linear -> acc: {:.4f}, prec: {:.4f}, rec: {:.4f}, f1: {:.4f}, sv: {}".format(
res["Linear"]["accuracy"], res["Linear"]["precision"], res["Linear"]["recall"],
res["Linear"]["f1"], res["Linear"]["support_vectors"]))
print(" Confusion Matrix:\n", res["Linear"]["confusion_matrix"])
print(" RBF -> acc: {:.4f}, prec: {:.4f}, rec: {:.4f}, f1: {:.4f}, sv: {}".format(
res["RBF"]["accuracy"], res["RBF"]["precision"], res["RBF"]["recall"], res["RBF"]["f1"],
res["RBF"]["support_vectors"]))
print(" Confusion Matrix:\n", res["RBF"]["confusion_matrix"])
print("-"*60)

feat_idx = [0,1]
X_2f = Xs[:, feat_idx]
X2_train, X2_test, y2_train, y2_test = shuffle_split(X_2f, y_mapped, test_size=0.2,
random_state=42)
alpha_lin2, b_lin2, sv_lin2 = svm_train_dual_smo(X2_train, y2_train, C=1.0,
kernel_func=None, kernel_params=None, max_iter=1000)
alpha_rbf2, b_rbf2, sv_rbf2 = svm_train_dual_smo(X2_train, y2_train, C=1.0,
kernel_func=rbf_kernel, kernel_params={"gamma":0.5}, max_iter=1000)

x_min, x_max = X2_train[:,0].min()-1, X2_train[:,0].max()+1

y_min, y_max = X2_train[:,1].min()-1, X2_train[:,1].max()+1
xx, yy = [Link]([Link](x_min, x_max, 300), [Link](y_min, y_max, 300))
grid = np.c_[[Link](), [Link]()]

pred_grid_lin, decg_lin = svm_predict(alpha_lin2, b_lin2, X2_train, y2_train, grid,

kernel_func=None, kernel_params=None)
Z_lin = pred_grid_lin.reshape([Link])
pred_grid_rbf, decg_rbf = svm_predict(alpha_rbf2, b_rbf2, X2_train, y2_train, grid,
kernel_func=rbf_kernel, kernel_params={"gamma":0.5})
Z_rbf = pred_grid_rbf.reshape([Link])

[Link](style="whitegrid")
[Link](figsize=(12,5))
[Link](1,2,1)
[Link](xx, yy, Z_lin, cmap="coolwarm", alpha=0.3)
[Link](X2_train[:,0], X2_train[:,1], c=y2_train, cmap="coolwarm", edgecolors="k")
[Link](X2_train[sv_lin2,0], X2_train[sv_lin2,1], s=100, facecolors='none', edgecolors='k')
[Link]("Linear Kernel Decision Boundary (features {} & {})".format(feat_idx[0], feat_idx[1]))
[Link]("Feature {}".format(feat_idx[0]))
[Link]("Feature {}".format(feat_idx[1]))
[Link](1,2,2)
[Link](xx, yy, Z_rbf, cmap="coolwarm", alpha=0.3)
[Link](X2_train[:,0], X2_train[:,1], c=y2_train, cmap="coolwarm", edgecolors="k")
[Link](X2_train[sv_rbf2,0], X2_train[sv_rbf2,1], s=100, facecolors='none', edgecolors='k')
[Link]("RBF Kernel Decision Boundary (features {} & {})".format(feat_idx[0], feat_idx[1]))
[Link]("Feature {}".format(feat_idx[0]))
[Link]("Feature {}".format(feat_idx[1]))
plt.tight_layout()
[Link]()
8. PCA with Variance Retention

Goal

Apply PCA to reduce dimensions of the Breast Cancer Wisconsin dataset and determine the
minimum number of components required to retain 90%, 95%, and 99% variance. Visualize
results, reconstruct data, and compute reconstruction error.

Dataset

Used: [Link].load_breast_cancer
Samples: 569
Features: 30 numeric
Target: 0 = Malignant, 1 = Benign

Method

1. Standardize all features using Z-score normalization (mean 0, std 1).

2. Fit PCA on the training set.
3. Plot cumulative explained variance and identify how many components reach 90%, 95%,
99%.
4. Reduce dataset to 2 components and visualize samples in 2-D.
5. For each threshold:
• Reduce test data to the selected components
• Reconstruct using inverse transform
• Compute reconstruction error (MSE)
6. Create summary table.

Results
• Cumulative variance plot
• PCA 2-D scatter
• Printed table with:

• Threshold
• Components needed
• Variance achieved
• Reconstruction MSE

Code

import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import load_breast_cancer
from [Link] import PCA
from [Link] import StandardScaler
from [Link] import mean_squared_error

data = load_breast_cancer()
X = [Link]
y = [Link]

scaler = StandardScaler()
Xs = scaler.fit_transform(X)

pca_full = PCA()
pca_full.fit(Xs)
cum_var = [Link](pca_full.explained_variance_ratio_)

thresholds = [0.90, 0.95, 0.99]

results = []

for t in thresholds:
k = [Link](cum_var >= t) + 1
pca_k = PCA(n_components=k)
Xk = pca_k.fit_transform(Xs)
Xr = pca_k.inverse_transform(Xk)
mse = mean_squared_error(Xs, Xr)
[Link]([t, k, cum_var[k-1], mse])

df = [Link](results, columns=["Threshold","Components","Variance
Achieved","Reconstruction MSE"])
print(df)

[Link](figsize=(10,5))
[Link](cum_var, marker='o')
[Link](0.90, color='red', linestyle='--')
[Link](0.95, color='green', linestyle='--')
[Link](0.99, color='purple', linestyle='--')
[Link]("Number of Components")
[Link]("Cumulative Explained Variance")
[Link]("Variance Retention Curve")
[Link](True)
[Link]()

pca2 = PCA(n_components=2)
X2 = pca2.fit_transform(Xs)

[Link](figsize=(8,6))
[Link](x=X2[:,0], y=X2[:,1], hue=y, palette="coolwarm")
[Link]("PCA (2 Components) Projection")
[Link]("PC1")
[Link]("PC2")
[Link]()
9. Unsupervised Learning

Objective

This lab explores two key unsupervised learning techniques.

Part 1 uses PCA for dimensionality reduction to visualize high-dimensional chemical wine data
in 2D space.
Part 2 applies K-Means clustering to segment mall customers based on income and spending
behavior.

Part 1: PCA for Wine Data Visualization

Goal

Convert 13-dimensional wine chemical features into a 2-dimensional representation to see if

natural groupings exist in the data.

Dataset

Wine Dataset (13 features)

Targets represent 3 wine cultivars, but are not used for computing PCA—only for coloring the
final plot.

Method

1. Load the dataset and separate X (features) and y (cultivar label).

2. Standardize X because PCA is variance-based and sensitive to scale.
3. Fit PCA with n_components = 2.
4. Retrieve the variance explained by PC1 and PC2.
5. Create a scatter plot of PC1 vs PC2, coloring points using the true labels to visually
inspect separation.

Results

The two-component PCA projection produces a 2D map where wines form visibly distinct
clusters corresponding closely to their cultivars.
Explained variance values (produced by the script) show the proportion of information preserved
by PC1, PC2, and their combined total.

Part 2: K-Means Customer Segmentation

Goal

Segment shopping mall customers into distinct behavior-based groups using their annual income
and spending score.

Dataset

Mall Customers Dataset

Features: Annual Income (k$), Spending Score (1–100)

Method

1. Use the Elbow Method for K = 1 to 10 to compute inertia values.

2. Identify the “elbow” point as the optimal number of clusters.
3. Apply K-Means using the selected K.
4. Visualize clusters on a 2D scatter plot with colors representing cluster labels.

Results

The elbow point (value shown by the plot) typically occurs at K = 5 for this dataset, indicating
five distinct customer groups.
Cluster visualization displays well-separated groups representing different spending behaviors.

Code

import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
from [Link] import load_wine
from [Link] import StandardScaler
from [Link] import PCA
from [Link] import KMeans

wine = load_wine()
X = [Link]
y = [Link]

scaler = StandardScaler()
Xs = scaler.fit_transform(X)

pca = PCA(n_components=2)
X_pca = pca.fit_transform(Xs)

var1 = pca.explained_variance_ratio_[0]
var2 = pca.explained_variance_ratio_[1]
total_var = var1 + var2
print("PC1 Variance:", var1)
print("PC2 Variance:", var2)
print("Total Variance:", total_var)

[Link](figsize=(8,6))
[Link](x=X_pca[:,0], y=X_pca[:,1], hue=y, palette="viridis")
[Link]("PC1")
[Link]("PC2")
[Link]("PCA of Wine Dataset")
[Link]()

mall = pd.read_csv("Mall_Customers.csv")
Xc = mall[["Annual Income (k$)", "Spending Score (1-100)"]].values

inertias = []
Ks = range(1,11)
for k in Ks:
km = KMeans(n_clusters=k, random_state=42)
[Link](Xc)
[Link](km.inertia_)

[Link](figsize=(8,6))
[Link](Ks, inertias, marker='o')
[Link]("Number of Clusters (K)")
[Link]("Inertia")
[Link]("Elbow Method")
[Link]()

optimal_k = int(input("Enter optimal K from elbow plot: "))

kmeans = KMeans(n_clusters=optimal_k, random_state=42)
labels = kmeans.fit_predict(Xc)

[Link](figsize=(8,6))
[Link](x=Xc[:,0], y=Xc[:,1], hue=labels, palette="rainbow")
[Link]("Annual Income (k$)")
[Link]("Spending Score")
[Link]("Customer Segments (K-Means)")
[Link]()

Machine Learning Techniques Lab Report
No ratings yet
Machine Learning Techniques Lab Report
53 pages
Logistic Regression with SGD Implementation
No ratings yet
Logistic Regression with SGD Implementation
6 pages
Logistic Regression with Gradient Descent
No ratings yet
Logistic Regression with Gradient Descent
11 pages
Machine Learning Models Report
No ratings yet
Machine Learning Models Report
38 pages
Gradient Descent Model Evaluation
No ratings yet
Gradient Descent Model Evaluation
9 pages
Exp 2
No ratings yet
Exp 2
8 pages
Logistic Regression from Scratch in Python
No ratings yet
Logistic Regression from Scratch in Python
4 pages
SGD vs BGD for Linear Regression
No ratings yet
SGD vs BGD for Linear Regression
5 pages
Training Linear Models in Python
No ratings yet
Training Linear Models in Python
24 pages
Logistic Regression Implementation
No ratings yet
Logistic Regression Implementation
12 pages
05 Exercise Stochastic Gradient Descent
No ratings yet
05 Exercise Stochastic Gradient Descent
15 pages
Gradient Descent for Linear Regression
No ratings yet
Gradient Descent for Linear Regression
8 pages
Linear and Logistic Regression Techniques
No ratings yet
Linear and Logistic Regression Techniques
33 pages
Gradient Descent Algorithm in Machine Learning - GeeksforGeeks
No ratings yet
Gradient Descent Algorithm in Machine Learning - GeeksforGeeks
14 pages
Gradient Descent in Classification
No ratings yet
Gradient Descent in Classification
5 pages
MSE and Model Evaluation with Sklearn
No ratings yet
MSE and Model Evaluation with Sklearn
7 pages
Perceptron Implementation with Datasets
No ratings yet
Perceptron Implementation with Datasets
89 pages
Linear Regression for SalePrice Prediction
No ratings yet
Linear Regression for SalePrice Prediction
2 pages
Gradient Descent Methods for Regression
No ratings yet
Gradient Descent Methods for Regression
9 pages
Deep Learning Practical Experiments Guide
No ratings yet
Deep Learning Practical Experiments Guide
15 pages
Battery Temperature Prediction Models
No ratings yet
Battery Temperature Prediction Models
6 pages
Soft Computing: Housing Price Prediction
No ratings yet
Soft Computing: Housing Price Prediction
7 pages
Python Gradient Descent Optimization
No ratings yet
Python Gradient Descent Optimization
29 pages
Linear & Logistic Regression Programs
No ratings yet
Linear & Logistic Regression Programs
17 pages
Stochastic Gradient Descent in Python
No ratings yet
Stochastic Gradient Descent in Python
3 pages
Linear Regression from Scratch Guide
No ratings yet
Linear Regression from Scratch Guide
20 pages
ImportError in Scikit-Learn Usage
No ratings yet
ImportError in Scikit-Learn Usage
35 pages
SGD Regression on Boston Housing Data
No ratings yet
SGD Regression on Boston Housing Data
8 pages
Neural Network Binary Classifier Code
No ratings yet
Neural Network Binary Classifier Code
5 pages
Data Analysis and Regression Models
No ratings yet
Data Analysis and Regression Models
9 pages
Logistic Regression Model Tutorial
No ratings yet
Logistic Regression Model Tutorial
11 pages
SVM Banknote Authentication Analysis
No ratings yet
SVM Banknote Authentication Analysis
9 pages
Data Analysis and Regression Models
No ratings yet
Data Analysis and Regression Models
14 pages
Gradient Boosting for Electricity Theft Detection
No ratings yet
Gradient Boosting for Electricity Theft Detection
10 pages
Candidate Elimination and Neural Networks
No ratings yet
Candidate Elimination and Neural Networks
6 pages
Linear Regression Implementation Guide
100% (1)
Linear Regression Implementation Guide
45 pages
Machine Learning Algorithms in Python
No ratings yet
Machine Learning Algorithms in Python
8 pages
Implementing Neural Network Models in Python
No ratings yet
Implementing Neural Network Models in Python
16 pages
Aiml Program 1
No ratings yet
Aiml Program 1
14 pages
Usr
No ratings yet
Usr
348 pages
MLTEE T5 Assignment Pseudo Code Analysis
No ratings yet
MLTEE T5 Assignment Pseudo Code Analysis
10 pages
Deep Learning Lab: Classification & GD
No ratings yet
Deep Learning Lab: Classification & GD
5 pages
Regression Training with Neural Network
No ratings yet
Regression Training with Neural Network
3 pages
Generative AI Practical File: Python Code
No ratings yet
Generative AI Practical File: Python Code
28 pages
Intro to Artificial Neural Networks
No ratings yet
Intro to Artificial Neural Networks
14 pages
Logistic Regression for MNIST Digits
No ratings yet
Logistic Regression for MNIST Digits
19 pages
Boston Housing Price Prediction Analysis
No ratings yet
Boston Housing Price Prediction Analysis
2 pages
Logistic Regression with Gradient Descent
No ratings yet
Logistic Regression with Gradient Descent
3 pages
Logistic Regression with Gradient Descent
No ratings yet
Logistic Regression with Gradient Descent
3 pages
Data Preprocessing Techniques in Python
No ratings yet
Data Preprocessing Techniques in Python
44 pages
Pattern Recognition Lab
No ratings yet
Pattern Recognition Lab
16 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
19 pages
KerasRegressor for Path Loss Prediction
No ratings yet
KerasRegressor for Path Loss Prediction
2 pages
AML Module3
No ratings yet
AML Module3
15 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
43 pages
Linear Regression and Classification Models
No ratings yet
Linear Regression and Classification Models
13 pages
Mean Squared Error in Regression Models
No ratings yet
Mean Squared Error in Regression Models
22 pages

Machine Learning Techniques Lab File

Uploaded by

Machine Learning Techniques Lab File

Uploaded by

Atal Bihari Vajpayee

Indian Institute of Information Technology

MACHINE LEARNING TECHNIQUES

SUBMITTED BY:- SUBMITTED TO:-

RADHIKA GUPTA Prof. ADITYA TRIVEDI

ROLL No: 2025MIT-009

[Link] (I.T) 1st sem DEPARTMENT OF I.T

1 Polynomial Curve Fitting using 21/08/2025

2 Binary Classification using 28/08/2025

3 Binary Logistic Regression: 04/09/2025

4 Multiclass Logistic Regression: 11/09/2025

5 Logistic Regression using 09/10/2025

6 Personalized Book 16/10/2025

7 Kernelized SVM Classifier 30/10/2025

8 PCA with Variance Retention 06/11/2025

9 Unsupervised Learning – PCA 13/11/2025

• The target function is:

• Training error goes down as polynomial degree increases.

sigmas = {"A": 0.05, "B": 0.01}

x_test = [Link]([Link](0, 1, 100), 2)

def design_matrix(x, M):

def predict(w, X):

def rmse(y_true, y_pred):

def compute_gradient(w, X, t):

def train_bgd(X, t, lr=1e-3, max_epochs=5000, tol=1e-8):

def train_sgd(X, t, lr=1e-4, max_epochs=10000, tol=1e-8):

M_values = range(0, 10)

for ds_key, ds in [Link]():

for ds_key, ds in [Link]():

[Link](x_test, t_test_true, "k--", label="True sin(2πx)")

σ(x) = 1 / (1 + e-x)), z = wTx+b

• The cost function used is Binary Cross-Entropy (Log Loss):

J(w) = -(1/N) Σ [y * log(p) + (1 - y) * log(1 - p)]

• Optimization is done with Gradient Descent:

from [Link] import files

X = (X - [Link](axis=0)) / ([Link](axis=0) + 1e-9)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

def compute_cost(y_true, y_pred, w):

def train_logistic_regression(X, y, lr=0.01, epochs=1000):

dw = (1/m) * [Link](X.T, (y_pred - y))

def predict(X, w, b, threshold=0.5):

w, b, costs = train_logistic_regression(X_train, y_train, lr=0.1, epochs=2000)

def confusion_matrix(y_true, y_pred):

def compute_metrics(y_true, y_pred):

o Loss function: Negative Log Likelihood (Binary Cross Entropy).

• BGD converges slowly but steadily.

print("Dataset shape:", [Link], [Link])

X = (X - [Link](axis=0)) / ([Link](axis=0) + 1e-9)

X_train, y_train = X[train_idx], y[train_idx]

def compute_loss(y, y_hat, w, l2=0.0):

def gradients(X, y, y_hat, w, l2=0.0):

def hessian(X, y_hat, l2=0.0):

def predict(X, w, b, threshold=0.5):

def train_bgd(X, y, lr=0.01, epochs=1000, l2=0.0):

def train_sgd(X, y, lr=0.01, epochs=100, batch_size=32, l2=0.0):

def train_newton(X, y, epochs=20, l2=0.0):

w_bgd, b_bgd, costs_bgd = train_bgd(X_train, y_train, lr=0.1, epochs=500)

def compute_metrics(y_true, y_pred):

return accuracy, precision, recall, f1, (TP,TN,FP,FN)

y_pred_test, probs_test = predict(X_test, w_newton, b_newton)

print("Confusion Matrix (TP,TN,FP,FN):", cm_vals)

To implement and compare two approaches for multiclass classification:

1. Softmax Logistic Regression (single model for all classes)

Dataset used: Iris dataset (150 samples, 3 classes, 4 features).

Then applies softmax:

o Loss: Cross-Entropy Loss.

• Softmax regression directly optimizes multiclass probabilities → smooth decision

def one_hot(y, n_classes=None):

def cross_entropy_loss(probs, Y_onehot, reg=0.0, W=None):

self.W = [Link]((n_features, n_classes))

def fit(self, X, y, X_val=None, y_val=None):

def predict_proba(self, X):

def predict(self, X):

def fit(self, X, y):

if [Link] and (it % max(1, self.max_iter // 10) == 0):

def predict_proba(self, X):