0% found this document useful (0 votes)

11 views4 pages

Machine Learning Lab Assignment Guide

Uploaded by

Kanik Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views4 pages

Machine Learning Lab Assignment Guide

Uploaded by

Kanik Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine Learning Lab Assignment

Author : Lab Author Date : 2024-11-25

Instructions

This lab assignment consists of multiple tasks aimed at applying and understanding
concepts from the provided slides. Complete the tasks using Python and libraries such
as NumPy, pandas, matplotlib, and scikit-learn. Include plots, metrics, and
explanations for your findings. Submit your code and a report summarizing your
results.

Task 1: Debugging Regularized Linear Regression

Objective: Explore the effects of regularization in linear regression and debug common
issues.

Steps: 1. Dataset Preparation: - Create a synthetic dataset with features X and target y
using a polynomial function (e.g., y = 3x^2 + 2x + 1 + ε, where ε is Gaussian noise). -
Split the dataset into 70% training and 30% test sets.

1. Regularized Linear Regression:

◦ Implement or use Ridge Regression (from scikit-learn) to train the model.

◦ Experiment with different values of the regularization parameter λ (e.g., λ

= 0, 0.1, 1, 10, 100).

◦ Plot the training error and test error as a function of λ.

2. Analysis:

◦ Identify underfitting and overfitting regions from the plot.

◦ Discuss how λ affects the weights w_j and the model’s ability to generalize.

Task 2: Training, Validation, and Test Set Splits

Objective: Understand the importance of data splits in model evaluation.

Steps: 1. Dataset Splitting: - Use the dataset provided in the slides: Size (sq ft): [2104,
1600, 2400, 1416, 3000, 1985, 1534, 1427, 1380, 1494] Price (k$): [400, 330, 369,
232, 540, 300, 315, 199, 212, 243] - Split the dataset into: - 60% training set - 20%
validation set - 20% test set - Display the resulting splits.

1. Linear Regression Model:

◦ Train a linear regression model on the training set.

◦ Compute the Mean Squared Error (MSE) on the training, validation, and test
sets.

◦ Compare the errors across the three subsets.

2. Analysis:

◦ Discuss why the validation set is critical for model selection.

◦ Explain the significance of keeping the test set separate from training and
validation.

Task 3: Model Selection with Polynomial Regression

Objective: Select the best polynomial model for a given dataset using validation error.

Steps: 1. Dataset Preparation: - Use the same dataset from Task 2 or generate a
synthetic dataset with non-linear patterns.

1. Polynomial Regression:

◦ Train polynomial regression models of degree d = 1, 2, ..., 10.

◦ Compute the training error and validation error for each degree d.

2. Visualization:

◦ Plot the training error and validation error as a function of d.

◦ Identify the degree that minimizes the validation error.

3. Analysis:

◦ Discuss the concepts of underfitting and overfitting based on the plot.

◦ Justify the importance of validation error in choosing the polynomial

degree.

Task 4: Effect of Regularization on Bias-Variance Tradeoff

Objective: Examine the impact of regularization on bias and variance.

Steps: 1. Synthetic Dataset: - Generate a dataset with 100 examples and a true
relationship y = 2x + 3 + ε, where ε is Gaussian noise.

1. Ridge Regression:

◦ Train Ridge Regression models with λ = 0, 0.01, 0.1, 1, 10, 100.

◦ Compute the training error, validation error, and weights w_j for each λ.

2. Analysis:

◦ Plot the errors as a function of λ.

◦ Plot the magnitude of weights (|w_j|) as a function of λ.

◦ Discuss how increasing λ affects the bias-variance tradeoff.

Task 5: Neural Network Model Selection

Objective: Choose the best neural network architecture based on cross-validation error.

Steps: 1. Dataset Preparation: - Create a synthetic dataset with multiple input features
and a non-linear target relationship.

1. Neural Network Architectures:

◦ Define three neural network architectures:

▪ Architecture 1: 25 input units → 15 hidden units → 1 output unit

▪ Architecture 2: 20 input units → 12, 12 hidden units → 1 output unit

▪ Architecture 3: 32 input units → 16, 8, 4 hidden units → 1 output unit

2. Training and Validation:

◦ Train each architecture on the training set and evaluate on the validation
set.

◦ Compute the training and validation errors for each architecture.

3. Analysis:

◦ Select the architecture with the lowest validation error.

◦ Discuss why it is important to choose the architecture based on validation

error and not training error.

Task 6: Real-World Debugging

Objective: Debug and improve a poorly performing machine learning model.

Steps: 1. Scenario: - You are given a model with the following errors: - Training Error: 10
- Validation Error: 40 - Test Error: 42 - The large gap between training and validation
error indicates overfitting.

1. Debugging:

◦ Suggest three corrective actions to address overfitting (e.g., regularization,

adding more data, reducing model complexity).
2. Implementation:

◦ Implement one of these actions (e.g., increase λ) and re-train the model.

◦ Compute the new training, validation, and test errors.

3. Analysis:

◦ Compare the errors before and after applying the corrective action.

◦ Discuss the effectiveness of your approach.

Submission Instructions

1. Submit your Python code in a Jupyter Notebook or Python script.

2. Include a PDF report summarizing:

◦ Key results (tables, plots, metrics, etc.).

◦ Explanations and analyses for each task.

3. Ensure all plots are labeled and interpretations are included.

4. Submit your work by the deadline.

Common questions

Regularization influences the bias-variance tradeoff by introducing a penalty term to the loss function, which shrinks the magnitude of the coefficients. This reduces variance but increases the bias of the model. A mild amount of regularization helps prevent overfitting by ensuring that the model isn't overly complex, thus reducing variance, but at the expense of increasing bias. As regularization strength increases (higher λ), the model becomes simpler, reducing variance further while increasing bias, potentially leading to underfitting if over-applied . The key is to find a regularization level that minimizes total error by achieving the optimal bias-variance balance .

Tuning the regularization parameter λ is significant in the context of the bias-variance tradeoff because it directly controls the regularization strength, impacting model flexibility and thus generalization capability. A very small λ leads to a model with low bias but high variance, apt to overfit the training data. Conversely, a large λ simplifies the model, increasing bias while reducing variance, potentially resulting in underfitting. By carefully selecting an optimal λ, one can balance these effects to minimize total error, achieving a good bias-variance tradeoff that enhances the model's ability to perform well on unseen data . Proper tuning of λ is thus essential for aligning the model's capacity with the complexity of the data .

Using validation error to choose the polynomial degree is crucial because it provides an unbiased assessment of how well different polynomial degrees generalize to unseen data. The training error may decrease with increasing polynomial degree, but this often results in a model that fits the training data too closely, capturing noise and leading to overfitting. Validation error allows for identifying a degree where the model performs well not only on training data but also on unseen data, providing a balance between underfitting and overfitting . The optimal polynomial degree is typically the one that minimizes the validation error, facilitating the selection of a model with good generalization capabilities .

Underfitting and overfitting in regression models can be identified by analyzing the training and validation error metrics. Underfitting is indicated by high errors on both the training and validation sets; the model is too simple to capture the underlying data patterns. Overfitting is characterized by a discrepancy where the training error is low, but the validation error is high, indicating that the model captures noise from the training data rather than a generalizable pattern . Plots of training vs. test/validation error across different model complexities or regularization strengths can visually highlight these phenomena, showing an optimal point with minimal validation error that indicates neither underfitting nor overfitting .

Cross-validation plays a crucial role in model selection by providing a comprehensive evaluation mechanism that reduces the risk of overfitting to a single train-test split. It involves dividing data into k subsets (folds), training the model on k-1 folds, and validating it on the remaining fold. This process rotates until each fold has been used for validation, averaging the results to get a robust performance estimate. By doing so, cross-validation helps ensure the model's chosen parameters and overall architecture generalize well across different data subsets, leading to more reliable performance evaluations . It prevents reliance on a potentially unrepresentative data split and accounts for variability in model performance across datasets .

Varying the regularization parameter λ in Ridge Regression affects the model's generalization ability by modifying the trade-off between bias and variance. Specifically, a very small λ or λ = 0 may lead to overfitting, as the model may have low bias but high variance, fitting the training data too closely and capturing noise. Conversely, a very large λ can lead to underfitting, where the model has high bias and low variance, oversimplifying the data and missing relevant patterns . Proper tuning of λ can help to achieve an optimal balance, reducing both underfitting and overfitting by controlling the magnitude of the coefficients w_j, thereby improving the model's ability to generalize to unseen data .

Separating the test set from training and validation sets is important because it ensures an unbiased evaluation of the model's performance. The test set represents new and unseen data, reflecting real-world performance. During model development, the training set is used to learn model parameters, while the validation set is for tuning hyperparameters and model selection, both aiming to improve the model. The test set, however, should remain untouched until the final evaluation to provide an independent measure of a model's ability to generalize, ensuring the reported performance is not overestimated due to information leakage from training/validation phases .

Different neural network architectures, each with varying layers and units, impact model performance by altering the model's capacity to learn from data. Larger architectures can capture more complex patterns but are also more prone to overfitting. Smaller architectures may be too simplistic and fail to capture the data's underlying complexity. Using validation error over training error for selecting the best architecture is preferred because the training error only indicates how well a model fits the seen data, whereas validation error reflects the model's ability to generalize to new, unseen data . Therefore, the architecture that minimizes the validation error is typically chosen, as it is more likely to maintain high performance on future data .

To address overfitting in a machine learning model, several corrective actions can be taken: 1) Increase regularization, such as adjusting the parameter λ in models like Ridge Regression to penalize large coefficients and simplify the model. This can effectively reduce variance and mitigate overfitting if appropriately balanced . 2) Add more training data, which can provide a more representative sample of the input space and help the model learn generalizable patterns instead of noise . 3) Reduce model complexity, such as decreasing polynomial degrees or neural network layers/units, ensuring the model is not too complex relative to the data. This can help align the model’s capacity with the data's complexity, potentially closing the gap between training and validation errors . The effectiveness of these actions depends on the model, data characteristics, and extent of overfitting initially observed.

Maintaining a separate validation set is critical because it helps in the model selection process without contaminating the test set, which is reserved to evaluate the final model's performance on unseen data. The validation set allows for fine-tuning hyperparameters and choosing models based on performance metrics without bias, ensuring the test set results remain an unbiased assessment of how the model will perform in real-world scenarios . By using the validation set to guide decisions such as model complexity and hyperparameter tuning, it prevents overfitting to the test data and provides a more realistic prediction of the model's performance .

Machine Learning Model Evaluation Guide
No ratings yet
Machine Learning Model Evaluation Guide
437 pages
Machine Learning with MATLAB Overview
No ratings yet
Machine Learning with MATLAB Overview
30 pages
Logistic Regression Lab Exercise
No ratings yet
Logistic Regression Lab Exercise
9 pages
Supervised Learning & Regularization in Python
No ratings yet
Supervised Learning & Regularization in Python
29 pages
Model Evaluation and Selection Lab Guide
No ratings yet
Model Evaluation and Selection Lab Guide
21 pages
Model Evaluation and Selection Lab Guide
No ratings yet
Model Evaluation and Selection Lab Guide
21 pages
Practical Machine Learning Challenges
No ratings yet
Practical Machine Learning Challenges
40 pages
Srishti Gupta 2503310004 - Assignment - 7
No ratings yet
Srishti Gupta 2503310004 - Assignment - 7
14 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Logistic Regression Lab Exercise
No ratings yet
Logistic Regression Lab Exercise
5 pages
Debugging Machine Learning Models
No ratings yet
Debugging Machine Learning Models
32 pages
8 - Hypothesis Evaluation
No ratings yet
8 - Hypothesis Evaluation
30 pages
CS5691: Programming Assignment 1
No ratings yet
CS5691: Programming Assignment 1
2 pages
Applying Machine Learning: Key Advice
No ratings yet
Applying Machine Learning: Key Advice
25 pages
Regularization in Linear Regression Lab
No ratings yet
Regularization in Linear Regression Lab
5 pages
Linear & Logistic Regression Assignment
No ratings yet
Linear & Logistic Regression Assignment
3 pages
Machine Learning Application Advice
No ratings yet
Machine Learning Application Advice
30 pages
( ( - ENGN601 - ) Introduction To AI) 1 - Lecture 5 (Lecture Slides)
No ratings yet
( ( - ENGN601 - ) Introduction To AI) 1 - Lecture 5 (Lecture Slides)
41 pages
Logistic Regression Lab in Machine Learning
No ratings yet
Logistic Regression Lab in Machine Learning
24 pages
TensorFlow Neural Network Assignment
No ratings yet
TensorFlow Neural Network Assignment
3 pages
ML Assignment 02
No ratings yet
ML Assignment 02
4 pages
Linear Regression II Lab: Regularization
No ratings yet
Linear Regression II Lab: Regularization
10 pages
Auto MPG Dataset Regression Analysis
No ratings yet
Auto MPG Dataset Regression Analysis
11 pages
Linear Regression Lab: Feature Scaling & Training
No ratings yet
Linear Regression Lab: Feature Scaling & Training
8 pages
Linear Regression II: Regularization Lab
No ratings yet
Linear Regression II: Regularization Lab
8 pages
Machine Learning Project: Classification & Regression
No ratings yet
Machine Learning Project: Classification & Regression
2 pages
Lab 5 Regression Using ANN
No ratings yet
Lab 5 Regression Using ANN
6 pages
Machine Learning Assignment 1 Guidelines
No ratings yet
Machine Learning Assignment 1 Guidelines
2 pages
Inconsistencies in R² Scores Explained
No ratings yet
Inconsistencies in R² Scores Explained
2 pages
Lab 06 Labreport
No ratings yet
Lab 06 Labreport
15 pages
Diagnosing Bias and Variance Errors
No ratings yet
Diagnosing Bias and Variance Errors
11 pages
First Cut Draft: ML Challenges Overview
No ratings yet
First Cut Draft: ML Challenges Overview
11 pages
KVR Aml Cs4 13 April 2025
No ratings yet
KVR Aml Cs4 13 April 2025
38 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Machine Learning Application Advice
No ratings yet
Machine Learning Application Advice
8 pages
AI ML Assignment 2
No ratings yet
AI ML Assignment 2
4 pages
Sample Experiment Upload Format
No ratings yet
Sample Experiment Upload Format
4 pages
Regression Techniques in Machine Learning
No ratings yet
Regression Techniques in Machine Learning
56 pages
EE769 2026 A1 Tentative
No ratings yet
EE769 2026 A1 Tentative
4 pages
AI Capstone Project Guidelines
No ratings yet
AI Capstone Project Guidelines
5 pages
Regression Assignment
No ratings yet
Regression Assignment
3 pages
Machine Learning System Design Overview
100% (3)
Machine Learning System Design Overview
84 pages
Machine Learning Model Evaluation Techniques
No ratings yet
Machine Learning Model Evaluation Techniques
9 pages
Machine Learning Model Evaluation Techniques
No ratings yet
Machine Learning Model Evaluation Techniques
57 pages
Multiple Regression Methods Analysis
No ratings yet
Multiple Regression Methods Analysis
2 pages
Cross-Validation in Model Evaluation
No ratings yet
Cross-Validation in Model Evaluation
63 pages
Regression Analysis Project Overview
No ratings yet
Regression Analysis Project Overview
3 pages
Machine Learning Debugging Techniques
No ratings yet
Machine Learning Debugging Techniques
45 pages
Supervised Machine Learning Overview
No ratings yet
Supervised Machine Learning Overview
3 pages
Evaluating Machine Learning Algorithms
100% (2)
Evaluating Machine Learning Algorithms
42 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
18 pages
CIE3 Problem Statement
No ratings yet
CIE3 Problem Statement
4 pages
ML Project
No ratings yet
ML Project
3 pages
Machine Learning Overfitting Solutions
No ratings yet
Machine Learning Overfitting Solutions
32 pages
Machine Learning Challenges & Solutions
No ratings yet
Machine Learning Challenges & Solutions
26 pages
Supervised Machine Learning: Regression Insights
No ratings yet
Supervised Machine Learning: Regression Insights
11 pages
Advanced Machine Learning Techniques
No ratings yet
Advanced Machine Learning Techniques
48 pages
Machine Learning Lab Manual - Python
No ratings yet
Machine Learning Lab Manual - Python
45 pages
MIMO Equalization Techniques Overview
No ratings yet
MIMO Equalization Techniques Overview
20 pages
Hands-On Predictive Modeling with R
No ratings yet
Hands-On Predictive Modeling with R
5 pages
Introduction to Computational Journalism
100% (1)
Introduction to Computational Journalism
58 pages
MTH202 Linear Transformations Guide
No ratings yet
MTH202 Linear Transformations Guide
32 pages
20th World Congress on Automatic Control
No ratings yet
20th World Congress on Automatic Control
6 pages
Dynamics in Modern Robotics Explained
No ratings yet
Dynamics in Modern Robotics Explained
15 pages
Stochastic Control and Communication Course
No ratings yet
Stochastic Control and Communication Course
4 pages
Proto-RL: Efficient Representation Learning
No ratings yet
Proto-RL: Efficient Representation Learning
21 pages
ID3 Algorithm: Decision Tree Basics
No ratings yet
ID3 Algorithm: Decision Tree Basics
28 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
91 pages
Differential Equations Practice Problems
No ratings yet
Differential Equations Practice Problems
1 page
Economics Sem 4 Question Paper PDF
No ratings yet
Economics Sem 4 Question Paper PDF
65 pages
S-DES Implementation in Python
No ratings yet
S-DES Implementation in Python
7 pages
Floating Point Representation Overview
No ratings yet
Floating Point Representation Overview
25 pages
Chemical Potential in Ideal Gases
No ratings yet
Chemical Potential in Ideal Gases
3 pages
Time History Analysis in Earthquake Engineering
No ratings yet
Time History Analysis in Earthquake Engineering
4 pages
2396 Predictive Failure Model For Conveyor Belt Systems
No ratings yet
2396 Predictive Failure Model For Conveyor Belt Systems
6 pages
Mine Hoist Load Monitoring via VMD
No ratings yet
Mine Hoist Load Monitoring via VMD
15 pages
Machine Learning For Time Series Forecasting With Python Francesca Lazzeri Online PDF
100% (8)
Machine Learning For Time Series Forecasting With Python Francesca Lazzeri Online PDF
236 pages
Optimal Production via Linear Programming
No ratings yet
Optimal Production via Linear Programming
43 pages
2D Finite Element Heat Transfer Analysis
No ratings yet
2D Finite Element Heat Transfer Analysis
27 pages
DAA Unit 1: Algorithm Basics & GCD
No ratings yet
DAA Unit 1: Algorithm Basics & GCD
33 pages
Formulating Linear Programming Models
No ratings yet
Formulating Linear Programming Models
19 pages
Ninth Order Boundary Value Problems Solutions
No ratings yet
Ninth Order Boundary Value Problems Solutions
16 pages
Data Visualization for AIML with Matplotlib
No ratings yet
Data Visualization for AIML with Matplotlib
16 pages
MAST20029 Engineering Math Exam 2024
No ratings yet
MAST20029 Engineering Math Exam 2024
13 pages
AES Status Report
No ratings yet
AES Status Report
25 pages
Multi-Peak Fitting and FFT Analysis
No ratings yet
Multi-Peak Fitting and FFT Analysis
1 page
Understanding Nonlinear Activation Functions
No ratings yet
Understanding Nonlinear Activation Functions
41 pages

Machine Learning Lab Assignment Guide

Uploaded by

Machine Learning Lab Assignment Guide

Uploaded by

Machine Learning Lab Assignment

Author : Lab Author Date : 2024-11-25

Task 1: Debugging Regularized Linear Regression

1. Regularized Linear Regression:

◦ Implement or use Ridge Regression (from scikit-learn) to train the model.

◦ Experiment with different values of the regularization parameter λ (e.g., λ

◦ Plot the training error and test error as a function of λ.

◦ Identify underfitting and overfitting regions from the plot.

Task 2: Training, Validation, and Test Set Splits

Objective: Understand the importance of data splits in model evaluation.

1. Linear Regression Model:

◦ Train a linear regression model on the training set.

◦ Compare the errors across the three subsets.

◦ Discuss why the validation set is critical for model selection.

Task 3: Model Selection with Polynomial Regression

◦ Train polynomial regression models of degree d = 1, 2, ..., 10.

◦ Plot the training error and validation error as a function of d.

◦ Identify the degree that minimizes the validation error.

◦ Discuss the concepts of underfitting and overfitting based on the plot.

◦ Justify the importance of validation error in choosing the polynomial

Task 4: Effect of Regularization on Bias-Variance Tradeoff

Objective: Examine the impact of regularization on bias and variance.

◦ Train Ridge Regression models with λ = 0, 0.01, 0.1, 1, 10, 100.

◦ Plot the errors as a function of λ.

◦ Plot the magnitude of weights (|w_j|) as a function of λ.

◦ Discuss how increasing λ affects the bias-variance tradeoff.

Task 5: Neural Network Model Selection

1. Neural Network Architectures:

◦ Define three neural network architectures:

▪ Architecture 2: 20 input units → 12, 12 hidden units → 1 output unit

▪ Architecture 3: 32 input units → 16, 8, 4 hidden units → 1 output unit

2. Training and Validation:

◦ Compute the training and validation errors for each architecture.

◦ Select the architecture with the lowest validation error.

◦ Discuss why it is important to choose the architecture based on validation

Task 6: Real-World Debugging

Objective: Debug and improve a poorly performing machine learning model.

◦ Suggest three corrective actions to address overfitting (e.g., regularization,

◦ Compute the new training, validation, and test errors.

◦ Discuss the effectiveness of your approach.

1. Submit your Python code in a Jupyter Notebook or Python script.

2. Include a PDF report summarizing:

◦ Explanations and analyses for each task.

3. Ensure all plots are labeled and interpretations are included.

4. Submit your work by the deadline.

Common questions

How does regularization influence the bias-variance tradeoff in machine learning models?

How does regularization influence the bias-variance tradeoff in machine learning models?

Explain the significance of tuning the regularization parameter λ specifically in the context of the bias-variance tradeoff.

Explain the significance of tuning the regularization parameter λ specifically in the context of the bias-variance tradeoff.

Why is using validation error crucial in choosing the polynomial degree for regression models?

Why is using validation error crucial in choosing the polynomial degree for regression models?

How do you identify underfitting and overfitting based on error metrics in regression models?

How do you identify underfitting and overfitting based on error metrics in regression models?

What role does cross-validation play in model selection and how does it contribute to reliable model performance evaluation?

What role does cross-validation play in model selection and how does it contribute to reliable model performance evaluation?

What are the implications of varying the regularization parameter λ in Ridge Regression on the model's ability to generalize?

What are the implications of varying the regularization parameter λ in Ridge Regression on the model's ability to generalize?

Discuss the importance of separating the test set from training and validation sets during machine learning experiments.

Discuss the importance of separating the test set from training and validation sets during machine learning experiments.

Explain the impact of different neural network architectures on model performance and why validation error is preferred over training error for selecting the best architecture.

Explain the impact of different neural network architectures on model performance and why validation error is preferred over training error for selecting the best architecture.

In the context of debugging a machine learning model, what corrective actions can address overfitting, and how effective might they be?

In the context of debugging a machine learning model, what corrective actions can address overfitting, and how effective might they be?

Why is it critical to maintain a separate validation set in addition to training and test sets during model evaluation?

Why is it critical to maintain a separate validation set in addition to training and test sets during model evaluation?

You might also like