0% found this document useful (0 votes)
48 views2 pages

Machine Learning Assignment 3 Guide

This document outlines the instructions for Assignment 3 of a machine learning course. It includes 3 questions - the first two are theoretical and require justifying strategies for handling overfitting and underfitting in CNN models, the third requires implementing a CNN from scratch on MNIST data and extracting features for an SVM on CIFAR-10 data.

Uploaded by

sonu23144
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views2 pages

Machine Learning Assignment 3 Guide

This document outlines the instructions for Assignment 3 of a machine learning course. It includes 3 questions - the first two are theoretical and require justifying strategies for handling overfitting and underfitting in CNN models, the third requires implementing a CNN from scratch on MNIST data and extracting features for an SVM on CIFAR-10 data.

Uploaded by

sonu23144
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CSE343/CSE543/ECE363/ECE563: Machine Learning

Winter 2024

Assignment-3 (24 points)

Release: Mar 19, 2024 (Tuesday) Submission: 11:00 am Mar 26, 2024 (Tuesday)

Instructions
• Institute Plagiarism Policy Applicable. Both programming and theoretical questions will be subjected
to strict plagiarism check.
• This assignment should be attempted individually. All questions are compulsory.
• Theory [T]: For theory questions, only hand-written solutions are acceptable. Attempt each question on a
different sheet & staple them together (for the ease of checking). Do not start a new question at the back of
the previous one. Do not forget to mention page number (bottom centre) and your credentials (bottom right)
on each sheet. It must be submitted in Assignment submission box kept in class during class time.
• Programming [P]: For programming questions, the use of python programming language is allowed only.
You must submit a single .py file named as A3 [Link]. Make sure the submission is self-complete &
replicable i.e., you are able to reproduce your results with the submitted files only. Use random seed wherever
applicable to retain reproducability.
• [Link] : Create a .pdf report of programming questions that contains your applied approach, pre-
processing, assumptions, analysis, visualizations, etc.. Anything not in the report will not be evaluated. Al-
ternatively, a well-documented .ipynb file (in addition to a single .py file mentioned in the previous bullet) with
answers to all the questions may be submitted as a report. The report must be named as A3 RollNo [Link]
or A3 RollNo [Link].
• File Submission: Submit a .zip named A3 [Link] (e.g., A3 [Link]) file containing the report and
code files.
• Submission Policy: Turn-in your submission as early as possible to avoid late submissions. In case of
multiple sibmissions, the latest submission will be evaluated. Late submissions will not be evaluated
and hence will be awarded zero marks.
• Clarifications: Symbols have their usual meaning. Assume the missing information & mention it in the
report. You are allowed to use any machine learning library until exclusively mentioned in the question that
it is supposed to be done from scratch. You can always use basic python libraries such as numpy, pandas,
and matplotlib, unless specified otherwise. Use Google Classroom for any queries. In order to keep it fair for
all, no email queries will be entertained. You may attend office/TA hours for personal resolutions. Start your
assignment early. No queries will be answered in Google Classroom comments during the last time.
• Compliance: The questions in this assignment are structured to meet the Course Outcomes CO1, CO2,
CO3, and CO4, as described in the course directory.

• There could be multiple ways to approach a question. Please justify your answers mathematically in the
theory questions, and via commented text in the programming questions appropriately. Questions without
justification will get zero marks.
1. Overfitting/Underfitting (2+2=4 points)
(a) [T ∥ CO1, CO2, CO4] Overfitting - You are building a convolutional neural network model for
classification. How will you verify if your model is overfitting or not? If overfitting, how will you handle
it? List at least 6 different ways of handling overfitting and justify each of these strategies.
(b) [T ∥ CO1, CO2, CO4] Underfitting - You are building a convolutional neural network model for
classification. How will you verify if your model is underfitting or not? If underfitting, how will you
handle it? List at least 6 different ways of handling underfitting and justify each of these strategies.
2. [P ∥ CO2, CO3, CO4] Implementation of CNN from scratch (8 points)

Dataset: MNIST Handwritten Digit Recognition

Dataset is already divided in train and test sets.

• Implement a CNN from scratch (without using a custom-designed CNN model like AlexNet etc.) for the
MNIST handwritten digit recognition task. Your implementation should include convolutional layers,
pooling layers, and fully connected layers.
• Split the dataset into training and validation. The test set is already given; do not alter that. Generate
a grid of images displaying 5 random images of each class. Ensure that the images are displayed in the
original color palette.
• Implement data augmentation techniques like rotation, scaling, and translation to improve model per-
formance.
• Train your CNN model on the MNIST dataset and report the training and validation accuracy and loss
curves.
• Experiment with different hyper-parameters (e.g., learning rate, batch size, and many more) and analyze
their impact on the model’s performance.
• List the model’s summary with the number of trainable parameters.
• Demonstrate the concepts of overfitting and underfitting with different sized architectures and the meth-
ods to overcome them.

3. [P ∥ CO2, CO3, CO4] Learning latent features using CNN and classifying using SVM (12 points)

Dataset: CIFAR-10 Image Classification

• Download the CIFAR-10 python version. Preprocess the CIFAR-10 dataset and split it into training,
validation, and test sets.
• Implement data augmentation techniques like horizontal flipping, random cropping, etc.
• Implement a CNN model for the CIFAR-10 image classification task. Your model should include convo-
lutional layers, pooling layers, and fully connected layers.
• Train your CNN model on the CIFAR-10 dataset and report the training and validation accuracy and
loss curves. Experiment with different architectures, optimizers, and learning rate schedules to improve
model performance.
• Use the latent features learned using the CNN and feed them to an SVM classifier for the CIFAR-10
image classification task. Experiment with different kernel functions and hyper-parameters, and compare
the performance of your SVM model with the CNN model.
• Analyze the performance of your CNN and SVM models on the CIFAR-10 test set. Discuss the strengths
and weaknesses of each model and suggest potential improvements or extensions.

Common questions

Powered by AI

Learning curves, which plot the model's performance on training and validation datasets over epochs, can diagnose issues like overfitting and underfitting. In cases of overfitting, the training accuracy will be high while validation accuracy deteriorates, suggesting the model is too complex and memorizes the training data . Underfitting is indicated by both training and validation accuracies being low, pointing to a model that lacks sufficient capacity to capture the underlying data patterns. Monitoring these curves helps adjust model complexity, regularization parameters, and training duration effectively .

To determine if a convolutional neural network (CNN) model is overfitting, observe the relationship between training and validation performances. If the training accuracy continues to improve while the validation accuracy starts degrading, this is indicative of overfitting . Six strategies to handle overfitting include adding more training data, using data augmentation, implementing dropout, introducing early stopping, reducing the complexity of the model, and applying regularization techniques like L2 regularization . Each strategy helps the model generalize better by combating the tendency to memorize the training data rather than learning to predict from it.

The requirement to submit a single .py file ensures that the submission is self-contained, simplifying the replication and evaluation processes. It centralizes the code, reducing the likelihood of missing dependencies that could impair result reproduction . This approach enhances evaluation efficiency by providing a streamlined method for examiners to execute and verify the submitted solutions, facilitating fair and consistent assessment across all submissions .

To address underfitting in a CNN, you can increase the model complexity by adding more layers or neurons, employ a more suitable neural network architecture, increase the number of training epochs, tune learning rates for optimization, use better data preprocessing techniques, and ensure your dataset is comprehensive enough. These strategies help prevent underfitting by ensuring the model has enough capacity and variance to learn the complex patterns within the data .

Using latent features from a CNN can greatly enhance the performance of an SVM in image classification tasks, such as with the CIFAR-10 dataset. CNNs naturally generate hierarchical feature representations that are highly informative for downstream tasks . By using these learned features as input to an SVM, the model combines the strengths of CNNs’ feature extraction with SVM's efficiency in classification tasks. Experimenting with different kernel functions and hyperparameters can further optimize SVM performance, offering a potential improvement over using either method individually .

Enhancements for CNN models on CIFAR-10 could include incorporating more sophisticated architectures, such as residual networks or attention mechanisms, which can capture more complex patterns and relationships in the data . For SVM models, experimenting with various kernel functions and performing hyperparameter optimization using techniques like grid search or Bayesian optimization could improve classification performance . Another potential extension is the integration of ensemble methods, combining outputs from multiple models to boost accuracy and robustness. Incorporating transfer learning from pre-trained models can also be beneficial in leveraging existing knowledge to improve performance .

Experimenting with different hyper-parameters such as learning rate and batch size is vital because they significantly influence the training dynamics and convergence of neural networks. The learning rate determines the step size in updating the model parameters and can affect whether the model converges efficiently or gets stuck in local minima . The batch size affects the stability and speed of the training process; smaller batches provide a more noisy estimate of the gradient, which can sometimes help escape local minima, while larger batches offer a smoother and more stable convergence. Optimizing these hyper-parameters is essential for achieving better model performance and generalization .

Data augmentation plays a critical role in training CNN models by increasing the diversity of the training dataset without requiring additional manual labeling. This helps in improving the generalization capability of the model by making it robust to various transformations . Techniques such as rotation, scaling, translation, horizontal flipping, and random cropping can be applied to synthetically expand the dataset, thereby reducing overfitting and enhancing the model’s ability to learn invariant features .

To implement a CNN from scratch for the MNIST dataset, first divide the dataset into training and validation sets, keeping the provided test set intact. Implement convolutional layers for feature extraction, pooling layers for downsampling, and fully connected layers for classification tasks . Apply data augmentation techniques, such as rotation, scaling, and translation, to enhance the training dataset and improve model generalization. Hyperparameter tuning, including learning rate and batch size changes, will further assist in optimizing the model's performance . Visualization of training and validation accuracy and loss curves will offer insights into the model's learning and generalization capabilities.

Reproducibility is crucial in machine learning experiments to ensure that findings can be verified and built upon by others. In the context of this assignment, ensuring that results can be replicated using the submitted files is emphasized to safeguard validity and reliability of the outcomes . Utilizing a random seed is a practical measure to achieve reproducibility. This fosters an environment of transparency and trust, allowing scientific progress and educational effectiveness .

You might also like