0% found this document useful (0 votes)

111 views5 pages

DBATU B.Tech Machine Learning Syllabus

The document outlines the curriculum for the B. Tech program in Computer Science and Design at Dr. Babasaheb Ambedkar Technological University, effective from the academic year 2022-23. It details the course structure for the third year, including course titles, codes, evaluation schemes, and credits, with a focus on Machine Learning and its applications. Additionally, it provides objectives, outcomes, and practical lab exercises related to Machine Learning and R programming.

Uploaded by

samirsiot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views5 pages

DBATU B.Tech Machine Learning Syllabus

Uploaded by

samirsiot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Dr.

Babasaheb Ambedkar Technological University (Established a University of

Technology in the State of Maharashtra)
(Under Maharashtra Act No. XXIX of 2014)
P.O. Lonere, Dist. Raigad, Pin 402 103,
Maharashtra Telephone and Fax. 02140 - 275142 [Link]
[Link]

CURRICULUM UNDER GRADUATE PROGRAMME FOR

B. TECH
Computer Science and Design
WITH EFFECT FROM THE ACADEMIC YEAR
S.Y. [Link] 2022-23
T.Y. [Link] 2023-24
Proposed Scheme for B. Tech. Computer Science and Design
Semester - VI (Third Year)

Sr. Course Course Code Course Title Weekly Evaluation Scheme Credit
No Category Teaching
Hrs.
L T P CA MSE ESE Total
1 BTCSD 601 Software 3 1 -- 20 20 60 100 4
Engineering &
Testing
2 BTCSD 602 Data Visualization 3 1 -- 20 20 60 100 4

3 BTCSD 603 Machine Learning 3 1 -- 20 20 60 100 4

4 BTCSD Elective-IV 3 -- -- 20 20 60 100 3

604 (a) Internet of Things
604 (b) Augmented &
Virtual Reality
604 (c) Soft Computing
5 BTCSD Elective-V 3 -- -- 20 20 60 100 3
605 (a) Development
605 (b) Engineering
Employability and
605 (c) Skill Development
Consumer
Behaviour
6 BTCSDL606 Machine Learning 1* -- 2 30 -- 20 50 2
Lab and R
Programming Lab
7 BTCSDM607 Mini Project-II -- -- 4 60 -- 40 100 2
8 BTCSDF608 Field Training / -- -- -- -- -- -- -- Audit to be
Internship/ Evaluated
Industrial in VII Sem.
Training-III
Total 16 3 6 190 100 360 650 22

Note: * Lecture should be conducted only for R Programming

BTCSD 603 Machine Learning

Course Objectives:
1. To understand fundamental concepts of machine learning and its various algorithms.
2. To understand various strategies of generating models from data and evaluating them.
3. To apply ML algorithms on given data and interpret the results obtained.
4. To design appropriate ML solution to solve real world problems in AI domain.

Course Outcomes:
1. Develop a good understanding of fundamental principles of machine learning.
2. Formulation of a Machine Learning problem.
3. Develop a model using supervised/unsupervised machine learning algorithms for
classification/prediction/clustering.
4. Evaluate performance of various machine learning algorithms on various data sets of a
domain.
5. Design and Concrete implementations of various machine learning algorithms to solve
a given problem using languages such as Python.

UNIT I: Introduction to Machine Learning [7 Hours]

Introduction to Machine Learning: Definition of Machine Learning, Definition of
learning. Classification of Machine Learning: Supervised learning, unsupervised
learning, Reinforcement learning, Semi-supervised learning. Categorizing based on
required Output: Classification, Regression, and Clustering. Difference in ML and
Traditional Programming, Definition of Data, Information and Knowledge. Split data
in Machine Learning: Training Data, Validation Data and Testing Data. Machine
Learning: Applications.

UNIT II: Machine Learning - Performance Metrics [7 Hours]

Performance Metrics for Classification Problems- Confusion Matrix, Classification
Accuracy, Classification Report- Precision, Recall or Sensitivity, Support, F1 Score,
AUC (Area Under ROC curve). Performance Metrics for Regression Problems- Mean
Absolute Error (MAE), Mean Square Error (MSE), R Squared (R2).

UNIT III: Linear and Logistic Regression [7 Hours]

Introduction to linear regression: Introduction to Linear Regression, Optimal
Coefficients, Cost function, Coefficient of Determination, Analysis of Linear
Regression using dummy Data, Linear Regression Intuition. Multivariable regression
and gradient descent: Generic Gradient Descent, Learning Rate, Complexity Analysis
of Normal Equation Linear Regression, how to find More Complex Boundaries,
Variations of Gradient Descent. Logistic regression: Handling Classification Problems,
Logistic Regression, Cost Function, Finding Optimal Values, Solving Derivatives,
Multiclass Logistic Regression, Finding Complex Boundaries and Regularization,
Using Logistic Regression from Sklearn.

UNIT IV: Decision Trees and Random Forests [7 Hours]

Decision trees: Decision Trees, Decision Trees for Interview call, Building Decision
Trees, Getting to Best Decision Tree, Deciding Feature to Split on, Continuous Valued
Features Code using Sklearn decision tree, information gain, Gain Ratio, Gini Index,
Decision Trees & Overfitting, Pruning. Random forests: Introduction to Random
Forests, Data Bagging and Feature Selection, Extra Trees, Regression using decision
Trees and Random Forest, Random Forest in Sklearn.

UNIT V: Naive Bayes, KNN and SVM [7 Hours]

Naive Bayes: Bayes Theorem, Independence Assumption in Naive Bayes, Probability
estimation for Discrete Values Features, how to handle zero probabilities,
Implementation of Naive Bayes, Finding the probability for continuous valued features,
Text Classification using Naive Bayes. K-Nearest Neighbours: Introduction to KNN,
Feature scaling before KNN, KNN in Sklearn, Cross Validation, Finding Optimal K,
Implement KNN, Curse of Dimensionality, Handling Categorical Data, Pros & Cons
of KNN. Support Vector Machine: Intuition behind SVM, SVM Cost Function,
Decision Boundary & the C parameter, using SVM from Sklearn, Finding Non-Linear
Decision Boundary, Choosing Landmark Points, Similarity Functions, how to move to
new dimensions, Multi-class Classification, Using Sklearn SVM on Iris, Choosing
Parameters using Grid Search, Using Support Vectors to Regression.

Textbooks:
1. Ethem Alpaydın, Introduction to Machine Learning, PHI, Third Edition, ISBN No. 978-
81-203- 5078-6
2. Christopher M. Bishop, Pattern Recognition and Machine Learning, Mcgraw-Hill,
ISBN No. 0- 07- 115467-1
3. Tom Mitchell, Machine Learning, Mcgraw-Hill, First Edition, ISBN No. 0-07-115467-
1. [Link] Bonaccorso, “Machine Learning Algorithms”, Packt Publishing Limited,
ISBN10: 1785889621, ISBN-13: 978-1785889622

Reference Books:
1. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2/e, Wiley, 2001
2. Shai Shalev-Shwartz and Shai Ben-David, Understanding Machine Learning (From
Theory to Algorithms), Cambridge University Press, First Edition, ISBN No. 978-1-
107-51282-5.
3. A. Rostamizadeh, A. Talwalkar, M. Mohri, Foundations of Machine Learning, MIT
Press.
4. A. Webb, Statistical Pattern Recognition, 3/e, Wiley, 2011.
5. [Link]
BTCSDL 606 Machine Learning Lab

List of practicals:
1. Python Libraries for Data Science
a. Pandas Library
b. Numpy Library
c. Scikit Learn Library
d. Matplotlib
2. Evaluation Metrics
a. Accuracy
b. Precision
c. Recall
d. F1-Score
3. Train and Test Sets by Splitting Learn and Test Data.
4. Linear Regression
5. Multivariable Regression
6. Decision Tree Algorithm implementation.
7. Random Forest Algorithm implementation.
8. Naive Bayes Classification Algorithm implementation.
9. K-Nearest Neighbour Algorithm implementation.
10. SVM Algorithm implementation.

BTCSDL 606 R Programming Lab

List of Practical’s:
1. Study of data analysis using MS-Excel (Prerequisite).
2. Study of basic Syntaxes in R.
3. Implementation of vector data objects operations.
3. Implementation of matrix, array and factors and perform va in R.
4. Implementation and use of data frames in R.
5. Create Sample (Dummy) Data in R and perform data manipulation with R.
6. Study and implementation of various control structures in R.
7. Data Manipulation with dplyr package.
8. Data Manipulation with [Link] package.
9. Study and implementation of Data Visualization with ggplot2.
10. Study and implementation data transpose operations in R.

Common questions

Data visualization is critical for understanding the patterns, trends, and outliers in datasets, which in turn influences the selection and development of machine learning models. Effective visualization helps in identifying which preprocessing steps might be necessary, such as handling missing data or outlier removal. It also aids in interpreting model outputs and in evaluating model performance by providing visual comparisons, such as ROC curves for classification problems, thereby making complex data more accessible and actionable. Visualization promotes clearer communication of findings and can influence decision-making by stakeholders .

Random Forest, an ensemble of decision trees, offers several advantages over a single decision tree. It improves generalization and reduces overfitting by averaging the predictions from multiple trees that have been trained on different subsets of the data (data bagging) and by using a random subset of features when considering splits (feature selection). This process ensures that any individual noisy decision tree does not significantly influence the model’s prediction. Random Forest tends to have higher accuracy and stability than a single decision tree, especially on complex datasets with non-linear relationships .

Preprocessing is crucial in the K-Nearest Neighbors (KNN) algorithm due to its reliance on distance calculations for finding nearest neighbors. Feature scaling techniques like normalization and standardization are often required because KNN is sensitive to the magnitude of the features, and any unequally scaled features could disproportionately affect the distance calculations, leading to biased predictions. Additionally, handling missing values and transforming categorical features are also important preprocessing steps. Effective preprocessing improves the algorithm's accuracy and helps in achieving more reliable and meaningful predictions .

The confusion matrix provides a summary of prediction results on a classification problem and is used to derive several key performance metrics: - **Precision** is calculated as the ratio of true positive predictions to the sum of true positives and false positives, indicating the accuracy of positive predictions. - **Recall (Sensitivity)** is derived from the ratio of true positive predictions to the sum of true positives and false negatives, indicating the ability of a model to retrieve actual positives. - **F1 Score** is the harmonic mean of precision and recall, balancing between the two when they are inversely related. These metrics help in assessing the model's ability to make correct predictions and are used to evaluate the classification models comprehensively .

Regularization in logistic regression, through techniques such as L1 (Lasso) and L2 (Ridge) regularization, adds a penalty to the cost function for large coefficients. This penalty encourages smaller coefficient values, effectively controlling the complexity of the model. Regularization helps prevent overfitting by discouraging extremely flexible models that fit the training data too closely, capturing noise instead of the underlying distribution. By consistently penalizing for larger weights, regularization ensures the model maintains generalization capability across new unseen data, thus improving its robustness and predictive performance .

Supervised learning involves training a model on a labeled dataset, meaning that each training example is paired with an output label. It is mainly used for classification and regression problems. The algorithm learns a mapping from inputs to the desired output. In contrast, unsupervised learning deals with unlabeled data and its main objectives are clustering and association, where the algorithm tries to learn patterns or structures from the input data without guidance on what to learn. The choice of algorithm is influenced by whether the data is labeled or unlabeled. Use cases for supervised learning include sentiment analysis and medical diagnosis prediction, whereas unsupervised learning is often used in customer segmentation and anomaly detection .

In Support Vector Machine (SVM), support vectors are the data points that lie closest to the decision boundary (or hyperplane), and thus are critical in defining it. The margin is the distance between the support vectors and the hyperplane. SVM aims to maximize this margin to achieve optimal separation between classes, as larger margins are associated with better generalization on unseen data. The influence of support vectors is significant because only they are used to determine the position and orientation of the hyperplane, making them the most informative samples for developing the classification model .

The learning rate in the gradient descent algorithm is a hyperparameter that determines the size of the steps taken towards the minimum of the cost function. A properly chosen learning rate ensures that the algorithm converges to the minimum efficiently. A learning rate that is too small results in a slow convergence process, which increases computation time. Conversely, a learning rate that is too large can cause the algorithm to overshoot the minimum, potentially causing divergence rather than convergence. Therefore, choosing an appropriate learning rate is essential for the effectiveness and efficiency of the gradient descent algorithm .

The Machine Learning lab practical activities are designed to complement theoretical knowledge by enabling students to apply concepts in a hands-on environment. For instance, implementations of algorithms such as linear regression, decision trees, and SVM allow students to understand how these models are trained and evaluated on real datasets . The use of Python libraries such as Scikit Learn helps students practice the execution and tuning of machine learning models. Moreover, working with data preprocessing and visualization tools strengthens their ability to interpret and manipulate data, bridging the gap between theory and practical application .

The primary challenge with the Naive Bayes algorithm is its assumption of feature independence, which often does not hold in real-world data where features may be correlated. This can lead to inaccurate probability estimates and consequently affect classification accuracy. To address these challenges, one can use feature selection or dimensionality reduction techniques to minimize correlation between features before applying the Naive Bayes algorithm. Another approach is to use techniques like Bayesian Networks, which partially relax the independence assumption by allowing for some degree of dependence between variables .

ML Question Paper
No ratings yet
ML Question Paper
56 pages
Question Bank M1&M2
100% (1)
Question Bank M1&M2
1 page
AMT 305 Machine Learning Syllabus
No ratings yet
AMT 305 Machine Learning Syllabus
16 pages
Machine Learning Exam Questions R22
100% (1)
Machine Learning Exam Questions R22
4 pages
BCS602 Machine Learning Syllabus Overview
No ratings yet
BCS602 Machine Learning Syllabus Overview
22 pages
Machine Learning Question Bank Module
No ratings yet
Machine Learning Question Bank Module
7 pages
Overview of Activation Functions in ML
No ratings yet
Overview of Activation Functions in ML
19 pages
Machine Learning Exam Questions Guide
No ratings yet
Machine Learning Exam Questions Guide
15 pages
AL3502 Deep Learning for Vision Syllabus
75% (4)
AL3502 Deep Learning for Vision Syllabus
79 pages
Supervised Learning in AI & ML
No ratings yet
Supervised Learning in AI & ML
35 pages
Machine Learning Assignment 7 Solutions
100% (1)
Machine Learning Assignment 7 Solutions
3 pages
BAI602 Machine Learning Syllabus
50% (2)
BAI602 Machine Learning Syllabus
4 pages
Decision Trees: Properties and Calculations
No ratings yet
Decision Trees: Properties and Calculations
2 pages
Neural Networks for Digit Recognition
No ratings yet
Neural Networks for Digit Recognition
25 pages
AI & ML Course Modules Overview
No ratings yet
AI & ML Course Modules Overview
7 pages
Anna University ML Question Paper Set 1
0% (1)
Anna University ML Question Paper Set 1
4 pages
JNTUH R22 Machine Learning Syllabus
0% (1)
JNTUH R22 Machine Learning Syllabus
2 pages
MLT Syllabus and Machine Learning Concepts
No ratings yet
MLT Syllabus and Machine Learning Concepts
8 pages
Ph.D. Exam: Pattern Recognition & ML
No ratings yet
Ph.D. Exam: Pattern Recognition & ML
2 pages
Machine Learning Techniques Sample Paper 2024
No ratings yet
Machine Learning Techniques Sample Paper 2024
5 pages
Machine Learning Full Question Bank
No ratings yet
Machine Learning Full Question Bank
14 pages
Machine Learning Exam Paper GR20D5129
60% (10)
Machine Learning Exam Paper GR20D5129
2 pages
Unit I: Machine Learning Techniques
No ratings yet
Unit I: Machine Learning Techniques
21 pages
JNTUH Machine Learning Exam Papers
67% (3)
JNTUH Machine Learning Exam Papers
7 pages
Machine Learning Categories Explained
100% (2)
Machine Learning Categories Explained
12 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
28 pages
Understanding Bias, Variance, and Estimators
100% (2)
Understanding Bias, Variance, and Estimators
79 pages
NPTEL Machine Learning Week 5 Overview
No ratings yet
NPTEL Machine Learning Week 5 Overview
6 pages
Deep Learning Exam Key Questions
No ratings yet
Deep Learning Exam Key Questions
1 page
Machine Learning Course Plan 2019
No ratings yet
Machine Learning Course Plan 2019
3 pages
Machine Learning Exam Questions 2024
No ratings yet
Machine Learning Exam Questions 2024
1 page
Linear Models in Machine Learning
No ratings yet
Linear Models in Machine Learning
86 pages
Deep Learning Exam Questions 2021-22
83% (6)
Deep Learning Exam Questions 2021-22
7 pages
Machine Learning Concepts Overview
No ratings yet
Machine Learning Concepts Overview
22 pages
BCS602 Machine Learning Syllabus
No ratings yet
BCS602 Machine Learning Syllabus
4 pages
Deep Learning Course Outline
No ratings yet
Deep Learning Course Outline
4 pages
Machine Learning Module 2 Notes
No ratings yet
Machine Learning Module 2 Notes
19 pages
Machine Learning Week 3 Assignment MCQs
No ratings yet
Machine Learning Week 3 Assignment MCQs
6 pages
RGPV Machine Learning Lab Manual
No ratings yet
RGPV Machine Learning Lab Manual
34 pages
BCS602 Machine Learning Overview
No ratings yet
BCS602 Machine Learning Overview
38 pages
Deep Learning r23 Question Bank
No ratings yet
Deep Learning r23 Question Bank
4 pages
Machine Learning Midterm Solutions 2024
No ratings yet
Machine Learning Midterm Solutions 2024
8 pages
New ML - Question Bank - Cse - III-II B.tech r23
No ratings yet
New ML - Question Bank - Cse - III-II B.tech r23
7 pages
Neural Networks and Deep Learning Exam Paper
100% (2)
Neural Networks and Deep Learning Exam Paper
4 pages
BCS602 Machine Learning Syllabus
No ratings yet
BCS602 Machine Learning Syllabus
4 pages
NPTEL Machine Learning Assignment Week 1
No ratings yet
NPTEL Machine Learning Assignment Week 1
18 pages
BCSL606 Machine Learning Lab Manual
No ratings yet
BCSL606 Machine Learning Lab Manual
19 pages
AL3451 Machine Learning Question Paper
No ratings yet
AL3451 Machine Learning Question Paper
4 pages
Deep Learning Notes Overview
No ratings yet
Deep Learning Notes Overview
69 pages
JNTUH R22 Machine Learning Notes
100% (1)
JNTUH R22 Machine Learning Notes
28 pages
Mumbai University ML Question Paper
100% (1)
Mumbai University ML Question Paper
4 pages
ML (U1&u2)
No ratings yet
ML (U1&u2)
51 pages
Machine Learning II Question Bank
No ratings yet
Machine Learning II Question Bank
3 pages
Machine Learning Exam Question Papers
100% (1)
Machine Learning Exam Question Papers
6 pages
Deep Learning Course Syllabus
100% (2)
Deep Learning Course Syllabus
2 pages
Machine Learning Exam Questions & Answers
100% (1)
Machine Learning Exam Questions & Answers
2 pages
Perceptron for Even/Odd Recognition
No ratings yet
Perceptron for Even/Odd Recognition
3 pages
Machine Learning Syllabus BCA 2023-2024
No ratings yet
Machine Learning Syllabus BCA 2023-2024
4 pages
Linear Discriminant Functions Overview
No ratings yet
Linear Discriminant Functions Overview
41 pages
ML Syllabus_sem -V
No ratings yet
ML Syllabus_sem -V
2 pages
Understanding Rights and Their Importance
No ratings yet
Understanding Rights and Their Importance
4 pages
Cameroonian Teacher Work Attestation
No ratings yet
Cameroonian Teacher Work Attestation
1 page
ĐỀ SỐ 3
No ratings yet
ĐỀ SỐ 3
10 pages
Project Organization Types Explained
No ratings yet
Project Organization Types Explained
18 pages
Market and Operations Report682379
No ratings yet
Market and Operations Report682379
62 pages
Understanding Constraint Satisfaction Problems
No ratings yet
Understanding Constraint Satisfaction Problems
19 pages
Hanwha Decan s1 Catalog
100% (1)
Hanwha Decan s1 Catalog
1 page
Big Data and Data Mining Framework
No ratings yet
Big Data and Data Mining Framework
5 pages
Understanding Elliptic Curve Cryptography
No ratings yet
Understanding Elliptic Curve Cryptography
37 pages
Accountancy Qualifying Exam Reviewer 2024
77% (13)
Accountancy Qualifying Exam Reviewer 2024
6 pages
ATC Panel Specifications and BOQ
No ratings yet
ATC Panel Specifications and BOQ
20 pages
Understanding HTML Forms and Elements
No ratings yet
Understanding HTML Forms and Elements
9 pages
50x49 Floor Plan Overview
No ratings yet
50x49 Floor Plan Overview
2 pages
JD Intern Kfintech Full Stack
No ratings yet
JD Intern Kfintech Full Stack
2 pages
TLE 8 Lesson 3: Philippine Fish Species
100% (1)
TLE 8 Lesson 3: Philippine Fish Species
4 pages
RealSoul: Immersive Soul System Plugin
No ratings yet
RealSoul: Immersive Soul System Plugin
18 pages
2027 Pharma Market Access Trends
No ratings yet
2027 Pharma Market Access Trends
14 pages
Comprehensive Guide to Blockchain Technology
No ratings yet
Comprehensive Guide to Blockchain Technology
5 pages
Childcare Crisis in Orange County
No ratings yet
Childcare Crisis in Orange County
24 pages
Green Ports: Sustainable Management Strategies
No ratings yet
Green Ports: Sustainable Management Strategies
52 pages
Safe Plugin Installation Guide
No ratings yet
Safe Plugin Installation Guide
22 pages
May 2023 Accounting Exam Paper
No ratings yet
May 2023 Accounting Exam Paper
7 pages
Assured Shorthold Tenancy Agreement
No ratings yet
Assured Shorthold Tenancy Agreement
8 pages
UN Functions and Reforms Overview
No ratings yet
UN Functions and Reforms Overview
10 pages
Visual Inspection of Weld Api 650
100% (1)
Visual Inspection of Weld Api 650
2 pages
NME-ICT: Transforming Indian Education
100% (1)
NME-ICT: Transforming Indian Education
36 pages
Technology's Role in Education Evolution
No ratings yet
Technology's Role in Education Evolution
16 pages
Understanding S-Curve in Project Management
No ratings yet
Understanding S-Curve in Project Management
5 pages
High Content Screening Data Analysis Challenges
No ratings yet
High Content Screening Data Analysis Challenges
10 pages

DBATU B.Tech Machine Learning Syllabus

Uploaded by

DBATU B.Tech Machine Learning Syllabus

Uploaded by

Dr.

Babasaheb Ambedkar Technological University (Established a University of

CURRICULUM UNDER GRADUATE PROGRAMME FOR

3 BTCSD 603 Machine Learning 3 1 -- 20 20 60 100 4

4 BTCSD Elective-IV 3 -- -- 20 20 60 100 3

Note: * Lecture should be conducted only for R Programming

UNIT I: Introduction to Machine Learning [7 Hours]

UNIT II: Machine Learning - Performance Metrics [7 Hours]

UNIT III: Linear and Logistic Regression [7 Hours]

UNIT IV: Decision Trees and Random Forests [7 Hours]

UNIT V: Naive Bayes, KNN and SVM [7 Hours]

BTCSDL 606 R Programming Lab

Common questions

How is the concept of data visualization integral to the understanding and application of machine learning models?

What are the advantages of using Random Forest over a single Decision Tree for predictive modeling?

Explain the role of preprocessing in the K-Nearest Neighbors (KNN) algorithm and how it can affect the algorithm’s accuracy.

How can the confusion matrix be utilized to derive other classification performance metrics such as precision, recall, and F1-score?

How can the use of regularization in logistic regression mitigate the problem of overfitting?

What are the key differences between supervised learning and unsupervised learning, and how do they influence the selection of machine learning algorithms?

How does the concept of support vectors and the margin influence the decision boundary in Support Vector Machine (SVM)?

Discuss the importance of learning rate and its impact on the gradient descent algorithm in machine learning model training.

In what ways do the Machine Learning lab practical activities aim to reinforce the theoretical concepts taught in the classroom?

What challenges are associated with the implementation of the Naive Bayes algorithm concerning the independence assumption, and how can they be addressed?

You might also like