0% found this document useful (0 votes)
79 views5 pages

DBATU B.Tech Machine Learning Syllabus

The document outlines the curriculum for the B. Tech program in Computer Science and Design at Dr. Babasaheb Ambedkar Technological University, effective from the academic year 2022-23. It details the course structure for the third year, including course titles, codes, evaluation schemes, and credits, with a focus on Machine Learning and its applications. Additionally, it provides objectives, outcomes, and practical lab exercises related to Machine Learning and R programming.

Uploaded by

samirsiot
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views5 pages

DBATU B.Tech Machine Learning Syllabus

The document outlines the curriculum for the B. Tech program in Computer Science and Design at Dr. Babasaheb Ambedkar Technological University, effective from the academic year 2022-23. It details the course structure for the third year, including course titles, codes, evaluation schemes, and credits, with a focus on Machine Learning and its applications. Additionally, it provides objectives, outcomes, and practical lab exercises related to Machine Learning and R programming.

Uploaded by

samirsiot
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Dr.

Babasaheb Ambedkar Technological University (Established a University of


Technology in the State of Maharashtra)
(Under Maharashtra Act No. XXIX of 2014)
P.O. Lonere, Dist. Raigad, Pin 402 103,
Maharashtra Telephone and Fax. 02140 - 275142 [Link]
[Link]

CURRICULUM UNDER GRADUATE PROGRAMME FOR


B. TECH
Computer Science and Design
WITH EFFECT FROM THE ACADEMIC YEAR
S.Y. [Link] 2022-23
T.Y. [Link] 2023-24
Proposed Scheme for B. Tech. Computer Science and Design
Semester - VI (Third Year)

Sr. Course Course Code Course Title Weekly Evaluation Scheme Credit
No Category Teaching
Hrs.
L T P CA MSE ESE Total
1 BTCSD 601 Software 3 1 -- 20 20 60 100 4
Engineering &
Testing
2 BTCSD 602 Data Visualization 3 1 -- 20 20 60 100 4

3 BTCSD 603 Machine Learning 3 1 -- 20 20 60 100 4

4 BTCSD Elective-IV 3 -- -- 20 20 60 100 3


604 (a) Internet of Things
604 (b) Augmented &
Virtual Reality
604 (c) Soft Computing
5 BTCSD Elective-V 3 -- -- 20 20 60 100 3
605 (a) Development
605 (b) Engineering
Employability and
605 (c) Skill Development
Consumer
Behaviour
6 BTCSDL606 Machine Learning 1* -- 2 30 -- 20 50 2
Lab and R
Programming Lab
7 BTCSDM607 Mini Project-II -- -- 4 60 -- 40 100 2
8 BTCSDF608 Field Training / -- -- -- -- -- -- -- Audit to be
Internship/ Evaluated
Industrial in VII Sem.
Training-III
Total 16 3 6 190 100 360 650 22

Note: * Lecture should be conducted only for R Programming


BTCSD 603 Machine Learning

Course Objectives:
1. To understand fundamental concepts of machine learning and its various algorithms.
2. To understand various strategies of generating models from data and evaluating them.
3. To apply ML algorithms on given data and interpret the results obtained.
4. To design appropriate ML solution to solve real world problems in AI domain.

Course Outcomes:
1. Develop a good understanding of fundamental principles of machine learning.
2. Formulation of a Machine Learning problem.
3. Develop a model using supervised/unsupervised machine learning algorithms for
classification/prediction/clustering.
4. Evaluate performance of various machine learning algorithms on various data sets of a
domain.
5. Design and Concrete implementations of various machine learning algorithms to solve
a given problem using languages such as Python.

UNIT I: Introduction to Machine Learning [7 Hours]


Introduction to Machine Learning: Definition of Machine Learning, Definition of
learning. Classification of Machine Learning: Supervised learning, unsupervised
learning, Reinforcement learning, Semi-supervised learning. Categorizing based on
required Output: Classification, Regression, and Clustering. Difference in ML and
Traditional Programming, Definition of Data, Information and Knowledge. Split data
in Machine Learning: Training Data, Validation Data and Testing Data. Machine
Learning: Applications.

UNIT II: Machine Learning - Performance Metrics [7 Hours]


Performance Metrics for Classification Problems- Confusion Matrix, Classification
Accuracy, Classification Report- Precision, Recall or Sensitivity, Support, F1 Score,
AUC (Area Under ROC curve). Performance Metrics for Regression Problems- Mean
Absolute Error (MAE), Mean Square Error (MSE), R Squared (R2).

UNIT III: Linear and Logistic Regression [7 Hours]


Introduction to linear regression: Introduction to Linear Regression, Optimal
Coefficients, Cost function, Coefficient of Determination, Analysis of Linear
Regression using dummy Data, Linear Regression Intuition. Multivariable regression
and gradient descent: Generic Gradient Descent, Learning Rate, Complexity Analysis
of Normal Equation Linear Regression, how to find More Complex Boundaries,
Variations of Gradient Descent. Logistic regression: Handling Classification Problems,
Logistic Regression, Cost Function, Finding Optimal Values, Solving Derivatives,
Multiclass Logistic Regression, Finding Complex Boundaries and Regularization,
Using Logistic Regression from Sklearn.

UNIT IV: Decision Trees and Random Forests [7 Hours]


Decision trees: Decision Trees, Decision Trees for Interview call, Building Decision
Trees, Getting to Best Decision Tree, Deciding Feature to Split on, Continuous Valued
Features Code using Sklearn decision tree, information gain, Gain Ratio, Gini Index,
Decision Trees & Overfitting, Pruning. Random forests: Introduction to Random
Forests, Data Bagging and Feature Selection, Extra Trees, Regression using decision
Trees and Random Forest, Random Forest in Sklearn.

UNIT V: Naive Bayes, KNN and SVM [7 Hours]


Naive Bayes: Bayes Theorem, Independence Assumption in Naive Bayes, Probability
estimation for Discrete Values Features, how to handle zero probabilities,
Implementation of Naive Bayes, Finding the probability for continuous valued features,
Text Classification using Naive Bayes. K-Nearest Neighbours: Introduction to KNN,
Feature scaling before KNN, KNN in Sklearn, Cross Validation, Finding Optimal K,
Implement KNN, Curse of Dimensionality, Handling Categorical Data, Pros & Cons
of KNN. Support Vector Machine: Intuition behind SVM, SVM Cost Function,
Decision Boundary & the C parameter, using SVM from Sklearn, Finding Non-Linear
Decision Boundary, Choosing Landmark Points, Similarity Functions, how to move to
new dimensions, Multi-class Classification, Using Sklearn SVM on Iris, Choosing
Parameters using Grid Search, Using Support Vectors to Regression.

Textbooks:
1. Ethem Alpaydın, Introduction to Machine Learning, PHI, Third Edition, ISBN No. 978-
81-203- 5078-6
2. Christopher M. Bishop, Pattern Recognition and Machine Learning, Mcgraw-Hill,
ISBN No. 0- 07- 115467-1
3. Tom Mitchell, Machine Learning, Mcgraw-Hill, First Edition, ISBN No. 0-07-115467-
1. [Link] Bonaccorso, “Machine Learning Algorithms”, Packt Publishing Limited,
ISBN10: 1785889621, ISBN-13: 978-1785889622

Reference Books:
1. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2/e, Wiley, 2001
2. Shai Shalev-Shwartz and Shai Ben-David, Understanding Machine Learning (From
Theory to Algorithms), Cambridge University Press, First Edition, ISBN No. 978-1-
107-51282-5.
3. A. Rostamizadeh, A. Talwalkar, M. Mohri, Foundations of Machine Learning, MIT
Press.
4. A. Webb, Statistical Pattern Recognition, 3/e, Wiley, 2011.
5. [Link]
BTCSDL 606 Machine Learning Lab

List of practicals:
1. Python Libraries for Data Science
a. Pandas Library
b. Numpy Library
c. Scikit Learn Library
d. Matplotlib
2. Evaluation Metrics
a. Accuracy
b. Precision
c. Recall
d. F1-Score
3. Train and Test Sets by Splitting Learn and Test Data.
4. Linear Regression
5. Multivariable Regression
6. Decision Tree Algorithm implementation.
7. Random Forest Algorithm implementation.
8. Naive Bayes Classification Algorithm implementation.
9. K-Nearest Neighbour Algorithm implementation.
10. SVM Algorithm implementation.

BTCSDL 606 R Programming Lab


List of Practical’s:
1. Study of data analysis using MS-Excel (Prerequisite).
2. Study of basic Syntaxes in R.
3. Implementation of vector data objects operations.
3. Implementation of matrix, array and factors and perform va in R.
4. Implementation and use of data frames in R.
5. Create Sample (Dummy) Data in R and perform data manipulation with R.
6. Study and implementation of various control structures in R.
7. Data Manipulation with dplyr package.
8. Data Manipulation with [Link] package.
9. Study and implementation of Data Visualization with ggplot2.
10. Study and implementation data transpose operations in R.

Common questions

Powered by AI

Data visualization is critical for understanding the patterns, trends, and outliers in datasets, which in turn influences the selection and development of machine learning models. Effective visualization helps in identifying which preprocessing steps might be necessary, such as handling missing data or outlier removal. It also aids in interpreting model outputs and in evaluating model performance by providing visual comparisons, such as ROC curves for classification problems, thereby making complex data more accessible and actionable. Visualization promotes clearer communication of findings and can influence decision-making by stakeholders .

Random Forest, an ensemble of decision trees, offers several advantages over a single decision tree. It improves generalization and reduces overfitting by averaging the predictions from multiple trees that have been trained on different subsets of the data (data bagging) and by using a random subset of features when considering splits (feature selection). This process ensures that any individual noisy decision tree does not significantly influence the model’s prediction. Random Forest tends to have higher accuracy and stability than a single decision tree, especially on complex datasets with non-linear relationships .

Preprocessing is crucial in the K-Nearest Neighbors (KNN) algorithm due to its reliance on distance calculations for finding nearest neighbors. Feature scaling techniques like normalization and standardization are often required because KNN is sensitive to the magnitude of the features, and any unequally scaled features could disproportionately affect the distance calculations, leading to biased predictions. Additionally, handling missing values and transforming categorical features are also important preprocessing steps. Effective preprocessing improves the algorithm's accuracy and helps in achieving more reliable and meaningful predictions .

The confusion matrix provides a summary of prediction results on a classification problem and is used to derive several key performance metrics: - **Precision** is calculated as the ratio of true positive predictions to the sum of true positives and false positives, indicating the accuracy of positive predictions. - **Recall (Sensitivity)** is derived from the ratio of true positive predictions to the sum of true positives and false negatives, indicating the ability of a model to retrieve actual positives. - **F1 Score** is the harmonic mean of precision and recall, balancing between the two when they are inversely related. These metrics help in assessing the model's ability to make correct predictions and are used to evaluate the classification models comprehensively .

Regularization in logistic regression, through techniques such as L1 (Lasso) and L2 (Ridge) regularization, adds a penalty to the cost function for large coefficients. This penalty encourages smaller coefficient values, effectively controlling the complexity of the model. Regularization helps prevent overfitting by discouraging extremely flexible models that fit the training data too closely, capturing noise instead of the underlying distribution. By consistently penalizing for larger weights, regularization ensures the model maintains generalization capability across new unseen data, thus improving its robustness and predictive performance .

Supervised learning involves training a model on a labeled dataset, meaning that each training example is paired with an output label. It is mainly used for classification and regression problems. The algorithm learns a mapping from inputs to the desired output. In contrast, unsupervised learning deals with unlabeled data and its main objectives are clustering and association, where the algorithm tries to learn patterns or structures from the input data without guidance on what to learn. The choice of algorithm is influenced by whether the data is labeled or unlabeled. Use cases for supervised learning include sentiment analysis and medical diagnosis prediction, whereas unsupervised learning is often used in customer segmentation and anomaly detection .

In Support Vector Machine (SVM), support vectors are the data points that lie closest to the decision boundary (or hyperplane), and thus are critical in defining it. The margin is the distance between the support vectors and the hyperplane. SVM aims to maximize this margin to achieve optimal separation between classes, as larger margins are associated with better generalization on unseen data. The influence of support vectors is significant because only they are used to determine the position and orientation of the hyperplane, making them the most informative samples for developing the classification model .

The learning rate in the gradient descent algorithm is a hyperparameter that determines the size of the steps taken towards the minimum of the cost function. A properly chosen learning rate ensures that the algorithm converges to the minimum efficiently. A learning rate that is too small results in a slow convergence process, which increases computation time. Conversely, a learning rate that is too large can cause the algorithm to overshoot the minimum, potentially causing divergence rather than convergence. Therefore, choosing an appropriate learning rate is essential for the effectiveness and efficiency of the gradient descent algorithm .

The Machine Learning lab practical activities are designed to complement theoretical knowledge by enabling students to apply concepts in a hands-on environment. For instance, implementations of algorithms such as linear regression, decision trees, and SVM allow students to understand how these models are trained and evaluated on real datasets . The use of Python libraries such as Scikit Learn helps students practice the execution and tuning of machine learning models. Moreover, working with data preprocessing and visualization tools strengthens their ability to interpret and manipulate data, bridging the gap between theory and practical application .

The primary challenge with the Naive Bayes algorithm is its assumption of feature independence, which often does not hold in real-world data where features may be correlated. This can lead to inaccurate probability estimates and consequently affect classification accuracy. To address these challenges, one can use feature selection or dimensionality reduction techniques to minimize correlation between features before applying the Naive Bayes algorithm. Another approach is to use techniques like Bayesian Networks, which partially relax the independence assumption by allowing for some degree of dependence between variables .

You might also like