0% found this document useful (0 votes)

13 views5 pages

Data Modeuling & Evaluation - Practical - List

The document outlines the course framework for 'Data Modeling and Evaluation' as part of the MCA program, detailing practical assignments and topics covered over 60 hours. Key areas include data normalization, feature engineering, various machine learning models, clustering techniques, classification metrics, and model validation. The course emphasizes hands-on experience with real-world datasets and the application of different machine learning techniques and evaluation methods.

Uploaded by

shimtone21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views5 pages

Data Modeuling & Evaluation - Practical - List

Uploaded by

shimtone21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

DATA MODELING AND EVALUTION

L-T-P: 0-0-2

COURSE FRAMEWORK
PROGRAM MCA
COURSE CODE/ 23MCDS403 DATA MODELING AND EVALUTION
TITLE
SEMESTER IV SEMESTER
Credits for the Course 2 Credits

SYLLABUS OF THE COURSE

Course Code Name of the Course
23MCAL401 Lab – III DATA MODELING AND EVALUTION
Total hours: 60 hours
Assignment Data Model Design, ER Model Practical
No. Hours
1 1. Normalize a given dataset up to 3NF. 5
2. Perform denormalization on a normalized dataset
and compare results.
3. Write the ER-Diagram of Online Shopping
Management System
4. Convert the Table into First Normal Form (1NF)
5. Drow the ER- Diagram of University Management
System .
6. Consider the following unnormalized table (UNF)
for a Student Course Registration System:
Stude Cour Cours
Stude Instruct Instructo
nt_Na se_I e_Na
nt_ID or r_Phone
me D me
CSE Databa Prof. 123-456-
101 Alice
101 se John 7890
CSE Algorit Prof. 987-654-
101 Alice
102 hms Smith 3210
CSE Databa Prof. 123-456-
102 Bob
101 se John 7890
2 Data Preparation and Feature Engineering

1. Data Cleaning & Preprocessing: 5

a) Identify and handle any missing values in the
Purchase_Frequency column.
b) Detect and treat potential outliers in the
Annual_Income and Spending_Score columns.
2. Encoding Categorical Variables:
a) Encode the Gender and Membership Type columns
appropriately for machine learning models.
3. Feature Engineering:
a) Perform feature selection to identify the most
relevant features for predicting spending behavior.
b) Apply Principal Component Analysis (PCA) and
explain how it helps in dimensionality reduction.
c) Normalize or scale the Annual Income and Spending
Score columns using an appropriate technique.

3 Linear Regression and Its Applications

1. Load a dataset (e.g., housing prices dataset). 5

2. Preprocess the data (handle missing values,
normalize features if necessary).
3. Split the data into training and testing sets.
4. Train a linear regression model using libraries like
Scikit-learn.
5. Evaluate the model using metrics like Mean Squared
Error (MSE).
6. Visualize the regression line and residuals.

4 Decision Trees and Random Forest

1. Load a dataset (e.g., Titanic survival dataset).
2. Preprocess categorical and numerical features.
3. Split into training and testing sets.
4. Train a logistic regression model.
5. Evaluate using accuracy and confusion matrix.

5 Tree-Based Models
1. Load a classification dataset. 5
2. Train a Decision Tree and Random Forest model.
3. Compare their accuracies.
4. Load a dataset (e.g., Titanic, Iris, or a custom
dataset).
5. Preprocess data (handle missing values, encode
categorical variables, and normalize if needed).
6. Split data into training and testing sets (80/20 split).
7. Train a Decision Tree classifier with different depths
and criteria (Gini vs. Entropy).
8. Visualize the tree and analyze the decision paths.

6 Gradient Boosting Machines (GBM, XGBoost,

LightGBM)
1. To implement GBM and compare with other 5
tree models.
2. Train an XGBoost model using different values
of learning rate (0.01, 0.1, 0.3, 0.5). How does
accuracy change?
3. What happens if you increase the number of
boosting rounds (e.g., n_estimators=500
instead of 100) in XGBoost and LightGBM?
4. Implement early stopping in XGBoost and
LightGBM. How does it affect performance?
5. Try using GPU acceleration for XGBoost and
LightGBM. How much does training time
improve?

Clustering (K-Means, Hierarchical Clustering) and

7
Dimensionality Reduction (PCA, t-SNE)
1. Data Preprocessing 5
1. Load a real-world dataset (e.g., Iris, MNIST,
Wine, or any suitable dataset).
2. Perform data cleaning and preprocessing:
3. Handle missing values (if any).
4. Normalize or standardize the data (if needed).
2. Clustering Algorithms
1. Apply K-Means Clustering:
2. Use the Elbow Method to determine the optimal
number of clusters.
3. Compute and visualize the clustering results.
4. Apply Hierarchical Clustering:
5. Use both Agglomerative and Divisive clustering.
6. Generate a dendrogram and analyze the cluster
structure.
3. Dimensionality Reduction
1. Apply Principal Component Analysis (PCA):
2. Reduce the dataset to 2 or 3 dimensions for
visualization.
3. Analyze the explained variance of principal
components.
4. Apply t-Distributed Stochastic Neighbor
Embedding (t-SNE):
5. Visualize the high-dimensional data in a 2D
space.
6. Compare t-SNE vs. PCA results.
4. Analysis & Interpretation
1. Compare the performance of K-Means vs.
Hierarchical Clustering.
2. Discuss the effectiveness of PCA vs. t-SNE for
visualization.
3. Provide insights into clustering quality and
visualization results.

8 Classification Metrics: Confusion Matrix, Accuracy,

Precision, Recall, F1 Score
1. Train a classification model on a real-world dataset 5
and compute the confusion matrix. What do the
values in each cell represent?
2. Given a highly imbalanced dataset, why might
accuracy not be a reliable metric? Compute
precision, recall, and F1 score to support your
argument.
3. How does changing the decision threshold of a
classifier affect precision and recall? Demonstrate
with an experiment.
4. Plot the ROC curve for a classification model and
compute the AUC score. What does the AUC value
tell you about the model’s performance?
5. Compare the ROC curves of two different models.
How do you determine which model is better? Can a
model have high accuracy but a low AUC score?
Explain with an example.

9 Time Series Modeling ARIMA, SARIMA, and Prophet

Building an MLP for Structured Data 5

1. Load a structured dataset (e.g.,
[Link].load_diabetes).
2. Normalize the data and split into training/testing
sets.
3. Build a Multi-Layer Perceptron (MLP) using
TensorFlow/Keras.
4. Train the model and evaluate performance using
MSE/RMSE.

10 Model Validation Techniques

Implementing -Stratified K-Fold Cross-Validation 5
1. Load an imbalanced classification dataset (e.g.,
breast cancer dataset).
2. Apply Stratified K-Fold to ensure equal class
distribution in each fold.
3. Compare results with standard K-Fold

Implementing -Time-Based Split for Time Series Data

1. Load a time series dataset (e.g., stock prices,
weather data)
2. Perform a time-based split where training data is
from past timestamps, and testing data from future
timestamps.
3. Compare results with a random split.

Ensuring Reproducibility in Machine Learning

Experiments.
1. Set random seeds for NumPy, TensorFlow, and
Scikit-learn.
2. Use DVC (Data Version Control) for dataset tracking.
3. Store model training scripts in Git for version
control.

11 Overfitting and Underfitting

Compare and Interpret Results: 5
1. Evaluate how L1, L2, and Dropout affect model
performance.
2. Discuss which regularization technique works best
and why

Setting Up Random Seeds:

1. Implement a script where all necessary libraries
(NumPy, TensorFlow, Scikit-Learn, etc.) have fixed
random seeds.
2. Run the same model multiple times and verify that
the results are consistent

Version Control for Machine Learning Projects:

1. Use Git to track changes in your model.
2. Create different branches to experiment with
different regularization techniques.
3. Document model performance results for
comparison.

Using Reproducibility Tools:

1. Utilize tools like MLflow or DVC (Data Version
Control) to track experiments.
2. Save hyperparameters and evaluation metrics for
future reference.
3. Demonstrate how to reproduce previous
experiments exactly.

ML Study Guide
No ratings yet
ML Study Guide
21 pages
Steps for Machine Learning Projects
No ratings yet
Steps for Machine Learning Projects
9 pages
Creating Python Models: A Comprehensive Guide
No ratings yet
Creating Python Models: A Comprehensive Guide
29 pages
Comprehensive Machine Learning Guide
No ratings yet
Comprehensive Machine Learning Guide
17 pages
DS ML Machine Learning I
No ratings yet
DS ML Machine Learning I
8 pages
ML Model Development Pipeline Guide
No ratings yet
ML Model Development Pipeline Guide
4 pages
06 Machine Learning Fundamentals
No ratings yet
06 Machine Learning Fundamentals
13 pages
PGP in Data Science Curriculum Overview
No ratings yet
PGP in Data Science Curriculum Overview
17 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
7 pages
Machine Learning Basics and Data Preprocessing
No ratings yet
Machine Learning Basics and Data Preprocessing
35 pages
Data Science and Machine Learning Guide
No ratings yet
Data Science and Machine Learning Guide
32 pages
Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
14 pages
Machine Learning Concepts and Python Guide
No ratings yet
Machine Learning Concepts and Python Guide
589 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
75 pages
Python Machine Learning Roadmap Guide
No ratings yet
Python Machine Learning Roadmap Guide
43 pages
Modul 6-Final
No ratings yet
Modul 6-Final
24 pages
Applied Machine Learning and MLOps Guide
No ratings yet
Applied Machine Learning and MLOps Guide
9 pages
Foundations of Machine Learning Basics
No ratings yet
Foundations of Machine Learning Basics
70 pages
Data Science Fundamentals Explained
No ratings yet
Data Science Fundamentals Explained
29 pages
Hypothesis Testing in ML with Python
No ratings yet
Hypothesis Testing in ML with Python
10 pages
Z-Score Analysis in Machine Learning
No ratings yet
Z-Score Analysis in Machine Learning
33 pages
ML Important Notes
No ratings yet
ML Important Notes
11 pages
AI Unit1 Unit2 Complete Notes
No ratings yet
AI Unit1 Unit2 Complete Notes
46 pages
Machine Learning Basics and Techniques
No ratings yet
Machine Learning Basics and Techniques
9 pages
Data Science & Machine Learning Course Overview
No ratings yet
Data Science & Machine Learning Course Overview
40 pages
Machine Learning Applications and Techniques
No ratings yet
Machine Learning Applications and Techniques
53 pages
Supervised Machine Learning Guide
No ratings yet
Supervised Machine Learning Guide
22 pages
Data Preprocessing Methods in ML
No ratings yet
Data Preprocessing Methods in ML
3 pages
Deploying Python for Data Science
No ratings yet
Deploying Python for Data Science
7 pages
Machine Learning Data Wrangling Guide
No ratings yet
Machine Learning Data Wrangling Guide
73 pages
VTU Exam Question Paper With Solution of BCS602 Machine Learning-1 June-2025-Navya V K
No ratings yet
VTU Exam Question Paper With Solution of BCS602 Machine Learning-1 June-2025-Navya V K
34 pages
PGP in Data Science Curriculum Overview
No ratings yet
PGP in Data Science Curriculum Overview
16 pages
Model Evaluation & Selection Guide
No ratings yet
Model Evaluation & Selection Guide
17 pages
Lecture 10 Advanced
No ratings yet
Lecture 10 Advanced
31 pages
Data Preparation in Machine Learning
No ratings yet
Data Preparation in Machine Learning
11 pages
Spark Neural Network Overview
No ratings yet
Spark Neural Network Overview
43 pages
3rd Unit Da
No ratings yet
3rd Unit Da
5 pages
Machine Learning Basics Overview
No ratings yet
Machine Learning Basics Overview
21 pages
Experiment 2
No ratings yet
Experiment 2
4 pages
Machine Learning in Data Science Overview
No ratings yet
Machine Learning in Data Science Overview
94 pages
PGP in AI & ML at Great Lakes
No ratings yet
PGP in AI & ML at Great Lakes
43 pages
Machine Learning Process
No ratings yet
Machine Learning Process
8 pages
Machine Learning Process and Data Types
No ratings yet
Machine Learning Process and Data Types
4 pages
Machine Learning Course Outline at BITS Pilani
No ratings yet
Machine Learning Course Outline at BITS Pilani
70 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
80 pages
Data Mining Learning Path
No ratings yet
Data Mining Learning Path
14 pages
MLT Theory
No ratings yet
MLT Theory
8 pages
MLE Complete Study Guide
No ratings yet
MLE Complete Study Guide
20 pages
Essential Steps for DS/ML Projects
No ratings yet
Essential Steps for DS/ML Projects
30 pages
Machine Learning With Python
100% (3)
Machine Learning With Python
137 pages
Machine Learning Pipeline Overview
No ratings yet
Machine Learning Pipeline Overview
19 pages
Master Data Science Skills Roadmap
No ratings yet
Master Data Science Skills Roadmap
19 pages
Chap6 New
No ratings yet
Chap6 New
27 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
End-to-End Machine Learning Guide
No ratings yet
End-to-End Machine Learning Guide
9 pages
Diabetic Retinopathy Detection Review
No ratings yet
Diabetic Retinopathy Detection Review
2 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
23 pages
Deep Learning Exam Paper Overview
No ratings yet
Deep Learning Exam Paper Overview
2 pages
Cardiac Arrhythmia Classification Using ML
No ratings yet
Cardiac Arrhythmia Classification Using ML
5 pages
VTU AI Question Paper Overview
No ratings yet
VTU AI Question Paper Overview
4 pages
B.Tech AI & Data Science Question Bank
No ratings yet
B.Tech AI & Data Science Question Bank
2 pages
Deep Learning
No ratings yet
Deep Learning
2 pages
ID3 Decision Tree Calculation Example
No ratings yet
ID3 Decision Tree Calculation Example
12 pages
Understanding Transformer Architecture
No ratings yet
Understanding Transformer Architecture
4 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
6 pages
Deep Learning REcord
No ratings yet
Deep Learning REcord
23 pages
Email Spam Classification Analysis
No ratings yet
Email Spam Classification Analysis
5 pages
Back-Propagation 1
No ratings yet
Back-Propagation 1
34 pages
Overview of Artificial Neural Networks
No ratings yet
Overview of Artificial Neural Networks
32 pages
Understanding Gated RNNs (GRUs)
No ratings yet
Understanding Gated RNNs (GRUs)
10 pages
Text Sentiment Predictor Using LSTM
No ratings yet
Text Sentiment Predictor Using LSTM
8 pages
Implementing Backpropagation in ANN
No ratings yet
Implementing Backpropagation in ANN
4 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
15 pages
Neural Networks for Anomaly Detection Review
No ratings yet
Neural Networks for Anomaly Detection Review
26 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
22 pages
Deep Learning Course Overview 2024/2025
No ratings yet
Deep Learning Course Overview 2024/2025
48 pages
Supervised Learning Network
No ratings yet
Supervised Learning Network
33 pages
Computer Vision with PyTorch Guide
No ratings yet
Computer Vision with PyTorch Guide
10 pages
Seismic Facies Classification Using Supervised Convolutional Neural Networks and Semisupervised Generative Adversarial Networks
No ratings yet
Seismic Facies Classification Using Supervised Convolutional Neural Networks and Semisupervised Generative Adversarial Networks
12 pages
Differences Between Bagging and Random Forest
100% (5)
Differences Between Bagging and Random Forest
5 pages
Understanding Association Rule Mining
No ratings yet
Understanding Association Rule Mining
22 pages
Generative Modelling
No ratings yet
Generative Modelling
18 pages
Deep Learning Neural Network Challenges
No ratings yet
Deep Learning Neural Network Challenges
12 pages
In-Depth Guide to Machine Learning Algorithms
No ratings yet
In-Depth Guide to Machine Learning Algorithms
7 pages
Deep Learning for Predictive Maintenance
No ratings yet
Deep Learning for Predictive Maintenance
5 pages

Data Modeuling & Evaluation - Practical - List

Uploaded by

Data Modeuling & Evaluation - Practical - List

Uploaded by

DATA MODELING AND EVALUTION

SYLLABUS OF THE COURSE

1. Data Cleaning & Preprocessing: 5

3 Linear Regression and Its Applications

1. Load a dataset (e.g., housing prices dataset). 5

4 Decision Trees and Random Forest

6 Gradient Boosting Machines (GBM, XGBoost,

Clustering (K-Means, Hierarchical Clustering) and

8 Classification Metrics: Confusion Matrix, Accuracy,

9 Time Series Modeling ARIMA, SARIMA, and Prophet

Building an MLP for Structured Data 5

10 Model Validation Techniques

Implementing -Time-Based Split for Time Series Data

Ensuring Reproducibility in Machine Learning

11 Overfitting and Underfitting

Setting Up Random Seeds:

Version Control for Machine Learning Projects:

Using Reproducibility Tools:

You might also like