P.V.
P SIDDHARTHA INSTITUTE OF TECHNOLOGY
BRANCH : CSE REGULATION : PVP23
Course: [Link] SUBJECT: DATA WAREHOUSING AND DATA MINING
Year and Semester: III Year / I
Subject Code:23CS3501
Semester
QUESTION BANK
UNIT I
Q.
PART A CO LEVEL
NO
1 Define Data Mining. 1 L1
2 What is a data warehouse? 1 L1
3 List any two differences between OLTP and OLAP. 1 L1
4 What are the types of data that can be mined? 1 L1
5 Define an attribute and give an example. 1 L1
6 What is the purpose of data cleaning? 1 L1
7 What is meant by data discretization? 1 L1
8 Define a data cube. 1 L1
9 List any two real-life applications of data mining. 1 L1
10 What is data reduction? 1 L1
PART B
Classify and describe different types of data that can be mined in
11 1 L3
data mining.
Normalize the following group of data by using the following
techniques.
200, 300, 400, 600, 1000
12 i. min-max normalization technique 1 L3
ii. z-score normalization
iii. Decimal scaling.
a) Write your observations on the above techniques.
In what way various data mining functionalities are applied to
13 1 L3
solve real-world problems
14 Identify and discuss the challenges and issues in data mining 1 L3
Illustrate the multi-tiered architecture of data warehouse with a
15 1 L3
neat diagram.
Illustrate OLAP operations by applying examples for roll-up, drill-
16 1 L3
down, slice, dice, and pivot.
How to make use of statistical measures to measure central
17 1 L3
tendency of data.
18 Suppose that the data for analysis includes the attribute age. The 1 L3
age values for the data tuples are (in increasing order):
13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72
(i) Use min-max normalization to transform the value of 45 for
age onto the range [0,1]
(ii) Use Z-Score normalization to transform the value 45 for age
where the standard deviation of age is 20.64 years.
With a neat sketch illustrate KDD (Knowledge Discovery in
19 1 L3
Databases) process.
How data reduction techniques are utilized in dimensionality
20 1 L3
reduction.
UNIT II
Q.N
PART A CO LEVEL
O
1 Define Machine Learning. 1 L1
2 List two real-world applications of Machine Learning. 1 L2
3 What are the main paradigms of machine learning? 1 L1
4 Mention any two stages in a typical Machine Learning process. 1 L1
5 List any 2 datasets used in machine learning? 1 L1
6 Define proximity measure in machine learning. 1 L1
7 What is Euclidean distance? 1 L2
8 Give one difference between KNN and Weighted KNN. 1 L2
9 Define KNN Regression. 1 L1
10 List any two performance evaluation metrics for classification. 1 L2
PART B
Compare and Contrast Machine Learning from traditional
11 2 L3
programming to modern approaches.
12 Distinguish paradigms of machine learning with examples. 2 L3
Illustrate the stages of machine learning process with a neat
13 2 L3
diagram?
Describe different types of datasets used for classification and
14 2 L2
Regression.
15 How machine learning is utilized in real-world applications. 2 L3
Identify the evaluation measures used for classification and
16 2 L3
Regression.
17 Illustrate K-Nearest Neighbors (KNN) algorithm with an example 2 L3
18 Apply K-nearest neighbor classifier to predict if a patient is 4 L4
diabetic using the features: BMI and Age. Assume K = 3. Test
Example:
BMI = 43.6, Age = 40,Sugar = ? Training examples are given in
the table above.
BMI Age Sugar
33.6 50 1
26.6 30 0
23.4 40 0
43.1 67 0
35.3 23 1
35.9 67 1
36.7 45 1
25.7 46 0
23.3 29 0
31.0 56 1
19 Compare and contrast standard KNN with Weighted KNN. 2 L3
Apply K-Nearest Neighbors(KNN) algorithm with Euclidean
Distance to predict the Fruit Type of Strawberry (Sweetness = 5,
Sourness = 5). Analyse the outputs when K=1,3,5.
Sweetne Sournes
Fruit Fruit Type
ss s
Lemon 1 9 Sour
20 Grapefruit 2 8 Sour 4 L4
Orange 3 7 Sour
Cherry 6 4 Sweet
Banana 9 1 Sweet
Grapes 8 2 Sweet
Strawberry 5 5 ?
UNIT III
Q.
NO PART A CO LEVEL
.
1 Define classification in the context of machine learning. 1 L1
2 What is a decision tree? 1 L1
Mention any two attribute selection measures used in
3 1 L2
decision trees.
4 What is overfitting in classification? 1 L2
5 Define entropy. 1 L2
6 What is pruning in decision trees? 1 L1
7 State Bayes Theorem. 1 L1
8 What is Naïve Bayes classifier? 1 L1
9 Define information gain 1 L2
10 List any two methods to improve classification accuracy. 1 L2
PART B
Construct a decision tree using ID3 algorithm for the
following example and Classify to predict which patients
are high risk for heart disease.
11 3 L4
Construct a decision tree using ID3 algorithm for the
following example and Classify whether the person gets
the Job offer or Not.
12 3 L4
Apply Naïve Bayes Algorithm for the below data and
analyze whether the person has flu or not for the below
sample?
13 4 L4
Data Sample: X= (Chills=’Y’, RunnyNose=’N’,
Headache=’No’, Fever=’Y’, Flu=?)
14 Analyze the given data and apply Naïve Bayes Algorithm 4 L4
for the given sample and predict if an accident will happen
or Not?
{Weather Condition -Rain, Road Condition =Good, Traffic
Condition = Normal, Engine Problem= No, Accident=?}
Write the Naïve Bayes Algorithm and then solve the below
problem. Consider the given dataset and apply Naïve
Bayes algorithm to predict that if a fruit has the following
properties, then which type of fruit it is? Fruit = {Yellow,
Sweet, Long}
15 Fruit Yellow Sweet Long Total 4 L4
Mango 350 450 0 650
Banana 400 300 350 400
Others 50 100 50 150
Total 800 850 400 1200
Illustrate different attribute selection measures in decision
16 2 L3
trees with an example?
17 Compare and Contrast Bagging and Boosting 2 L3
How tree pruning contributes in improving the
18 2 L3
effectiveness of a decision tree during its construction.
Compare and contrast Naïve Bayes and Decision Tree
19 2 L3
classifiers.
Illustrate the techniques used to improve classification
20 2 L3
accuracy.
UNIT IV
Q.
PART A CO LEVEL
NO.
1 What is a linear discriminant? 1 L1
2 Define perceptron learning rule. 1 L1
3 What is the role of the activation function in a perceptron? 1 L2
Mention one difference between logistic regression and
4 1 L2
linear regression.
5 What is a support vector machine (SVM)? 1 L1
6 Define the margin in the context of SVM. 1 L1
7 What is a linearly non-separable case in classification? 1 L2
8 How kernel is used in non-linear SVMs? 1 L2
9 Define Logistic regression. 1 L1
10 List two applications for linear regression. 1 L1
PART B
Analyze the given data and Apply Support Vector Machine
to plot hyper plane of the following data point on linearly
11 4 L4
separable data: (1,1) (2,1) (1,-1) (2, -1) (4, 0) (5, 1) (5, -1)
(6, 0)
Illustrate the perceptron learning algorithm with an
12 2 L3
example.
Analyze the given data and Apply Support Vector Machine
to plot hyper plane of the following data point on linearly
13 separable data: (2,2), (−2,2), (−2,−2), (2,−2)as positively 4 L4
labelled points (1,1), (1,−1), (−1,−1), (−1,1) as
negatively labelled points.
In what way SVM works in the linearly separable case with
14 2 L3
an example.
Illustrate the concept of non-linear SVM and how the
15 2 L3
kernel trick is used.
Compare and contrast linear SVM and non-linear SVM
16 2 L3
with examples.
Compare and Contrast logistic regression and linear
17 2 L3
regression with examples.
Illustrate the concept of logistic regression model with its
18 2 L3
sigmoid function.
How linear regression is used for prediction with an
19 2 L3
example.
Apply logistic regression to demonstrate binary
20 2 L3
classification.
UNIT V
Q.
PART A CO LEVEL
NO.
1 What is clustering in data mining? 1 L1
2 Define partitional clustering. 1 L1
3 What is the basic idea of agglomerative clustering? 1 L1
List two differences between hierarchical and partitional
4 1 L2
clustering.
5 Define centroid in K-Means clustering. 1 L1
6 What is soft clustering? 1 L2
What is the main difference between K-Means and Fuzzy
7 1 L2
C-Means clustering?
What is the goal of Expectation Maximization (EM)
8 1 L2
clustering?
9 Define intra-cluster and inter-cluster similarity. 1 L2
10 State one limitation of K-Means clustering. 1 L2
PART B
Compute the Hierarchical Clustering on the below data
and represent the output in a dendrogram
A B C d e f
a 0
b 0.12 0
11 c 0.51 0.25 0 4 L3
d 0.84 0.16 0.14 0
0.4
e 0.28 0.77 0.70 0
5
0.2
f 0.34 0.61 0.93 0.67 0
0
Compute the Hierarchical Clustering on the given data
and denote the output in dendogram.
A B C d e f
a 0
12 b 0.71 0 4 L3
c 5.66 4.95 0
d 3.61 2.92 2.24 0
e 4.24 3.54 1.41 1.00 0
f 3.20 2.50 2.50 0.50 1.12 0
Apply K-Means Clustering on a two- Dimensional dataset
containing six data points {(1,1) , (2,1), (1,4), (4,3), (5,4),
13 4 L3
(6,5)} using K=2, Euclidian Distance , and the initial
cluster centroids are c1=(1,1) and c2=(2,1).
Apply K-Means Clustering with three clusters
14 A1(2,10),A2(2,5),A3(8,4) A4(5,8)A5(7,5) A6(6,4) A7(1,2) 4 L3
A8(4,9)
15 Illustrate Fuzzy C-Means clustering with an example. 3 L3
Compare K-Means vs Fuzzy C-Means with an example and
16 3 L3
list the key differences.
How the Expectation-Maximization (EM) clustering
17 3 L3
process with an example.
How Partitioning of Data is done in context of
18 3 L3
reorganization, Compression and summarization?
Compare and Contrast Agglomerative and divisive
19 3 L3
Clustering?
20 Illustrate K-means Clustering algorithm with an example? 3 L3
Course Coordinators
1. Dr [Link]
2. Ms K.N Divya
3. Ms A Madhuri
HOD,CSE