Hope Foundation’s
Finolex Academy of Management and Technology, Ratnagiri
Department of Computer Science and Engineering (AIML)
Subject name: Data Warehousing and Mining Lab Subject Code: CSL503
Class TE CSE Semester –VI (CBCGS) Academic year: 2024-25
Name of Student Zain Munawar Solkar QUIZ Score :
Roll No 75 Experiment No. 04
Title: Using open source tools perform Clustering.
1. Lab objectives applicable:
LOB4: To make students well versed in all data mining algorithms, methods, and tools.
2. Lab outcomes applicable:
LO3: Demonstrate an understanding of the importance of data mining.
LO6: Implement the appropriate data mining methods like classification, clustering or Frequent Pattern mining on large
data sets.
3. Learning Objectives:
1. To determine similarity and dissimilarity among elements and create clusters accordingly.
4. Practical applications of the assignment/experiment:
Clustering algorithms group similar data points together to uncover patterns and relationships, enhancing data
analysis and decision-making.
5. Prerequisites:
NA
6. Minimum Hardware Requirements:
1. I series processor, RAM 4GB,
7. Software Requirements:
1. Weka 3.8
8. Quiz Questions
[Link]
wform?usp=sf_link
9. Experiment/Assignment Evaluation:
Sr. No. Parameters Marks obtained Out of
1 Technical Understanding (Assessment may be done based on Q & A or any 6
other relevant method.) Teacher should mention the other method used -
2 Lab Performance 2
3 Punctuality 2
Date of performance (DOP) Total marks obtained 10
Signature of Faculty
Department of Computer Science and Engineering
10. Theory:
Solve example which is fed as input to Weka software. K-means one dimensional problem and 2-dimensional problem
Q.1) Implement k means clustering to form 2 clusters.
{13, 16, 29 ,78, 21, 43, 56, 90, 21, 8, 88, 60, 34}
Solution: -
Step 1 –
K=2
Let the two clusters be K1 and K2 with means M1 and M2 respectively
M1=29, M2=13
Step 2 –
Cluster K1: {29, 78, 21, 43, 56, 90, 21, 88, 60, 34}
Cluster K2: {13, 16, 8}
New M1 = (29 + 78 + 21 + 43 + 56 + 90 + 21 + 88 + 60 + 34) / 10 = 520 / 10 = 52.0
New M2 = (13 + 16 + 8) / 3 = 37 / 3 ≈ 12.33
Cluster K1: {29, 78, 43, 56, 90, 88, 60, 34}
Cluster K2: {13, 16, 21, 21, 8}
New M1 = (29 + 78 + 43 + 56 + 90 + 88 + 60 + 34) / 8 = 478 / 8 = 59.75
New M2 = (13 + 16 + 21 + 21 + 8) / 5 = 79 / 5 = 15.8
Cluster K1: {78, 43, 56, 90, 88, 60}
Cluster K2: {13, 16, 29, 21, 21, 8, 34}
New M1 = (78 + 43 + 56 + 90 + 88 + 60) / 6 = 415 / 6 ≈ 69.17
New M2 = (13 + 16 + 29 + 21 + 21 + 8 + 34) / 7 = 142 / 7 ≈ 20.29
Cluster K1: {78, 56, 90, 88, 60}
Cluster K2: {13, 16, 29, 21, 21, 8, 34, 43}
New M1 = (78 + 56 + 90 + 88 + 60) / 5 = 372 / 5 = 74.4
New M2 = (13 + 16 + 29 + 21 + 21 + 8 + 34 + 43) / 8 = 185 / 8 = 23.13
Cluster K1: {78, 56, 90, 88, 60}
Cluster K2: {13, 16, 29, 21, 21, 8, 34, 43}
No changes in the Clusters.
Step 3 –
Final Clusters are; -
K1 (Mean ≈ 74.4): {78, 56, 90, 88, 60}
K2 (Mean ≈ 23.13): {13, 16, 29, 21, 21, 8, 34, 43}
Q.2) Apply k means clustering to form 2 clusters.
Object Attribute1 (X) Attribute 2 (Y)
Weight index PH
MedicineA 1 1
MedicineB 2 1
MedicineC 4 3
MedicineD 5 4
Solution: -
Step 1 –
K=2
Let the two clusters be K1 and K2 with means M1 and M2 respectively
M1=MedicineC (4,3), M2=MedicineA (1,1)
Department of Computer Science and Engineering
Step 2 –
Object Coordinates Distance to M1 (4,3) Distance to M2 (1,1) Assigned
Cluster
MedicineA (1,1) √((4 − 1)² + (3 − 1)²) = √13 ≈ √((1 − 1)² + (1 − 1)²) = 0.00 K2
3.61
MedicineB (2,1) √((4 − 2)² + (3 − 1)²) = √8 ≈ √((1 − 2)² + (1 − 1)²) = √1 = K2
2.83 1.00
MedicineC (4,3) √((4 − 4)² + (3 − 3)²) = 0.00 √((1 − 4)² + (1 − 3)²) = √13 ≈ K1
3.61
MedicineD (5,4) √((4 − 5)² + (3 − 4)²) = √2 ≈ √((1 − 5)² + (1 − 4)²) = √25 = K1
1.41 5.00
K1: {MedicineC (4, 3), MedicineD (5, 4)}
K2: {MedicineA (1, 1), MedicineB (2, 1)}
Updated Means:
M1 = (4.5, 3.5)
M2 = (1.5, 1)
Object Coordinates Distance to M1 (4.5,3.5) Distance to M2 (1.5,1) Assigned
Cluster
MedicineA (1,1) √((4.5 − 1)² + (3.5 − 1)²) = √((1.5 − 1)² + (1 − 1)²) = K2
√18.5 ≈ 4.30 √0.25 = 0.50
MedicineB (2,1) √((4.5 − 2)² + (3.5 − 1)²) = √((1.5 − 2)² + (1 − 1)²) = K2
√12.5 ≈ 3.54 √0.25 = 0.50
MedicineC (4,3) √((4.5 − 4)² + (3.5 − 3)²) = √((1.5 − 4)² + (1 − 3)²) = K1
√0.5 ≈ 0.71 √10.25 ≈ 3.20
MedicineD (5,4) √((4.5 − 5)² + (3.5 − 4)²) = √((1.5 − 5)² + (1 − 4)²) = K1
√0.5 ≈ 0.71 √21.25 ≈ 4.61
K1: {MedicineC (4, 3), MedicineD (5, 4)}
K2: {MedicineA (1, 1), MedicineB (2, 1)}
Updated Means:
M1 = (4.5, 3.5)
M2 = (1.5, 1)
No changes in the Clusters
Step 3 –
Final Clusters are; -
K1: {MedicineC (4, 3), MedicineD (5, 4)}
K2: {MedicineA (1, 1), MedicineB (2, 1)}
Department of Computer Science and Engineering
11. Outcome –
K-means 1D -
Source code:
Department of Computer Science and Engineering
Output:
Department of Computer Science and Engineering
K-means 2D -
Source code:
Department of Computer Science and Engineering
Output:
Department of Computer Science and Engineering
12. Learning Outcomes Achieved
1. Students are able to cluster the given data in k- some known number of clusters.
13. Conclusion:
1. Applications of the Studied Technique in Industry
Clustering algorithms, such as K-means or hierarchical clustering, are widely used in industry for customer
segmentation, market analysis, and anomaly detection. These techniques help businesses tailor marketing strategies,
optimize resource allocation, and identify unusual patterns or trends in large datasets
2. Engineering Relevance
Clustering algorithms are crucial in engineering for solving complex problems related to pattern recognition, image
processing, and system optimization. They enable engineers to group similar data points, improve model accuracy, and
make informed decisions based on data-driven insights.
3. Skills Developed
The experiment with clustering algorithms enhances skills in data preprocessing, algorithm implementation, and result
interpretation. It also develops expertise in applying statistical techniques to solve real-world problems, as well as
proficiency in using data mining tools and software for effective data analysis.
14. References:
[1] https:// Paulraj Ponniah, “Data Warehousing: Fundamentals for IT Professional” , Wiley Publications
[2] Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann 3nd Edition.
[3] Margaret H. Dunham, “Data Mining: Introductory and Advanced Topics”, Person Education.
[4] Raghu Ramakrishnan and Johannes Gehrke, “Database Management Systems”, 3rd Edition McGraw Hill.
[5] Elmasari and Navathe, “Fundamentals of Database Systems”, Pearson Education.
Department of Computer Science and Engineering