0% found this document useful (0 votes)

11 views79 pages

Data Mining Lab Programs 2023-24

None

Uploaded by

HARI HARAN KRISHNAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views79 pages

Data Mining Lab Programs 2023-24

None

Uploaded by

HARI HARAN KRISHNAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

College of Excellence, 2022-4th Rank

Autonomous and Affiliated to Bharathiar University

Accredited with A++ grade by NAAC, An ISO 9001: 2015 Certified Institution
Peelamedu, Coimbatore-641004

DEPARTMENT OF COMPUTER SCIENCE (PG)

DATA MINING LAB (MCS22P5)

2023-2024
College of Excellence, 2022-4th Rank
Autonomous and Affiliated to Bharathiar University
Accredited with A++ grade by NAAC, An ISO 9001: 2015 Certified Institution
Peelamedu, Coimbatore-641004

DEPARTMENT OF COMPUTER SCIENCE (PG)

DATA MINING LAB (MCS21P5)

Certified that this is a bonafide record work done by of

II [Link] (Computer Science) during the year 2023-2024.

FACULTY INCHARGE HEAD OF THE DEPARTMENT

Submitted for the practical examination held on at

PSGR Krishnammal College for Women, Coimbatore.

INTERNAL EXAMINER EXTERNAL EXAMINER

INDEX

S No. Date Topics Page No Sign

PYTHON PROGRAMS

1
1 Linear Regression

4
2 Decision Tree
6
3 Naive Bayes
9
4 Support Vector Machine
13
5 Comparision of KNN, SVM,
Decision Tree, Logistic Regression,
Naïve Bayes
18
6 BIRCH Clustering

R PROGRAMS

21
7 Data Exploration and Visualization
27
8 Linear Regression
30
9 Association Rule Mining
39
10 Classification using SVM

K-Means Clustering and Outliers 43

11
Detection

Text Preprocessing 49
12
S
Date Topics Page No Sign
No.

DATA VISUALIZATION USING TABLEAU

Visualizing maximum Confirmed, 55

13 Cured Covid-19 patients in Covid-19
Dataset
Visualizing the difference in Volume 57
14
Year-Wise and Quarter-Wise
59
15 Visualizing Delay of Flights

Visualizing the Sales of the Product 61

16
in Bakery Dataset

DATA VISUALIZATION USING POWER BI

Visualizing Monthly Changes in 63

17
Stock Datase

Visualizing Product-Wise 65
18 Sales using Superstore
Dataset
Visualizing Region-Wise and 67
19
Category-Wise Profit using
Superstore Dataset
Power Query Operations using 70
20
Walmart and Masters Dataset
Ex. No: 01
LINEAR REGRESSION
Date:

AIM

To write a Python program to perform linear regression using diabetes dataset.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open Jupyter Notebook and create new Python3 Notebook.

STEP 3: Import the packages matplotlib. pyplot and numpy.

STEP 4: Import dataset, linear_model from sklearn and mean_squared_error, r2_score from
[Link].

STEP 5: Load the diabetes dataset using datasets.load_diabetes( ) and use one feature.

STEP 6: Split the data and target into training, testing set.

STEP 7: Create linear regression object using linear_model.LinearRegression( ).

STEP 8: Train the model using training dataset and make predictions using testing set by using
predict( ).

STEP 9: Print coefficients, variance score and mean-squared error using coef_, r2_score( ) and
mean_squared_error( ).

STEP 10: Plot the graph using [Link]( ).

STEP 11: Save the program and run it.

STEP 12: Stop the process.

1
PROGRAM

import [Link] as plt

import numpy as np
from sklearn import datasets,linear_model
from [Link] import mean_squared_error,r2_score
diabetes=datasets.load_diabetes()
diabetes_X=[Link][:,[Link],2]
diabetes_X_train=diabetes_X[:-20]
diabetes_X_test=diabetes_X[-20:]
diabetes_y_train=[Link][:-20]
diabetes_y_test=[Link][-20:]
regr=linear_model.LinearRegression()
[Link](diabetes_X_train,diabetes_y_train)
diabetes_y_pred=[Link](diabetes_X_test)
print('Coefficients:%d'%regr.coef_)
print("Mean squared error:%d" %mean_squared_error(diabetes_y_test,diabetes_y_pred))
print('Variance score:%d'% r2_score(diabetes_y_test,diabetes_y_pred))
[Link](diabetes_X_test,diabetes_y_test,color='green')
[Link](diabetes_X_test,diabetes_y_pred,color='blue',linewidth=3)
[Link](())
[Link](())
[Link]()

2
OUTPUT

Coefficients:938
Mean squared error:2548
Variance score:0

RESULT

Thus, the above python program to perform linear regression on the diabetes dataset
has been verified and executed successfully.

3
Ex. No: 02
DECISION TREE
Date:

AIM

To write a Python program to classify the samples as man and woman based on the
attributes of height and length of hair using decision tree classifier.

ALGORITHM

STEP 1: Start the process

STEP 2: Open Jupyter Notebook and create new Python3 notebook

STEP 3: Import tree from sklearn and train_test_split fromsklearn.model_selection

STEP 4: Assign X as the input features namely height and length of hair

STEP 5: Assign Y as the target variable namely woman and man

STEP 5: Split the dataset into training set and test set using train_test_split()

STEP 6: Create a decision tree classifier object and fit the model using the training

set

STEP 7: Predict the target for test data

STEP8:Save and run the program

STEP 9: Stop the process

4
PROGRAM

from sklearn import tree

from sklearn.model_selection import train_test_split
X=[[165,19],[175,32],[136,35],[174,65],[141,28],[176,15],[131,32],[166,6],[128,32],[179,10]
,[136,34],[186,2],[126,25],[176,28],[112,38],[169,9],[171,36],[116,25],[196,25],[196,38],[12

6,40],[197,20],[150,25],[140,32],[136,35]]
Y=['Man','Woman','Woman','Man','Woman','Man','Woman','Man','Woman','Man','Woman','M

an','Woman','Woman','Woman','Man','Woman','Woman','Man','Woman','Woman','Man','Man',
'Woman','Woman']
data_feature_names = ['height','length of hair']
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3,random_state=10)
DTclf = [Link]()
DTclf1 = [Link](X,Y)
prediction = [Link]([[140,37]])
print(prediction)

OUTPUT

['Woman']

RESULT

Thus, the program is executed successfully and the output is verified.

5
Ex. No: 03
NAIVE BAYES
Date:

AIM

To write a Python program to implement Naïve Bayes classification using Breast

Cancer dataset.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open Jupyter Notebook and create new Python3 Notebook.

STEP 3: Import the numpy, pandas, io packages and train_test_split, preprocessing from
sklearn package.

STEP 4: Load the Breast cancer dataset using read_csv.

STEP 5: Split out the validation dataset using train_test_split() function.

STEP 6: Create a model using Gaussian Naïve Bayes algorithm.

STEP 7: Fit the model with the training dataset.

STEP 8: Make the predictions using the validation datasets.

STEP 9: Print the classification report for the expected and predicted data.

STEP 10: Print the confusion matrix for the expected and predicted data.

STEP 11: Stop the process.

6
PROGRAM

import numpy as np
import pandas as pd
from pandas import read_csv
import io
from sklearn.model_selection import train_test_split
path = "D:\\Datasets\\[Link]"
names = ['A1','A2','A3','A4','A5','A6','A7','A8','A9','class']
dataset = read_csv(path, names=names)array = [Link]
print(array)
X = array[:,0:8]print(X)
y = array[:,9]
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20,
random_state=1, shuffle=True)
from sklearn import preprocessingfrom sklearn import metrics
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
[Link](X_train, Y_train)
print(model)
expected = Y_validation
predicted = [Link](X_validation) print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

7
OUTPUT

RESULT

Thus, the program is executed successfully and the output is verified.

8
Ex. No: 04
SPPORT VECTOR MACHINE
Date:

AIM
To write a Python program to implement Support Vector Machine using wine quality
dataset

ALGORITHM
STEP 1: Start the process
STEP 2: Open Jupyter notebook and create Python3 notebook
STEP 3: Import read_csv from pandas, train_test_split from sklearn.model_selection and
classification_report, confusion_matrix from [Link]
STEP 4: Load wine quality dataset using read_csv( ) and split out validation dataset using
train_test_split( )
STEP 5: Make predictions using model and summarize the fit of model
STEP 6: Print classification report and confusion matrix using excepted and predicted
STEP 7: Save the program and run it
STEP 8: Stop the process

9
PROGRAM

from pandas import read_csv

from sklearn.model_selection import train_test_split
from [Link] import classification_report,confusion_matrix
from [Link] import SVC
path = "C:\\Users\\2msccs02\\Downloads\\[Link]"
names = ['A1','A2','A3','A4','A5','A6','A7','A8','A9','A10','A11','A12','A13','class']
dataset = read_csv(path,names=names)
array = [Link]
print(array)
X = array[:,0:12]
print(X)
Y = array[:,13]
X_train,X_validation,Y_train,Y_validation=train_test_split(X,Y,test_size=0.20,random_state
=1,shuffle=True)
model = SVC()
[Link](X_train,Y_train)
print(model)
excepted = Y_validation
predicted = [Link](X_validation)
print (classification_report(excepted,predicted))
print (confusion_matrix(excepted,predicted))

10
OUTPUT

[[1.423e+01 1.710e+00 2.430e+00 ... 3.920e+00 1.065e+03 1.000e+00]

[1.320e+01 1.780e+00 2.140e+00 ... 3.400e+00 1.050e+03 1.000e+00]

[1.316e+01 2.360e+00 2.670e+00 ... 3.170e+00 1.185e+03 1.000e+00]

...

[1.327e+01 4.280e+00 2.260e+00 ... 1.560e+00 8.350e+02 3.000e+00]

[1.317e+01 2.590e+00 2.370e+00 ... 1.620e+00 8.400e+02 3.000e+00]

[1.413e+01 4.100e+00 2.740e+00 ... 1.600e+00 5.600e+02 3.000e+00]]

[[14.23 1.71 2.43 ... 5.64 1.04 3.92]

[13.2 1.78 2.14 ... 4.38 1.05 3.4 ]

[13.16 2.36 2.67 ... 5.68 1.03 3.17]...

[13.27 4.28 2.26 ... 10.2 0.59 1.56]

[13.17 2.59 2.37 ... 9.3 0.6 1.62]

[14.13 4.1 2.74 ... 9.2 0.61 1.6 ]]

SVC()

precision recall f1-score support

1.0 0.53 0.71 0.61 14

2.0 0.53 0.69 0.60 13

3.0 0.00 0.00 0.00 9

11
accuracy 0.44 0.55 0.53 36

macro avg 0.35 0.47 0.40 36

weighted avg 0.40 0.53 0.45 36

[[10 4 0]

[4 9 0]

[5 4 0]]

RESULT

Thus, the program is executed successfully and the output is verified.

12
Ex. No: 05
COMPARISION OF KNN, SVM, DECISION
Date: TREE, LOGISTIC REGRESSION, NAÏVE
BAYES

AIM
To write a Python program to compare the algorithm for classification of iris dataset

ALGORITHM
STEP 1: Start the process
STEP 2: Open Jupyter Notebook and create new Python3 notebook
STEP 3: Import read_csv from pandas and pyplot from matplotlib
STEP 4: Import train_test_split, cross_val_score and StratifiedKFold from sklearn.model_
selection
STEP 5: Import LogisticRegression from sklearn.linear_model, DecisionTreeClassifier from
[Link], KNeighborsClassifier from [Link], GaussianNB from sklearn.naive_bayes
and SVC from [Link]
STEP 6: Load iris dataset and their column names using read_csv( )
STEP 7: Split out validation dataset using train_test_split( )
STEP 8: Create model using LogisticRegression( ), KNeighbors( ), GaussianNB( ),
DecisionTreeClassifier( ), SVC( )
STEP 9: Check algorithms and evaluate each model in turn using cross_val_score( )
STEP 10: Compare the algorithmsusing [Link]( ) and display the graph using
[Link]( )
STEP 11: Save and run the program
STEP 12: Stop the process

13
PROGRAM

from pandas import read_csv

from matplotlib import pyplot
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
from [Link] import DecisionTreeClassifier
from [Link] import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from [Link] import SVC
path = "C:/Users/2msccs02/Downloads/Datasets/[Link]"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = read_csv(path, names=names)
array = [Link]
X = array[:,0:4]
y = array[:,4]
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20,
random_state=1, shuffle=True)
models = []
[Link](('LR', LogisticRegression(solver='liblinear', multi_class='ovr')))
[Link](('KNN', KNeighborsClassifier()))
[Link](('CART', DecisionTreeClassifier()))
[Link](('NB', GaussianNB()))
[Link](('SVM', SVC(gamma='auto')))
results = []
names = []
for name, model in models:
kfold = StratifiedKFold(n_splits=10, random_state=1, shuffle=True)
cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
[Link](cv_results)
[Link](name)
print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))
[Link](results, labels=names)

14
[Link]('ALGORITHM Comparison')
[Link]()

OUTPUT

LR: 0.941667 (0.065085)

KNN: 0.958333 (0.041667)

15
CART: 0.941667 (0.053359)

NB: 0.950000 (0.055277)

16
SVM: 0.983333 (0.033333)

RESULT

Thus, the program is executed successfully and the output is verified.

17
Ex. No: 06
BIRCH CLUSTERING
Date:

AIM

To write a Python program to perform clustering using BIRCH algorithm

ALGORITHM

STEP 1: Start the process

STEP 2: Open Jupyter Notebook and create a new Python3 notebook

STEP 3: Import unique, where from numpy, make_classification and Birch from sklearn and
pyplot from matplotlib

STEP 3: Create random samples using make_classification()

STEP 4: Create a model using BIRCH

STEP 5: Fit the model

STEP 6: Predict the cluster and assign it to yhat

STEP 7: Retrieve the unique clusters from yhat

STEP 8: Create the scatter for the unique cluster samples

STEP 9: Plot the scatter plot using [Link]() function

STEP 10: Display the plot using [Link]() function

STEP 11: Stop the Process.

18
PROGRAM

from numpy import unique

from numpy import where
from [Link] import make_classification
from [Link] import Birch
from matplotlib import pyplot
X, _ = make_classification(n_samples=1000, n_features=2, n_informative=2,
n_redundant=0, n_clusters_per_class=1, random_state=4)
model = Birch(threshold=0.01, n_clusters=2)
[Link](X)
yhat = [Link](X)
clusters = unique(yhat)
for cluster in clusters:
row_ix = where(yhat == cluster)
[Link](X[row_ix, 0], X[row_ix, 1])
[Link]()

19
OUTPUT

RESULT

Thus, the program is executed successfully and the output is verified.

20
Ex. No: 07 DATA EXPLORATION AND
Date: VISUALIZATION

AIM

To perform the data exploration of the iris dataset and to implement various statistical
operations.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open R Console.

STEP 3: Load the iris dataset in R console.

STEP 4: Display the descriptive features of the dataset using commands like dim(iris),
names(iris), str(iris), attributes(iris), iris[1:3], head(iris,10), tail(iris,10), summary(iris),
iris[1:10,"[Link]"].

STEP 5: Find the mean, range, median and histogram using following commands,
mean(iris$[Link]), range(iris$[Link]), median(iris$[Link]),
quantile(iris$[Link]), hist(iris$[Link]).

STEP 6: Calculate covariance and correlation using cov(iris$[Link],iris$[Link])

and cor(iris$[Link],iris$[Link]).

STEP 7: In R console pie chart is plotted for the iris dataset.

STEP 8: Stop the process.

21
PROGRAM

> View(iris)
> dim(iris)
[1] 150 5
> names(iris)
[1] "[Link]" "[Link]" "[Link]" "[Link]"
[5] "Species"
> structure(iris)
> str(iris)
'[Link]': 150 obs. of 5 variables:
$ [Link]: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ [Link] : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ [Link]: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ [Link] : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> attributes(iris)
$names
[1] "[Link]" "[Link]" "[Link]" "[Link]"
[5] "Species"
$class
[1] "[Link]"
$[Link]
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
[17] 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
[33] 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
[49] 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
[65] 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
[81] 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
[97] 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
[113] 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
[129] 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
[145] 145 146 147 148 149 150

22
> head(iris)
[Link] [Link] [Link] [Link] Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> head(iris,10)
[Link] [Link] [Link] [Link] Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa

> summary(iris)
[Link] [Link] [Link] [Link]
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
Versicolor :50
virginica :50
> tail(iris,3)
[Link] [Link] [Link] [Link] Species
23
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
> mean(iris$[Link])
[1] 5.843333
> range(iris$[Link])
[1] 4.3 7.9
> cov(iris$[Link],iris$[Link])
[1] -0.042434
> median(iris$[Link])
[1] 5.8

> plot(density(iris$[Link]))

24
> hist(iris$[Link])

> table(iris$[Link])
4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6
1 3 1 4 2 5 6 10 9 4 1 6 7 6 8 7 3 6
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7 7.1 7.2 7.3 7.4 7.6 7.7 7.9
6 4 9 7 5 2 8 3 4 1 1 3 1 1 1 4 1

> pairs(iris)

25
> pie(table(iris$Species))

RESULT

Thus, the data exploration of the iris dataset is performed and various statistical
operations are implemented.

26
Ex. No: 08
LINEAR REGRESSION
Date:

AIM

To perform linear regression using the salary dataset to predict salary based on
experience.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open R Console.

STEP 3: Load the salary dataset in the R console.

STEP 4: Import the caTools package needed for Linear Regression.

STEP 5: Split the dataset into train and test sets.

STEP 6: Find the line of fit using the lm formula, using years of experience and salary.

STEP 7: Predict the test set results using predict.

STEP 8: Visualize training and test set using ggplot.

STEP 9: Stop the process.

27
PROGRAM

>dataset = [Link]("C:/Users/2msccs02/Documents/Salary_Data.csv")
> View(dataset)
> [Link]('caTools')
package ‘caTools’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\2msccs02\AppData\Local\Temp\RtmpkXv9S1\downloaded_packages
> library(caTools)
> [Link](123)
> split = [Link](dataset$Salary, SplitRatio = 2/3)
> training_set = subset(dataset, split == TRUE)
> test_set = subset(dataset, split == FALSE)
> regressor = lm(formula = Salary ~ YearsExperience,
+ data = training_set)
> y_pred = predict(regressor, newdata = test_set)
> library(ggplot2)
> ggplot() +
+ geom_point(aes(x = training_set$YearsExperience, y = training_set$Salary),
+ colour = 'red') +
+ geom_line(aes(x = training_set$YearsExperience, y = predict(regressor, newdata =
training_set)),
+ colour = 'blue') +
+ ggtitle('Salary vs Experience (Training set)') +
+ xlab('Years of experience') +
+ ylab('Salary')

28
OUTPUT

RESULT

Thus, the program is executed successfully and the output is verified.

29
Ex. No: 9
ASSOCIATION RULE MINING
Date:

AIM

To implement Association Rule mining of groceries dataset using Apriori Algorithm.

ALGORITHM

STEP 1: Open the R tool window

STEP 2: Load arules library package using library(arules) function

STEP 3: Use the Groceries dataset

STEP 4: Calculate the support for frequent items using eclat() function

STEP 5: Displays all the frequent items in the grocery dataset Inspect(frequentItems)

STEP 6: Set the minimum support as 0.001 and confidence as 0.5

STEP 7: Sort the items by high support, high confidence and high lift

STEP 8: Display the sorted items

STEP 9: Stop the process

30
PROGRAM

> [Link]("arules")
> [Link]("arulesViz")
> library(arules)
> library(arulesViz)
> data("Groceries")
> transactions<-Groceries
> summary(Groceries)
transactions as itemMatrix in sparse format with
9835 rows (elements/itemsets/transactions) and
169 columns (items) and a density of 0.02609146
most frequent items:
whole milk other vegetables rolls/buns soda
2513 1903 1809 1715
yogurt (Other)
1372 34055
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14
2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77
15 16 17 18 19 20 21 22 23 24 26 27 28 29
55 46 29 14 14 9 11 4 6 1 1 1 1 3
32
1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 4.409 6.000 32.000

includes extended item information - examples:

labels level2 level1
1 frankfurter sausage meat and sausage
2 sausage sausage meat and sausage
3 liver loaf sausage meat and sausage
> itemFrequencyPlot(transactions,support=0.1,[Link]=1.0)

31
> itemFrequencyPlot(transactions,support=0.05,[Link]=0.8)

32
> itemFrequencyPlot(transactions,topN=20)

> frequentItems<-eclat(Groceries,parameter=list(supp=0.07,maxlen=15))
Eclat
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.07 1 15 frequent itemsets TRUE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Absolute minimum support count: 688
create itemset ...
set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
sorting and recoding items ... [18 item(s)] done [0.00s].
creating sparse bit matrix ... [18 row(s), 9835 column(s)] done [0.00s].
writing ... [19 set(s)] done [0.00s].
Creating S4 object ... done [0.00s].

> inspect(frequentItems)
items support count
[1] {other vegetables, whole milk} 0.07483477 736
[2] {whole milk} 0.25551601 2513
33
[3] {other vegetables} 0.19349263 1903
[4] {rolls/buns} 0.18393493 1809
[5] {yogurt} 0.13950178 1372
[6] {soda} 0.17437722 1715
[7] {root vegetables} 0.10899847 1072
[8] {tropical fruit} 0.10493137 1032
[9] {bottled water} 0.11052364 1087
[10] {sausage} 0.09395018 924
[11] {shopping bags} 0.09852567 969
[12] {citrus fruit} 0.08276563 814
[13] {pastry} 0.08896797 875
[14] {pip fruit} 0.07564820 744
[15] {whipped/sour cream} 0.07168277 705
[16] {fruit/vegetable juice} 0.07229283 711
[17] {newspapers} 0.07981698 785
[18] {bottled beer} 0.08052872 792
[19] {canned beer} 0.07768175 764

> rules<-apriori(Groceries,parameter=list(supp=0.001,conf=0.5))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support
0.5 0.1 1 none FALSE TRUE 5 0.001
minlen maxlen target ext
1 10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 9
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
sorting and recoding items ... [157 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.01s].
writing ... [5668 rule(s)] done [0.00s].
34
creating S4 object ... done [0.00s].
> rules_conf<-sort(rules,by="support",decreasing=TRUE)[1:20]
> inspect((rules_conf))
lhs rhs support confidence coverage
lift count
[1] {other vegetables,
yogurt} => {whole milk} 0.022267412 0.5128806
0.04341637 2.007235 219
[2] {tropical fruit,
yogurt} => {whole milk} 0.015149975 0.5173611
0.02928317 2.024770 149
[3] {other vegetables,
whipped/sour cream} => {whole milk} 0.014641586 0.5070423 0.02887646
1.984385 144
[4] {root vegetables,
yogurt} => {whole milk} 0.014539908 0.5629921
0.02582613 2.203354 143

[5] {pip fruit,

other vegetables} => {whole milk} 0.013523132 0.5175097 0.02613116
2.025351 133
[6] {root vegetables,
yogurt} => {other vegetables} 0.012913066 0.5000000 0.02582613
2.584078 127
[7] {root vegetables,
rolls/buns} => {whole milk} 0.012709710 0.5230126 0.02430097
2.046888 125
[8] {other vegetables,
domestic eggs} => {whole milk} 0.012302999 0.5525114 0.02226741
2.162336 121
[9] {tropical fruit,
root vegetables} => {other vegetables} 0.012302999 0.5845411 0.02104728
3.020999 121
[10] {root vegetables,
rolls/buns} => {other vegetables} 0.012201322 0.5020921 0.02430097
35
2.594890 120
[11] {tropical fruit,
root vegetables} => {whole milk} 0.011997966 0.5700483 0.02104728
2.230969 118
[12] {other vegetables,
butter} => {whole milk} 0.011489578 0.5736041 0.02003050
2.244885 113
[13] {yogurt,
whipped/sour cream} => {whole milk} 0.010879512 0.5245098 0.02074225
2.052747 107
[14] {citrus fruit,
root vegetables} => {other vegetables} 0.010371124 0.5862069 0.01769192
3.029608 102
[15] {curd,
yogurt} => {whole milk} 0.010066090 0.5823529
0.01728521 2.279125 99
[16] {other vegetables,
curd} => {whole milk} 0.009862735 0.5739645
0.01718353 2.246296 97
[17] {other vegetables,
frozen vegetables} => {whole milk} 0.009659380 0.5428571 0.01779359
2.124552 95
[18] {pip fruit,
yogurt} => {whole milk} 0.009557702 0.5310734
0.01799695 2.078435 94
[19] {yogurt,
fruit/vegetable juice} => {whole milk} 0.009456024 0.5054348 0.01870869
1.978094 93
[20] {root vegetables,
whipped/sour cream} => {whole milk} 0.009456024 0.5535714 0.01708185
2.166484 93
> rules_conf1<-sort(rules,by="confidence",decreasing=TRUE)
> inspect(head(rules_conf1))
lhs rhs support confidence coverage
lift count
36
[1] {rice,
sugar} => {whole milk} 0.001220132 1 0.001220132
3.913649 12
[2] {canned fish,
hygiene articles} => {whole milk} 0.001118454 1 0.001118454
3.913649 11
[3] {root vegetables,
butter,
rice} => {whole milk} 0.001016777 1 0.001016777
3.913649 10
[4] {root vegetables,
whipped/sour cream,
flour} => {whole milk} 0.001728521 1 0.001728521
3.913649 17
[5] {butter,
soft cheese,
domestic eggs} => {whole milk} 0.001016777 1 0.001016777
3.913649 10
[6] {citrus fruit,
root vegetables,
soft cheese} => {other vegetables} 0.001016777 1 0.001016777
5.168156 10
> rules_conf1<-sort(rules,by="lift",decreasing=TRUE)
> inspect(head(rules_conf1))
lhs rhs support confidence coverage
lift count
[1] {Instant food products,
soda} => {hamburger meat} 0.001220132 0.6315789 0.001931876
18.99565 12
[2] {soda,
popcorn} => {salty snack} 0.001220132 0.6315789 0.001931876
16.69779 12
[3] {flour,
baking powder} => {sugar} 0.001016777 0.5555556 0.001830198
16.40807 10
37
[4] {ham,
processed cheese} => {white bread} 0.001931876 0.6333333 0.003050330
15.04549 19
[5] {whole milk,
Instant food products} => {hamburger meat} 0.001525165 0.5000000 0.003050330
15.03823 15
[6] {other vegetables,
curd,
yogurt,
whipped/sour cream} => {cream cheese } 0.001016777 0.5882353 0.001728521
14.83409 10

RESULT

Thus, the program is executed successfully and the output is verified.

38
Ex. No: 10
CLASSIFICATION USING SVM
Date:

AIM

To perform classification of iris dataset using support vector machine (SVM).

ALGORITHM

STEP 1: Start the process.

STEP 2: Open R Console.

STEP 3: Import the package kernlab using library(kernlab).

STEP 4: Use iris KSVM function using following commands:

 irisSVM <- ksvm(Species~., data=trainData, type="C-bsvc", kernel=rbf, C=10,

[Link]=TRUE)
 irisSVM1 <- ksvm(Species~., data=iris, type="C-bsvc", kernel="rbfdot",
[Link]=TRUE)
STEP 5: Extract the fitted values from by using the command fitted(irisSVM) and
fitted(irisSVM1).

STEP 6: Print the Outputs.

STEP 7: Stop the process

39
PROGRAM

> [Link]("kernlab")
package ‘kernlab’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\2msccs02\AppData\Local\Temp\Rtmpyw8HQk\downloaded_packages
> library(kernlab)
> rbf<-rbfdot(sigma=0.1)
> [Link]("partykit")
package ‘libcoin’ successfully unpacked and MD5 sums checked
package ‘mvtnorm’ successfully unpacked and MD5 sums checked
package ‘Formula’ successfully unpacked and MD5 sums checked
package ‘inum’ successfully unpacked and MD5 sums checked
package ‘partykit’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\2msccs02\AppData\Local\Temp\RtmpM1EoLr\downloaded_packages
> library(partykit)
Loading required package: grid
Loading required package: libcoin
Loading required package: mvtnorm
> ind <- sample(2, nrow(iris),replace=TRUE, prob = c(0.7,0.3))
> trainData <- iris[ind==1,]
> testData <- iris[ind==2,]
> irisSVM <- ksvm(Species~., data=trainData, type="C-bsvc", kernel=rbf, C=10,
+ [Link]=TRUE)
> irisSVM1 <-ksvm(Species~.,data=iris,type="C-bsvc",kernel="rbfdot",[Link]=TRUE)
> fitted(irisSVM)
[1] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[11] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[21] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[31] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor

40
versicolor versicolor
[41] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
versicolor versicolor
[51] versicolor versicolor versicolor versicolor versicolor versicolor virginica versicolor
versicolor versicolor
[61] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
virginica virginica
[71] virginica virginica virginica virginica virginica virginica virginica virginica
virginica virginica
[81] virginica virginica virginica virginica virginica virginica virginica virginica
virginica virginica
[91] virginica virginica versicolor virginica virginica virginica virginica virginica
virginica virginica
[101] virginica virginica
Levels: setosa versicolor virginica
> fitted(irisSVM1)
[1] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[11] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[21] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[31] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[41] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[51] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
versicolor versicolor
[61] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
versicolor versicolor
[71] versicolor versicolor versicolor versicolor versicolor versicolor versicolor virginica
versicolor versicolor
[81] versicolor versicolor versicolor virginica versicolor versicolor versicolor versicolor
versicolor versicolor
[91] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
41
versicolor versicolor
[101] virginica virginica virginica virginica virginica virginica virginica virginica
virginica virginica
[111] virginica virginica virginica virginica virginica virginica virginica virginica
virginica versicolor
[121] virginica virginica virginica virginica virginica virginica virginica virginica
virginica virginica
[131] virginica virginica virginica versicolor virginica virginica virginica virginica
virginica virginica
[141] virginica virginica virginica virginica virginica virginica virginica virginica
virginica virginica
Levels: setosa versicolor virginica

RESULT

Thus, the given dataset has been verified and executed successfully.

42
Ex. No: 11 K-MEANS CLUSTERING AND
Date: OUTLIERS DETECTION

AIM

To create K-means clustering and to detect outliers of Iris dataset using R tool.

ALGORITHM

STEP 1: Start the process.

STEP 2: To perform the K-means clustering, load iris dataset.

STEP 3: The loaded dataset is displayed.

STEP 4: The clustering is done using K-means command by specifying the total number of
clusters.

STEP 5: Finally using the table command, the K-means cluster is displayed.

STEP 6: Find the centers of the clusters generated.

STEP 7: Find the square root of the cluster centers

STEP 8: Find the outliers using the functions like sqrt, and order.

STEP 9: Stop the process.

43
PROGRAM

> iris2<-iris
> iris2$Species<-NULL
> [Link]<-kmeans(iris2,3)
> table(iris$Species, [Link]$cluster)
1 2 3
setosa 0 50 0
versicolor 2 0 48
virginica 36 0 14
> [Link]
K-means clustering with 3 clusters of sizes 38, 50, 62

Cluster means:
[Link] [Link] [Link] [Link]
1 6.850000 3.073684 5.742105 2.071053
2 5.006000 3.428000 1.462000 0.246000
3 5.901613 2.748387 4.393548 1.433871

Clustering vector:
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3
3133333
[59] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 1 1 1 3 1
11111331
[117] 1 1 1 3 1 3 1 3 1 1 3 3 1 1 1 1 1 3 1 1 1 1 3 1 1 1 3 1 1 1 3 1 1 3

Within cluster sum of squares by cluster:

[1] 23.87947 15.15100 39.82097
(between_SS / total_SS = 88.4 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "[Link]" "betweenss"

"size"
[8] "iter" "ifault"

44
> table(iris$Species, [Link]$cluster)
1 2 3
setosa 0 50 0
versicolor 2 0 48
virginica 36 0 14
> plot(iris2[c("[Link]", "[Link]")],col=[Link]$cluster)

> plot(iris2[c("[Link]", "[Link]")],col=[Link]$cluster)

> [Link]$centers
[Link] [Link] [Link] [Link]

45
1 6.850000 3.073684 5.742105 2.071053
2 5.006000 3.428000 1.462000 0.246000
3 5.901613 2.748387 4.393548 1.433871
> [Link]$cluster
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
33133333
[59] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 1 1 1 3 1
11111331
[117] 1 1 1 3 1 3 1 3 1 1 3 3 1 1 1 1 1 3 1 1 1 1 3 1 1 1 3 1 1 1 3 1 1 3
> centers<-[Link]$centers[[Link]$cluster,]
> distances<-sqrt(rowSums((iris2 - centers)^2))
> outliers<-order(distances,decreasing=T)[1:5]
> print(outliers)
[1] 99 58 94 61 119
> print(iris2[outliers,])
[Link] [Link] [Link] [Link]
99 5.1 2.5 3.0 1.1
58 4.9 2.4 3.3 1.0
94 5.0 2.3 3.3 1.0
61 5.0 2.0 3.5 1.0
119 7.7 2.6 6.9 2.3
>plot(iris2[,c("[Link]","[Link]")],pch="o",col=[Link]$cluster,cex=0.3)

46
>points([Link]$centers[,c("[Link]","[Link]")],col=1:3,pch=8 , cex=1.5)

>points(iris2[outliers,c("[Link]","[Link]")],pch="+",col=4,cex=1.5)

47
RESULT

Thus, the program is executed successfully and the output is verifie

48
Ex. No: 12 TEXT PREPROCESSING
Date:

AIM

To perform text preprocessing and to create a term document matrix using Reuters
dataset.

ALGORITHM

STEP 1: Open R tool window

STEP 2: Load text mining package using library(tm)

STEP 3: Create a corpus of the text documents from the texts/crude folder of tm package and
name it as Reuters dataset.

STEP 4: Perform pre-processing steps like punctuation removal, stop words removal, white
space removal and lowercase conversion.

STEP 5: Create a term document matrix using dtm -> Document Term Matrix(reuters) for
reuters dataset and display the same.

STEP 6: Run the program

49
PROGRAM

> [Link]("tm")
> library(tm)
Loading required package: NLP
> cor <- [Link]("texts","crude",package="tm")
> reuters <- Corpus(DirSource(cor))
> reuters
<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 20
> inspect(reuters[1])
<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 1
[Link]
<?xml version="1.0"?>\n<REUTERS TOPICS="YES" LEWISSPLIT="TRAIN"
CGISPLIT="TRAINING-SET" OLDID="5670" NEWID="127">\n <DATE>26-
FEB-1987 17:00:56.04</DATE>\n <TOPICS>\n <D>crude</D>\n </TOPICS>\n
<PLACES>\n <D>usa</D>\n </PLACES>\n <PEOPLE/>\n
<ORGS/>\n
<EXCHANGES/>\n <COMPANIES/>\n <UNKNOWN>Y\n f0119 reute\nu f BC-
DIAMOND-SHAMROCK-(DIA 02-26 0097</UNKNOWN>\n
<TEXT>\n
<TITLE>DIAMOND SHAMROCK (DIA) CUTS CRUDE PRICES</TITLE>\n
<DATELINE>NEW YORK, FEB 26 -</DATELINE>\n <BODY>Diamond Shamrock
Corp said that\n effective today it had cut its contract prices for crude oil by\n1.50 dlrs
a barrel.\n The reduction brings its posted price for West Texas\nIntermediate to 16.00
dlrs a barrel, the copany said.\n "The price reduction today was made in the light
of falling\noil product prices and a weak crude oil market," a
company\nspokeswoman said.\n Diamond is the latest in a line of U.S. oil companies
that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil
markets.\n Reuter</BODY>\n </TEXT>\n</REUTERS>
> reuters<-tm_map(reuters,stripWhitespace)
> writeLines([Link](reuters[1]))

50
<?xml version="1.0"?> <REUTERS TOPICS="YES" LEWISSPLIT="TRAIN"
CGISPLIT="TRAINING-SET" OLDID="5670" NEWID="127"> <DATE>26-FEB-
1987 17:00:56.04</DATE> <TOPICS> <D>crude</D> </TOPICS> <PLACES>
<D>usa</D> </PLACES> <PEOPLE/> <ORGS/>
<EXCHANGES/>
<COMPANIES/> <UNKNOWN>Y f0119 reute u f BC-DIAMOND-SHAMROCK-
(DIA 02-26 0097</UNKNOWN> <TEXT> <TITLE>DIAMOND SHAMROCK
(DIA) CUTS CRUDE PRICES</TITLE> <DATELINE>NEW YORK, FEB 26
</DATELINE> <BODY>Diamond Shamrock Corp said that effective today it had cut
its contract prices for crude oil by 1.50 dlrs a barrel. The reduction brings its posted
price for West Texas Intermediate to 16.00 dlrs a barrel, the copany said. "The
price reduction today was made in the light of falling oil product prices and a weak
crude oil market," a company spokeswoman said. Diamond is the latest in a line
of U.S. oil companies that have cut its contract, or posted, prices over the last two days
citing weak oil markets. Reuter</BODY> </TEXT> </REUTERS>
list(language = "en")
list()
> reuters<-tm_map(reuters,tolower)
> writeLines([Link](reuters[1]))
<?xml version="1.0"?> <reuters topics="yes" lewissplit="train" cgisplit="training-set"
oldid="5670" newid="127"> <date>26-feb-1987 17:00:56.04</date>
<topics>
<d>crude</d> </topics> <places> <d>usa</d> </places> <people/>
<orgs/>
<exchanges/> <companies/> <unknown>y f0119 reute u f bc-diamond-shamrock-(dia
02-26 0097</unknown> <text> <title>diamond shamrock (dia) cuts crude
prices</title> <dateline>new york, feb 26 -</dateline> <body>diamond shamrock corp
said that effective today it had cut its contract prices for crude oil by 1.50 dlrs a barrel.
the reduction brings its posted price for west texas intermediate to 16.00 dlrs a barrel,
the copany said. "the price reduction today was made in the light of falling oil
product prices and a weak crude oil market," a company spokeswoman said.
diamond is the latest in a line of u.s. oil companies that have cut its contract, or posted,
prices over the last two days citing weak oil markets. reuter</body> </text> </reuters>
list(language = "en")
list()
> reuters<-tm_map(reuters,removePunctuation)
> writeLines([Link](reuters[1]))
xml version10 reuters topicsyes lewissplittrain cgisplittrainingset oldid5670 newid127
date26feb1987 17005604date topics dcruded topics places dusad places people orgs

exchanges companies unknowny f0119 reute u f bcdiamondshamrockdia 0226

51
0097unknown text titlediamond shamrock dia cuts crude pricestitle datelinenew york
feb 26 dateline bodydiamond shamrock corp said that effective today it had cut its
contract prices for crude oil by 150 dlrs a barrel the reduction brings its posted price for
west texas intermediate to 1600 dlrs a barrel the copany said quotthe price reduction
today was made in the light of falling oil product prices and a weak crude oil marketquota
company spokeswoman said diamond is the latest in a line of us oil companies that
have cut its contract or posted prices over the last two days citing weak oil markets
reuterbody text reuters
list(language = "en")

list()
> reuters<-tm_map(reuters,removeNumbers)
> writeLines([Link](reuters[1]))
xml version reuters topicsyes lewissplittrain cgisplittrainingset oldid newid datefeb date
topics dcruded topics places dusad places people orgs exchanges companies unknownyf
reute u f bcdiamondshamrockdia unknown text titlediamond shamrock dia cuts crude
pricestitle datelinenew york feb dateline bodydiamond shamrock corp said that
effective today it had cut its contract prices for crude oil by dlrs a barrel the reduction
brings its posted price for west texas intermediate to dlrs a barrel the copany said
quotthe price reduction today was made in the light of falling oil product prices and a
weak crude oil marketquot a company spokeswoman said diamond is the latest in a line
of us oil companies that have cut its contract or posted prices over the last two days
citing weak oil markets reuterbody text reuters
list(language = "en")

list()
> reuters<-tm_map(reuters,removeWords,stopwords("english"))
> writeLines([Link](reuters[1]))
xml version reuters topicsyes lewissplittrain cgisplittrainingset oldid newid datefeb date
topics dcruded topics places dusad places people orgs exchanges companies unknowny
f reute u f bcdiamondshamrockdia unknown text titlediamond shamrock dia cuts crude
pricestitle datelinenew york feb dateline bodydiamond shamrock corp said effective
today cut contract prices crude oil dlrs barrel reduction brings posted price west texas
intermediate dlrs barrel copany said quotthe price reduction today made light falling oil
product prices weak crude oil marketquot company spokeswoman said diamond latest
line us oil companies cut contract posted prices last two days citing weak oil markets
reuterbody text reuters
list(language = "en")
list()
> dtm<-DocumentTermMatrix(reuters)
52
inspect(dtm)
<<DocumentTermMatrix (documents: 20, terms: 1134)>>
Non-/sparse entries: 2351/20329
Sparsity : 90%
Maximal term length: 22
Weighting : term frequency (tf)
Sample :
Terms

Docs mln oil opec orgs places prices reuters said text topics
[Link] 4 12 10 2 2 6 3 11 2 2
[Link] 4 7 7 2 2 4 2 10 2 2
[Link] 1 3 1 2 2 1 2 1 2 2
[Link] 0 3 2 2 2 2 2 3 2 2
[Link] 0 5 1 1 2 1 2 5 2 2
[Link] 3 9 7 2 2 9 2 7 2 2
[Link] 10 5 5 2 2 5 2 8 2 2
[Link] 3 5 0 1 2 2 2 2 2 2
[Link] 3 6 0 1 2 2 2 2 2 2
[Link] 0 3 0 1 2 3 2 4 2 2

> findFreqTerms(dtm, 5)
[1] "barrel" "cgisplittrainingset" "companies" "company" "contract"
[6] "crude" "date" "datefeb" "dateline" "datelinenew"
[11] "dcruded" "dlrs" "dusad" "exchanges" "feb"
[16] "last" "lewissplittrain" "markets" "newid" "oil"
[21] "oldid" "orgs" "people" "places" "posted"
[26] "price" "prices" "reute" "reuterbody" "reuters"
[31] "said" "text" "today" "topics" "topicsyes"
[36] "unknown" "unknowny" "version" "west" "xml"
[41] "york" "ability" "agreement" "analysts" "april"

53
[46] "bpd" "december" "demand" "dopecd" "emergency"
[51] "energy" "hold" "industry" "march" "market"
[56] "may" "meet" "meeting" "mln" "new"
[61] "now" "one" "opec" "output" "petroleum"
[66] "production" "quota" "research" "sell" "sources"
[71] "will" "world" "pct" "present" "reserve"
[76] "reserves" "study" "year" "ali" "also"
[81] "arab" "barrels" "ceiling" "daily" "datemar"
[86] "expected" "gulf" "international" "kuwait" "members"
[91] "minister" "month" "official" "plans" "qatar"
[96] "quoted" "recent" "says" "sheikh" "traders"
[101] "united" "unknownrm" "week" "economic" "economy"
[106] "exports" "government" "group" "help" "imports"
[111] "report" "states" "dsaudiarabiad" "abdulaziz" "billion"
[116] "budget" "expenditure" "riyals" "accord" "agency"
[121] "arabia" "commitment" "nazer" "saudi" "february"
[126] "january" "exchange" "futures" "nymex"
> findAssocs(dtm, "opec", 0.7)
$opec

meeting named dopecd emergency oil said buyers orgs prices

0.86 0.83 0.82 0.82 0.81 0.78 0.78 0.77 0.77
analysts agreement however sell ability clearly late reports trying
0.77 0.74 0.74 0.73 0.72 0.72 0.72 0.72 0.72
winter bpd
0.72 0.70

RESULT

Thus, the program is executed successfully and the output is verified.

54
Ex. No: 13 VISUALIZING MAXIMUM CONFIRMED,
CURED COVID-19 PATIENTS IN COVID-19
Date: DATASET

AIM

To visualize the covid dataset using tableau and find the cured case, death rate, and
confirmed case.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open tableau and import the covid dataset.

STEP 3: In the worksheet choose the dimension and measure as state and cured to find the
state with the maximum number of people cured from covid.

STEP 4: In the worksheet choose the dimension and measure as Month and death rate to find
the month in 2021 with the highest death rate.

STEP 5: In the next worksheet choose the dimension and measure as state and confirmed case
to find the state with the maximum number of confirmed covid cases.

STEP 6: Stop the process.

55
OUTPUT

Report generated based on State and Cured

RESULT

Thus, the program is executed and the output is verified successful

56
Ex. No: 14 VISUALIZING THE DIFFERENCE IN
VOLUME YEAR-WISE AND QUARTER
Date:
WISE

AIM

To create a tableau program using the stock dataset 2010-13 and create a chart to
visualize the percent difference in volume for each company by year and quarter and find how
many quarters the company biogen idec show a positive percent difference in volume.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open tableau and import the dataset.

STEP 3: Drag and drop the stock 2010-2013.

STEP 4: In the worksheet choose dimensions as volume and company.

STEP 5: In the same worksheet choose measures as the date.

STEP 6: Right-click the date and select the quarter option.

STEP 7: Stop the process.

57
OUTPUT

Report generated based on Volume, Date and Company

RESULT

Thus, the program is executed and the output is verified successfully.

58
Ex. No: 15
VISUALIZING DELAY OF FLIGHTS
Date:

AIM

To use the flights table, create a bar chart showing the average of minutes of delay per
flight broken down by carrier name, and filtered by a state to only show Minnesota (MN).

ALGORITHM

STEP 1: Start the process.

STEP 2: Open tableau and import the dataset.

STEP 3: Choose the flight dataset.

STEP 4: In the worksheet choose the dimension as carrier name, state and measure as minutes
of delay per flight.

STEP 5: Select State attribute  Filter  Select MN from the list  Ok.

STEP 6: Display the output.

STEP 7: Stop the process.

59
OUTPUT

Report generated based on Carrier Name, State and

Minutes of Delay per Flight

RESULT

Thus, the program is executed and the output is verified successfully.

60
Ex. No: 16 VISUALIZING THE SALES OF THE
Date: PRODUCT IN BAKERY DATASET

AIM

To use the bakery dataset summarizes the sales of products for a day of the week and
also find the product which has the maximum sales on Monday using Tableau.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open the tableau and import the bakery dataset.

STEP 3: In the worksheet choose the dimensions as items, weekday(date), and measures as
transactions.

STEP 4: Select Weekday attribute  Filter  Select Monday from the list  Ok.

STEP 5: Display the output.

STEP 6: Stop the process.

61
OUTPUT

Report generated based on Weekday, Item and Transaction

RESULT

Thus, the program is executed and the output is verified successful

62
Ex. No: 17 VISUALIZING MONTHLY CHANGES IN
Date: STOCKS DATASET

AIM

Create a chart to visualize the monthly change in volumes of stocks, from the beginning
of 2010 to the end of 2013 for the two consecutive months that have the least fluctuation in
increase or decrease.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open power BI and import the Stock dataset.

STEP 3: Choose the chart and choose the x-axis as date (year, month) and the y-axis as
volume(min). To find the product which was the least fluctuation in order.

STEP 4: In the next page choose the x-axis as date(year, month) and the y-axis as
volume(min). To find the product which was the least fluctuation in order and choose any
two months and years in filter type

STEP 5: Stop the process.

63
OUTPUT

RESULT

Thus the above program is executed successfully and the output is verified.

64
Ex. No: 18 VISUALIZING PRODUCT-WISE SALES

Date: USING SUPERSTORE DATASET

AIM

To visualize the superstore dataset using Power BI and find the product that
produce sthe maximum sales

ALGORITHM

STEP 1: Start the process.

STEP 2: Open Power BI Desktop and import the Superstore dataset.
STEP 3: Select and Load the Orders, Return and People Tables.

STEP 4: In Page three create a Bar Chart. Drag and drop the Product Name in X-axis and Sales
Field in Y-axis. To view the Maximum Sales choose Maximum from the drop-down list to
view the Maximum Sales of each Product.

STEP 5: Stop the process.

65
OUTPUT

RESULT

Thus the above program is executed successfully and the output is verified

66
Ex. No: 19 VISUALIZING REGION-WISE AND
CATEGORY-WISE PROFIT USING
Date:
SUPERSTORE DATASET

AIM

To visualize the superstore dataset using Power BI and find the region that produces
the maximum profit.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open Power BI Desktop and import the Superstore dataset.

STEP 3: Select and Load the Orders, Return and People Tables.

STEP 4: In Page One create a Pie Chart which displays the Maximum Sales and in the filter
add the condition to display the maximum sales of Product Names that starts with the Letter A.
Drag and drop the Product Name field to Legend to view the sales of each product.

STEP 5: In Page two create a Funnel Chart. Drag and drop the Profit field from the table in X-
axis and Region Field in Y-axis. To view the Maximum profit choose Maximum from the drop-
down list to view the Maximum Profit in each region

STEP 6: In Page three create a Bar Chart. Drag and drop the Product Name in X-axis and Sales
Field in Y-axis. To view the Maximum Sales choose Maximum from the drop-down list to
view the Maximum Sales of each Product.

STEP 7: In Page four create a dashboard view of all the charts and create a Q&A from
Visualization -> Build Visualization -> Select Q&A. In the search box type the query to get
the answer.

STEP 8: Stop the process.

67
OUTPUT

68
RESULT

Thus, the Superstore dataset is executed using Power BI and the output is verified
successfully.

69
Ex. No: 20 POWER QUERY OPERATIONS
USING WALMART AND MASTERS
Date:
DATASET

AIM

To perfoem the power query operations like append, merge, custom column and conditional
column on the walmart and the masters dataset in Power Bi

ALGORITHM

Step 1: Start the process

Step 2: Open the Power BI and load and transform the data
Step 3: Combine the given data in year wise using append queries
Step 4: Then bring the Masters into the data using Merge query
Step 5: Change the data type as whole number for the Temperature,Fuel_price,CPI and
Unemployment Columns
Step 6:Then Calculate Price using Custom Column (Add Column->Custom Column->Formula
=weeksales÷Quantity)
Step 7:Check if the date is holiday using the Holiday Flag Column and bring the week day for
Holiday flag=1(Add Column->Condition Column->Condition)
Step 8:Stop the process

70
OUTPUT

a) Combine the Year wise data using append queries

b) Mergethe Masters data with the walmart data

71
c) change Temperature, Fuel_Price ,CPI .Unemployment datatype to whole number

72
d) Calculate price using custom column

73
e) Check if the date is holiday using the Holiday Flag=1, bring the weekday, using
conditional column

74
RESULT

Thus the above program is executed successfully and the output is verified

Machine Learning Model Implementations
No ratings yet
Machine Learning Model Implementations
18 pages
Build Regression and Classification Models
No ratings yet
Build Regression and Classification Models
15 pages
Linear & Logistic Regression Programs
No ratings yet
Linear & Logistic Regression Programs
17 pages
Machine Learning Practical Record BCA
No ratings yet
Machine Learning Practical Record BCA
35 pages
Eml Lab Manual
No ratings yet
Eml Lab Manual
39 pages
SK Krai Hardware Data Analysis Techniques
No ratings yet
SK Krai Hardware Data Analysis Techniques
38 pages
Machine Learning Lab Exercises in Python
No ratings yet
Machine Learning Lab Exercises in Python
34 pages
Business Data Analysis Techniques
No ratings yet
Business Data Analysis Techniques
28 pages
Aayush Bcom (Hons) 026
No ratings yet
Aayush Bcom (Hons) 026
12 pages
Python Data Analysis and ML Techniques
No ratings yet
Python Data Analysis and ML Techniques
64 pages
Naïve Bayes and Bayesian Networks Implementation
No ratings yet
Naïve Bayes and Bayesian Networks Implementation
27 pages
Machine Learning Lab Practical File
No ratings yet
Machine Learning Lab Practical File
42 pages
Machine Learning Lab Practical File
No ratings yet
Machine Learning Lab Practical File
24 pages
AD22411-Machine Learning Laboratory EX - NO:1 Date: 1. Linear Regression
No ratings yet
AD22411-Machine Learning Laboratory EX - NO:1 Date: 1. Linear Regression
13 pages
Linear and Polynomial Regression Demo
No ratings yet
Linear and Polynomial Regression Demo
12 pages
Mean Squared Error in Regression Models
No ratings yet
Mean Squared Error in Regression Models
22 pages
Machine Learning Course Manual
No ratings yet
Machine Learning Course Manual
36 pages
Machine Learning Algorithms in Python
No ratings yet
Machine Learning Algorithms in Python
8 pages
Machine Learning Lab Record BCA 2024
No ratings yet
Machine Learning Lab Record BCA 2024
47 pages
Machine Learning Practical Exercises
No ratings yet
Machine Learning Practical Exercises
22 pages
Machine Learning Lab Certificate and Experiments
No ratings yet
Machine Learning Lab Certificate and Experiments
44 pages
Machine Learning Course Lab Guide
No ratings yet
Machine Learning Course Lab Guide
9 pages
Experience-Salary Correlation Analysis
No ratings yet
Experience-Salary Correlation Analysis
36 pages
PS 1-10 Output
No ratings yet
PS 1-10 Output
27 pages
Experience-Salary Correlation Analysis
No ratings yet
Experience-Salary Correlation Analysis
9 pages
Cy-701 Machine Learning Lab Manual
No ratings yet
Cy-701 Machine Learning Lab Manual
31 pages
Student Performance Prediction Analysis
No ratings yet
Student Performance Prediction Analysis
19 pages
WEEK 13 To WEEK 15
No ratings yet
WEEK 13 To WEEK 15
15 pages
FDS Record
No ratings yet
FDS Record
63 pages
Cse Lab Manual Sample
No ratings yet
Cse Lab Manual Sample
18 pages
Regression and Classification Lab Guide
No ratings yet
Regression and Classification Lab Guide
6 pages
Pattern Recognition Lab Experiments Guide
No ratings yet
Pattern Recognition Lab Experiments Guide
26 pages
Implementing Logistic Regression in Python
No ratings yet
Implementing Logistic Regression in Python
9 pages
Data Preprocessing and Visualization
No ratings yet
Data Preprocessing and Visualization
27 pages
Logistic Regression Analysis in Python
No ratings yet
Logistic Regression Analysis in Python
17 pages
Naive Bayesian Classifier Program Guide
No ratings yet
Naive Bayesian Classifier Program Guide
6 pages
Data Analysis and Modeling Techniques
No ratings yet
Data Analysis and Modeling Techniques
29 pages
MLP Regressor on Wine Dataset Analysis
No ratings yet
MLP Regressor on Wine Dataset Analysis
10 pages
Machine Learning Practical Guide
No ratings yet
Machine Learning Practical Guide
29 pages
Codes
No ratings yet
Codes
96 pages
Cardio Data Analysis and Modeling
No ratings yet
Cardio Data Analysis and Modeling
2 pages
Diabetes Machine Learning Case Study
100% (1)
Diabetes Machine Learning Case Study
10 pages
Machine Learning Lab Experiments Guide
No ratings yet
Machine Learning Lab Experiments Guide
42 pages
Machine Learning Lab Experiments Guide
No ratings yet
Machine Learning Lab Experiments Guide
30 pages
Final Aiml File
No ratings yet
Final Aiml File
48 pages
AI Linear & Logistic Regression Analysis
No ratings yet
AI Linear & Logistic Regression Analysis
13 pages
Machine Learning Practical Journal 2023
No ratings yet
Machine Learning Practical Journal 2023
25 pages
Linear Regression Implementation Guide
100% (1)
Linear Regression Implementation Guide
45 pages
Data Preprocessing and Modeling Techniques
No ratings yet
Data Preprocessing and Modeling Techniques
25 pages
BCA Part II Sem IV - Lab - Introduction To ML
No ratings yet
BCA Part II Sem IV - Lab - Introduction To ML
47 pages
Implementing Machine Learning Models
No ratings yet
Implementing Machine Learning Models
10 pages
Non-Linear Regression
No ratings yet
Non-Linear Regression
5 pages
Experiments in Machine Learning Models
No ratings yet
Experiments in Machine Learning Models
21 pages
Real-Time Speech Recognition Game
No ratings yet
Real-Time Speech Recognition Game
36 pages
Drone Tech for Water Resource Assessment
No ratings yet
Drone Tech for Water Resource Assessment
21 pages
Kazama's Favorite Cartoon: Shinchan
No ratings yet
Kazama's Favorite Cartoon: Shinchan
1 page
RPA Implementation Project Methodology
No ratings yet
RPA Implementation Project Methodology
10 pages
Entry-Level Computer Science Resume
No ratings yet
Entry-Level Computer Science Resume
1 page
Technical Support The Way You've Always Wanted It
No ratings yet
Technical Support The Way You've Always Wanted It
2 pages
BSNL Fibre Super Star Premium Plus Invoice
No ratings yet
BSNL Fibre Super Star Premium Plus Invoice
7 pages
Control Room Culvert Design Guidelines
No ratings yet
Control Room Culvert Design Guidelines
6 pages
Latin Square Cryptosystem Analysis
No ratings yet
Latin Square Cryptosystem Analysis
1 page
Fluid Mechanics and Thermodynamics Overview
No ratings yet
Fluid Mechanics and Thermodynamics Overview
36 pages
Our Lady of Cainta: Cultural Significance
No ratings yet
Our Lady of Cainta: Cultural Significance
2 pages
Water: Permeability Coe Single-Variable Function of Soil Parameter
No ratings yet
Water: Permeability Coe Single-Variable Function of Soil Parameter
20 pages
Daily Equipment Status Checklist
No ratings yet
Daily Equipment Status Checklist
7 pages
Clinical Textbook of Dental Hygiene and Therapy 1st Edition by Robert Ireland 1405135409 9781405135405 Ebook Malware-Free Download
100% (8)
Clinical Textbook of Dental Hygiene and Therapy 1st Edition by Robert Ireland 1405135409 9781405135405 Ebook Malware-Free Download
59 pages
Peugeot 206 TU5JP4 Engine Diagram
No ratings yet
Peugeot 206 TU5JP4 Engine Diagram
1 page
Understanding Data Flow Diagrams and CASE Tools
No ratings yet
Understanding Data Flow Diagrams and CASE Tools
32 pages
Gerhardt - 2016 - Summary Culinary
No ratings yet
Gerhardt - 2016 - Summary Culinary
4 pages
AMU Course Codes and Syllabus Overview
No ratings yet
AMU Course Codes and Syllabus Overview
50 pages
Big Data's Impact on Finance Research
No ratings yet
Big Data's Impact on Finance Research
13 pages
Passive Earth Pressure in Dense Sand
No ratings yet
Passive Earth Pressure in Dense Sand
9 pages
List of Documents NBA Pfiles
100% (3)
List of Documents NBA Pfiles
48 pages
A Freira e o Mafioso: Romance Dark
No ratings yet
A Freira e o Mafioso: Romance Dark
542 pages
Understanding Sound: Types and Measurement
No ratings yet
Understanding Sound: Types and Measurement
11 pages
Salk Institute: Architectural Design Insights
No ratings yet
Salk Institute: Architectural Design Insights
3 pages
Overview of Physics and Measurement
No ratings yet
Overview of Physics and Measurement
11 pages
11th Zoology Practical Guide
No ratings yet
11th Zoology Practical Guide
12 pages
Internship Supervisor Evaluation Form
No ratings yet
Internship Supervisor Evaluation Form
1 page
FNB Pricing Guide
No ratings yet
FNB Pricing Guide
14 pages
Applying Mendel's Principles in Genetics
No ratings yet
Applying Mendel's Principles in Genetics
62 pages
Transformer Protection and Selectivity Guide
No ratings yet
Transformer Protection and Selectivity Guide
7 pages
Student Motivation and Academic Success
100% (1)
Student Motivation and Academic Success
14 pages
Understanding Statistical Process Control
No ratings yet
Understanding Statistical Process Control
83 pages
A Cut Above The Rest
No ratings yet
A Cut Above The Rest
19 pages
Linux Commands and Shell Scripts Guide
No ratings yet
Linux Commands and Shell Scripts Guide
11 pages