0% found this document useful (0 votes)
11 views79 pages

Data Mining Lab Programs 2023-24

None
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views79 pages

Data Mining Lab Programs 2023-24

None
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

College of Excellence, 2022-4th Rank

Autonomous and Affiliated to Bharathiar University

Accredited with A++ grade by NAAC, An ISO 9001: 2015 Certified Institution
Peelamedu, Coimbatore-641004

DEPARTMENT OF COMPUTER SCIENCE (PG)

DATA MINING LAB (MCS22P5)

2023-2024
College of Excellence, 2022-4th Rank
Autonomous and Affiliated to Bharathiar University
Accredited with A++ grade by NAAC, An ISO 9001: 2015 Certified Institution
Peelamedu, Coimbatore-641004

DEPARTMENT OF COMPUTER SCIENCE (PG)

DATA MINING LAB (MCS21P5)

REGISTER NUMBER:

Certified that this is a bonafide record work done by of

II [Link] (Computer Science) during the year 2023-2024.

FACULTY INCHARGE HEAD OF THE DEPARTMENT

Submitted for the practical examination held on at


PSGR Krishnammal College for Women, Coimbatore.

INTERNAL EXAMINER EXTERNAL EXAMINER


INDEX

S No. Date Topics Page No Sign

PYTHON PROGRAMS

1
1 Linear Regression

4
2 Decision Tree
6
3 Naive Bayes
9
4 Support Vector Machine
13
5 Comparision of KNN, SVM,
Decision Tree, Logistic Regression,
Naïve Bayes
18
6 BIRCH Clustering

R PROGRAMS

21
7 Data Exploration and Visualization
27
8 Linear Regression
30
9 Association Rule Mining
39
10 Classification using SVM

K-Means Clustering and Outliers 43


11
Detection

Text Preprocessing 49
12
S
Date Topics Page No Sign
No.

DATA VISUALIZATION USING TABLEAU

Visualizing maximum Confirmed, 55


13 Cured Covid-19 patients in Covid-19
Dataset
Visualizing the difference in Volume 57
14
Year-Wise and Quarter-Wise
59
15 Visualizing Delay of Flights

Visualizing the Sales of the Product 61


16
in Bakery Dataset

DATA VISUALIZATION USING POWER BI

Visualizing Monthly Changes in 63


17
Stock Datase

Visualizing Product-Wise 65
18 Sales using Superstore
Dataset
Visualizing Region-Wise and 67
19
Category-Wise Profit using
Superstore Dataset
Power Query Operations using 70
20
Walmart and Masters Dataset
Ex. No: 01
LINEAR REGRESSION
Date:

AIM

To write a Python program to perform linear regression using diabetes dataset.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open Jupyter Notebook and create new Python3 Notebook.

STEP 3: Import the packages matplotlib. pyplot and numpy.

STEP 4: Import dataset, linear_model from sklearn and mean_squared_error, r2_score from
[Link].

STEP 5: Load the diabetes dataset using datasets.load_diabetes( ) and use one feature.

STEP 6: Split the data and target into training, testing set.

STEP 7: Create linear regression object using linear_model.LinearRegression( ).

STEP 8: Train the model using training dataset and make predictions using testing set by using
predict( ).

STEP 9: Print coefficients, variance score and mean-squared error using coef_, r2_score( ) and
mean_squared_error( ).

STEP 10: Plot the graph using [Link]( ).

STEP 11: Save the program and run it.

STEP 12: Stop the process.

1
PROGRAM

import [Link] as plt


import numpy as np
from sklearn import datasets,linear_model
from [Link] import mean_squared_error,r2_score
diabetes=datasets.load_diabetes()
diabetes_X=[Link][:,[Link],2]
diabetes_X_train=diabetes_X[:-20]
diabetes_X_test=diabetes_X[-20:]
diabetes_y_train=[Link][:-20]
diabetes_y_test=[Link][-20:]
regr=linear_model.LinearRegression()
[Link](diabetes_X_train,diabetes_y_train)
diabetes_y_pred=[Link](diabetes_X_test)
print('Coefficients:%d'%regr.coef_)
print("Mean squared error:%d" %mean_squared_error(diabetes_y_test,diabetes_y_pred))
print('Variance score:%d'% r2_score(diabetes_y_test,diabetes_y_pred))
[Link](diabetes_X_test,diabetes_y_test,color='green')
[Link](diabetes_X_test,diabetes_y_pred,color='blue',linewidth=3)
[Link](())
[Link](())
[Link]()

2
OUTPUT

Coefficients:938
Mean squared error:2548
Variance score:0

RESULT

Thus, the above python program to perform linear regression on the diabetes dataset
has been verified and executed successfully.

3
Ex. No: 02
DECISION TREE
Date:

AIM

To write a Python program to classify the samples as man and woman based on the
attributes of height and length of hair using decision tree classifier.

ALGORITHM

STEP 1: Start the process

STEP 2: Open Jupyter Notebook and create new Python3 notebook

STEP 3: Import tree from sklearn and train_test_split fromsklearn.model_selection

STEP 4: Assign X as the input features namely height and length of hair

STEP 5: Assign Y as the target variable namely woman and man

STEP 5: Split the dataset into training set and test set using train_test_split()

STEP 6: Create a decision tree classifier object and fit the model using the training

set

STEP 7: Predict the target for test data

STEP8:Save and run the program

STEP 9: Stop the process

4
PROGRAM

from sklearn import tree


from sklearn.model_selection import train_test_split
X=[[165,19],[175,32],[136,35],[174,65],[141,28],[176,15],[131,32],[166,6],[128,32],[179,10]
,[136,34],[186,2],[126,25],[176,28],[112,38],[169,9],[171,36],[116,25],[196,25],[196,38],[12

6,40],[197,20],[150,25],[140,32],[136,35]]
Y=['Man','Woman','Woman','Man','Woman','Man','Woman','Man','Woman','Man','Woman','M

an','Woman','Woman','Woman','Man','Woman','Woman','Man','Woman','Woman','Man','Man',
'Woman','Woman']
data_feature_names = ['height','length of hair']
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3,random_state=10)
DTclf = [Link]()
DTclf1 = [Link](X,Y)
prediction = [Link]([[140,37]])
print(prediction)

OUTPUT

['Woman']

RESULT

Thus, the program is executed successfully and the output is verified.

5
Ex. No: 03
NAIVE BAYES
Date:

AIM

To write a Python program to implement Naïve Bayes classification using Breast


Cancer dataset.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open Jupyter Notebook and create new Python3 Notebook.

STEP 3: Import the numpy, pandas, io packages and train_test_split, preprocessing from
sklearn package.

STEP 4: Load the Breast cancer dataset using read_csv.

STEP 5: Split out the validation dataset using train_test_split() function.

STEP 6: Create a model using Gaussian Naïve Bayes algorithm.

STEP 7: Fit the model with the training dataset.

STEP 8: Make the predictions using the validation datasets.

STEP 9: Print the classification report for the expected and predicted data.

STEP 10: Print the confusion matrix for the expected and predicted data.

STEP 11: Stop the process.

6
PROGRAM

import numpy as np
import pandas as pd
from pandas import read_csv
import io
from sklearn.model_selection import train_test_split
path = "D:\\Datasets\\[Link]"
names = ['A1','A2','A3','A4','A5','A6','A7','A8','A9','class']
dataset = read_csv(path, names=names)array = [Link]
print(array)
X = array[:,0:8]print(X)
y = array[:,9]
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20,
random_state=1, shuffle=True)
from sklearn import preprocessingfrom sklearn import metrics
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
[Link](X_train, Y_train)
print(model)
expected = Y_validation
predicted = [Link](X_validation) print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

7
OUTPUT

RESULT

Thus, the program is executed successfully and the output is verified.

8
Ex. No: 04
SPPORT VECTOR MACHINE
Date:

AIM
To write a Python program to implement Support Vector Machine using wine quality
dataset

ALGORITHM
STEP 1: Start the process
STEP 2: Open Jupyter notebook and create Python3 notebook
STEP 3: Import read_csv from pandas, train_test_split from sklearn.model_selection and
classification_report, confusion_matrix from [Link]
STEP 4: Load wine quality dataset using read_csv( ) and split out validation dataset using
train_test_split( )
STEP 5: Make predictions using model and summarize the fit of model
STEP 6: Print classification report and confusion matrix using excepted and predicted
STEP 7: Save the program and run it
STEP 8: Stop the process

9
PROGRAM

from pandas import read_csv


from sklearn.model_selection import train_test_split
from [Link] import classification_report,confusion_matrix
from [Link] import SVC
path = "C:\\Users\\2msccs02\\Downloads\\[Link]"
names = ['A1','A2','A3','A4','A5','A6','A7','A8','A9','A10','A11','A12','A13','class']
dataset = read_csv(path,names=names)
array = [Link]
print(array)
X = array[:,0:12]
print(X)
Y = array[:,13]
X_train,X_validation,Y_train,Y_validation=train_test_split(X,Y,test_size=0.20,random_state
=1,shuffle=True)
model = SVC()
[Link](X_train,Y_train)
print(model)
excepted = Y_validation
predicted = [Link](X_validation)
print (classification_report(excepted,predicted))
print (confusion_matrix(excepted,predicted))

10
OUTPUT

[[1.423e+01 1.710e+00 2.430e+00 ... 3.920e+00 1.065e+03 1.000e+00]

[1.320e+01 1.780e+00 2.140e+00 ... 3.400e+00 1.050e+03 1.000e+00]

[1.316e+01 2.360e+00 2.670e+00 ... 3.170e+00 1.185e+03 1.000e+00]

...

[1.327e+01 4.280e+00 2.260e+00 ... 1.560e+00 8.350e+02 3.000e+00]

[1.317e+01 2.590e+00 2.370e+00 ... 1.620e+00 8.400e+02 3.000e+00]

[1.413e+01 4.100e+00 2.740e+00 ... 1.600e+00 5.600e+02 3.000e+00]]

[[14.23 1.71 2.43 ... 5.64 1.04 3.92]

[13.2 1.78 2.14 ... 4.38 1.05 3.4 ]

[13.16 2.36 2.67 ... 5.68 1.03 3.17]...

[13.27 4.28 2.26 ... 10.2 0.59 1.56]

[13.17 2.59 2.37 ... 9.3 0.6 1.62]

[14.13 4.1 2.74 ... 9.2 0.61 1.6 ]]

SVC()

precision recall f1-score support

1.0 0.53 0.71 0.61 14

2.0 0.53 0.69 0.60 13

3.0 0.00 0.00 0.00 9

11
accuracy 0.44 0.55 0.53 36

macro avg 0.35 0.47 0.40 36

weighted avg 0.40 0.53 0.45 36

[[10 4 0]

[4 9 0]

[5 4 0]]

RESULT

Thus, the program is executed successfully and the output is verified.

12
Ex. No: 05
COMPARISION OF KNN, SVM, DECISION
Date: TREE, LOGISTIC REGRESSION, NAÏVE
BAYES

AIM
To write a Python program to compare the algorithm for classification of iris dataset

ALGORITHM
STEP 1: Start the process
STEP 2: Open Jupyter Notebook and create new Python3 notebook
STEP 3: Import read_csv from pandas and pyplot from matplotlib
STEP 4: Import train_test_split, cross_val_score and StratifiedKFold from sklearn.model_
selection
STEP 5: Import LogisticRegression from sklearn.linear_model, DecisionTreeClassifier from
[Link], KNeighborsClassifier from [Link], GaussianNB from sklearn.naive_bayes
and SVC from [Link]
STEP 6: Load iris dataset and their column names using read_csv( )
STEP 7: Split out validation dataset using train_test_split( )
STEP 8: Create model using LogisticRegression( ), KNeighbors( ), GaussianNB( ),
DecisionTreeClassifier( ), SVC( )
STEP 9: Check algorithms and evaluate each model in turn using cross_val_score( )
STEP 10: Compare the algorithmsusing [Link]( ) and display the graph using
[Link]( )
STEP 11: Save and run the program
STEP 12: Stop the process

13
PROGRAM

from pandas import read_csv


from matplotlib import pyplot
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
from [Link] import DecisionTreeClassifier
from [Link] import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from [Link] import SVC
path = "C:/Users/2msccs02/Downloads/Datasets/[Link]"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = read_csv(path, names=names)
array = [Link]
X = array[:,0:4]
y = array[:,4]
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20,
random_state=1, shuffle=True)
models = []
[Link](('LR', LogisticRegression(solver='liblinear', multi_class='ovr')))
[Link](('KNN', KNeighborsClassifier()))
[Link](('CART', DecisionTreeClassifier()))
[Link](('NB', GaussianNB()))
[Link](('SVM', SVC(gamma='auto')))
results = []
names = []
for name, model in models:
kfold = StratifiedKFold(n_splits=10, random_state=1, shuffle=True)
cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
[Link](cv_results)
[Link](name)
print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))
[Link](results, labels=names)

14
[Link]('ALGORITHM Comparison')
[Link]()

OUTPUT

LR: 0.941667 (0.065085)

KNN: 0.958333 (0.041667)

15
CART: 0.941667 (0.053359)

NB: 0.950000 (0.055277)

16
SVM: 0.983333 (0.033333)

RESULT

Thus, the program is executed successfully and the output is verified.

17
Ex. No: 06
BIRCH CLUSTERING
Date:

AIM

To write a Python program to perform clustering using BIRCH algorithm

ALGORITHM

STEP 1: Start the process

STEP 2: Open Jupyter Notebook and create a new Python3 notebook

STEP 3: Import unique, where from numpy, make_classification and Birch from sklearn and
pyplot from matplotlib

STEP 3: Create random samples using make_classification()

STEP 4: Create a model using BIRCH

STEP 5: Fit the model

STEP 6: Predict the cluster and assign it to yhat

STEP 7: Retrieve the unique clusters from yhat

STEP 8: Create the scatter for the unique cluster samples

STEP 9: Plot the scatter plot using [Link]() function

STEP 10: Display the plot using [Link]() function

STEP 11: Stop the Process.

18
PROGRAM

from numpy import unique


from numpy import where
from [Link] import make_classification
from [Link] import Birch
from matplotlib import pyplot
X, _ = make_classification(n_samples=1000, n_features=2, n_informative=2,
n_redundant=0, n_clusters_per_class=1, random_state=4)
model = Birch(threshold=0.01, n_clusters=2)
[Link](X)
yhat = [Link](X)
clusters = unique(yhat)
for cluster in clusters:
row_ix = where(yhat == cluster)
[Link](X[row_ix, 0], X[row_ix, 1])
[Link]()

19
OUTPUT

RESULT

Thus, the program is executed successfully and the output is verified.

20
Ex. No: 07 DATA EXPLORATION AND
Date: VISUALIZATION

AIM

To perform the data exploration of the iris dataset and to implement various statistical
operations.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open R Console.

STEP 3: Load the iris dataset in R console.

STEP 4: Display the descriptive features of the dataset using commands like dim(iris),
names(iris), str(iris), attributes(iris), iris[1:3], head(iris,10), tail(iris,10), summary(iris),
iris[1:10,"[Link]"].

STEP 5: Find the mean, range, median and histogram using following commands,
mean(iris$[Link]), range(iris$[Link]), median(iris$[Link]),
quantile(iris$[Link]), hist(iris$[Link]).

STEP 6: Calculate covariance and correlation using cov(iris$[Link],iris$[Link])


and cor(iris$[Link],iris$[Link]).

STEP 7: In R console pie chart is plotted for the iris dataset.

STEP 8: Stop the process.

21
PROGRAM

> View(iris)
> dim(iris)
[1] 150 5
> names(iris)
[1] "[Link]" "[Link]" "[Link]" "[Link]"
[5] "Species"
> structure(iris)
> str(iris)
'[Link]': 150 obs. of 5 variables:
$ [Link]: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ [Link] : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ [Link]: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ [Link] : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> attributes(iris)
$names
[1] "[Link]" "[Link]" "[Link]" "[Link]"
[5] "Species"
$class
[1] "[Link]"
$[Link]
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
[17] 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
[33] 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
[49] 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
[65] 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
[81] 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
[97] 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
[113] 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
[129] 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
[145] 145 146 147 148 149 150

22
> head(iris)
[Link] [Link] [Link] [Link] Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> head(iris,10)
[Link] [Link] [Link] [Link] Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa

> summary(iris)
[Link] [Link] [Link] [Link]
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
Versicolor :50
virginica :50
> tail(iris,3)
[Link] [Link] [Link] [Link] Species
23
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
> mean(iris$[Link])
[1] 5.843333
> range(iris$[Link])
[1] 4.3 7.9
> cov(iris$[Link],iris$[Link])
[1] -0.042434
> median(iris$[Link])
[1] 5.8

> plot(density(iris$[Link]))

24
> hist(iris$[Link])

> table(iris$[Link])
4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6
1 3 1 4 2 5 6 10 9 4 1 6 7 6 8 7 3 6
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7 7.1 7.2 7.3 7.4 7.6 7.7 7.9
6 4 9 7 5 2 8 3 4 1 1 3 1 1 1 4 1

> pairs(iris)

25
> pie(table(iris$Species))

RESULT

Thus, the data exploration of the iris dataset is performed and various statistical
operations are implemented.

26
Ex. No: 08
LINEAR REGRESSION
Date:

AIM

To perform linear regression using the salary dataset to predict salary based on
experience.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open R Console.

STEP 3: Load the salary dataset in the R console.

STEP 4: Import the caTools package needed for Linear Regression.

STEP 5: Split the dataset into train and test sets.

STEP 6: Find the line of fit using the lm formula, using years of experience and salary.

STEP 7: Predict the test set results using predict.

STEP 8: Visualize training and test set using ggplot.

STEP 9: Stop the process.

27
PROGRAM

>dataset = [Link]("C:/Users/2msccs02/Documents/Salary_Data.csv")
> View(dataset)
> [Link]('caTools')
package ‘caTools’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\2msccs02\AppData\Local\Temp\RtmpkXv9S1\downloaded_packages
> library(caTools)
> [Link](123)
> split = [Link](dataset$Salary, SplitRatio = 2/3)
> training_set = subset(dataset, split == TRUE)
> test_set = subset(dataset, split == FALSE)
> regressor = lm(formula = Salary ~ YearsExperience,
+ data = training_set)
> y_pred = predict(regressor, newdata = test_set)
> library(ggplot2)
> ggplot() +
+ geom_point(aes(x = training_set$YearsExperience, y = training_set$Salary),
+ colour = 'red') +
+ geom_line(aes(x = training_set$YearsExperience, y = predict(regressor, newdata =
training_set)),
+ colour = 'blue') +
+ ggtitle('Salary vs Experience (Training set)') +
+ xlab('Years of experience') +
+ ylab('Salary')

28
OUTPUT

RESULT

Thus, the program is executed successfully and the output is verified.

29
Ex. No: 9
ASSOCIATION RULE MINING
Date:

AIM

To implement Association Rule mining of groceries dataset using Apriori Algorithm.

ALGORITHM

STEP 1: Open the R tool window

STEP 2: Load arules library package using library(arules) function

STEP 3: Use the Groceries dataset

STEP 4: Calculate the support for frequent items using eclat() function

STEP 5: Displays all the frequent items in the grocery dataset Inspect(frequentItems)

STEP 6: Set the minimum support as 0.001 and confidence as 0.5

STEP 7: Sort the items by high support, high confidence and high lift

STEP 8: Display the sorted items

STEP 9: Stop the process

30
PROGRAM

> [Link]("arules")
> [Link]("arulesViz")
> library(arules)
> library(arulesViz)
> data("Groceries")
> transactions<-Groceries
> summary(Groceries)
transactions as itemMatrix in sparse format with
9835 rows (elements/itemsets/transactions) and
169 columns (items) and a density of 0.02609146
most frequent items:
whole milk other vegetables rolls/buns soda
2513 1903 1809 1715
yogurt (Other)
1372 34055
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14
2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77
15 16 17 18 19 20 21 22 23 24 26 27 28 29
55 46 29 14 14 9 11 4 6 1 1 1 1 3
32
1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 4.409 6.000 32.000

includes extended item information - examples:


labels level2 level1
1 frankfurter sausage meat and sausage
2 sausage sausage meat and sausage
3 liver loaf sausage meat and sausage
> itemFrequencyPlot(transactions,support=0.1,[Link]=1.0)

31
> itemFrequencyPlot(transactions,support=0.05,[Link]=0.8)

32
> itemFrequencyPlot(transactions,topN=20)

> frequentItems<-eclat(Groceries,parameter=list(supp=0.07,maxlen=15))
Eclat
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.07 1 15 frequent itemsets TRUE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Absolute minimum support count: 688
create itemset ...
set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
sorting and recoding items ... [18 item(s)] done [0.00s].
creating sparse bit matrix ... [18 row(s), 9835 column(s)] done [0.00s].
writing ... [19 set(s)] done [0.00s].
Creating S4 object ... done [0.00s].

> inspect(frequentItems)
items support count
[1] {other vegetables, whole milk} 0.07483477 736
[2] {whole milk} 0.25551601 2513
33
[3] {other vegetables} 0.19349263 1903
[4] {rolls/buns} 0.18393493 1809
[5] {yogurt} 0.13950178 1372
[6] {soda} 0.17437722 1715
[7] {root vegetables} 0.10899847 1072
[8] {tropical fruit} 0.10493137 1032
[9] {bottled water} 0.11052364 1087
[10] {sausage} 0.09395018 924
[11] {shopping bags} 0.09852567 969
[12] {citrus fruit} 0.08276563 814
[13] {pastry} 0.08896797 875
[14] {pip fruit} 0.07564820 744
[15] {whipped/sour cream} 0.07168277 705
[16] {fruit/vegetable juice} 0.07229283 711
[17] {newspapers} 0.07981698 785
[18] {bottled beer} 0.08052872 792
[19] {canned beer} 0.07768175 764

> rules<-apriori(Groceries,parameter=list(supp=0.001,conf=0.5))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support
0.5 0.1 1 none FALSE TRUE 5 0.001
minlen maxlen target ext
1 10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 9
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
sorting and recoding items ... [157 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.01s].
writing ... [5668 rule(s)] done [0.00s].
34
creating S4 object ... done [0.00s].
> rules_conf<-sort(rules,by="support",decreasing=TRUE)[1:20]
> inspect((rules_conf))
lhs rhs support confidence coverage
lift count
[1] {other vegetables,
yogurt} => {whole milk} 0.022267412 0.5128806
0.04341637 2.007235 219
[2] {tropical fruit,
yogurt} => {whole milk} 0.015149975 0.5173611
0.02928317 2.024770 149
[3] {other vegetables,
whipped/sour cream} => {whole milk} 0.014641586 0.5070423 0.02887646
1.984385 144
[4] {root vegetables,
yogurt} => {whole milk} 0.014539908 0.5629921
0.02582613 2.203354 143

[5] {pip fruit,


other vegetables} => {whole milk} 0.013523132 0.5175097 0.02613116
2.025351 133
[6] {root vegetables,
yogurt} => {other vegetables} 0.012913066 0.5000000 0.02582613
2.584078 127
[7] {root vegetables,
rolls/buns} => {whole milk} 0.012709710 0.5230126 0.02430097
2.046888 125
[8] {other vegetables,
domestic eggs} => {whole milk} 0.012302999 0.5525114 0.02226741
2.162336 121
[9] {tropical fruit,
root vegetables} => {other vegetables} 0.012302999 0.5845411 0.02104728
3.020999 121
[10] {root vegetables,
rolls/buns} => {other vegetables} 0.012201322 0.5020921 0.02430097
35
2.594890 120
[11] {tropical fruit,
root vegetables} => {whole milk} 0.011997966 0.5700483 0.02104728
2.230969 118
[12] {other vegetables,
butter} => {whole milk} 0.011489578 0.5736041 0.02003050
2.244885 113
[13] {yogurt,
whipped/sour cream} => {whole milk} 0.010879512 0.5245098 0.02074225
2.052747 107
[14] {citrus fruit,
root vegetables} => {other vegetables} 0.010371124 0.5862069 0.01769192
3.029608 102
[15] {curd,
yogurt} => {whole milk} 0.010066090 0.5823529
0.01728521 2.279125 99
[16] {other vegetables,
curd} => {whole milk} 0.009862735 0.5739645
0.01718353 2.246296 97
[17] {other vegetables,
frozen vegetables} => {whole milk} 0.009659380 0.5428571 0.01779359
2.124552 95
[18] {pip fruit,
yogurt} => {whole milk} 0.009557702 0.5310734
0.01799695 2.078435 94
[19] {yogurt,
fruit/vegetable juice} => {whole milk} 0.009456024 0.5054348 0.01870869
1.978094 93
[20] {root vegetables,
whipped/sour cream} => {whole milk} 0.009456024 0.5535714 0.01708185
2.166484 93
> rules_conf1<-sort(rules,by="confidence",decreasing=TRUE)
> inspect(head(rules_conf1))
lhs rhs support confidence coverage
lift count
36
[1] {rice,
sugar} => {whole milk} 0.001220132 1 0.001220132
3.913649 12
[2] {canned fish,
hygiene articles} => {whole milk} 0.001118454 1 0.001118454
3.913649 11
[3] {root vegetables,
butter,
rice} => {whole milk} 0.001016777 1 0.001016777
3.913649 10
[4] {root vegetables,
whipped/sour cream,
flour} => {whole milk} 0.001728521 1 0.001728521
3.913649 17
[5] {butter,
soft cheese,
domestic eggs} => {whole milk} 0.001016777 1 0.001016777
3.913649 10
[6] {citrus fruit,
root vegetables,
soft cheese} => {other vegetables} 0.001016777 1 0.001016777
5.168156 10
> rules_conf1<-sort(rules,by="lift",decreasing=TRUE)
> inspect(head(rules_conf1))
lhs rhs support confidence coverage
lift count
[1] {Instant food products,
soda} => {hamburger meat} 0.001220132 0.6315789 0.001931876
18.99565 12
[2] {soda,
popcorn} => {salty snack} 0.001220132 0.6315789 0.001931876
16.69779 12
[3] {flour,
baking powder} => {sugar} 0.001016777 0.5555556 0.001830198
16.40807 10
37
[4] {ham,
processed cheese} => {white bread} 0.001931876 0.6333333 0.003050330
15.04549 19
[5] {whole milk,
Instant food products} => {hamburger meat} 0.001525165 0.5000000 0.003050330
15.03823 15
[6] {other vegetables,
curd,
yogurt,
whipped/sour cream} => {cream cheese } 0.001016777 0.5882353 0.001728521
14.83409 10

RESULT

Thus, the program is executed successfully and the output is verified.

38
Ex. No: 10
CLASSIFICATION USING SVM
Date:

AIM

To perform classification of iris dataset using support vector machine (SVM).

ALGORITHM

STEP 1: Start the process.

STEP 2: Open R Console.

STEP 3: Import the package kernlab using library(kernlab).

STEP 4: Use iris KSVM function using following commands:

 irisSVM <- ksvm(Species~., data=trainData, type="C-bsvc", kernel=rbf, C=10,


[Link]=TRUE)
 irisSVM1 <- ksvm(Species~., data=iris, type="C-bsvc", kernel="rbfdot",
[Link]=TRUE)
STEP 5: Extract the fitted values from by using the command fitted(irisSVM) and
fitted(irisSVM1).

STEP 6: Print the Outputs.

STEP 7: Stop the process

39
PROGRAM

> [Link]("kernlab")
package ‘kernlab’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\2msccs02\AppData\Local\Temp\Rtmpyw8HQk\downloaded_packages
> library(kernlab)
> rbf<-rbfdot(sigma=0.1)
> [Link]("partykit")
package ‘libcoin’ successfully unpacked and MD5 sums checked
package ‘mvtnorm’ successfully unpacked and MD5 sums checked
package ‘Formula’ successfully unpacked and MD5 sums checked
package ‘inum’ successfully unpacked and MD5 sums checked
package ‘partykit’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\2msccs02\AppData\Local\Temp\RtmpM1EoLr\downloaded_packages
> library(partykit)
Loading required package: grid
Loading required package: libcoin
Loading required package: mvtnorm
> ind <- sample(2, nrow(iris),replace=TRUE, prob = c(0.7,0.3))
> trainData <- iris[ind==1,]
> testData <- iris[ind==2,]
> irisSVM <- ksvm(Species~., data=trainData, type="C-bsvc", kernel=rbf, C=10,
+ [Link]=TRUE)
> irisSVM1 <-ksvm(Species~.,data=iris,type="C-bsvc",kernel="rbfdot",[Link]=TRUE)
> fitted(irisSVM)
[1] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[11] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[21] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[31] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor

40
versicolor versicolor
[41] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
versicolor versicolor
[51] versicolor versicolor versicolor versicolor versicolor versicolor virginica versicolor
versicolor versicolor
[61] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
virginica virginica
[71] virginica virginica virginica virginica virginica virginica virginica virginica
virginica virginica
[81] virginica virginica virginica virginica virginica virginica virginica virginica
virginica virginica
[91] virginica virginica versicolor virginica virginica virginica virginica virginica
virginica virginica
[101] virginica virginica
Levels: setosa versicolor virginica
> fitted(irisSVM1)
[1] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[11] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[21] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[31] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[41] setosa setosa setosa setosa setosa setosa setosa setosa
setosa setosa
[51] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
versicolor versicolor
[61] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
versicolor versicolor
[71] versicolor versicolor versicolor versicolor versicolor versicolor versicolor virginica
versicolor versicolor
[81] versicolor versicolor versicolor virginica versicolor versicolor versicolor versicolor
versicolor versicolor
[91] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
41
versicolor versicolor
[101] virginica virginica virginica virginica virginica virginica virginica virginica
virginica virginica
[111] virginica virginica virginica virginica virginica virginica virginica virginica
virginica versicolor
[121] virginica virginica virginica virginica virginica virginica virginica virginica
virginica virginica
[131] virginica virginica virginica versicolor virginica virginica virginica virginica
virginica virginica
[141] virginica virginica virginica virginica virginica virginica virginica virginica
virginica virginica
Levels: setosa versicolor virginica

RESULT

Thus, the given dataset has been verified and executed successfully.

42
Ex. No: 11 K-MEANS CLUSTERING AND
Date: OUTLIERS DETECTION

AIM

To create K-means clustering and to detect outliers of Iris dataset using R tool.

ALGORITHM

STEP 1: Start the process.

STEP 2: To perform the K-means clustering, load iris dataset.

STEP 3: The loaded dataset is displayed.

STEP 4: The clustering is done using K-means command by specifying the total number of
clusters.

STEP 5: Finally using the table command, the K-means cluster is displayed.

STEP 6: Find the centers of the clusters generated.

STEP 7: Find the square root of the cluster centers

STEP 8: Find the outliers using the functions like sqrt, and order.

STEP 9: Stop the process.

43
PROGRAM

> iris2<-iris
> iris2$Species<-NULL
> [Link]<-kmeans(iris2,3)
> table(iris$Species, [Link]$cluster)
1 2 3
setosa 0 50 0
versicolor 2 0 48
virginica 36 0 14
> [Link]
K-means clustering with 3 clusters of sizes 38, 50, 62

Cluster means:
[Link] [Link] [Link] [Link]
1 6.850000 3.073684 5.742105 2.071053
2 5.006000 3.428000 1.462000 0.246000
3 5.901613 2.748387 4.393548 1.433871

Clustering vector:
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3
3133333
[59] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 1 1 1 3 1
11111331
[117] 1 1 1 3 1 3 1 3 1 1 3 3 1 1 1 1 1 3 1 1 1 1 3 1 1 1 3 1 1 1 3 1 1 3

Within cluster sum of squares by cluster:


[1] 23.87947 15.15100 39.82097
(between_SS / total_SS = 88.4 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "[Link]" "betweenss"


"size"
[8] "iter" "ifault"

44
> table(iris$Species, [Link]$cluster)
1 2 3
setosa 0 50 0
versicolor 2 0 48
virginica 36 0 14
> plot(iris2[c("[Link]", "[Link]")],col=[Link]$cluster)

> plot(iris2[c("[Link]", "[Link]")],col=[Link]$cluster)

> [Link]$centers
[Link] [Link] [Link] [Link]

45
1 6.850000 3.073684 5.742105 2.071053
2 5.006000 3.428000 1.462000 0.246000
3 5.901613 2.748387 4.393548 1.433871
> [Link]$cluster
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
33133333
[59] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 1 1 1 3 1
11111331
[117] 1 1 1 3 1 3 1 3 1 1 3 3 1 1 1 1 1 3 1 1 1 1 3 1 1 1 3 1 1 1 3 1 1 3
> centers<-[Link]$centers[[Link]$cluster,]
> distances<-sqrt(rowSums((iris2 - centers)^2))
> outliers<-order(distances,decreasing=T)[1:5]
> print(outliers)
[1] 99 58 94 61 119
> print(iris2[outliers,])
[Link] [Link] [Link] [Link]
99 5.1 2.5 3.0 1.1
58 4.9 2.4 3.3 1.0
94 5.0 2.3 3.3 1.0
61 5.0 2.0 3.5 1.0
119 7.7 2.6 6.9 2.3
>plot(iris2[,c("[Link]","[Link]")],pch="o",col=[Link]$cluster,cex=0.3)

46
>points([Link]$centers[,c("[Link]","[Link]")],col=1:3,pch=8 , cex=1.5)

>points(iris2[outliers,c("[Link]","[Link]")],pch="+",col=4,cex=1.5)

47
RESULT

Thus, the program is executed successfully and the output is verifie

48
Ex. No: 12 TEXT PREPROCESSING
Date:

AIM

To perform text preprocessing and to create a term document matrix using Reuters
dataset.

ALGORITHM

STEP 1: Open R tool window

STEP 2: Load text mining package using library(tm)

STEP 3: Create a corpus of the text documents from the texts/crude folder of tm package and
name it as Reuters dataset.

STEP 4: Perform pre-processing steps like punctuation removal, stop words removal, white
space removal and lowercase conversion.

STEP 5: Create a term document matrix using dtm -> Document Term Matrix(reuters) for
reuters dataset and display the same.

STEP 6: Run the program

49
PROGRAM

> [Link]("tm")
> library(tm)
Loading required package: NLP
> cor <- [Link]("texts","crude",package="tm")
> reuters <- Corpus(DirSource(cor))
> reuters
<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 20
> inspect(reuters[1])
<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 1
[Link]
<?xml version="1.0"?>\n<REUTERS TOPICS="YES" LEWISSPLIT="TRAIN"
CGISPLIT="TRAINING-SET" OLDID="5670" NEWID="127">\n <DATE>26-
FEB-1987 17:00:56.04</DATE>\n <TOPICS>\n <D>crude</D>\n </TOPICS>\n
<PLACES>\n <D>usa</D>\n </PLACES>\n <PEOPLE/>\n
<ORGS/>\n
<EXCHANGES/>\n <COMPANIES/>\n <UNKNOWN>Y\n f0119 reute\nu f BC-
DIAMOND-SHAMROCK-(DIA 02-26 0097</UNKNOWN>\n
<TEXT>\n
<TITLE>DIAMOND SHAMROCK (DIA) CUTS CRUDE PRICES</TITLE>\n
<DATELINE>NEW YORK, FEB 26 -</DATELINE>\n <BODY>Diamond Shamrock
Corp said that\n effective today it had cut its contract prices for crude oil by\n1.50 dlrs
a barrel.\n The reduction brings its posted price for West Texas\nIntermediate to 16.00
dlrs a barrel, the copany said.\n &quot;The price reduction today was made in the light
of falling\noil product prices and a weak crude oil market,&quot; a
company\nspokeswoman said.\n Diamond is the latest in a line of U.S. oil companies
that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil
markets.\n Reuter</BODY>\n </TEXT>\n</REUTERS>
> reuters<-tm_map(reuters,stripWhitespace)
> writeLines([Link](reuters[1]))

50
<?xml version="1.0"?> <REUTERS TOPICS="YES" LEWISSPLIT="TRAIN"
CGISPLIT="TRAINING-SET" OLDID="5670" NEWID="127"> <DATE>26-FEB-
1987 17:00:56.04</DATE> <TOPICS> <D>crude</D> </TOPICS> <PLACES>
<D>usa</D> </PLACES> <PEOPLE/> <ORGS/>
<EXCHANGES/>
<COMPANIES/> <UNKNOWN>Y f0119 reute u f BC-DIAMOND-SHAMROCK-
(DIA 02-26 0097</UNKNOWN> <TEXT> <TITLE>DIAMOND SHAMROCK
(DIA) CUTS CRUDE PRICES</TITLE> <DATELINE>NEW YORK, FEB 26
</DATELINE> <BODY>Diamond Shamrock Corp said that effective today it had cut
its contract prices for crude oil by 1.50 dlrs a barrel. The reduction brings its posted
price for West Texas Intermediate to 16.00 dlrs a barrel, the copany said. &quot;The
price reduction today was made in the light of falling oil product prices and a weak
crude oil market,&quot; a company spokeswoman said. Diamond is the latest in a line
of U.S. oil companies that have cut its contract, or posted, prices over the last two days
citing weak oil markets. Reuter</BODY> </TEXT> </REUTERS>
list(language = "en")
list()
> reuters<-tm_map(reuters,tolower)
> writeLines([Link](reuters[1]))
<?xml version="1.0"?> <reuters topics="yes" lewissplit="train" cgisplit="training-set"
oldid="5670" newid="127"> <date>26-feb-1987 17:00:56.04</date>
<topics>
<d>crude</d> </topics> <places> <d>usa</d> </places> <people/>
<orgs/>
<exchanges/> <companies/> <unknown>y f0119 reute u f bc-diamond-shamrock-(dia
02-26 0097</unknown> <text> <title>diamond shamrock (dia) cuts crude
prices</title> <dateline>new york, feb 26 -</dateline> <body>diamond shamrock corp
said that effective today it had cut its contract prices for crude oil by 1.50 dlrs a barrel.
the reduction brings its posted price for west texas intermediate to 16.00 dlrs a barrel,
the copany said. &quot;the price reduction today was made in the light of falling oil
product prices and a weak crude oil market,&quot; a company spokeswoman said.
diamond is the latest in a line of u.s. oil companies that have cut its contract, or posted,
prices over the last two days citing weak oil markets. reuter</body> </text> </reuters>
list(language = "en")
list()
> reuters<-tm_map(reuters,removePunctuation)
> writeLines([Link](reuters[1]))
xml version10 reuters topicsyes lewissplittrain cgisplittrainingset oldid5670 newid127
date26feb1987 17005604date topics dcruded topics places dusad places people orgs

exchanges companies unknowny f0119 reute u f bcdiamondshamrockdia 0226


51
0097unknown text titlediamond shamrock dia cuts crude pricestitle datelinenew york
feb 26 dateline bodydiamond shamrock corp said that effective today it had cut its
contract prices for crude oil by 150 dlrs a barrel the reduction brings its posted price for
west texas intermediate to 1600 dlrs a barrel the copany said quotthe price reduction
today was made in the light of falling oil product prices and a weak crude oil marketquota
company spokeswoman said diamond is the latest in a line of us oil companies that
have cut its contract or posted prices over the last two days citing weak oil markets
reuterbody text reuters
list(language = "en")

list()
> reuters<-tm_map(reuters,removeNumbers)
> writeLines([Link](reuters[1]))
xml version reuters topicsyes lewissplittrain cgisplittrainingset oldid newid datefeb date
topics dcruded topics places dusad places people orgs exchanges companies unknownyf
reute u f bcdiamondshamrockdia unknown text titlediamond shamrock dia cuts crude
pricestitle datelinenew york feb dateline bodydiamond shamrock corp said that
effective today it had cut its contract prices for crude oil by dlrs a barrel the reduction
brings its posted price for west texas intermediate to dlrs a barrel the copany said
quotthe price reduction today was made in the light of falling oil product prices and a
weak crude oil marketquot a company spokeswoman said diamond is the latest in a line
of us oil companies that have cut its contract or posted prices over the last two days
citing weak oil markets reuterbody text reuters
list(language = "en")

list()
> reuters<-tm_map(reuters,removeWords,stopwords("english"))
> writeLines([Link](reuters[1]))
xml version reuters topicsyes lewissplittrain cgisplittrainingset oldid newid datefeb date
topics dcruded topics places dusad places people orgs exchanges companies unknowny
f reute u f bcdiamondshamrockdia unknown text titlediamond shamrock dia cuts crude
pricestitle datelinenew york feb dateline bodydiamond shamrock corp said effective
today cut contract prices crude oil dlrs barrel reduction brings posted price west texas
intermediate dlrs barrel copany said quotthe price reduction today made light falling oil
product prices weak crude oil marketquot company spokeswoman said diamond latest
line us oil companies cut contract posted prices last two days citing weak oil markets
reuterbody text reuters
list(language = "en")
list()
> dtm<-DocumentTermMatrix(reuters)
52
inspect(dtm)
<<DocumentTermMatrix (documents: 20, terms: 1134)>>
Non-/sparse entries: 2351/20329
Sparsity : 90%
Maximal term length: 22
Weighting : term frequency (tf)
Sample :
Terms

Docs mln oil opec orgs places prices reuters said text topics
[Link] 4 12 10 2 2 6 3 11 2 2
[Link] 4 7 7 2 2 4 2 10 2 2
[Link] 1 3 1 2 2 1 2 1 2 2
[Link] 0 3 2 2 2 2 2 3 2 2
[Link] 0 5 1 1 2 1 2 5 2 2
[Link] 3 9 7 2 2 9 2 7 2 2
[Link] 10 5 5 2 2 5 2 8 2 2
[Link] 3 5 0 1 2 2 2 2 2 2
[Link] 3 6 0 1 2 2 2 2 2 2
[Link] 0 3 0 1 2 3 2 4 2 2

> findFreqTerms(dtm, 5)
[1] "barrel" "cgisplittrainingset" "companies" "company" "contract"
[6] "crude" "date" "datefeb" "dateline" "datelinenew"
[11] "dcruded" "dlrs" "dusad" "exchanges" "feb"
[16] "last" "lewissplittrain" "markets" "newid" "oil"
[21] "oldid" "orgs" "people" "places" "posted"
[26] "price" "prices" "reute" "reuterbody" "reuters"
[31] "said" "text" "today" "topics" "topicsyes"
[36] "unknown" "unknowny" "version" "west" "xml"
[41] "york" "ability" "agreement" "analysts" "april"

53
[46] "bpd" "december" "demand" "dopecd" "emergency"
[51] "energy" "hold" "industry" "march" "market"
[56] "may" "meet" "meeting" "mln" "new"
[61] "now" "one" "opec" "output" "petroleum"
[66] "production" "quota" "research" "sell" "sources"
[71] "will" "world" "pct" "present" "reserve"
[76] "reserves" "study" "year" "ali" "also"
[81] "arab" "barrels" "ceiling" "daily" "datemar"
[86] "expected" "gulf" "international" "kuwait" "members"
[91] "minister" "month" "official" "plans" "qatar"
[96] "quoted" "recent" "says" "sheikh" "traders"
[101] "united" "unknownrm" "week" "economic" "economy"
[106] "exports" "government" "group" "help" "imports"
[111] "report" "states" "dsaudiarabiad" "abdulaziz" "billion"
[116] "budget" "expenditure" "riyals" "accord" "agency"
[121] "arabia" "commitment" "nazer" "saudi" "february"
[126] "january" "exchange" "futures" "nymex"
> findAssocs(dtm, "opec", 0.7)
$opec

meeting named dopecd emergency oil said buyers orgs prices


0.86 0.83 0.82 0.82 0.81 0.78 0.78 0.77 0.77
analysts agreement however sell ability clearly late reports trying
0.77 0.74 0.74 0.73 0.72 0.72 0.72 0.72 0.72
winter bpd
0.72 0.70

RESULT

Thus, the program is executed successfully and the output is verified.

54
Ex. No: 13 VISUALIZING MAXIMUM CONFIRMED,
CURED COVID-19 PATIENTS IN COVID-19
Date: DATASET

AIM

To visualize the covid dataset using tableau and find the cured case, death rate, and
confirmed case.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open tableau and import the covid dataset.

STEP 3: In the worksheet choose the dimension and measure as state and cured to find the
state with the maximum number of people cured from covid.

STEP 4: In the worksheet choose the dimension and measure as Month and death rate to find
the month in 2021 with the highest death rate.

STEP 5: In the next worksheet choose the dimension and measure as state and confirmed case
to find the state with the maximum number of confirmed covid cases.

STEP 6: Stop the process.

55
OUTPUT

Report generated based on State and Cured

RESULT

Thus, the program is executed and the output is verified successful

56
Ex. No: 14 VISUALIZING THE DIFFERENCE IN
VOLUME YEAR-WISE AND QUARTER
Date:
WISE

AIM

To create a tableau program using the stock dataset 2010-13 and create a chart to
visualize the percent difference in volume for each company by year and quarter and find how
many quarters the company biogen idec show a positive percent difference in volume.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open tableau and import the dataset.

STEP 3: Drag and drop the stock 2010-2013.

STEP 4: In the worksheet choose dimensions as volume and company.

STEP 5: In the same worksheet choose measures as the date.

STEP 6: Right-click the date and select the quarter option.

STEP 7: Stop the process.

57
OUTPUT

Report generated based on Volume, Date and Company

RESULT

Thus, the program is executed and the output is verified successfully.

58
Ex. No: 15
VISUALIZING DELAY OF FLIGHTS
Date:

AIM

To use the flights table, create a bar chart showing the average of minutes of delay per
flight broken down by carrier name, and filtered by a state to only show Minnesota (MN).

ALGORITHM

STEP 1: Start the process.

STEP 2: Open tableau and import the dataset.

STEP 3: Choose the flight dataset.

STEP 4: In the worksheet choose the dimension as carrier name, state and measure as minutes
of delay per flight.

STEP 5: Select State attribute  Filter  Select MN from the list  Ok.

STEP 6: Display the output.

STEP 7: Stop the process.

59
OUTPUT

Report generated based on Carrier Name, State and


Minutes of Delay per Flight

RESULT

Thus, the program is executed and the output is verified successfully.

60
Ex. No: 16 VISUALIZING THE SALES OF THE
Date: PRODUCT IN BAKERY DATASET

AIM

To use the bakery dataset summarizes the sales of products for a day of the week and
also find the product which has the maximum sales on Monday using Tableau.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open the tableau and import the bakery dataset.

STEP 3: In the worksheet choose the dimensions as items, weekday(date), and measures as
transactions.

STEP 4: Select Weekday attribute  Filter  Select Monday from the list  Ok.

STEP 5: Display the output.

STEP 6: Stop the process.

61
OUTPUT

Report generated based on Weekday, Item and Transaction

RESULT

Thus, the program is executed and the output is verified successful


62
Ex. No: 17 VISUALIZING MONTHLY CHANGES IN
Date: STOCKS DATASET

AIM

Create a chart to visualize the monthly change in volumes of stocks, from the beginning
of 2010 to the end of 2013 for the two consecutive months that have the least fluctuation in
increase or decrease.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open power BI and import the Stock dataset.

STEP 3: Choose the chart and choose the x-axis as date (year, month) and the y-axis as
volume(min). To find the product which was the least fluctuation in order.

STEP 4: In the next page choose the x-axis as date(year, month) and the y-axis as
volume(min). To find the product which was the least fluctuation in order and choose any
two months and years in filter type

STEP 5: Stop the process.

63
OUTPUT

RESULT

Thus the above program is executed successfully and the output is verified.

64
Ex. No: 18 VISUALIZING PRODUCT-WISE SALES

Date: USING SUPERSTORE DATASET

AIM

To visualize the superstore dataset using Power BI and find the product that
produce sthe maximum sales

ALGORITHM

STEP 1: Start the process.


STEP 2: Open Power BI Desktop and import the Superstore dataset.
STEP 3: Select and Load the Orders, Return and People Tables.

STEP 4: In Page three create a Bar Chart. Drag and drop the Product Name in X-axis and Sales
Field in Y-axis. To view the Maximum Sales choose Maximum from the drop-down list to
view the Maximum Sales of each Product.

STEP 5: Stop the process.

65
OUTPUT

RESULT

Thus the above program is executed successfully and the output is verified

66
Ex. No: 19 VISUALIZING REGION-WISE AND
CATEGORY-WISE PROFIT USING
Date:
SUPERSTORE DATASET

AIM

To visualize the superstore dataset using Power BI and find the region that produces
the maximum profit.

ALGORITHM

STEP 1: Start the process.

STEP 2: Open Power BI Desktop and import the Superstore dataset.

STEP 3: Select and Load the Orders, Return and People Tables.

STEP 4: In Page One create a Pie Chart which displays the Maximum Sales and in the filter
add the condition to display the maximum sales of Product Names that starts with the Letter A.
Drag and drop the Product Name field to Legend to view the sales of each product.

STEP 5: In Page two create a Funnel Chart. Drag and drop the Profit field from the table in X-
axis and Region Field in Y-axis. To view the Maximum profit choose Maximum from the drop-
down list to view the Maximum Profit in each region

STEP 6: In Page three create a Bar Chart. Drag and drop the Product Name in X-axis and Sales
Field in Y-axis. To view the Maximum Sales choose Maximum from the drop-down list to
view the Maximum Sales of each Product.

STEP 7: In Page four create a dashboard view of all the charts and create a Q&A from
Visualization -> Build Visualization -> Select Q&A. In the search box type the query to get
the answer.

STEP 8: Stop the process.

67
OUTPUT

68
RESULT

Thus, the Superstore dataset is executed using Power BI and the output is verified
successfully.

69
Ex. No: 20 POWER QUERY OPERATIONS
USING WALMART AND MASTERS
Date:
DATASET

AIM

To perfoem the power query operations like append, merge, custom column and conditional
column on the walmart and the masters dataset in Power Bi

ALGORITHM

Step 1: Start the process


Step 2: Open the Power BI and load and transform the data
Step 3: Combine the given data in year wise using append queries
Step 4: Then bring the Masters into the data using Merge query
Step 5: Change the data type as whole number for the Temperature,Fuel_price,CPI and
Unemployment Columns
Step 6:Then Calculate Price using Custom Column (Add Column->Custom Column->Formula
=weeksales÷Quantity)
Step 7:Check if the date is holiday using the Holiday Flag Column and bring the week day for
Holiday flag=1(Add Column->Condition Column->Condition)
Step 8:Stop the process

70
OUTPUT

a) Combine the Year wise data using append queries

b) Mergethe Masters data with the walmart data

71
c) change Temperature, Fuel_Price ,CPI .Unemployment datatype to whole number

72
d) Calculate price using custom column

73
e) Check if the date is holiday using the Holiday Flag=1, bring the weekday, using
conditional column

74
RESULT

Thus the above program is executed successfully and the output is verified

75

You might also like