0% found this document useful (0 votes)

11 views30 pages

FIND-S and Candidate-Elimination Algorithms

ML lab manual

Uploaded by

danishkhanmitcs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views30 pages

FIND-S and Candidate-Elimination Algorithms

ML lab manual

Uploaded by

danishkhanmitcs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1.

Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data
from a .CSV file.
import csv

with open('[Link]', 'r') as f:

reader = [Link](f)
your_list = list(reader)

h = [['0', '0', '0', '0', '0', '0']]

for i in your_list:
print(i)
if i[-1] == "True":
j=0
for x in i:
if x != "True":
if x != h[0][j] and h[0][j] == '0':
h[0][j] = x
elif x != h[0][j] and h[0][j] != '0':
h[0][j] = '?'
else:
pass
j=j+1
print("Most specific hypothesis is")
print(h)

Output

'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True

'Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same',True
'Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change',False
'Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change',True

Maximally Specific set

[['Sunny', 'Warm', '?', 'Strong', '?', '?']]
2. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.

class Holder:
factors={} #Initialize an empty dictionary
attributes = () #declaration of dictionaries parameters with an arbitrary length

'''
Constructor of class Holder holding two parameters,
self refers to the instance of the class
'''
def init (self,attr): #
[Link] = attr
for i in attr:
[Link][i]=[]

def add_values(self,factor,values):
[Link][factor]=values

class CandidateElimination:
Positive={} #Initialize positive empty dictionary
Negative={} #Initialize negative empty dictionary

def init (self,data,fact):

self.num_factors = len(data[0][0])
[Link] = [Link]
[Link] = [Link]
[Link] = data

def run_algorithm(self):
'''
Initialize the specific and general boundaries, and loop the dataset against the
algorithm
'''
G = [Link]()
S = [Link]()

'''
Programmatically populate list in the iterating variable trial_set
'''
count=0
for trial_set in [Link]:
if self.is_positive(trial_set): #if trial set/example consists of positive examples
G = self.remove_inconsistent_G(G,trial_set[0]) #remove inconsitent data from
the general boundary
S_new = S[:] #initialize the dictionary with no key-value pair
print (S_new)
for s in S:
if not [Link](s,trial_set[0]):
S_new.remove(s)
generalization = self.generalize_inconsistent_S(s,trial_set[0])
generalization = self.get_general(generalization,G)
if generalization:
S_new.append(generalization)
S = S_new[:]
S = self.remove_more_general(S)
print(S)

else:#if it is negative

S = self.remove_inconsistent_S(S,trial_set[0]) #remove inconsitent data from

the specific boundary
G_new = G[:] #initialize the dictionary with no key-value pair (dataset can
take any value)
print (G_new)
for g in G:
if [Link](g,trial_set[0]):
G_new.remove(g)
specializations = self.specialize_inconsistent_G(g,trial_set[0])
specializationss = self.get_specific(specializations,S)
if specializations != []:
G_new += specializationss
G = G_new[:]
G = self.remove_more_specific(G)
print(G)

print (S)
print (G)

def initializeS(self):
''' Initialize the specific boundary '''
S = tuple(['-' for factor in range(self.num_factors)]) #6 constraints in the vector
return [S]

def initializeG(self):
''' Initialize the general boundary '''
G = tuple(['?' for factor in range(self.num_factors)]) # 6 constraints in the vector
return [G]

def is_positive(self,trial_set):
''' Check if a given training trial_set is positive '''
if trial_set[1] == 'Y':
return True
elif trial_set[1] == 'N':
return False
else:
raise TypeError("invalid target value")

def match_factor(self,value1,value2):
''' Check for the factors values match,
necessary while checking the consistency of
training trial_set with the hypothesis '''
if value1 == '?' or value2 == '?':
return True
elif value1 == value2 :
return True
return False

def consistent(self,hypothesis,instance):
''' Check whether the instance is part of the hypothesis '''
for i,factor in enumerate(hypothesis):
if not self.match_factor(factor,instance[i]):
return False
return True

def remove_inconsistent_G(self,hypotheses,instance):
''' For a positive trial_set, the hypotheses in G
inconsistent with it should be removed '''
G_new = hypotheses[:]

for g in hypotheses:
if not [Link](g,instance):
G_new.remove(g)
return G_new

def remove_inconsistent_S(self,hypotheses,instance):
''' For a negative trial_set, the hypotheses in S
inconsistent with it should be removed '''
S_new = hypotheses[:]
for s in hypotheses:
if [Link](s,instance):
S_new.remove(s)
return S_new

def remove_more_general(self,hypotheses):
''' After generalizing S for a positive trial_set, the hypothesis in S
general than others in S should be removed '''
S_new = hypotheses[:]
for old in hypotheses:
for new in S_new:
if old!=new and self.more_general(new,old):
S_new.remove[new]
return S_new

def remove_more_specific(self,hypotheses):
''' After specializing G for a negative trial_set, the hypothesis in G
specific than others in G should be removed '''
G_new = hypotheses[:]
for old in hypotheses:
for new in G_new:
if old!=new and self.more_specific(new,old):
G_new.remove[new]
return G_new

def generalize_inconsistent_S(self,hypothesis,instance):
''' When a inconsistent hypothesis for positive trial_set is seen in the specific
boundary S,
it should be generalized to be consistent with the trial_set ... we will get one
hypothesis'''
hypo = list(hypothesis) # convert tuple to list for mutability
for i,factor in enumerate(hypo):
if factor == '-':
hypo[i] = instance[i]
elif not self.match_factor(factor,instance[i]):
hypo[i] = '?'
generalization = tuple(hypo) # convert list back to tuple for immutability
return generalization

def specialize_inconsistent_G(self,hypothesis,instance):
''' When a inconsistent hypothesis for negative trial_set is seen in the general
boundary G
should be specialized to be consistent with the trial_set.. we will get a set of
hypotheses '''
specializations = []
hypo = list(hypothesis) # convert tuple to list for mutability
for i,factor in enumerate(hypo):
if factor == '?':
values = [Link][[Link][i]]
for j in values:
if instance[i] != j:
hyp=hypo[:]
hyp[i]=j
hyp=tuple(hyp) # convert list back to tuple for immutability
[Link](hyp)
return specializations
def get_general(self,generalization,G):
''' Checks if there is more general hypothesis in G
for a generalization of inconsistent hypothesis in S
in case of positive trial_set and returns valid generalization '''

for g in G:
if self.more_general(g,generalization):
return generalization
return None

def get_specific(self,specializations,S):
''' Checks if there is more specific hypothesis in S
for each of hypothesis in specializations of an
inconsistent hypothesis in G in case of negative trial_set
and return the valid specializations'''
valid_specializations = []
for hypo in specializations:
for s in S:
if self.more_specific(s,hypo) or s==[Link]()[0]:
valid_specializations.append(hypo)
return valid_specializations

def exists_general(self,hypothesis,G):
'''Used to check if there exists a more general hypothesis in
general boundary for version space'''

for g in G:
if self.more_general(g,hypothesis):
return True
return False

def exists_specific(self,hypothesis,S):
'''Used to check if there exists a more specific hypothesis in
general boundary for version space'''

for s in S:
if self.more_specific(s,hypothesis):
return True
return False

def more_general(self,hyp1,hyp2):
''' Check whether hyp1 is more general than hyp2 '''
hyp = zip(hyp1,hyp2)
for i,j in hyp:
if i == '?':
continue
elif j == '?':
if i != '?':
return False
elif i != j:
return False
else:
continue
return True

def more_specific(self,hyp1,hyp2):
''' hyp1 more specific than hyp2 is
equivalent to hyp2 being more general than hyp1 '''
return self.more_general(hyp2,hyp1)

dataset=[(('sunny','warm','normal','strong','warm','same'),'Y'),(('sunny','warm','high','stron
g','warm','same'),'Y'),(('rainy','cold','high','strong','warm','change'),'N'),(('sunny','warm','hi
gh','strong','cool','change'),'Y')]
attributes =('Sky','Temp','Humidity','Wind','Water','Forecast')
f = Holder(attributes)
f.add_values('Sky',('sunny','rainy','cloudy')) #sky can be sunny rainy or cloudy
f.add_values('Temp',('cold','warm')) #Temp can be sunny cold or warm
f.add_values('Humidity',('normal','high')) #Humidity can be normal or high
f.add_values('Wind',('weak','strong')) #wind can be weak or strong
f.add_values('Water',('warm','cold')) #water can be warm or cold
f.add_values('Forecast',('same','change')) #Forecast can be same or change
a = CandidateElimination(dataset,f) #pass the dataset to the algorithm class and call the
run algoritm method
a.run_algorithm()

Output

[('sunny', 'warm', 'normal', 'strong', 'warm', 'same')]

[('sunny', 'warm', 'normal', 'strong', 'warm', 'same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('?', '?', '?', '?', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?')]
3. Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to
classify a new sample.

import numpy as np
import math
from data_loader import read_data

class Node:
def init (self, attribute):
[Link] = attribute
[Link] = []
[Link] = ""

def str (self):

return [Link]

def subtables(data, col, delete):

dict = {}
items = [Link](data[:, col])

count = [Link](([Link][0], 1), dtype=np.int32)

for x in range([Link][0]):
for y in range([Link][0]):
if data[y, col] == items[x]:
count[x] += 1

for x in range([Link][0]):
dict[items[x]] = [Link]((int(count[x]), [Link][1]), dtype="|S32")

pos = 0
for y in range([Link][0]):
if data[y, col] == items[x]:
dict[items[x]][pos] = data[y]
pos += 1

if delete:
dict[items[x]] = [Link](dict[items[x]], col, 1)

return items, dict

def entropy(S):
items = [Link](S)
if [Link] == 1:
return 0

counts = [Link](([Link][0], 1))

sums = 0

for x in range([Link][0]):

counts[x] = sum(S == items[x]) / ([Link] * 1.0)

for count in counts:

sums += -1 * count * [Link](count, 2)
return sums

def gain_ratio(data, col):

items, dict = subtables(data, col, delete=False)

total_size = [Link][0]
entropies = [Link](([Link][0], 1))
intrinsic = [Link](([Link][0], 1))
for x in range([Link][0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
entropies[x] = ratio * entropy(dict[items[x]][:, -1])
intrinsic[x] = ratio * [Link](ratio, 2)

total_entropy = entropy(data[:, -1])

iv = -1 * sum(intrinsic)

for x in range([Link][0]):
total_entropy -= entropies[x]

return total_entropy / iv

def create_node(data, metadata):

if ([Link](data[:, -1])).shape[0] == 1:
node = Node("")
[Link] = [Link](data[:, -1])[0]
return node

gains = [Link](([Link][1] - 1, 1))

for col in range([Link][1] - 1):
gains[col] = gain_ratio(data, col)

split = [Link](gains)

node = Node(metadata[split])
metadata = [Link](metadata, split, 0)
items, dict = subtables(data, split, delete=True)

for x in range([Link][0]):
child = create_node(dict[items[x]], metadata)
[Link]((items[x], child))

return node

def empty(size):
s = ""
for x in range(size):
s += " "
return s

def print_tree(node, level):

if [Link] != "":
print(empty(level), [Link])
return

print(empty(level), [Link])

for value, n in [Link]:

print(empty(level + 1), value)
print_tree(n, level + 2)

metadata, traindata = read_data("[Link]")

data = [Link](traindata)
node = create_node(data, metadata)
print_tree(node, 0)

Data_loader.py
import csv
def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = [Link](csvfile, delimiter=',')
headers = next(datareader)
metadata = []
traindata = []
for name in headers:
[Link](name)
for row in datareader:
[Link](row)

return (metadata, traindata)

[Link]

outlook,temperature,humidity,wind,
answer sunny,hot,high,weak,no
sunny,hot,high,strong,no
overcast,hot,high,weak,yes
rain,mild,high,weak,yes
rain,cool,normal,weak,yes
rain,cool,normal,strong,no
overcast,cool,normal,strong,yes
sunny,mild,high,weak,no
sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes
sunny,mild,normal,strong,yes
overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes
rain,mild,high,strong,no

Output
outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes
4. Build an Artificial Neural Network by implementing the Backpropagation
algorithm and test the same using appropriate data sets.

import numpy as np
X = [Link](([2, 9], [1, 5], [3, 6]), dtype=float)
y = [Link](([92], [86], [89]), dtype=float)
X = X/[Link](X,axis=0) # maximum of X array longitudinally
y = y/100

#Sigmoid Function
def sigmoid (x):
return 1/(1 + [Link](-x))

#Derivative of Sigmoid Function

def derivatives_sigmoid(x):
return x * (1 - x)

#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=[Link](size=(inputlayer_neurons,hiddenlayer_neurons))
bh=[Link](size=(1,hiddenlayer_neurons))
wout=[Link](size=(hiddenlayer_neurons,output_neurons))
bout=[Link](size=(1,output_neurons))
#draws a random range of numbers uniformly of dim x*y
for i in range(epoch):

#Forward Propogation
hinp1=[Link](X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=[Link](hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)

#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)#how much hidden layer wts
contributed to error
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.[Link](d_output) *lr# dotproduct of nextlayererror and
currentlayerop
# bout += [Link](d_output, axis=0,keepdims=True) *lr
wh += [Link](d_hiddenlayer) *lr
#bh += [Link](d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)

output
Input:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1. 0.66666667]]
Actual Output:
[[ 0.92]
[ 0.86]
[ 0.89]]
Predicted Output:
[[ 0.89559591]
[ 0.88142069]
[ 0.8928407 ]]
5. Write a program to implement the naïve Bayesian classifier for a sample training data
set stored as a .CSV file. Compute the accuracy of the classifier, considering few test data
sets.

import csv
import random
import math

def loadCsv(filename):
lines = [Link](open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]

return dataset

def splitDataset(dataset, splitRatio):

#67% training size
trainSize = int(len(dataset) * splitRatio);
trainSet = []
copy = list(dataset);
while len(trainSet) < trainSize:
#generate indices for the dataset list randomly to pick ele for training data
index = [Link](len(copy));
[Link]([Link](index))
return [trainSet, copy]
def separateByClass(dataset):
separated = {}
#creates a dictionary of classes 1 and 0 where the values are the instacnes belonging to
each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated

def mean(numbers):
return sum(numbers)/float(len(numbers))

def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return [Link](variance)
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];
del summaries[-1]
return summaries

def summarizeByClass(dataset):
separated = separateByClass(dataset);
summaries = {}
for classValue, instances in [Link]():
#summaries is a dic of tuples(mean,std) for each class value
summaries[classValue] = summarize(instances)
return summaries

def calculateProbability(x, mean, stdev):

exponent = [Link](-([Link](x-mean,2)/(2*[Link](stdev,2))))
return (1 / ([Link](2*[Link]) * stdev)) * exponent

def calculateClassProbabilities(summaries, inputVector):

probabilities = {}
for classValue, classSummaries in [Link]():#class and attribute information
as mean and sd
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i] #take mean and sd of every attribute
for class 0 and 1 seperaely
x = inputVector[i] #testvector's first attribute
probabilities[classValue] *= calculateProbability(x, mean, stdev);#use
normal dist
return probabilities

def predict(summaries, inputVector):

probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in [Link]():#assigns that class which has he
highest prob
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel

def getPredictions(summaries, testSet):

predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
[Link](result)
return predictions
def getAccuracy(testSet, predictions):
correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0

def main():
filename = '[Link]'
splitRatio = 0.67
dataset = loadCsv(filename);

trainingSet, testSet = splitDataset(dataset, splitRatio)

print('Split {0} rows into train={1} and test={2} rows'.format(len(dataset),
len(trainingSet), len(testSet)))
# prepare model
summaries = summarizeByClass(trainingSet);
# test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracy of the classifier is : {0}%'.format(accuracy))

main()

Output
confusion matrix is as
follows [[17 0 0]
[ 0 17 0]
[ 0 0 11]]
Accuracy metrics
precision recall f1-score support

0 1.00 1.00 1.00 17

1 1.00 1.00 1.00 17
2 1.00 1.00 1.00 11

avg / total 1.00 1.00 1.00 45

6. Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to write
the program. Calculate the accuracy, precision, and recall for your data set.

import pandas as pd
msg=pd.read_csv('[Link]',names=['message','label'])
print('The dimensions of the dataset',[Link])
msg['labelnum']=[Link]({'pos':1,'neg':0})
X=[Link]
y=[Link]
print(X)
print(y)

#splitting the dataset into train and test data

from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print([Link])
print([Link])
print([Link])
print([Link])
#output of count vectoriser is a sparse matrix
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print(count_vect.get_feature_names())

df=[Link](xtrain_dtm.toarray(),columns=count_vect.get_feature_names())
print(df)#tabular representation
print(xtrain_dtm) #sparse matrix representation

# Training Naive Bayes (NB) classifier on training data.

from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = [Link](xtest_dtm)

#printing accuracy metrics

from sklearn import metrics
print('Accuracy metrics')
print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))

'''docs_new = ['I like this place', 'My boss is not my saviour']

X_new_counts = count_vect.transform(docs_new)
predictednew = [Link](X_new_counts)
for doc, category in zip(docs_new, predictednew):
print('%s->%s' % (doc, [Link][category]))'''

I love this sandwich,pos

This is an amazing place,pos
I feel very good about these beers,pos
This is my best work,pos
What an awesome view,pos
I do not like this restaurant,neg
I am tired of this stuff,neg
I can't deal with this,neg
He is my sworn enemy,neg
My boss is horrible,neg
This is an awesome place,pos
I do not like the taste of this juice,neg
I love to dance,pos
I am sick and tired of this place,neg
What a great holiday,pos
That is a bad locality to stay,neg
We will have good fun tomorrow,pos
I went to my enemy's house today,neg

OUTPUT

['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'boss', 'can', 'deal',
'do', 'enemy', 'feel', 'fun', 'good', 'have', 'horrible', 'house', 'is', 'like', 'love', 'my',
'not', 'of', 'place', 'restaurant', 'sandwich', 'sick', 'stuff', 'these', 'this', 'tired', 'to',
'today', 'tomorrow', 'very', 'view', 'we', 'went', 'what', 'will', 'with', 'work']
about am amazing an and awesome beers best boss can ... today \
0 1 0 0 0 0 0 1 0 0 0 ... 0
1 0 0 0 0 0 0 0 1 0 0 ... 0
2 0 0 1 1 0 0 0 0 0 0 ... 0
3 0 0 0 0 0 0 0 0 0 0 ... 1
4 0 0 0 0 0 0 0 0 0 0 ... 0
5 01 0 0 1 0 0 0 0 0 ... 0
6 0 0 0 0 0 0 0 0 0 1 ... 0
7 0 0 0 0 0 0 0 0 0 0 ... 0
8 0 1 0 0 0 0 0 0 0 0 ... 0
9 0 0 0 1 0 1 0 0 0 0 ... 0
10 0 0 0 0 0 0 0 0 0 0 ... 0
11 0 0 0 0 0 0 0 0 1 0 ... 0
12 0 0 0 1 0 1 0 0 0 0 ... 0
tomorrow very view we went what will with work
0 0 1 0 0 00 0 00
1 0 0 0 0 00 0 0 1
2 0 0 0 0 00 0 0 0
3 0 0 0 0 10 0 0 0
4 0 0 0 0 00 0 0 0
5 0 0 0 0 00 0 0 0
6 0 0 0 0 00 0 1 0
7 1 0 0 1 00 1 0 0
8 0 0 0 0 00 0 0 0
7. Write a program to construct a Bayesian network considering medical data. Use
this model to demonstrate the diagnosis of heart patients using standard Heart
Disease Data Set. You can use Java/Python ML library classes/API.

From pomegranate import*

Tuberculosis=ConditionalProbabilityTable(

Smoking = DiscreteDistribut
Lung = ConditionalProbabilityTable(

Bronchitis = ConditionalProbabilityTable(
0.92],

Tuberculosis_or_cancer = ConditionalProbabilityTable(

0.0],
1.0],

Xray = ConditionalProbabilityTable(
dyspnea = ConditionalProbabilityTable(

0.04],
0.89],

network.add_nodes(s0,s1,s2)
network.add_edge(s0,s1)
network.add_edge(s1.s2)
[Link]()
8. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data
set for clustering using k-Means algorithm. Compare the results of these two
algorithms and comment on the quality of clustering. You can add Java/Python ML
library classes/API in the program.

import numpy as np
import [Link] as plt
from [Link].samples_generator import make_blobs
X, y_true = make_blobs(n_samples=100, centers =
4,Cluster_std=0.60,random_state=0)
X = X[:, ::-1]

#flip axes for better plotting

from [Link] import GaussianMixture
gmm = GaussianMixture (n_components = 4).fit(X)
lables = [Link](X)
[Link](X[:, 0], X[:, 1], c=labels, s=40,
probs = gmm.predict_proba(X)
print(probs[:5].round(3))
size = 50 * [Link](1) ** 2 # square emphasizes differences

from [Link] import Ellipse

def draw_ellipse(position, covariance, ax=None, **kwargs);

Ax = ax or [Link]()
# Convert covariance to principal axes
if [Link] ==(2,2):

U, s, Vt = [Link](covariance)
Angle = [Link](np.arctan2(U[1, 0], U[0,0]))
Width, height = 2 * [Link](s)
else:
angle = 0
width, height = 2 * [Link](covariance)

#Draw the Ellipse

for nsig in range(1,4):
ax.add_patch(Ellipse(position, nsig * width, nsig *height,
angle, **kwargs))

def plot_gmm(gmm, X, label=True, ax=None):

ax = ax or [Link]()
labels = [Link](X).predict(X)
if label:
else:
[Link](X[:, 0], x[:, 1], s=40, zorder=2)

w_factor = 0.2 / gmm.weights_.max()

for pos, covar, w in zip(gmm.means_, gmm.covariances_, gmm.weights_):
draw_ellipse(pos, covar, alpha=w * w_factor)

gmm = GaussianMixture(n_components=4, random_state=42)

plot_gmm(gmm, X)
gmm = Gaussian
random_state=42)
plot_gmm(gmm, X)

Output

[[1 ,0, 0, 0]
[0 ,0, 1, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]]
K-means
from [Link] import KMeans

#from sklearn import metrics

import numpy as np
import [Link] as plt
import pandas as pd
data=pd.read_csv("[Link]")
df1=[Link](data)
print(df1)
f1 = df1['Distance_Feature'].values
f2 = df1['Speeding_Feature'].values

X=[Link](list(zip(f1,f2)))
[Link]()
[Link]([0, 100])
[Link]([0, 50])
[Link]('Dataset')
[Link]('speeding_feature')
[Link]('Distance_Feature')
[Link](f1,f2)
[Link]()

# create new plot and data

[Link]()
colors = ['b', 'g', 'r']
markers = ['o', 'v', 's']

# KMeans algorithm
#K = 3
kmeans_model = KMeans(n_clusters=3).fit(X)

[Link]()
for i, l in enumerate(kmeans_model.labels_):
[Link](f1[i], f2[i], color=colors[l], marker=markers[l],ls='None')
[Link]([0, 100])
[Link]([0, 50])

[Link]()

Driver_ID,Distance_Feature,Speeding_Feature
3423311935,71.24,28
3423313212,52.53,25
3423313724,64.54,27
3423311373,55.69,22
3423310999,54.58,25
3423313857,41.91,10
3423312432,58.64,20
3423311434,52.02,8
3423311328,31.25,34
3423312488,44.31,19
3423311254,49.35,40
3423312943,58.07,45
3423312536,44.22,22
3423311542,55.73,19
3423312176,46.63,43
3423314176,52.97,32
3423314202,46.25,35
3423311346,51.55,27
3423310666,57.05,26
3423313527,58.45,30
3423312182,43.42,23
3423313590,55.68,37
3423312268,55.15,18
9. Write a program to implement k-Nearest Neighbour algorithm to classify the iris
data set. Print both correct and wrong predictions. Java/Python ML library classes
can be used for this problem.
import csv
import random
import math
import operator

def loadDataset(filename, split, trainingSet=[] , testSet=[]):

with open(filename, 'rb') as csvfile:
lines = [Link](csvfile)
dataset = list(lines)
for x in range(len(dataset)-1):
for y in range(4):
dataset[x][y] = float(dataset[x][y])
if [Link]() < split:
[Link](dataset[x])
else:
[Link](dataset[x])

def euclideanDistance(instance1, instance2, length):

distance = 0
for x in range(length):
distance += pow((instance1[x] - instance2[x]), 2)
return [Link](distance)

def getNeighbors(trainingSet, testInstance, k):

distances = []
length = len(testInstance)-1
for x in range(len(trainingSet)):
dist = euclideanDistance(testInstance, trainingSet[x], length)
[Link]((trainingSet[x], dist))
[Link](key=[Link](1))
neighbors = []
for x in range(k):
[Link](distances[x][0])
return neighbors

def getResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)):
response = neighbors[x][-1]
if response in classVotes:
classVotes[response] += 1
else:
classVotes[response] = 1
sortedVotes =
sorted([Link](),
reverse=True)
return sortedVotes[0][0]

def getAccuracy(testSet,
predictions): correct = 0
for x in
range(len(testSet)):
key=[Link](1
),
if testSet[x][-1] == predictions[x]:
correct += 1
return (correct/float(len(testSet))) * 100.0

def main():
# prepare
data
trainingSet=
[] testSet=[]
split = 0.67
loadDataset('[Link]', split, trainingSet,
testSet) print('Train set: ' + repr(len(trainingSet)))
print('Test set: ' + repr(len(testSet)))
# generate
predictions
predictions=[]
k=3
for x in range(len(testSet)):
neighbors = getNeighbors(trainingSet, testSet[x],
k) result = getResponse(neighbors)
[Link](result)
print('> predicted=' + repr(result) + ', actual=' + repr(testSet[x][-
1])) accuracy = getAccuracy(testSet, predictions)
print('Accuracy: ' + repr(accuracy) +

'%') main()
OUTPUT
Confusion matrix is as follows

[[11 0 0]

[0 9 1]

[0 1 8]]

Accuracy metrics

0 1.00 1.00 1.00 11

1 0.90 0.90 0.90 10

2 0.89 0.89 0,89 9

Avg/Total 0.93 0.93 0.93 30

10. Implement the non-parametric Locally Weighted Regression algorithm in order
to fit data points. Select appropriate data set for your experiment and draw graphs.

from numpy import *

import operator
from os import listdir
import matplotlib
import [Link] as plt
import pandas as pd
import numpy as np1
import [Link] as np
from [Link] import pearsonr

def kernel(point,xmat, k):

m,n = [Link](xmat)
weights = [Link]([Link]((m)))
for j in range(m):
diff = point - X[j]
weights[j,j] = [Link](diff*diff.T/(-2.0*k**2))
return weights

def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

def localWeightRegression(xmat,ymat,k):
m,n = [Link](xmat)
ypred = [Link](m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

# load data points

data = pd.read_csv('[Link]')
bill = [Link](data.total_bill)
tip = [Link]([Link])

#preparing and add 1 in bill

mbill = [Link](bill)
mtip = [Link](tip)
m= [Link](mbill)[1]
one = [Link]([Link](m))
X= [Link]((one.T,mbill.T))
#set k here
ypred = localWeightRegression(X,mtip,2)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]

Output

Common questions

The k-Nearest Neighbors (k-NN) algorithm is advantageous due to its simplicity and effectiveness in diverse applications, as it requires no prior assumptions about data distribution and works well with smaller datasets. It is also intuitive, as it classifies new instances based on majority voting from the 'k' nearest neighbors. However, k-NN is computationally intensive, especially as the dataset size grows, because it must calculate distances to all training data points for classification. It is also sensitive to irrelevant or redundant features and the choice of 'k' can significantly impact performance. Large 'k' values tend to smooth decision boundaries more, potentially ignoring minor class-specific characteristics .

The Gaussian Mixture Model (GMM) uses the Expectation-Maximization (EM) algorithm to fit data into clusters modeled by multivariate Gaussian distributions. In the E-step, it calculates the probability of each data point belonging to each cluster, and in the M-step, it updates the Gaussian parameters (mean, variance) to maximize these probabilities. GMM is beneficial over k-Means as it accounts for the data's probabilistic nature and varying cluster shapes by allowing for clusters of different sizes and ellipticity. Unlike k-Means, which assumes clusters are spherical and equally sized, GMM can model more complex cluster structures, offering flexibility and improved modeling of actual data distribution .

Choosing an appropriate 'k', the number of clusters, in the k-Means algorithm is significant because it directly affects the algorithm's ability to correctly partition the data into meaningful groups. If 'k' is too small, it may lead to underfitting, where distinct clusters are merged, failing to capture the data's structure. Conversely, if 'k' is too large, it may lead to overfitting, where one cluster is split into many, potentially capturing noise as part of different clusters. An incorrect choice of 'k' could result in poor data partitioning, thereby affecting the algorithm's ability to provide meaningful insights .

Entropy is used in decision tree learning to measure the unpredictability or disorder in the dataset. In the ID3 algorithm, it is used for selecting which attribute to split a node. During node splitting, the algorithm calculates the information gain, which is the difference in entropy before and after a dataset is split on an attribute. The ID3 algorithm selects the attribute with the highest information gain, meaning the one that best reduces uncertainty about the class labels. This step is critical because it directly influences the effectiveness of the decision tree in classifying data by ensuring that the nodes are split in the most informative way, leading to a more accurate and concise decision tree .

The Naïve Bayesian classifier works by applying Bayes' theorem with the 'naïve' assumption of independence between features. It calculates the posterior probability for each class, given the feature values for a data instance, and classifies the instance into the class with the highest likelihood. The key assumption is that the presence of a particular feature is independent of the presence of any other feature, given the class label. This can lead to suboptimal performance when features are correlated, as the independence assumption fails, potentially resulting in incorrect probability estimates. Despite this limitation, it is effective for large datasets with many features due to its simplicity and computational efficiency .

The Candidate-Elimination algorithm uses both the specific boundary, S, and the general boundary, G, to find all hypotheses consistent with the training data, unlike the FIND-S algorithm which finds only the most specific hypothesis. Candidate-Elimination iteratively refines these boundaries by removing hypotheses in G inconsistent with positive examples and hypotheses in S inconsistent with negative examples. It specializes G in response to negative examples to ensure no negative instances are covered, and generalizes S in response to positive examples to ensure all positive instances are covered, thus carefully maintaining consistency across the entire version space .

Feature scaling, such as normalization or standardization, is crucial in k-Means clustering because it ensures that features contribute equally to the distance calculations. k-Means uses the Euclidean distance to assign instances to clusters, so features with larger ranges would disproportionately influence distances and, consequently, the clustering results. Without scaling, the algorithm might place undue weight on some features, leading to biased clustering. By scaling features to a uniform range, each feature can have an equitable impact on cluster formation, resulting in more accurate and meaningful clusters .

The FIND-S algorithm is used to find the most specific hypothesis that fits positive examples in a training dataset. It works by iteratively generalizing from the most specific hypothesis, represented initially by all '0's, to a hypothesis that is as specific as possible while still being consistent with all positive examples. For each positive example, the algorithm compares each attribute with the corresponding attribute of the hypothesis. If the attribute in the hypothesis is '0', it is replaced by the attribute from the example; if the attribute in the hypothesis is different from the example and not '0', it is replaced by '?' indicating any value is acceptable. The process continues until all positive examples are processed .

Gradient descent optimization functions by iteratively adjusting parameters to minimize a cost function, which measures the error between the predicted output and the actual output. In each iteration, it calculates the gradient, or partial derivative, of the cost function with respect to each parameter, and updates the parameters in the direction that reduces the cost. This process continues until the model converges at a minimum point, ideally reaching optimal parameter values. It is a common choice because it is computationally efficient, applicable to a wide variety of machine learning models, and capable of handling large datasets effectively due to its ability to scale well with the amount of data .

Parametric models, like linear regression, assume a specific form for the data distribution and learn a finite number of parameters corresponding to this assumed model. They are computationally efficient and suitable when prior knowledge about the data distribution exists. Non-parametric models, like k-nearest neighbors, do not assume any data distribution and instead learn from the data achieved with potentially infinite parameters as more data is available. They are flexible, handling varying data, complex distributions, and scenarios with high dimensionality better than parametric models, although they require more data and computational resources. As such, non-parametric models are preferred when data characteristics are unknown or do not fit any particular parametric form .

ML Lab Manual
No ratings yet
ML Lab Manual
32 pages
Candidate-Elimination Algorithm in Python
No ratings yet
Candidate-Elimination Algorithm in Python
6 pages
ML Lab Manual: FIND-S & Candidate-Elimination
No ratings yet
ML Lab Manual: FIND-S & Candidate-Elimination
30 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Machine Learning Algorithms Implementation
No ratings yet
Machine Learning Algorithms Implementation
33 pages
Machine Learning Lab Experiments Guide
No ratings yet
Machine Learning Lab Experiments Guide
34 pages
Machine Learning Algorithms Manual
No ratings yet
Machine Learning Algorithms Manual
30 pages
6CS4-22 Machine Learning Lab
No ratings yet
6CS4-22 Machine Learning Lab
36 pages
FIND-S and Candidate-Elimination Algorithms
No ratings yet
FIND-S and Candidate-Elimination Algorithms
22 pages
Candidate-Elimination Algorithm Lab
No ratings yet
Candidate-Elimination Algorithm Lab
23 pages
Machine Learning Lab Manual 1
No ratings yet
Machine Learning Lab Manual 1
29 pages
B.Tech II Sem Machine Learning Lab Manual
No ratings yet
B.Tech II Sem Machine Learning Lab Manual
51 pages
FIND-S and Candidate-Elimination Algorithms
No ratings yet
FIND-S and Candidate-Elimination Algorithms
105 pages
Machine Learning Lab Experiments Guide
No ratings yet
Machine Learning Lab Experiments Guide
31 pages
FIND-S and Candidate-Elimination Algorithms
No ratings yet
FIND-S and Candidate-Elimination Algorithms
23 pages
ML Lab Manual for JNTUA R20 Students
No ratings yet
ML Lab Manual for JNTUA R20 Students
77 pages
Naive Bayes Classifier in ML Lab
No ratings yet
Naive Bayes Classifier in ML Lab
22 pages
FIND-S Algorithm Implementation in Python
No ratings yet
FIND-S Algorithm Implementation in Python
49 pages
ML Lab Manual: FIND-S & ID3 Algorithms
No ratings yet
ML Lab Manual: FIND-S & ID3 Algorithms
54 pages
Machine Learning Lab: CSV Implementations
No ratings yet
Machine Learning Lab: CSV Implementations
31 pages
FIND-S and Candidate-Elimination Algorithms
No ratings yet
FIND-S and Candidate-Elimination Algorithms
31 pages
B.tech Machine Learning-Lab
No ratings yet
B.tech Machine Learning-Lab
33 pages
Machine Learning Lab Manual 2019
No ratings yet
Machine Learning Lab Manual 2019
85 pages
Decision Tree ID3 Implementation in Python
No ratings yet
Decision Tree ID3 Implementation in Python
18 pages
Machine Learning Algorithm Implementations
No ratings yet
Machine Learning Algorithm Implementations
25 pages
Candidate Elimination Algorithm
No ratings yet
Candidate Elimination Algorithm
4 pages
MLT BCAI 651 Lab Manual - Student
No ratings yet
MLT BCAI 651 Lab Manual - Student
57 pages
FIND-S and Candidate Elimination Algorithms
No ratings yet
FIND-S and Candidate Elimination Algorithms
55 pages
AD3461 Machine Learning Lab Manual
No ratings yet
AD3461 Machine Learning Lab Manual
43 pages
Find-S and Candidate-Elimination Algorithms
No ratings yet
Find-S and Candidate-Elimination Algorithms
3 pages
Machine Learning Hypothesis Algorithms
No ratings yet
Machine Learning Hypothesis Algorithms
9 pages
Data Analysis with Machine Learning Techniques
No ratings yet
Data Analysis with Machine Learning Techniques
14 pages
FIND-S and Candidate-Elimination Algorithms
No ratings yet
FIND-S and Candidate-Elimination Algorithms
15 pages
Implementing Find-S Algorithm in ML
No ratings yet
Implementing Find-S Algorithm in ML
42 pages
Candidate-Elimination Algorithm Explained
No ratings yet
Candidate-Elimination Algorithm Explained
49 pages
Machine Learning Practical File GITAM
No ratings yet
Machine Learning Practical File GITAM
99 pages
Candidate-Elimination Algorithm Demo
No ratings yet
Candidate-Elimination Algorithm Demo
5 pages
Implementing the FIND-S Algorithm
No ratings yet
Implementing the FIND-S Algorithm
19 pages
Implementing Machine Learning Algorithms
No ratings yet
Implementing Machine Learning Algorithms
4 pages
Machine Learning Practical Experiments
No ratings yet
Machine Learning Practical Experiments
36 pages
FIND-S and Candidate Elimination Algorithms
No ratings yet
FIND-S and Candidate Elimination Algorithms
33 pages
FIND-S and Candidate-Elimination Algorithms
No ratings yet
FIND-S and Candidate-Elimination Algorithms
27 pages
Find-S and Candidate Elimination Algorithms
No ratings yet
Find-S and Candidate Elimination Algorithms
5 pages
ML LAB Removed Removed
No ratings yet
ML LAB Removed Removed
22 pages
FINDS & Candidate Elimination Algorithms
No ratings yet
FINDS & Candidate Elimination Algorithms
7 pages
Machine Learning Lab Manual: FIND-S & Candidate-Elimination
No ratings yet
Machine Learning Lab Manual: FIND-S & Candidate-Elimination
25 pages
FIND-S, Candidate Elimination, ID3 Algorithms
No ratings yet
FIND-S, Candidate Elimination, ID3 Algorithms
40 pages
FIND-S, Candidate-Elimination, ID3, ANN Guide
No ratings yet
FIND-S, Candidate-Elimination, ID3, ANN Guide
9 pages
Implement FIND-S and Candidate Elimination Algorithms
No ratings yet
Implement FIND-S and Candidate Elimination Algorithms
30 pages
Candidate-Elimination and Classifiers Code
No ratings yet
Candidate-Elimination and Classifiers Code
10 pages
Candidate Elimination & ID3 Algorithms in Python
No ratings yet
Candidate Elimination & ID3 Algorithms in Python
90 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
23 pages
FIND-S and Candidate-Elimination Algorithms
No ratings yet
FIND-S and Candidate-Elimination Algorithms
24 pages
The Candidate Elimination Algorithm
No ratings yet
The Candidate Elimination Algorithm
3 pages
Implementing Machine Learning Algorithms
No ratings yet
Implementing Machine Learning Algorithms
10 pages
Practical - 1
No ratings yet
Practical - 1
31 pages
Machine Learning Lab Record for AI
No ratings yet
Machine Learning Lab Record for AI
34 pages
Understanding Communicative Competence
No ratings yet
Understanding Communicative Competence
28 pages
Sysml V2 Submission Team (SST) Sysml V2 Update January 30, 2021
No ratings yet
Sysml V2 Submission Team (SST) Sysml V2 Update January 30, 2021
31 pages
GARDEN 2T Safety Data Sheet (REACH)
No ratings yet
GARDEN 2T Safety Data Sheet (REACH)
8 pages
Static Analysis of Manipulators
No ratings yet
Static Analysis of Manipulators
15 pages
WAEC Chief Examiner's Report 2019
No ratings yet
WAEC Chief Examiner's Report 2019
316 pages
Reading Strategies: Note Making & Summarizing
No ratings yet
Reading Strategies: Note Making & Summarizing
12 pages
NLP Practitioner: INLPTA Standards
No ratings yet
NLP Practitioner: INLPTA Standards
5 pages
Understanding Generative AI Concepts
No ratings yet
Understanding Generative AI Concepts
22 pages
Understanding HRD Climate and Culture
67% (12)
Understanding HRD Climate and Culture
24 pages
Class 2 Maths Multiplication Worksheet
94% (18)
Class 2 Maths Multiplication Worksheet
6 pages
E-Bike Design and Fabrication Overview
No ratings yet
E-Bike Design and Fabrication Overview
8 pages
Power Commander V Installation Guide
No ratings yet
Power Commander V Installation Guide
5 pages
Face Detection with OpenCV in Python
No ratings yet
Face Detection with OpenCV in Python
17 pages
Carer NK Series Forklift Overview
No ratings yet
Carer NK Series Forklift Overview
5 pages
MT9000 Series Cutting Inserts Overview
No ratings yet
MT9000 Series Cutting Inserts Overview
30 pages
Foam System Installation Method Statement
No ratings yet
Foam System Installation Method Statement
17 pages
MED3LAB Practice Exam Overview
No ratings yet
MED3LAB Practice Exam Overview
6 pages
Understanding Constellations and Visibility
No ratings yet
Understanding Constellations and Visibility
4 pages
V-I Characteristics of P-N Junction Diode
No ratings yet
V-I Characteristics of P-N Junction Diode
3 pages
Understanding Global Citizenship Concepts
No ratings yet
Understanding Global Citizenship Concepts
18 pages
Circular
No ratings yet
Circular
5 pages
High-Level AM Transmitter Overview
No ratings yet
High-Level AM Transmitter Overview
125 pages
On Tobler's First Law of Geography
No ratings yet
On Tobler's First Law of Geography
8 pages
Understanding Safety and Hazards Basics
No ratings yet
Understanding Safety and Hazards Basics
23 pages
Light Propagation: Reflection and Refraction
No ratings yet
Light Propagation: Reflection and Refraction
20 pages
Vaccine Vial Monitor Performance Specs
No ratings yet
Vaccine Vial Monitor Performance Specs
10 pages
Java OOP: Classes, Interfaces, and Inheritance
No ratings yet
Java OOP: Classes, Interfaces, and Inheritance
1 page
Correlation of Reading and Translation Skills
No ratings yet
Correlation of Reading and Translation Skills
7 pages
Media and Information Literacy Guide
No ratings yet
Media and Information Literacy Guide
20 pages
SCFM to ACFM Conversion Guide
No ratings yet
SCFM to ACFM Conversion Guide
37 pages

FIND-S and Candidate-Elimination Algorithms

Uploaded by

FIND-S and Candidate-Elimination Algorithms

Uploaded by

1.

with open('[Link]', 'r') as f:

h = [['0', '0', '0', '0', '0', '0']]

'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True

Maximally Specific set

def init (self,data,fact):

S = self.remove_inconsistent_S(S,trial_set[0]) #remove inconsitent data from

[('sunny', 'warm', 'normal', 'strong', 'warm', 'same')]

def str (self):

def subtables(data, col, delete):

count = [Link](([Link][0], 1), dtype=np.int32)

return items, dict

counts = [Link](([Link][0], 1))

counts[x] = sum(S == items[x]) / ([Link] * 1.0)

for count in counts:

def gain_ratio(data, col):

total_entropy = entropy(data[:, -1])

def create_node(data, metadata):

gains = [Link](([Link][1] - 1, 1))

def print_tree(node, level):

for value, n in [Link]:

metadata, traindata = read_data("[Link]")

return (metadata, traindata)

#Derivative of Sigmoid Function

def splitDataset(dataset, splitRatio):

def calculateProbability(x, mean, stdev):

def calculateClassProbabilities(summaries, inputVector):

def predict(summaries, inputVector):

def getPredictions(summaries, testSet):

trainingSet, testSet = splitDataset(dataset, splitRatio)

0 1.00 1.00 1.00 17

avg / total 1.00 1.00 1.00 45

#splitting the dataset into train and test data

# Training Naive Bayes (NB) classifier on training data.

#printing accuracy metrics

'''docs_new = ['I like this place', 'My boss is not my saviour']

I love this sandwich,pos

From pomegranate import*

#flip axes for better plotting

from [Link] import Ellipse

#Draw the Ellipse

def plot_gmm(gmm, X, label=True, ax=None):

w_factor = 0.2 / gmm.weights_.max()

gmm = GaussianMixture(n_components=4, random_state=42)

#from sklearn import metrics

# create new plot and data

def loadDataset(filename, split, trainingSet=[] , testSet=[]):

def euclideanDistance(instance1, instance2, length):

def getNeighbors(trainingSet, testInstance, k):

0 1.00 1.00 1.00 11

1 0.90 0.90 0.90 10

2 0.89 0.89 0,89 9

Avg/Total 0.93 0.93 0.93 30

from numpy import *

def kernel(point,xmat, k):

# load data points

#preparing and add 1 in bill

Common questions

What are the advantages and disadvantages of using the k-Nearest Neighbors (k-NN) algorithm for classification tasks?

What are the advantages and disadvantages of using the k-Nearest Neighbors (k-NN) algorithm for classification tasks?

Explain how the Gaussian Mixture Model (GMM) applies the Expectation-Maximization (EM) algorithm in clustering, and what benefits does it provide compared to k-Means?

Explain how the Gaussian Mixture Model (GMM) applies the Expectation-Maximization (EM) algorithm in clustering, and what benefits does it provide compared to k-Means?

Discuss the significance of choosing appropriate 'k' in the k-Means algorithm and the impact of an incorrect choice on the clustering results.

Discuss the significance of choosing appropriate 'k' in the k-Means algorithm and the impact of an incorrect choice on the clustering results.

In the context of decision tree learning, how is entropy used to select attributes for node splitting, and why is this step critical for the ID3 algorithm?

In the context of decision tree learning, how is entropy used to select attributes for node splitting, and why is this step critical for the ID3 algorithm?

How does the Naïve Bayesian classifier work, and what assumptions does it make about the data that can impact its performance?

How does the Naïve Bayesian classifier work, and what assumptions does it make about the data that can impact its performance?

How does the Candidate-Elimination algorithm differ from the FIND-S algorithm in handling data inconsistencies?

How does the Candidate-Elimination algorithm differ from the FIND-S algorithm in handling data inconsistencies?

What role does feature scaling play in the k-Means clustering algorithm, and why is it important?

What role does feature scaling play in the k-Means clustering algorithm, and why is it important?

What is the primary purpose of the FIND-S algorithm and how does it work when applied to a dataset?

What is the primary purpose of the FIND-S algorithm and how does it work when applied to a dataset?

How does gradient descent optimization function in machine learning algorithms, and why is it a common choice for training models?

How does gradient descent optimization function in machine learning algorithms, and why is it a common choice for training models?