0% found this document useful (0 votes)
37 views21 pages

Decision Tree Learning with PlayTennis

The document discusses implementing various machine learning algorithms including breadth first search, iterative depth first search, A* search, recursive best first search, decision tree learning, feed forward backpropagation neural network, support vector machines, and AdaBoost ensemble learning. Code snippets are provided for implementing each algorithm along with sample outputs. Practical problems involving graph search, classification, and regression are presented.

Uploaded by

ssaahil.o6o4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views21 pages

Decision Tree Learning with PlayTennis

The document discusses implementing various machine learning algorithms including breadth first search, iterative depth first search, A* search, recursive best first search, decision tree learning, feed forward backpropagation neural network, support vector machines, and AdaBoost ensemble learning. Code snippets are provided for implementing each algorithm along with sample outputs. Practical problems involving graph search, classification, and regression are presented.

Uploaded by

ssaahil.o6o4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

PRACTICAL 1

Breadth First Search and Iterative Depth First Search

A) Implement the Breadth First Search algorithm to solve the


given problem .

Dataset: [Link] File

import queue as Q
from RMP import dict_gn

start = 'Arad'
goal = "Bucharest"
result = ''

def BFS(city, cityq, visitedq):


global result
if city == start:
result = result + " " + city
for eachcity in dict_gn[city].keys():
if eachcity == goal:
result = result + " " + eachcity
return
if eachcity not in [Link] and eachcity not in [Link]:
[Link](eachcity)
result = result + " " + eachcity
[Link](city)
BFS([Link](), cityq, visitedq)

def main():
cityq = [Link]()
visitedq = [Link]()
BFS(start, cityq, visitedq)
print(result)

main()

Output :
B) Implement the Iterative Depth First Search algorithm to
solve the same problem .

import queue as Q
from RMP import dict_gn

start = "Arad"
goal = "Bucharest"
result = ""

def DLS(city, visitedstack, startlimit, endlimit):


global result
found = 0
result = result + city + " "
[Link](city)
if city == goal:
return 1
if startlimit == endlimit:
return 0
for eachcity in dict_gn[city].keys():
if eachcity not in visitedstack:
found = DLS(eachcity, visitedstack, startlimit + 1, endlimit)
if found:
return found

def IDDFS(city, visitedstack, endlimit):


global result
for i in range(0, endlimit):
print("Searching at Limit:", i)
found = DLS(city, visitedstack, 0, i)
if found:
print("Found")
break
else:
print("Not Found!")
print(result)
print(" ")
result = ""
visitedstack = []

def main():
visitedstack = []
IDDFS(start, visitedstack, 9)
print("IDDFS Traversal from ", start, " to ", goal, " is:")
print(result)
main()

Output :
Practical 2
A* Search and Recursive Best First Search

A) Implement the A* Search algorithm for solving a pathfinding


problem.

Dataset: [Link] File

import queue as Q
from RMP import dict_gn
from RMP import dict_hn

start = 'Arad'
goal = 'Bucharest'
result = ''

def get_fn(citystr):
cities = [Link](",")
hn = gn = 0
for ctr in range(0, len(cities) - 1):
gn = gn + dict_gn[cities[ctr]][cities[ctr + 1]]
hn = dict_hn[cities[len(cities) - 1]]
return hn + gn

def expand(cityq):
global result
tot, citystr, thiscity = [Link]()
if thiscity == goal:
result = citystr + "::" + str(tot)
return
for cty in dict_gn[thiscity]:
[Link]((get_fn(citystr + "," + cty), citystr + "," + cty, cty))
expand(cityq)

def main():
cityq = [Link]()
thiscity = start
[Link]((get_fn(start), start, thiscity))
expand(cityq)
print("The A* path with the total is: ")
print(result)

main()
Output :
B) Implement the Recursive Best-First Search algorithm for the
same problem.

import queue as Q
from RMP import dict_gn
from RMP import dict_hn

start = 'Arad'
goal = 'Bucharest'
result = ''

def get_fn(citystr):
cities = [Link](",")
hn = gn = 0
for ctr in range(0, len(cities) - 1):
gn = gn + dict_gn[cities[ctr]][cities[ctr + 1]]
hn = dict_hn[cities[len(cities) - 1]]
return hn + gn

def printout(cityq):
for i in range(0, [Link]()):
print([Link][i])

def expand(cityq):
global result
tot, citystr, thiscity = [Link]()
nexttot = 999
if not [Link]():
nexttot, nextcitystr, nextthiscity = [Link][0]
if thiscity == goal and tot < nexttot:
result = citystr + "::" + str(tot)
return
print("Expanded city ", thiscity)
print("second best f(n) ", nexttot)
tempq = [Link]()
for cty in dict_gn[thiscity]:
[Link]((get_fn(citystr + ',' + cty), citystr + ',' + cty, cty))
for ctr in range(1, 3):
ctrtot, ctrcitystr, ctrthiscity = [Link]()
if ctrtot < nexttot:
[Link]((ctrtot, ctrcitystr, ctrthiscity))
else:
[Link]((ctrtot, citystr, thiscity))
break
printout(cityq)
expand(cityq)
def main():
cityq = [Link]()
thiscity = start
[Link]((999, "NA", "NA"))
[Link]((get_fn(start), start, thiscity))
expand(cityq)
print(result)

main()

Output :
PRACTICAL NO: 3

Implement the decision tree learning algorithm to build a


decision tree for a given dataset. Evaluate the accuracy and
efficiency on the test data set.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from [Link] import LabelEncoder
from [Link] import DecisionTreeClassifier, export_text
from [Link] import accuracy_score

# Load the dataset


PlayTennis = pd.read_csv("[Link]")

# Label encoding for categorical variables


Le = LabelEncoder()
PlayTennis['outlook'] = Le.fit_transform(PlayTennis['outlook'])
PlayTennis['temp'] = Le.fit_transform(PlayTennis['temp'])
PlayTennis['humidity'] = Le.fit_transform(PlayTennis['humidity'])
PlayTennis['windy'] = Le.fit_transform(PlayTennis['windy'])
PlayTennis['play'] = Le.fit_transform(PlayTennis['play'])

# Split the data into features (x) and target variable (y)
y = PlayTennis['play']
x = [Link](['play'], axis=1)

# Split the data into training and testing sets


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2,
random_state=42)

# Create and train the decision tree classifier


clf = DecisionTreeClassifier(criterion='entropy')
[Link](x_train, y_train)

# Evaluate the accuracy on the test dataset


y_pred = [Link](x_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy on test data: {accuracy * 100:.2f}%")

# Display the decision tree rules


tree_rules = export_text(clf, feature_names=[Link]())
print("Decision Tree Rules:\n", tree_rules)
# Install Graphviz if not already installed
# !pip install graphviz

import graphviz
dot = [Link]() # Corrected the typo here (Diagraph -> Digraph)
dot_data = tree.export_graphviz(clf, out_file=None)
graph = [Link](dot_data)
[Link](view=True) # Removed [Link](graph, view=True)

Output :
PRACTICAL NO: 4
Feed Forward Back propagation Neural Network
● Implement the Feed Forward Back propagation algorithm to
train a neural network
● Use a given dataset to train the neural network for a specific
task

import numpy as np

class NeuralNetwork:
def __init__(self):
[Link]()
self.synaptic_weights = 2 * [Link]((3, 1)) - 1

def sigmoid(self, x):


return 1 / (1 + [Link](-x))

def sigmoid_derivative(self, x):


return x * (1 - x)

def train(self, training_inputs, training_outputs, training_iterations):


for iteration in range(training_iterations):
output = [Link](training_inputs)
error = training_outputs - output
adjustments = [Link](training_inputs.T, error *
self.sigmoid_derivative(output))
self.synaptic_weights += adjustments

def think(self, inputs):


inputs = [Link](float)
output = [Link]([Link](inputs, self.synaptic_weights))
return output

if __name__ == "__main__":
# Initializing the neural network
neural_network = NeuralNetwork()

print("Beginning Randomly Generated Weights:")


print(neural_network.synaptic_weights)
# Training data consisting of 4 examples - 3 input values and 1 output
training_inputs = [Link]([[0, 0, 1],
[1, 1, 1],
[1, 0, 1],
[0, 1, 1]])

training_outputs = [Link]([[0, 1, 1, 0]]).T

# Training
neural_network.train(training_inputs, training_outputs, 15000)

print("Ending Weights After Training:")


print(neural_network.synaptic_weights)

user_input_one = float(input("User Input One: "))


user_input_two = float(input("User Input Two: "))
user_input_three = float(input("User Input Three: "))

print("Considering New Situation:", user_input_one, user_input_two,


user_input_three)
print("New Output data:")
print(neural_network.think([Link]([user_input_one, user_input_two,
user_input_three])))

Output :
PRACTICAL NO: 5
Implement the SVM algorithm for binary classification. Train a SVM
Model using the given dataset. Evaluate the performance on test data and
analyze the results

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from [Link] import StandardScaler
from [Link] import SVC
from [Link] import accuracy_score, classification_report,
confusion_matrix

# Load the dataset


diabetes_data = pd.read_csv("[Link]")
diabetes_data.head()

# Separate features (x) and target variable (y)


x = diabetes_data.drop("Outcome", axis=1)
y = diabetes_data["Outcome"]

# Split the data into training and testing sets


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2,
random_state=42)

# Standardize features by removing the mean and scaling to unit variance


scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = [Link](x_test)

# Create and train the SVM model


svm_model = SVC(kernel='linear', C=1.0)
svm_model.fit(x_train, y_train)

# Predictions on the test set


y_pred = svm_model.predict(x_test)

# Evaluate the performance


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Analyze the results


print(f"Accuracy on test data: {accuracy * 100:.2f}%")
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)

Output :
PRACTICAL NO: 6
Adaboost Ensemble Learning
● Implement the Adaboost algorithm to create an ensemble of weak
classifiers.
● Train the ensemble model on a given dataset and evaluate its
performance
● Compare the results with individual weak classifiers

import pandas as pd
from sklearn import model_selection
from [Link] import AdaBoostClassifier
from sklearn.model_selection import train_test_split
from [Link] import accuracy_score, precision_score, recall_score,
f1_score, confusion_matrix

url = "[Link]
[Link]"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pd.read_csv(url, names=names)

array = [Link]
X = array[:, 0:8]
Y = array[:, 8]

seed = 7
num_trees = 30

# Split the data into training and testing sets


X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2,
random_state=seed)

# Create an AdaBoost model with DecisionTreeClassifier as the base estimator


model = AdaBoostClassifier(n_estimators=num_trees, random_state=seed)

# Train the AdaBoost model


[Link](X_train, Y_train)

# Predictions on the test set


Y_pred = [Link](X_test)

# Evaluate performance using different metrics


accuracy = accuracy_score(Y_test, Y_pred)
precision = precision_score(Y_test, Y_pred)
recall = recall_score(Y_test, Y_pred)
f1 = f1_score(Y_test, Y_pred)
conf_matrix = confusion_matrix(Y_test, Y_pred)

# Display the evaluation metrics


print(f"Accuracy: {accuracy * 100:.2f}%")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print("Confusion Matrix:\n", conf_matrix)

Output :
PRACTICAL NO: 7
Naive Bayes' Classifier
● Implement the Naive Bayes algorithm for classification.
● Trin a Naive Bayes' model using a given dataset and calculate class
probabilities.
● Evaluate the accuracy of the model on test data and analyze the
results.

import numpy as np
import pandas as pd
import [Link] as plt
from sklearn.model_selection import train_test_split
from [Link] import KNeighborsClassifier
from [Link] import accuracy_score, confusion_matrix
import seaborn as sns

# Load your dataset, replace 'your_dataset.csv' with your dataset file


data = pd.read_csv('[Link]')
[Link]()

# Split the dataset into features (X) and the target variable (y)
X = [Link]('Outcome', axis=1) # Features
y = data['Outcome'] # Target variable

# Split the data into a training set and a testing set (e.g., 70% for training and
30% for testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Experiment with a specific value of k


k=5
knn_classifier = KNeighborsClassifier(n_neighbors=k)
knn_classifier.fit(X_train, y_train)

# Predictions on the test set


y_pred = knn_classifier.predict(X_test)

# Calculate and plot accuracy using a bar plot


accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
[Link](figsize=(10, 6))
[Link]([y_test, y_pred], bins=[-0.5, 0.5, 1.5], align='mid', color=['blue',
'orange'], label=['Actual', 'Predicted'])
[Link]('Histogram of Actual vs. Predicted Outputs')
[Link]('Outcome (0: Non-Diabetic, 1: Diabetic)')
[Link]('Frequency')
[Link]()
[Link]()

# Display confusion matrix


conf_matrix = confusion_matrix(y_test, y_pred)
[Link](conf_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Non-Diabetic', 'Diabetic'], yticklabels=['Non-Diabetic',
'Diabetic'])
[Link]('Confusion Matrix')
[Link]('Predicted')
[Link]('Actual')
[Link]()

Output :
PRACTICAL NO: 8

Implement the K-NN Algorithm for classification or regression.


Apply K-NN Algorithm on the given dataset & predict the class or value
for test data.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from [Link] import KNeighborsClassifier
from [Link] import accuracy_score, confusion_matrix

# Load your dataset, replace 'your_dataset.csv' with your dataset file


data = pd.read_csv('[Link]')
[Link]()

# Split the dataset into features (X) and the target variable (y)
X = [Link]('Outcome', axis=1) # Features
y = data['Outcome'] # Target variable

# Split the data into a training set and a testing set (e.g., 70% for training and
30% for testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# For Classification
k = 5 # Choose the number of neighbors (k)
knn_classifier = KNeighborsClassifier(n_neighbors=k)
knn_classifier.fit(X_train, y_train)

# Predictions on the test set


y_pred = knn_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)
Output :
PRACTICAL No: 9
Implement the Association Rule Mining algorithm (e.g. Apriori) to find
frequent dataset. Generate association rules from the frequent item set
and calculate their support.

import pandas as pd
from [Link] import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Load your dataset, replace 'Groceries_dataset.csv' with your dataset file


data = pd.read_csv('Groceries_dataset.csv')

# Convert the dataset to a transaction format


transactions = [Link](['Member_number',
'Date'])['itemDescription'].apply(list).[Link]()

# Use TransactionEncoder to one-hot encode the dataset


te = TransactionEncoder()
te_ary = [Link](transactions).transform(transactions)

# Create a DataFrame from the one-hot encoded array


df = [Link](te_ary, columns=te.columns_)

# Apply Apriori algorithm to find frequent itemsets


frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)

# Generate association rules


rules = association_rules(frequent_itemsets, metric='confidence',
min_threshold=0.5)

# Print frequent itemsets


print("Frequent Itemsets:")
print(frequent_itemsets)

# Print association rules


print("\nAssociation Rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence']])

Output :

Common questions

Powered by AI

The A* Search algorithm maintains a priority queue of paths sorted by heuristic costs, expanding paths in increasing order of total cost (actual path plus heuristic). It ensures the optimal solution by exploring paths with the lowest projected costs first . On the other hand, RBFS uses a similar heuristic function but recursively attempts to explore paths while backtracking when cheaper paths are found. It replaces paths based on their heuristic values compared to the best alternative paths explored thus far, which helps manage memory better than A* and avoid exhaustive searching .

Decision trees are inherently interpretable as they provide easily visualizable models where decision criteria are explicitly laid out. This makes them useful for extracting decision rules and understanding feature importance . However, neural networks, despite potentially offering higher prediction accuracy due to their ability to model complex patterns, are considered black-box models, making them less interpretable. Neural networks require careful tuning and may suffer from overfitting without adequate regularization, whereas decision trees can be pruned to avoid overfitting .

The K-NN algorithm is implemented by using a given k-value to classify instances based on their nearest neighbors in the feature space. The document details the process of splitting the data, training the model, and predicting on the test set with a k-value of 5. The model achieved an accuracy of approximately 75%, demonstrating its effectiveness in capturing local data structures for classification purposes. However, its performance is sensitive to the choice of k and the distance metric used, which can affect both accuracy and model training time .

The BFS algorithm implemented starts at a given node ('Arad') and systematically explores each node level by level until it reaches the goal node ('Bucharest'). For each city, it checks its neighbors and adds them to a queue if they haven't been visited. This ensures that the shortest path is found as BFS explores all nodes at the present depth level before moving on to nodes at the next depth level .

The Apriori algorithm identifies frequent itemsets by incrementally building larger sets from smaller ones using a specified minimum support threshold. It generates association rules by evaluating these itemsets' co-occurrence in transaction data, using metrics like support and confidence to assess rule strength. Support measures how often an itemset appears, while confidence indicates the reliability of the rule. The document demonstrates the application of Apriori and generation of rules from a groceries dataset, revealing associations like co-purchased items and validating these associations through calculated support and confidence .

AdaBoost enhances the predictive strength of weak classifiers by combining multiple models to form a strong composite model. The AdaBoost algorithm iteratively adjusts the weights of incorrectly classified instances, focusing on difficult cases to improve subsequent classifiers. As a result, the ensemble model corrects the biases of individual learners, achieving higher accuracy as demonstrated in the document with performance metrics surpassing those of standalone weak classifiers in precision, recall, and F1 score .

The Naive Bayes algorithm is implemented by calculating the probabilities of each class given certain feature values using the training data. It assumes independence among predictors, which simplifies the computation to multiplying individual probabilities. The algorithm is then evaluated on test data to assess its accuracy. Challenges include its assumption of feature independence, which may not hold true for all datasets, potentially resulting in inaccuracies in prediction when features are highly correlated .

Feature encoding and scaling play critical roles in preparing data for SVM classification. Encoding categorical variables numerically ensures the algorithm interprets categorical data correctly. Scaling features to standardize the data (i.e., zero mean and unit variance) is essential for SVMs, especially with linear kernels, to ensure that features contribute equally in determining hyperplanes. This avoids dominance by features with larger value ranges, facilitating the convergence of the optimization algorithm and improving model performance .

The SVM model's performance is evaluated based on its accuracy, confusion matrix, and classification report, which includes precision, recall, and F1 score. By splitting the dataset into training and testing sets and applying a linear kernel, the SVM model is trained. The evaluation results indicate its capability to classify samples effectively with a test accuracy of around 80%, as evidenced by the confusion matrix and comprehensive classification metrics reflecting its precision and recall for each class .

IDDFS combines the space efficiency of Depth First Search (DFS) with the breadth-first approach of BFS. It repeatedly applies depth-limited DFS, gradually increasing the depth limit until the goal is found, thus exploring consistently deeper levels. This results in space complexity resembling that of DFS (O(d)) rather than BFS (O(b^d)), where b is the branching factor and d is the depth of the goal. IDDFS is useful in problems with unknown depth limits or large possibility trees. In comparison, BFS requires storing all nodes at each level, leading to higher space requirements, but guarantees shortest path discovery .

You might also like