0% found this document useful (0 votes)
15 views18 pages

8 Python Programs for Data Classification

The document contains a series of programming exercises that demonstrate various machine learning techniques using Python. Each program utilizes different datasets and algorithms, including binary classification with make_circles and make_moons, clustering with make_blobs, decision trees with ID3, Q-learning, neural networks, and locally weighted regression. The examples include code snippets and visualizations to illustrate the results of each implementation.

Uploaded by

ririn79475
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views18 pages

8 Python Programs for Data Classification

The document contains a series of programming exercises that demonstrate various machine learning techniques using Python. Each program utilizes different datasets and algorithms, including binary classification with make_circles and make_moons, clustering with make_blobs, decision trees with ID3, Q-learning, neural networks, and locally weighted regression. The examples include code snippets and visualizations to illustrate the results of each implementation.

Uploaded by

ririn79475
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Program – 1

▪ Write a program using 2d binary classification data generated by


make_circles() have a spherical decision boundary.

# Import necessary libraries

from [Link] import make_circles

import [Link] as plt

# Generate 2d classification dataset

X, y = make_circles(n_samples=200, shuffle=True,

noise=0.1, random_state=42)

# Plot the generated datasets

[Link](X[:, 0], X[:, 1], c=y)

[Link]()

Page | 1
Output :

Page | 2
Program – 2
▪ Write a program for Two interlocking half circles represent the 2d binary
classification data produced by the make_moons() function.

#import the necessary libraries

from [Link] import make_moons

import [Link] as plt

# generate 2d classification dataset

X, y = make_moons(n_samples=500, shuffle=True,

noise=0.15, random_state=42)

# Plot the generated datasets

[Link](X[:, 0], X[:, 1], c=y)

[Link]()

Page | 3
Output :

Page | 4
Program – 3

▪ Write a program to generate the data by using the function make_blobs()


are blobs that can be utilized for clustering.

#import the necessary libraries

from [Link] import make_blobs

import [Link] as plt

# Generate 2d classification dataset

X, y = make_blobs(n_samples=500, centers=3, n_features=2, random_state=23)

# Plot the generated datasets

[Link](X[:, 0], X[:, 1], c=y)

[Link]()

Page | 5
Output :

Page | 6
Program – 4

▪ Write a program to generate data by the function make_classification()


need to balance between n_informative, n_redundant and n_classes
attributes X[:, :n_informative + n_redundant + n_repeated]

#import the necessary libraries


from [Link] import make_classification
import [Link] as plt
# generate 2d classification dataset
X, y = make_classification(n_samples = 100,
n_features=2,
n_redundant=0,
n_informative=2,
n_repeated=0,
n_classes =3,
n_clusters_per_class=1)
# Plot the generated datasets
[Link](X[:, 0], X[:, 1], c=y)
[Link]()

Page | 7
Output :

Page | 8
Program – 5
▪ Write a Program for the implementation of Q-Learning (Reinforcement
Learning)

import numpy as np
import pylab as pl
import networkx as nx
edges = [(0, 1), (1, 5), (5, 6), (5, 4), (1, 2),
(1, 3), (9, 10), (2, 4), (0, 6), (6, 7),
(8, 9), (7, 8), (1, 7), (3, 9)]

goal = 10
G = [Link]()
G.add_edges_from(edges)
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos)
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_labels(G, pos)
[Link]()

Page | 9
Output :

Page | 10
Program – 6
▪ Write a program to demonstrate the working of the decision tree
based ID3 algorithm. Use an appropriate data set for building the
decision tree and apply this knowledge to classify a new sample.

# Import necessary libraries

from sklearn import datasets

from sklearn.model_selection import train_test_split

from [Link] import DecisionTreeClassifier

from [Link] import accuracy_score, classification_report, confusion_matrix

# Load a dataset (for example, using the iris dataset)

iris = datasets.load_iris()

X = [Link]

y = [Link]

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create the decision tree classifier using ID3 algorithm (information gain)

clf = DecisionTreeClassifier(criterion="entropy")

# Train the classifier using the training data

[Link](X_train, y_train)

# Make predictions on the testing data

y_pred = [Link](X_test)

# Evaluate the performance of the classifier

print("Accuracy:", accuracy_score(y_test, y_pred))

print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

Page | 11
# Now, let's classify a new sample

new_sample = [[5.1, 3.5, 1.4, 0.2]] # example values for the new sample

predicted_class = [Link](new_sample)

print("\nPredicted class for new sample:", predicted_class)

Page | 12
output :

Page | 13
Program – 7
▪ Build an Artificial Neural Network by implementing the Backpropagation
algorithm and test the same using appropriate data sets.

import numpy as np

class NeuralNetwork:

def __init__(self, input_size, hidden_size, output_size):

self.input_size = input_size

self.hidden_size = hidden_size

self.output_size = output_size

self.learning_rate = 0.1

self.weights_input_hidden=[Link](self.input_size
, self.hidden_size)

self.weights_hidden_output=[Link](self.hidden_s
ize, self.output_size)

def sigmoid(self, x):

return 1 / (1 + [Link](-x))

def sigmoid_derivative(self, x):

return x * (1 - x)

def forward_propagation(self, input_data):

self.hidden_input = [Link](input_data, self.weights_input_hidden)

self.hidden_output = [Link](self.hidden_input)

[Link] = [Link](self.hidden_output,
self.weights_hidden_output)

return [Link]

def backward_propagation(self, input_data, target):

error = target - [Link]

d_output = error * self.sigmoid_derivative([Link])

error_hidden = d_output.dot(self.weights_hidden_output.T)

d_hidden = error_hidden *
self.sigmoid_derivative(self.hidden_output)

Page | 14
self.weights_hidden_output += self.hidden_output.[Link](d_output) *
self.learning_rate

self.weights_input_hidden += input_data.[Link](d_hidden) *
self.learning_rate

def train(self, input_data, target, epochs):

for _ in range(epochs):

output = self.forward_propagation(input_data)

self.backward_propagation(input_data, target)

def predict(self, input_data):

return self.forward_propagation(input_data)

# Example usage

input_data = [Link]([[0, 0], [0, 1], [1, 0], [1, 1]])

target = [Link]([[0], [1], [1], [0]])

# Create a neural network with 2 input nodes, 2 hidden nodes, and 1 output
node

nn = NeuralNetwork(2, 2, 1)

# Train the neural network

[Link](input_data, target, epochs=10000)

# Test the neural network

print("Predictions:")

for i in range(len(input_data)):

print(f"Input: {input_data[i]},

Predicted Output: {[Link](input_data[i])}")

Page | 15
Output :

Page | 16
Program – 8
▪ Implement the non-parametric Locally Weighted Regression
algorithm in order to fit datapoints. Select appropriate data set for
your experiment and draw graphs.
import numpy as np

import [Link] as plt

# Generate a sample dataset

[Link](0)

X = [Link](0, 10, 100)

y = 2 * X + 1 + [Link](scale=2, size=100)

# Locally Weighted Regression function

def lowess(x, y, tau=0.1):

n = len(x)

y_hat = [Link](n)

for i in range(n):

weights = [Link](-0.5 * ((x - x[i]) / tau) ** 2)

b = [Link]([[Link](weights * y), [Link](weights * y * x)])

A = [Link]([[[Link](weights), [Link](weights * x)],

[[Link](weights * x), [Link](weights * x * x)]])

theta = [Link](A, b)

y_hat[i] = theta[0] + theta[1] * x[i]

return y_hat

# Fit the data using Locally Weighted Regression

tau = 0.3 # Bandwidth parameter

y_lowess = lowess(X, y, tau)

# Plot the original data and the fitted curve

[Link](X, y, label='Original data', color='b')

[Link](X, y_lowess, label=f’LOWESS (tau={tau})', color='r')

[Link]()

[Link]()

Page | 17
Output :

Page | 18

Common questions

Powered by AI

When choosing between deterministic methods like Decision Trees and probabilistic models, consider the nature of the data, interpretability needs, and training complexity. Decision Trees provide clear rule-based classifications ideal for transparency and understanding. Probabilistic models can handle uncertainty and provide predictive distributions, lending themselves better to data with inherent variance. Scalability, handling of missing data, prediction speed, and underlying assumptions about data distribution also play significant roles in model selection .

The make_circles function generates data that forms a concentric circular pattern ideal for testing kernel-based algorithms, where the decision boundary is spherical . In contrast, make_moons produces data that forms two interlocking half circles providing a more complex pattern with non-linearly separable data . This makes make_moons suitable for testing algorithms capable of handling nonlinear boundaries without kernel transformations.

Reinforcement learning, particularly Q-Learning, can be applied to graph structures by modeling states as nodes and actions as edges. The goal can be a specific node, and Q-Learning helps find the optimal path with the highest cumulative reward from a starting node to the goal. This setup is effective for problems like maze solving, network routing, and game tactics, where decisions must be made sequentially over graph-based state spaces .

Synthetic datasets generated by functions like make_circles and make_blobs provide a controlled environment to test and understand algorithm behavior under known conditions. Their simplified nature, such as perfect separation or clustering, allows for fundamental insights into model performance without the noise and unpredictability of real-world data. While useful for understanding model limitations and behaviors, their overly tidy nature may not perfectly translate to complex real-world scenarios, potentially limiting general applicability .

The parameter 'tau' in Locally Weighted Regression acts as a bandwidth, influencing how much of the dataset impacts the regression estimation at each point. A small 'tau' leads to a fit that closely adheres to individual datapoints (overfitting), capturing more variability, whereas a large 'tau' results in a smoother, less precise fit (underfitting), capturing the overall trend better but possibly missing finer details .

The learning rate in Backpropagation determines the step size at each iteration while moving toward a minimum of the loss function. A high learning rate may converge quickly but risks overshooting the minimum, while a low learning rate offers more precision but can lead to longer training times and risk getting stuck in local minima. A well-chosen learning rate balances convergence speed with accuracy improvements .

When using make_classification, it is essential to balance the number of informative, redundant, and repeated features. Informative features are essential for classification, while redundant features are linear combinations of informative ones, and repeated features are drawn randomly. Additionally, specifying the number of classes and clusters per class helps tailor the complexity of the generated dataset to reflect realistic scenarios for classification challenges .

The 'gini' index and 'entropy' (ID3 algorithm) are both criteria used to split nodes in a Decision Tree. The 'gini' index evaluates the impurity of a dataset without incorporating the logarithm calculation that is part of 'entropy', making it computationally faster. 'Entropy' provides more information gain described in probabilistic terms, potentially yielding a tree with better discrimination capability at the risk of increased computation time. In practice, both perform similarly in terms of accuracy, but may yield different tree structures .

The graphical representation of a Decision Tree provides intuitive insights into decision rules and data structure, making it easier to interpret which variables are most important and how decisions are reached. This representation aids in better understanding and communication of the model's logic, facilitates error analysis, and supports domain experts in validating and refining decision-making criteria .

Using datasets created by make_blobs for clustering tasks is advantageous due to their design to test different clustering algorithms under controlled conditions. They offer well-separated clusters that can help in evaluating a clustering algorithm's ability to identify distinct groups in a dataset. This controlled complexity allows for testing model robustness and evaluation of cluster validity metrics such as silhouette scores .

You might also like