Machine Learning
1. Write a python program to compute Central Tendency Measures:
Mean, Median, Mode
CODE:
from collections import Counter
def compute_mean(numbers):
return sum(numbers) / len(numbers) # Indented this line
def compute_median(numbers):
sorted_numbers = sorted(numbers)
n = len(sorted_numbers)
if n % 2 == 0:
mid = n // 2
return (sorted_numbers[mid - 1] + sorted_numbers[mid]) / 2
else:
return sorted_numbers[n // 2]
def compute_mode(numbers):
count = Counter(numbers)
max_count = max([Link]())
mode = [num for num, freq in [Link]() if freq == max_count]
return mode if mode else None
if __name__ == "__main__":
# Sample input, you can change this list to test with different data
data = [1, 2, 3, 4, 5, 6, 6, 7, 8, 8, 8]
mean = compute_mean(data)
median = compute_median(data)
mode = compute_mode(data)
print(f"Data: {data}")
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}”)
OUTPUT:
Data: [1, 2, 3, 4, 5, 6, 6, 7, 8, 8, 8]
Mean: 5.2727272727272725
Median: 6
Mode: [8]
Footer 1
2. Measure of Dispersion: Variance, Standard Deviation
PROGRAM CODE:
def compute_mean(numbers):
return sum(numbers) / len(numbers) # Added indentation here
def compute_variance(numbers):
mean = compute_mean(numbers)
squared_diff = [(x - mean) ** 2 for x in numbers]
variance = sum(squared_diff) / len(numbers)
return variance
def compute_standard_deviation(numbers):
variance = compute_variance(numbers)
standard_deviation = variance ** 0.5
return standard_deviation
if __name__ == "__main__":
# Taking user input for a list of numbers
input_data = input("Enter a list of numbers separated by spaces: ")
try:
# Convert the user input into a list of floats
data = [float(num) for num in input_data.split()]
# Calculate the measures of dispersion
variance = compute_variance(data)
standard_deviation = compute_standard_deviation(data)
print(f"Data: {data}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {standard_deviation}")
except ValueError:
print("Invalid input! Please enter a list of numbers separated by
spaces.”
OUTPUT :
Enter a list of numbers separated by spaces: 10 12 16
20 23 26 29 45
Data: [10.0, 12.0, 16.0, 20.0, 23.0, 26.0, 29.0,
45.0]
Variance: 109.484375
Standard Deviation: 10.463478150213723
Footer 2
[Link] is an example of applying the K-Nearest Neighbors (KNN)
algorithm
for both classification and regression using Python. We'll use the popular
scikit-learn library and some sample datasets to illustrate the concepts.
PROGRAM CODE:
# Import necessary libraries
import numpy as np
from [Link] import load_iris, make_regression
from sklearn.model_selection import train_test_split
from [Link] import KNeighborsClassifier, KNeighborsRegressor
from [Link] import accuracy_score, mean_squared_error
# ---------------- KNN for Classification ---------------- #
# Load the Iris dataset for classification
iris = load_iris()
X_classification = [Link]
y_classification = [Link]
# Split the dataset into training and testing sets
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(
X_classification, y_classification, test_size=0.3, random_state=42
)
# Initialize the KNN classifier with k=3
knn_classifier = KNeighborsClassifier(n_neighbors=3)
# Train the model
knn_classifier.fit(X_train_c, y_train_c)
# Predict on the test set
y_pred_c = knn_classifier.predict(X_test_c)
# Calculate accuracy
accuracy = accuracy_score(y_test_c, y_pred_c)
print("Classification Results:")
print(f"Accuracy: {accuracy * 100:.2f}%")
# ---------------- KNN for Regression ---------------- #
# Create a synthetic dataset for regression
X_regression, y_regression = make_regression(n_samples=200, n_features=1, noise=10,
random_state=42)
# Split the dataset into training and testing sets
X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(
Footer 3
X_regression, y_regression, test_size=0.3, random_state=42
)
# Initialize the KNN regressor with k=3
knn_regressor = KNeighborsRegressor(n_neighbors=3)
# Train the model
knn_regressor.fit(X_train_r, y_train_r)
# Predict on the test set
y_pred_r = knn_regressor.predict(X_test_r)
# Calculate mean squared error
mse = mean_squared_error(y_test_r, y_pred_r)
print("\nRegression Results:")
print(f"Mean Squared Error: {mse:.2f}")
OUTPUT:
Classification Results:
Accuracy: 100.00%
Regression Results:
Mean Squared Error: 269.01
Footer 4
[Link]’s a Python program to demonstrate the Decision Tree Algorithm for a
classification problem using the Iris dataset. The program also includes parameter
tuning using Grid Search for better results.
import numpy as np
from [Link] import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from [Link] import DecisionTreeClassifier,plot_tree
from [Link] import accuracy_score,classification_report
import [Link] as plt
iris = load_iris()
X = [Link]
y = [Link]
X_train,X_test,y_train,y_test=train_test_split(
X,y,test_size=0.3,random_state=42
)
dt_classifier = DecisionTreeClassifier(random_state=42)
dt_classifier.fit(X_train,y_train)
y_pred=dt_classifier.predict(X_test)
accuracy = accuracy_score(y_test,y_pred)
print("Decision Tree Classification Results(Default Parameters):")
print(f"Accuracy:{accuracy * 100:.2f}%")
print("\nClassification Report:")
print(classification_report(y_test,y_pred))
[Link](figsize=(15,10))
plot_tree(dt_classifier,filled=
True,feature_names=iris.feature_names,class_names=iris.target_names)
[Link]("Decision Tree Visualization")
[Link]()
param_grid={
"criterion":["gini","entropy"],
"max_depth":[None,3,5,10],
"min_samples_split":[2,5,10],
"min_samples_leaf":[1,2,4],
}
grid_search = GridSearchCV(estimator=dt_classifier, param_grid=param_grid, cv=5,
scoring='accuracy')
Footer 5
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_
y_pred_tuned = best_model.predict(X_test)
accuracy_tuned = accuracy_score(y_test,y_pred_tuned)
print("\nDecision Tree Classification Results(Tuned Parameters):")
print(f"Accuracy:{accuracy_tuned * 100:.2f}%")
print(f"Best Parameters: {best_params}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred_tuned))
[Link](figsize=(15, 10))
plot_tree(best_model, filled=True, feature_names=iris.feature_names,
class_names=iris.target_names) # Used best_model here
[Link]("Tuned Decision Tree Visualization")
[Link]()
OUTPUT:
Decision Tree Classification Results(Default Parameters):
Accuracy:100.00%
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 19
1 1.00 1.00 1.00 13
2 1.00 1.00 1.00 13
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
Footer 6
Decision Tree Classification Results(Tuned Parameters):
Accuracy:100.00%
Best Parameters: {'criterion': 'gini', 'max_depth': None, 'min_samples_leaf': 1,
'min_samples_split': 10}
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 19
1 1.00 1.00 1.00 13
2 1.00 1.00 1.00 13
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
Footer 7
Footer 8
[Link]’s an example of using the Decision Tree algorithm for regression in Python. We'll
use a synthetic regression dataset and evaluate the model's performance based on
metrics such as Mean Squared Error (MSE) and R² score.
import numpy as np
import [Link] as plt
from [Link] import make_regression
from sklearn.model_selection import train_test_split
from [Link] import DecisionTreeRegressor, plot_tree
from [Link] import mean_squared_error, r2_score
X, y = make_regression(n_samples=200, n_features=1, noise=15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
dt_regressor = DecisionTreeRegressor(random_state=42)
dt_regressor.fit(X_train, y_train)
y_pred = dt_regressor.predict(X_test)\
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Decision Tree Regression Results:")
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"R² Score: {r2:.2f}")
[Link](figsize=(12, 8))
plot_tree(dt_regressor, filled=True, feature_names=["Feature"], rounded=True)
[Link]("Decision Tree Visualization")
[Link]()
[Link](figsize=(8, 6))
[Link](X_test, y_test, color="blue", label="Actual Values")
[Link](X_test, y_pred, color="red", label="Predicted Values")
[Link]("Decision Tree Regression: Predictions vs Actual Values")
[Link]("Feature")
[Link]("Target")
[Link]()
[Link]()
OUTPUT:
Decision Tree Regression Results:
Mean Squared Error (MSE): 527.88
R² Score: 0.94
Footer 9
Footer 10
6. Here’s a demonstration of the Naïve Bayes Classification algorithm using Python.
We'll
use the Gaussian Naïve Bayes model from sklearn and apply it to the Iris dataset to
classify different species of flowers.
from [Link] import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from [Link] import accuracy_score, classification_report
iris = load_iris()
X, y = [Link], [Link]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)
y_pred = nb_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred, target_names=iris.target_names)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", report)
OUTPUT:
Accuracy: 1.00
Classification Report:
precision recall f1-score
support
setosa 1.00 1.00 1.00 10
versicolor 1.00 1.00 1.00 9
virginica 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
Footer 11