Date:
Experiment No. 1
Aim:
To implement Linear Regression.
Theory:
Linear regression uses the relationship between the data-points to draw a straight line through all
them.
This line can be used to predict future values.
CODE:
1. import numpy as nmp
2. import [Link] as mtplt
50. plot_regression_line(p, q, b)
51.
52. if __name__ == "__main__":
53. main()
OUTPUT:
CONCLUSION:
Linear regression is a statistical technique to describe relationships between dependent variables
with a number of independent variables. This tutorial will discuss the basic concepts of linear
regression as well as its application within Python.
In order to give an understanding of the basics of the concept of linear regression, we begin with
the most basic form of linear regression, i.e., "Simple linear regression".
**
OUTPUT:
ACCURACY : 95.61%
LOGISTIC REGRESSION MODEL : 2.63%
CONCLUSION:
We learned how to implement your custom binary logistic regression model in Python while
understanding the underlying math. You saw how similar the logistic regression model can be
to a simple neural network.
**
Date:
Experiment No. 3
Aim:
To implement Ensemble learning (bagging/boosting)
Theory:
Bootstrap Aggregating, also known as bagging, is a machine learning ensemble meta-algorithm designed
to improve the stability and accuracy of machine learning algorithms used in statistical classification and
regression. It decreases the variance and helps to avoid overfitting. It is usually applied to decision tree
methods. Bagging is a special case of the model averaging approach.
Implementation Steps of Bagging
Step 1: Multiple subsets are created from the original data set with equal tuples, selecting
observations with replacement.
Step 2: A base model is created on each of these subsets.
Step 3: Each model is learned in parallel with each training set and independent of each other.
Step 4: The final predictions are determined by combining the predictions from all the models.
CODE:
# evaluate bagging algorithm for classification
from numpy import mean
from numpy import std
from [Link] import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from [Link] import BaggingClassifier
# define dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5,
random_state=5)
# define the model
model = BaggingClassifier()
# evaluate the model
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')
# report performance
print('Accuracy: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))
OUTPUT:
RAISE, SCORES
**
Date:
Experiment No. 4
Aim:
To implement multivariate Linear Regression
Theory:
Step 1: Select the features. First, you need to select that one feature that drives the
multivariate regression. ...
Step 2: Normalize the feature. ...
Step 3: Select loss function and formulate a hypothesis. ...
Step 4: Minimize the cost and loss function. ...
Step 5: Test the hypothesis
CODE:
import numpy as np
import [Link] as sm
y = [1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4]
x=[
[4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5],
[4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5],
[4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4]
]
def reg_m(y, x):
ones = [Link](len(x[0]))
X = sm.add_constant(np.column_stack((x[0], ones)))
for ele in x[1:]:
X = sm.add_constant(np.column_stack((ele, X)))
results = [Link](y, X).fit()
return results
**
Date:
Experiment No. 5
Aim:
To implement SVM
Theory:
Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or
nonlinear classification, regression, and even outlier detection tasks. SVMs can be used for a
variety of tasks, such as text classification, image classification, spam detection, handwriting
identification, gene expression analysis, face detection, and anomaly detection. SVMs are
adaptable and efficient in a variety of applications because they can manage high-dimensional
data and nonlinear relationships.
SVM algorithms are very effective as we try to find the maximum separating hyperplane
between the different classes available in the target feature.
**
Date:
EXPERIMENT NO. 06
AIM:
To implement Graph Based Clustering
THEORY:
1. Click the counts data node.
2. Click the Exploratory analysis section of the toolbox.
3. Click Graph-based clustering.
4. Configure the parameters.
5. Click Finish to run.
CODE:
import networkx as nx
G = [Link]()
G.add_edges_from([('A', 'B'), ('A', 'K'), ('B', 'K'), ('A', 'C'),
('B', 'C'), ('C', 'F'), ('F', 'G'), ('C', 'E'),
('E', 'F'), ('E', 'D'), ('E', 'H'), ('I', 'J')])
print([Link](G, 'C'))
**
[Link](figsize =(9, 9))
[Link](X_principal['P1'], X_principal['P2'], c = cvec)
[Link]((r, g, b, c, y, m, k),
('Label 0', 'Label 1', 'Label 2', 'Label 3 'Label 4',
'Label 5', 'Label -1'),
scatterpoints = 1,
loc ='upper left',
ncol = 3,
fontsize = 8)
[Link]()
OUTPUT:
BEFORE
AFTER
CONCLUSION:
Density Based Spatial Clustering of Applications with Noise(DBCSAN) is a clustering
algorithm which was proposed in 1996. In 2014, the algorithm was awarded the ‘Test of
Time’ award at the leading Data Mining conference, KDD.
Dataset – Credit Card
**
Date:
EXPERIMENT NO. 08
AIM:
To implement CART
THEORY:
CART( Classification And Regression Trees) is a variation of the decision tree
algorithm. It can handle both classification and regression tasks. Scikit-Learn uses the
Classification And Regression Tree (CART) algorithm to train Decision Trees (also called
“growing” trees). CART was first produced by Leo Breiman, Jerome Friedman, Richard
Olshen, and Charles Stone in 1984.
CART(Classification And Regression Tree) for Decision Tree
CART is a predictive algorithm used in Machine learning and it explains how the target
variable’s values can be predicted based on other matters. It is a decision tree where each fork
is split into a predictor variable and each node has a prediction for the target variable at the end.
The term CART serves as a generic term for the following categories of decision trees:
Classification Trees: The tree is used to determine which “class” the target variable is most
likely to fall into when it is continuous.
Regression trees: These are used to predict a continuous variable’s value.
CODE:
import numpy as np
import pandas as pd
iris = load_iris()
X = [Link]
y = [Link]
iris_df = [Link](data=[Link], columns=iris.feature_names)
iris_df['species'] = iris.target_names[[Link]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
[Link](X_train, y_train)
y_pred = [Link](X_test)
print("Accuracy Score:", accuracy_score(y_test, y_pred))
print("Classification Report:")
print(classification_report(y_test,y_pred, target_names=iris.target_names))
[Link](figsize=(20,10))
tree.plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names,
filled=True)
[Link]()
**
Date:
EXPERIMENT NO. 09
AIM:
To implement LDA
THEORY:
Linear Discriminant Analysis (LDA), also known as Normal Discriminant Analysis or
Discriminant Function Analysis, is a dimensionality reduction technique primarily utilized
in supervised classification problems. It facilitates the modeling of distinctions between
groups, effectively separating two or more classes. LDA operates by projecting features
from a higher-dimensional space into a lower-dimensional one. In machine learning, LDA
serves as a supervised learning algorithm specifically designed for classification tasks,
aiming to identify a linear combination of features that optimally segregates classes within
a dataset.
For example, we have two classes and we need to separate them efficiently. Classes can
have multiple features. Using only a single feature to classify them may result in some
overlapping as shown in the below figure. So, we will keep on increasing the number of
features for proper classification.
CONCLUSION:
Linear discriminant analysis (LDA), also known as normal discriminant analysis (NDA) or
discriminant function analysis (DFA), builds on Fisher's linear discriminant, a statistical approach
pioneered by Sir Ronald Fisher. It is a dimensionality reduction technique that is used
in supervised machine learning.
The primary function of LDA is to project high-dimensional data on to a lower-dimensional space
while retaining the data's inherent class separability. LDA can be applied to enhance the operation
of classification algorithms such as a decision tree or random forest.
**
Date:
EXPERIMENT NO. 10
AIM:
To implement PCA/SVD/LDA
THEORY:
PCA is suitable for unsupervised dimensionality reduction, LDA is effective for supervised
problems with a focus on class separability, and SVD is versatile, catering to various
applications including collaborative filtering and matrix factorization.
CODE:
import pandas as pd
iris = datasets.load_iris()
df = [Link](iris['data'], columns = iris['feature_names'])
[Link]()
scalar = StandardScaler()
scaled_data = [Link](scalar.fit_transform(df)) #scaling the data
scaled_data
[Link](scaled_data.corr())
pca = PCA(n_components = 3)
[Link](scaled_data)
data_pca = [Link](scaled_data)
data_pca = [Link](data_pca,columns=['PC1','PC2','PC3'])
data_pca.head()
[Link](data_pca.corr())
**