0% found this document useful (0 votes)

6 views106 pages

Unit 2 Data Analytics

The document discusses regression modeling and its applications in data analytics, focusing on the relationship between dependent and independent variables for forecasting and causal analysis. It covers key concepts such as linear regression, multivariate analysis, Bayesian statistics, and time series analysis, providing examples and methodologies for predicting outcomes based on data. Additionally, it highlights the importance of regression coefficients and the coefficient of determination in evaluating model performance.

Uploaded by

immahima169

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views106 pages

Unit 2 Data Analytics

Uploaded by

immahima169

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unit-2

Data Analytics
BIT-601
Ashish Tripathi
Assistant Professor
Department of Information Technology
Pranveer Singh Institute of Technology

ASHISH TRIPATHI
ASHISH TRIPATHI
Prerequisite:

ASHISH TRIPATHI
Regression Modelling
Regression analysis is a form of predictive modelling technique which investigates the relationship between
a dependent (target) and independent variable (s) (predictor). This technique is used for forecasting, time series modelling
and finding the causal effect relationship between the variables.
For example, relationship between rash driving and number of road accidents by a driver is best studied through regression.
Regression analysis is an important tool for modelling and analyzing data. Here, we fit a curve / line to the data points, in such
a manner that the differences between the distances of data points from the curve or line is minimized

In other words “regression is a statistical technique to determine the linear relationship between two or more
variables.”
It involves the analysis of two variables (often denoted as X, Y), for the purpose of determining the empirical relationship
between them) form, regression shows the relationship between one independent variable (X) and a dependent variable (Y),
as in the formula below:
Y = β0+β1X+ µ
• The magnitude and direction of that relation are given by the slope parameter (β1)
• The status of the dependent variable when the independent variable is absent is given by the intercept parameter (β 0).
• An error term (µ) captures the amount of variation not predicted by the slope and intercept terms.
• The regression coefficient (R2) shows how well the values fit the
ASHISH data.
TRIPATHI
ASHISH TRIPATHI
ASHISH TRIPATHI
Terminologies Related to the Regression Analysis
• Dependent Variable: The main factor in Regression analysis which we want to predict or understand is called the
dependent variable. It is also called target, Response, Regressand, Predicted and Output variable.
• Independent Variable: The factors which affect the dependent variables or which are used to predict the values of the
dependent variables are called independent variable, also called as a Predictor, Regressor, Exploratory, Input variable.
• Outliers: Outlier is an observation which contains either very low value or very high value in comparison to other
observed values. An outlier may hamper the result, so it should be avoided.
• Multi-collinearity: If the independent variables are highly correlated with each other than other variables, then such
condition is called Multi-collinearity. It should not be present in the dataset, because it creates problem while ranking the
most affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the training dataset but not well with test dataset, then
such problem is called Overfitting. And if our algorithm does not perform well even with training dataset, then such
problem is called underfitting.

ASHISH TRIPATHI
Regression models
Regression models are two types: Simple regression model and multiple regression model. Both are divided into linear and
nonlinear models

Linear Regression model

Linear regression is a statistical procedure for predicting the value of
a dependent variable from an independent variable when the
relationship between the variables can be described with a linear
model.
A linear regression equation can be written as Yp= mX + b, where Yp
is the predicted value of the dependent variable, m is the slope of the
ASHISH TRIPATHI
regression line, and b is the Y-intercept of the regression line.
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data.

The interpretation of slope (β1) highlights how Y changes by β1 for each 1 unit increase in X.

For Example, if β1 = 2, then Y is expected to increase by 2 for each 1 unit increase in X.

Similarly, the Y-Intercept (β0) highlights the average value of Y when X = 0, for example if β0 = 4, then average Y is expected to be 4

when X is 0.

ASHISH TRIPATHI
ASHISH TRIPATHI
Example: Consider the following data about food intake by cows and milk yield collected from a cattle farm:
What is the relationship between cows’ food intake and milk yield?
Food (kg) Milk Yield (Ltrs)
4 3.0
6 5.5
10 6.5
12 9.0

Solution: The scatter diagram for the above mentioned data is

ASHISH TRIPATHI
Advantages:
• Can be used to predict the future: By using the relevant model to a data set, Regression Analysis can accurately
predict a lot of useful information like Stock Prices, Medical Conditions and even Sentiments of the public
• Can be used to back major decisions and policies: Results from regression analysis adds a scientific backing to a
decision or policy and makes it even more reliable as it likelihood of success is then high.
• Can correct an error in thinking or disabuse: Sometimes, an anomaly between the prediction of regression analysis
and a decision/thinking can help correct the fallacy of the decision.
• Provides a new perspective: Large data sets realize their potential to provide new dimensions to a study through the
application of Regression Analysis.

ASHISH TRIPATHI
Coefficient of Determination
The coefficient of determination (denoted by R2) is a key output of regression
analysis. It is interpreted as the proportion of the variance in the dependent variable
that is predictable from the independent variable.
• The coefficient of determination is the square of the correlation (R) between
predicted y scores and actual y scores; thus, it ranges from 0 to 1.
• An R2 of 0 means that the dependent variable cannot be predicted from the
independent variable.
• An R2 of 1 means the dependent variable can be predicted without error from the
independent variable.
• An R2 between 0 and 1 indicates the extent to which the dependent variable is
predictable. An R2 of 0.10 means that 10 percent of the variance in Y is
predictable from X; an R2 of 0.20 means that 20 percent is predictable; and so
on.
ASHISH TRIPATHI
Multivariate Analysis

ASHISH TRIPATHI
Multivariate Analysis
Multivariate analysis is essentially the statistical process of simultaneously analyzing multiple independent (or
predictor) variables with multiple dependent (outcome or criterion) variables. Multivariate analysis (MVA) can help
summarize the data and also can reduce the chance of obtaining spurious results.
Multiple linear regression (MLR) aims to quantify the degree of linear association between one response variable
and several explanatory variables. It also refers to set of techniques for studying the straight-line relationships
among two or more variables.

The general MLR equation. A response variable (y; known as the regressand) is predicted by a number of
explanatory variables (x1, x2 ... xn; the regressors). The strength of each regressor effect on the response variable is
determined by the regression coefficients β1 ... βn.

ASHISH TRIPATHI
Multivariate Analysis methods
There are two general types of Multivariate analysis as given below:
a) Analysis of dependence:- If the variables are dependent on others, they are called analysis of dependence.
i.e., a category of multivariate statistical techniques; dependence methods explain or predict a dependent
variable(s) on the basis of two or more independent variables
E.g. Multiple and Partial Least Square(PLS) regression, Multiple Discriminant Analysis(MDA)

Variance analysis: Determines the influence of several or individual variables on groups by calculating statistical
averages. Here you can compare variables within a group as well as different groups, depending on where
deviations are to be assumed. For example: Which groups most often click on the' Buy Now' button in
your shopping cart?

Discriminant analysis: Used in the context of variance analysis to differentiate between groups that can be
described by similar or identical characteristics. Multiple discriminant analysis (MDA), also known as canonical
variates analysis (CVA) or canonical discriminant analysis (CDA), constructs functions to maximally discriminate
between n groups of objects. For example, by whichASHISH
variables
TRIPATHI do different groups of buyers differ?
b) Analysis of interdependence:- If the variables are not dependent on others, they are called analysis of
[Link], a category of multivariate statistical techniques; interdependence methods give meaning
to a set of variables or seek to group things together
E.g. Cluster analysis, factor analysis

Factor analysis: Reduces the structure to relevant data and individual variables. Factor studies focus on
different variables, so they are further subdivided into main component analysis and correspondence analysis.
For example: Which website elements have the greatest influence on purchasing behavior?

Cluster analysis: Observations are graphically assigned to individual variable groups and classified on the
basis of these. The results are clusters and segments, such as the number of buyers of a particular product,
who are between 35 and 47 years old and have a high income.

ASHISH TRIPATHI
Question: Apply regression analysis to predict the salary of an employee having 5 years of experience, based on their salary with respect to
years of experience, find equation of line.
Years of Experience Salary
1 30000
2 35000
3 45000
4 50000
5 X

n = 4 (number of data points)

X Y XY X² Y²
1 30000 30000 1 900,000,000 Means:
2 35000 70000 4 1,225,000,000
3 45000 135000 9 2,025,000,000
4 50000 200000 16 2,500,000,000
ΣY² =
ΣX = 10 ΣY = 160,000 ΣXY = 435,000 ΣX² = 30
6,650,000,000

ASHISH TRIPATHI
Slope (β₁): Intercept (β₀):

Predict Salary for 5 Years Experience Verification

ASHISH TRIPATHI
Question: Apply regression modelling to calculate the linear regression equation of the form: Y=β0+β1X. also predict the
price of a house that is 150 sq m. based on the following data:
House Size (X) (sq m) Price (Y) (100000s)
1 30 20
2 50 30
3 70 40
4 100 60
5 120 70

ASHISH TRIPATHI
Bayesian Modeling

ASHISH TRIPATHI
Bayesian Statistics

“Bayesian statistics is a mathematical procedure that applies

probabilities to statistical problems. It provides people the tools

to update their beliefs in the evidence of new data.”

ASHISH TRIPATHI
ASHISH TRIPATHI
ASHISH TRIPATHI
BAYESIAN INFERENCE & NETWORK

ASHISH TRIPATHI
Bayesian inference derives the posterior probability as a consequence of two antecedents: a prior probability and
a "likelihood function" derived from a statistical model for the observed data. Bayesian inference computes the
posterior probability according to Bayes' theorem:

ASHISH TRIPATHI
Question: A clinic uses a Bayesian network to diagnose a rare disease.

If a patient tests positive, what is the probability they actually have the disease?
𝑃 𝐷𝑖𝑠𝑒𝑎𝑠𝑒 ∣ 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 =?

Apply Bayes' Theorem

𝑃 +∣ 𝐷 × 𝑃 𝐷
𝑃 𝐷∣+ =
𝑃 +

Calculate 𝑃 + using Law of Total Probability Substitute into Bayes' Theorem

𝑃 + = 𝑃 +∣ 𝐷 𝑃 𝐷 + 𝑃 +∣ 𝐷𝑐 𝑃 𝐷𝑐 0.0095
𝑃 𝐷∣+ = = 0.1610
𝑃 + = 0.95 × 0.01 + 0.05 × 0.99 0.059
𝑃 + = 0.0095 + 0.0495 = 0.059

ASHISH TRIPATHI
Question: A security system at an airport uses two sensors: Metal Detector and Bag Scanner.

If both sensors alarm (MetalDetector=True and BagScanner=True), what is the probability that the passenger poses
a threat?

ASHISH TRIPATHI
Find: 𝑃(T | M,B)
T P(M|T) P(B|T)
Y 0.95 0.90
N 0.02 0.03

P(T)=0.001 (Threat)

ASHISH TRIPATHI
ASHISH TRIPATHI
Bayesian belief network is key computer technology for dealing with probabilistic events and
to solve a problem which has uncertainty. We can define a Bayesian network as:

"A Bayesian network is a probabilistic graphical model which represents a set of variables and
their conditional dependencies using a directed acyclic graph.“

It is also called a Bayes network, belief network, decision network, or Bayesian

model.

Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.

It consists of two parts:

•Directed Acyclic Graph
•Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision problems under
uncertain knowledge is known as an Influence diagram.

ASHISH TRIPATHI
A Bayesian network graph is made up of nodes and Arcs (directed links), where:

•Each node corresponds to the random variables, and a variable can be continuous or discrete.
•Arc or directed arrows represent the causal relationship or conditional probabilities between
random variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link
that means that nodes are independent with each other
• In the above diagram, A, B, C, and D are random variables represented by the nodes
of the network graph.
• If we are considering node B, which is connected with node A by a directed arrow,
then node A is called the parent of Node B.
• Node C is independent of node A. ASHISH TRIPATHI
ASHISH TRIPATHI
Analysis of Time Series

ASHISH TRIPATHI
Time series analysis comprises methods for analyzing time series data in order to extract meaningful
statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future
values based on previously observed values.

Time series is a sequence of observations of categorical or numeric variables indexed by a date, or

timestamp. A clear example of time series data is the time series of a stock price.

In the following table, we can see the basic structure of time series data. In this case the observations are
recorded every hour.

Timestamp Stock - Price

2020-10-11 09:00:00 100
2020-10-11 10:00:00 110
2020-10-11 11:00:00 105
2020-10-11 12:00:00 90
2020-10-11 13:00:00 120

ASHISH TRIPATHI
• Time-domain vs. Frequency-domain
– Time-domain approach: how does what happened today affect what will happen tomorrow? These approaches
view the investigation of lagged relationships as most important, e.g. autocorrelation analysis.
– Frequency-domain approach: what is the economic cycle through periods of expansion and recession? These
approaches view the investigation of cycles as most important, e.g. spectral analysis and wavelet analysis

Types of Time Series

• univariate vs. multivariate
A time series containing records of a single variable is termed as univariate, but if records of more than one
variable are considered then it is termed as multivariate.
• linear vs. non-linear
A time series model is said to be linear or non-linear depending on whether the current value of the series is
a linear or non-linear function of past observations.
• discrete vs. continuous
In a continuous time series observations are measured at every instance of time, whereas a discrete time
ASHISHpoints
series contains observations measured at discrete TRIPATHI in time.
Components of Time Series

Autocorrelation
Informally, autocorrelation is the similarity between observations as a function of the time lag between them.
Autocorrelation represents the degree of similarity between a given time series and a lagged version of itself over
successive time intervals. Autocorrelation measures the relationship between a variable's current value and its past
values. An autocorrelation of +1 represents a perfect positive correlation, while an autocorrelation of negative -1
represents a perfect negative correlation. It is also known as serial correlation. For example, the temperatures on different
days in a month are auto correlated.

The 1st value and the 24th value have a high

autocorrelation.
Similarly, the 12th and 36th observations are
highly correlated. This means that we will find a
very similar value at every 24 unit of time.
ASHISH TRIPATHI
Seasonality
Seasonality refers to fixed, periodic fluctuations. For example, electricity consumption
is high during the day and low during night, or online sales increase during Christmas
before slowing down again.
Stationarity
Stationarity is an important characteristic of time series. A time
Trends series is said to be stationary if its statistical properties do not
Trends is the long term movement of data over change over time. In other words, it has constant mean and
time. It denotes whether the observation values variance, and covariance is independent of time.
are increasing or decreasing.

ASHISH TRIPATHI
Rule Induction

ASHISH TRIPATHI
Rule induction is an area of machine learning in which formal rules are extracted from a set of
observations. The rules extracted may represent a full scientific model of the data, or merely represent
local patterns in the data.

Usually rules are expressions of the form

if (attribute − 1, value − 1) and (attribute − 2,

value − 2) and ··· and (attribute − n, value − n)
then (decision, value).

• It is a data mining process of deducing if-then rules from a dataset.

• It represent an inherent relationship between the attributes and class labels in the dataset.
• It is used in predictive analytics by classification of unknown data
• It describe the patterns of data
• It can be done by constructing the decision tree from same dataset.

ASHISH TRIPATHI
Let us consider a rule R1,
R1: IF age = youth AND student = yes
THEN buy_computer = yes

•The IF part of the rule is called rule antecedent or precondition.

•The THEN part of the rule is called rule consequent.
•The antecedent part the condition consist of one or more attribute tests and these tests are logically ANDed.
•The consequent part consists of class prediction.

We can also write rule R1 as follows −

R1: (age = youth) ^ (student = yes)) (buys computer = yes)

If the condition holds true for a given tuple, then the antecedent is satisfied.

ASHISH TRIPATHI
Rule Extraction
To extract a rule from a decision tree −
•One rule is created for each path from the root to the leaf node.
•To form a rule antecedent, each splitting criterion is logically ANDed.
•The leaf node holds the class prediction, forming the rule consequent.

Rule Induction Using Sequential Covering Algorithm

Sequential Covering Algorithm can be used to extract IF-THEN rules form the training data. We do not require to
generate a decision tree first. In this algorithm, each rule for a given class covers many of the tuples of that class.
Some of the sequential Covering Algorithms are AQ, CN2, and RIPPER. As per the general strategy the rules are
learned one at a time. For each time rules are learned, a tuple covered by the rule is removed and the process
continues for the rest of the tuples. This is because the path to each leaf in a decision tree corresponds to a rule.

The Following is the sequential learning Algorithm where rules are learned for one class at a time. When
learning a rule from a class Ci, we want the rule to cover all the tuples from class C only and no tuple form any
other class.

ASHISH TRIPATHI
ASHISH TRIPATHI
Steps in algorithm:
1. Class Selection:
• A class is selected one by one and all the rules of that class are marked.
2. Rule Development:
• Here a single rule considering all the points in data is created
• Then the points or tuples covering that rule is deleted for next iteration
3. Rule Accuracy:
• Next the accuracy of the single rule is measured as
𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑟𝑒𝑐𝑜𝑟𝑑𝑠 𝑏𝑦 𝑟𝑢𝑙𝑒
• Rule Accuracy = 𝐴𝑙𝑙 𝑟𝑒𝑐𝑜𝑟𝑑𝑠 𝑐𝑜𝑣𝑒𝑟𝑒𝑑 𝑏𝑦 𝑡ℎ𝑒 𝑟𝑢𝑙𝑒
4. Next rule:
• After going through all the points of a particular rule these data points are deleted and rule is
appended to the rule set. Then the next rule data points are taken and iteration is repeated.
5. Pruning the rule set:
𝑝−𝑛
• After the rule set is developed, it is pruned according to 𝑝+𝑛
• Here p is number of positive records covered by the rule and n is number of negative records
covered by the rule.

ASHISH TRIPATHI
Support Vector and Kernel Methods

ASHISH TRIPATHI
Supervised learning (SL)
“Machine Learning is the field of Supervised learning is the machine learning task of learning a
function that maps an input to an output based on example
study that gives computers the
input-output pairs. It infers a function from labeled training
ability to learn without data consisting of a set of training examples.
This is a situation where you put in some data you already have
being explicitly programmed.”
answers to, for example, to predict if a dog is a particular
breed, we load in millions of dog information/properties like
type, height, skin color, body hair length etc.
In ML lingo, these properties are referred to as ‘features’. A
single entry of these list of features is a data instance while the
collection of everything is the Training Data which forms the
basis of our prediction i.e. if you know the skin color, body hair
length, height and so on of a particular dog, then you can
predict the breed it will probably belong to.

Support Vector Machine or SVM are supervised machine learning models

with associated learning algorithms that analyze data for classification(
classifications means knowing what belong to what e.g. ‘apple’ belongs to
class ‘fruit’ while ‘dog’ to class ‘animals’). After giving an SVM model sets
of labeled training data for each category, they’re able to categorize new
ASHISH TRIPATHI
text.
• SVM is a classifier formally defined by a separating hyperplane.
• An hyperplane is a subspace of one dimension less than its ambient space.
The dimension of a mathematical space (or object) is informally defined as the
minimum number of coordinates (x, y, z axis) needed to specify any point (like each
blue and red point) within it while an ambient space is the space surrounding a
mathematical object.
• A mathematical object is an abstract object arising in mathematics
• An abstract object is an object which does not exist at any particular time or place,
but rather exists as a type of thing, i.e., an idea, or abstraction

ASHISH TRIPATHI
Basically, SVM finds a hyper-plane that creates a boundary between the types of data. In 2-dimensional space,
this hyper-plane is nothing but a line.
In SVM, we plot each data item in the dataset in an N-dimensional space, where N is the number of
features/attributes in the data. Next, find the optimal hyperplane to separate the data.

ASHISH TRIPATHI
Usually a learning algorithm tries to learn the most common characteristics (what differentiates one class from
another) of a class and the classification is based on those representative characteristics learnt (so classification is based
on differences between classes). The SVM works in the other way around. It finds the most similar examples between
classes. Those will be the support vectors.

As an example, lets consider two classes, apples and lemons. If we visualize the example above in 2D, we will have
something like this:

As we go from left to right, all the examples will be classified as

apples until we reach the yellow apple. From this point, the
confidence that a new example is an apple drops while the
lemon class confidence increases. When the lemon class
confidence becomes greater than the apple class confidence,
the new examples will be classified as lemons (somewhere
between the yellow apple and the green lemon).

ASHISH TRIPATHI
Based on these support vectors, the algorithm tries to find the best hyperplane that separates the classes. In 2D the
hyperplane is a line, so it would look like this:

we have an infinite number of possibilities to draw the

decision boundary.

Finding the Optimal Hyperplane

Intuitively the best line is the line that is far away from both
apple and lemon examples (has the largest margin). To have
optimal solution, we have to maximize the margin in both
ways

Each of the calculations (calculate distance and optimal

hyperplanes) are made in vectorial space, so each data point is
considered a vector. All in all, support vectors are data points
that defines the position and the margin of the hyperplane.
We call them “support” vectors, because these are the
representative data points of the classes, if we move one of
them, the position and/or the margin will change.
ASHISH TRIPATHI
Basic Steps
The basic steps of the SVM are:
[Link] two hyperplanes (in 2D) which separates the data with no points between them (red lines)
[Link] their distance (the margin)
[Link] average line (here the line half way between the two red lines) will be the decision boundary

ASHISH TRIPATHI
So why Kernels?

The red and blue balls cannot be separated by a straight line as they
are randomly distributed and this, in reality, is how most real life
problem data are -randomly distributed.

Kernels or kernel methods (also called Kernel functions) are sets of different types of algorithms that are
being used for pattern analysis. They are used to solve a non-linear problem by using a linear classifier.
Kernels Methods are employed in SVM (Support Vector Machines) which are used in classification and
regression problems. The SVM uses what is called a “Kernel Trick” where the data is transformed and an
optimal boundary is found for the possible outputs.

ASHISH TRIPATHI
ASHISH TRIPATHI
In order to get a mathematical understanding of kernel, let us understand the Lili Jiang’s equation of
kernel which is:
K(x, y)=<f(x), f(y)> where,

• K is the kernel function,

• X and Y are the dimensional inputs,
• f is the map from n-dimensional to m-dimensional space and, < x, y > is the dot product.
Illustration with the help of an example.
Let us say that we have two points, x= (2, 3, 4) and y= (3, 4, 5)
As we have seen, K(x, y) = < f(x), f(y) >.
Let us first calculate < f(x), f(y) >
f(x) = (x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)
f(y) = (y1y1, y1y2, y1y3, y2y1, y2y2, y2y3, y3y1, y3y2, y3y3)
so,
f(2, 3, 4) = (4, 6, 8, 6, 9, 12, 8, 12, 16) and
f(3 ,4, 5) = (9, 12, 15, 12, 16, 20, 15, 20, 25)
so the dot product,
f (x). f (y) = f(2,3,4) . f(3,4,5) = (36 + 72 + 120 + 72 +144 + 240 + 120 + 240 + 400) = 1444
And,
K(x, y) = (2*3 + 3*4 + 4*5)2 = (6 + 12 + 20)2 = 38*38 = 1444.
This as we find out, f(x).f(y) and K(x, y) give us the same result, but the former method required a lot of
calculations(because of projecting 3 dimensions into 9 dimensions) while using the kernel, it was much easier.
ASHISH TRIPATHI
Important Parameters in Kernelized SVC ( Support Vector Classifier)

The Kernel :
• The kernel, is selected based on the type of data and also the type of transformation. By default,
the kernel is Radial Basis Function Kernel (RBF).
Gamma :
• This parameter decides how far the influence of a single training example reaches during
transformation, which in turn affects how tightly the decision boundaries end up surrounding
points in the input space. If there is a small value of gamma, points farther apart are considered
similar. So more points are grouped together and have smoother decision boundaries (may be
less accurate). Larger values of gamma cause points to be closer together (may cause overfitting).
The ‘C’ parameter :
• This parameter controls the amount of regularization applied on the data. Large values of C mean
low regularization which in turn causes the training data to fit very well (may cause overfitting).
Lower values of C mean higher regularization which causes the model to be more tolerant of
errors (may lead to lower accuracy).

ASHISH TRIPATHI
Tuning Parameters

Regularization
The Regularization Parameter (in python it’s called C) tells the SVM optimization how much you want to avoid miss
classifying each training example.
If the C is higher, the optimization will choose smaller margin hyperplane, so training data miss classification rate will be
lower.
On the other hand, if the C is low, then the margin will be big, even if there will be miss classified training data
examples. This is shown in the following two diagrams:

ASHISH TRIPATHI
Gamma
The next important parameter is Gamma. The gamma parameter defines how far the influence of a single training
example reaches. This means that high Gamma will consider only points close to the plausible hyperplane
and low Gamma will consider points at greater distance.

As you can see, decreasing the Gamma will result that finding
the correct hyperplane will consider points at greater distances
so more and more points will be used

ASHISH TRIPATHI
Margin
The last parameter is the margin. Higher margin results better model, so better classification (or prediction). The
margin should be always maximized.

Popular Use Cases

[Link] Classification
[Link] spam
[Link] analysis
[Link]-based recognition
[Link] digit recognition

ASHISH TRIPATHI
Pros of Kernelized SVM:
[Link] perform very well on a range of datasets, where number of dimensions is greater than the number of
samples.
[Link] are versatile : different kernel functions can be specified, or custom kernels can also be defined for
specific datatypes.
[Link] work well for both high and low dimensional data.
[Link] very sensitive to overfitting.
[Link] can have high accuracy (Even better than Neural Networks)

Cons of Kernelized SVM:

[Link] (running time and memory usage) decreases as size of training set increases.
[Link] careful normalization of input data and parameter tuning.
[Link] not provide direct probability estimator.
[Link] to interpret why a prediction was made.
ASHISH TRIPATHI
Neural Networks: Learning and Generalization

ASHISH TRIPATHI
Generalisation is the ability of a neural network to perform accurately on new, unseen data after learning from training
data. A model that generalises well has learned underlying patterns, not just memorised the training set.

Key Challenges
Overfitting: The model learns the training data too well, including its noise and outliers, performing poorly on new data.
This is analogous to memorising answers to practice questions without understanding the underlying principles .
Underfitting: The model fails to capture the underlying trend in the data, performing poorly even on the training set.

Core Concepts for Improving Generalisation

•Bias-Variance Trade-off: This is a fundamental trade-off in machine learning.

• Bias is the error from erroneous assumptions in the learning algorithm. High bias can cause underfitting.

• Variance is the error from sensitivity to fluctuations in the training set. High variance can cause overfitting.

• The goal is to find the optimal model complexity that minimises both .

•The Curse of Dimensionality: As the number of features (dimensions) increases, the data becomes sparse. This sparsity
makes it increasingly difficult for a model to generalise, as the training data becomes a less representative sample of the
overall space

ASHISH TRIPATHI
Regularisation Techniques

These are methods used to prevent overfitting and enhance generalisation .

•Lasso Regularisation (L1): Adds a penalty equal to the absolute value of the magnitude of coefficients. This can shrink
some coefficients to zero, effectively performing feature selection.

•Dropout: A technique where randomly selected neurons are ignored during training. This forces the network to learn
more robust features that are useful in conjunction with many different random subsets of the other neurons.

•Data Augmentation: Increasing the diversity of the training set by applying transformations (e.g., rotating or cropping
images) to create new, plausible training examples.
Supervised Learning Unsupervised Learning
Supervised learning algorithms are trained using labeled data. Unsupervised learning algorithms are trained using unlabeled data.
Supervised learning model takes direct feedback to check if it is
Unsupervised learning model does not take any feedback.
predicting correct output or not.
Supervised learning model predicts the output. Unsupervised learning model finds the hidden patterns in data.
In supervised learning, input data is provided to the model along with
In unsupervised learning, only input data is provided to the model.
the output.
The goal of supervised learning is to train the model so that it can The goal of unsupervised learning is to find the hidden patterns and
predict the output when it is given new data. useful insights from the unknown dataset.
Unsupervised learning does not need any supervision to train the
Supervised learning needs supervision to train the model.
model.
Supervised learning can be categorized in Classification and Regression Unsupervised Learning can be classified in Clustering and Associations
problems. problems.
Supervised learning can be used for those cases where we know the Unsupervised learning can be used for those cases where we have only
input as well as corresponding outputs. input data and no corresponding output data.
Unsupervised learning model may give less accurate result as compared
Supervised learning model produces an accurate result.
to supervised learning.
Supervised learning is not close to true Artificial intelligence as in this, Unsupervised learning is more close to the true Artificial Intelligence as
we first train the model for each data, and then only it can predict the it learns similarly as a child learns daily routine things by his
correct output. experiences.
It includes various algorithms such as Linear Regression, Logistic
It includes various algorithms such as Clustering, KNN, and Apriori
Regression, Support Vector Machine, Multi-class Classification, Decision
algorithm.
tree, Bayesian Logic, etc.
ASHISH TRIPATHI
Perceptron
A simple neural network architecture allows only a unidirectional forward connections among neurons and because of that it
is called feed-forward neural network. The simplest type of feed-forward neural network, the Perceptron consists of only one
layer of neural units connected with a set of n input terminals. The number of outputs is the same as the number of neural
units. It is a single artificial neuron that computes itself its weighted input and uses a threshold activation function. It is a
computer model or computerized machine devised to represent or simulate the ability of the brain to recognize and
discriminate. The most basic form of an activation function is a simple binary function that has only two possible results.

ASHISH TRIPATHI
Multi-Layer Perceptron
Networks with more than one layer of artificial neurons, where only forward connections from the input towards the
output are allowed are called Multi layer perceptron(MLP) or Multilayer feed forward neural networks.
• A multilayer perceptron is a feedforward neural network with one or more hidden layers.
• The network consists of an input layer of source neurons, at least one middle or hidden layer
of computational neurons, and an output layer of computational neurons.
• The input signals are propagated in a forward direction on a layer-by-layer basis.

ASHISH TRIPATHI
In a multilayer perceptron, bias b(n) is treated as a synaptic weight driven by fixed input equal to +1
𝑥 𝑛 = [+1, 𝑥1 𝑛 , 𝑥2 𝑛 ……… 𝑥𝑚 𝑛 ]T
Correspondingly, we define weight vectors as
w 𝑛 = [𝑏(𝑛), 𝑤1 𝑛 , 𝑤2 𝑛 ……… 𝑤𝑚 𝑛 ]T
Accordingly the linear combiner output is written in the compact form
𝑉 𝑛 = σ𝑚 𝑇
𝑖=0 𝑤𝑖 𝑛 𝑥𝑖 𝑛 = 𝑤 𝑛 𝑥(𝑛)

ASHISH TRIPATHI
Neural Networks: Competitive Learning

ASHISH TRIPATHI
Competitive learning

Competitive learning is an unsupervised learning paradigm where neurons in a network compete to respond to a given
input. This is often described as a "winner-takes-all" approach.
The ultimate objective of training a neural network is to obtain a set of weights that makes almost all the tuples in the
training data classified correctly. The steps involved are given below:
1. Initialize weights with random values
2. Feed the input tuples into the network one by one
3. For each unit
1. Compute the net input to the unit as a linear combination of all the inputs to the unit
2. Compute the output value using the activation function
3. Compute the error
4. Update the weights and the bias

ASHISH TRIPATHI
Core Principle

•Neurons specialise in recognising specific types of input patterns. For a given input, the neuron whose weight vector is
most similar to the input is declared the "winner."

•Only the winning neuron (and perhaps its neighbours) is allowed to update its weights, becoming even more sensitive
to that type of input. This process leads to a natural partitioning of the feature space

Types of Network optimizing algorithms

(A) Growing Algorithms
(B) Pruning Algorithms

Growing Algorithms
• Train a small unit sample and then new units are added to it
• Example: Upstart Algorithm, Tiling Algorithm and Cascade Correlation Algorithm

Pruning Algorithms
• Train a large network and then remove the unwanted weights or units from it.
• The large size helps in quick training the network and finally reduced size helps in improving generalizations.
• Example: weight decay method, cross-validation method, Significance based method
ASHISH TRIPATHI
Back-propagation neural network

• Learning in a multilayer network proceeds the same way as for a perceptron.

• A training set of input patterns is presented to the network.
• The network computes its output pattern, and if there is an error − or in other words a
difference between actual and desired output patterns − the weights are adjusted to reduce
this error.
• In a back-propagation neural network, the learning algorithm has two phases.
• First, a training input pattern is presented to the network input layer. The network
propagates the input pattern from layer to layer until the output pattern is generated
by the output layer.
• Second, if this pattern is different from the desired output, an error is calculated and
then propagated backwards through the network from the output layer to the input
layer. The weights are modified as the error is propagated.

ASHISH TRIPATHI
ASHISH TRIPATHI
ASHISH TRIPATHI
ASHISH TRIPATHI
Parameters to optimize the Back Propagation Theorem
• Number of hidden nodes: This should be kept small.
• Momentum Coefficient: It changes the weight according to neighbor nodes so as to
achieve global minima
• Sigmoidal gain: It optimize the scaling factor so that the same weight range can be
applied to wide variety of functions.
• Local minima: here the weight change should be such that the network doesn’t fix up in
local minima, instead it should be back-traced to release it to find global minima
• Learning coefficient: It should be between 0 to 1 as it will keep the weight aligned to
the perception.

ASHISH TRIPATHI
Neural Network: Principal Components Analysis

ASHISH TRIPATHI
Dimension Reduction-

In pattern recognition, Dimension Reduction is defined as-

•It is a process of converting a data set having vast dimensions into a data set with lesser dimensions.
•It ensures that the converted data set conveys similar information concisely.

Example-

Consider the following example-

•The following graph shows two dimensions x1 and x2.
•x1 represents the measurement of several objects in cm.
•x2 represents the measurement of several objects in inches.
In machine learning,
•Using both these dimensions convey similar information.
•Also, they introduce a lot of noise in the system.
•So, it is better to use just one dimension.

Using dimension reduction techniques-

•We convert the dimensions of data from 2 dimensions (x1 and x2) to 1 dimension (z1).
•It makes the data relatively easier to explain.

ASHISH TRIPATHI
Dimension Reduction Techniques-

Principal Component Analysis-

•Principal Component Analysis reduces the number of variables in dataset by extracting important one from large
dataset..
•It transforms the variables into a new set of variables called as principal components.
•These principal components are linear combination of original variables and weighted vectors.
•They are orthogonal. These are the eigenvectors of covariance matrix.
•The first principal component accounts for most of the possible variation of original data.
•The second principal component does its best to capture the variance in the data.
•There can be only two principal components for a two-dimensional data set.
ASHISH TRIPATHI
WHAT IS PRINCIPAL COMPONENT ANALYSIS?

Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the
dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the
information in the large set.

PCA Algorithm-
Step-01: Get data.
Step-02: Compute the mean vector (µ).
Step-03: Subtract mean from the given data.
Step-04: Calculate the covariance matrix.
Step-05: Calculate the eigen vectors and eigen values of the covariance matrix.
Step-06: Choosing components and forming a feature vector.
Step-07: Deriving the new data set.

ASHISH TRIPATHI
Example:
Given data = { 2, 3, 4, 5, 6, 7 ; 1, 5, 3, 6, 7, 8 }.
Compute the principal component using PCA Algorithm.

Step-01:

Get data.
The given feature vectors are-
•x1 = (2, 1)
•x2 = (3, 5)
•x3 = (4, 3)
•x4 = (5, 6)
•x5 = (6, 7)
•x6 = (7, 8)

Step-02:

Calculate the mean vector (µ).

Mean vector (µ)

= ((2 + 3 + 4 + 5 + 6 + 7) / 6, (1 + 5 + 3 + 6 + 7 + 8) / 6)
= (4.5, 5)

ASHISH TRIPATHI
Step-03:

Subtract mean vector (µ) from the given feature vectors.

•x1 – µ = (2 – 4.5, 1 – 5) = (-2.5, -4)
•x2 – µ = (3 – 4.5, 5 – 5) = (-1.5, 0)
•x3 – µ = (4 – 4.5, 3 – 5) = (-0.5, -2)
•x4 – µ = (5 – 4.5, 6 – 5) = (0.5, 1)
•x5 – µ = (6 – 4.5, 7 – 5) = (1.5, 2)
•x6 – µ = (7 – 4.5, 8 – 5) = (2.5, 3)

Feature vectors (xi) after subtracting mean vector (µ) are-

Step-04:

Calculate the covariance matrix.

Covariance matrix is given by-

ASHISH TRIPATHI
ASHISH TRIPATHI
Now,
Covariance matrix
= (m1 + m2 + m3 + m4 + m5 + m6) / 6

On adding the above matrices and dividing by 6, we get-

Step-05:

Calculate the eigen values and eigen vectors of the covariance matrix.
λ is an eigen value for a matrix M if it is a solution of the characteristic
equation |M – λI| = 0.
So, we have-

ASHISH TRIPATHI
From here,
(2.92 – λ)(5.67 – λ) – (3.67 x 3.67) = 0
16.56 – 2.92λ – 5.67λ + λ2 – 13.47 = 0
λ2 – 8.59λ + 3.09 = 0

Solving this quadratic equation, we get λ = 8.22, 0.38

Thus, two eigen values are λ1 = 8.22 and λ2 = 0.38.

Clearly, the second eigen value is very small compared to the first eigen value.
So, the second eigen vector can be left out.

Eigen vector corresponding to the greatest eigen value is the principal component for the given data set.
So. we find the eigen vector corresponding to eigen value λ1.

We use the following equation to find the eigen vector-

MX = λX
where-
•M = Covariance Matrix
•X = Eigen vector
•λ = Eigen value

Substituting the values in the above equation, we get-

ASHISH TRIPATHI
Solving these, we get-
2.92X1 + 3.67X2 = 8.22X1
3.67X1 + 5.67X2 = 8.22X2

On simplification, we get-
5.3X1 = 3.67X2 ………(1)
3.67X1 = 2.55X2 ………(2)

From (1) and (2), X1 = 0.69X2

From (2), the eigen vector is-

Thus, principal component for the given data set is-

PC1= 2.55 X + 3.67 Y

ASHISH TRIPATHI
FUZZY LOGIC

ASHISH TRIPATHI
Fuzzy Logic
Fuzzy logic is an extension of Boolean logic by Lotfi Zadeh in 1965 based on the mathematical theory of fuzzy sets, which is a
generalization of the classical set theory. One advantage of fuzzy logic in order to formalize human reasoning is that the rules
are set in natural language.
Fuzzy logic(FL) is defined as a form of knowledge representation suitable for notions that cannot be defined precisely, but
which depend upon their contexts. Fuzzy means “not clear, distinct, or precise; blurred”.

Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. The approach of FL imitates the way of decision
making in humans that involves all intermediate possibilities between digital values YES and NO.
Fuzzy inference systems rely on membership functions to explain to the computer how to calculate the correct value between 0
and 1. The degree to which any fuzzy statement is true is denoted by a value between 0 and 1.

ASHISH TRIPATHI
Classical sets – either an element Fuzzy sets – A fuzzy set has a
belongs to the set or it does not. It is graphical description that expresses
a set of distinct objects. For example, how the transition from one to
for the set of integers, either an another takes place. This graphical
integer is even(member) or it is not description is called a membership
(non-member). function

Fuzzy Logic is a multivalued logic, that allows intermediate values to be defined between conventional evaluations like
true/false, yes/no, high/low, etc.
Crisp set provides-
• Precise location of the set boundaries
• Membership values of the set

ASHISH TRIPATHI
Key Techniques and Tools

•Rule Generation: Algorithms like the Wang-Mendel method can generate fuzzy rules from numerical data by
partitioning the input space and determining the relationship between inputs and outputs. This allows for the
creation of interpretable "If-Then" rules .

•Linguistic Summarisation: Fuzzy logic can be used to create concise, human-readable summaries of quantitative
data (e.g., summarising sales data as "Most products in the North region had high sales in the summer months.") .

•Fuzzy Clustering for Model Extraction: Techniques like the Evolving Clustering Method (ECM) can be used to find
the natural groupings in data, which can then be used to define fuzzy sets and the initial structure of a fuzzy model

ASHISH TRIPATHI
Basic elements of fuzzy logic system

The architecture of the fuzzy logic controller shown in figure includes four components: Fuzzifier, Rule Base, Fuzzy Inference
Engine, and Defuzzifier.

ASHISH TRIPATHI
Fuzzifier: The fuzzifier is the input interface which maps a numeric input to a fuzzy set so that it can be matched with the
premises of the fuzzy rules defined in the application-specific rule base.

Rule Base: The rule base contains a set of fuzzy if-then rules which defines the actions of the controller in terms of
linguistic variables and membership functions of linguistic terms.

Fuzzy Inference Engine: The fuzzy inference engine applies the inference mechanism to the set of rules in the fuzzy rule
base to produce a fuzzy output set. This involves matching the input fuzzy set with the premises of the rules, activation of
the rules to deduce the conclusion of each rule that is fired, and combination of all activated conclusions using fuzzy set
union to generate fuzzy set output.

Defuzzifier: The defuzzifier is an output mapping which converts fuzzy set output to a crisp output. Based on the crisp
output, the fuzzy logic controller can drive the system under control.

The fuzzy rule base contains a set of linguistic rules. These linguistic rules are expressed using linguistic values and linguistic
variables. Different linguistic values can be assigned to a linguistic variable. These linguistic values are modeled as fuzzy
sets. Based on the linguistic values, their corresponding membership functions can be expressed based on application
requirements.
ASHISH TRIPATHI
Membership Functions
Membership functions allow you to quantify linguistic term and represent a fuzzy set graphically. A membership
function is the core component of Fuzzy Logic. It defines how much a value belongs to a fuzzy set. A membership
function for a fuzzy set A on the universe of discourse X is defined as µA: X → [0, 1].
Here, each element of X is mapped to a value between 0 and 1. It is called membership value or degree of
membership. It quantifies the degree of membership of the element in X to the fuzzy set A.
A membership function maps an input value to a degree between 0 and 1:
Where:
𝑥= input value
𝐴= fuzzy set (e.g., “Hot”, “Cold”)
𝜇𝐴 𝑥 =degree of membership

In Fuzzy ID3:
•Membership functions:
• Convert crisp data → fuzzy values
• Determine how much each sample contributes to each
branch
• Used in entropy & information gain calculations

ASHISH TRIPATHI
Types of Membership Functions
1. Triangular Membership Function 2. Trapezoidal Membership Function
Most commonly used (simple & efficient) 0 𝑥≤𝑎
𝑥−𝑎
0 𝑥≤𝑎 𝑎<𝑥≤𝑏
𝑥−𝑎 𝑏−𝑎
𝑎<𝑥≤𝑏 𝜇 𝑥 = 1 𝑏<𝑥≤𝑐
𝜇 𝑥 = 𝑐−𝑥 𝑏 − 𝑎
𝑑−𝑥
𝑏<𝑥<𝑐 𝑐<𝑥<𝑑
𝑐−𝑏 𝑑−𝑐
0 𝑥≥𝑐 0 𝑥≥𝑑
•Defined by 3 points: 𝑎, 𝑏, 𝑐 •Defined by 4 points: 𝑎, 𝑏, 𝑐, 𝑑

3. Gaussian Membership Function

ቀ𝑥−𝑐)2
−
𝜇 𝑥 = 𝑒 2𝜎2
•𝑐 =center
•𝜎= spread

4. Sigmoid Membership Function

1
𝜇 𝑥 =
1 + 𝑒 −𝑎 𝑥−𝑐

ASHISH TRIPATHI
Core Parameters of a Membership Function

Parameter Meaning
Support Range where membership > 0
Core Region where membership = 1
Boundary Points where membership = 0.5
Shape parameters Define structure (a, b, c, etc.)
Height Maximum membership value
Width/Spread Range of influence
Slope Rate of change

ASHISH TRIPATHI
Fuzzy Logic In Control Systems
Fuzzy Logic provides a more efficient and resourceful way to solve Control Systems. Some Examples:
– Temperature Controller
– Anti – Lock Break System ( ABS)

Temperature Controller:
→ The problem - Change the speed of a heater fan, based off the room temperature and humidity
→ A temperature control system has four settings - Cold, Cool, Warm, and Hot
→ Humidity can be defined by - Low, Medium, and High
→ Using this we can define - the fuzzy set

ASHISH TRIPATHI
Example of a Fuzzy Logic System

Let us consider an air conditioning system with 5-level fuzzy logic system. This system adjusts the temperature of
air conditioner by comparing the room temperature and the target temperature value.

Algorithm
1. Define linguistic Variables and terms (start)
2. Construct membership functions for them (start)
3. Construct knowledge base of rules (start)
4. Convert crisp data into fuzzy data sets using membership
functions (Fuzzification)
5. Evaluate rules in the rule base (inference engine)
6. Combine results from each rule (inference engine)
7. Convert output data into non-fuzzy values. (De-Fuzzification)

ASHISH TRIPATHI
Fuzzy Decision Trees

ASHISH TRIPATHI
FUZZY DECISION TREES
Decision trees are one of the most popular methods for learning and reasoning from instances. Fuzzy Decision
Trees(FDT) aims to combine the ability of decision trees (to learn from examples, to present knowledge in
comprehensible form) with fuzzy representation (to deal with inexact and uncertain information).
Fuzzy Decision Trees (FDTs) are an extension of classical decision trees designed to handle uncertainty and
continuous data more gracefully

Limitations of Classical Trees

Classical trees like ID3 or CART often struggle with continuous data.

•They may require fuzzification of continuous features into discrete categories, which can lead to information
loss .

•They often use a dichotomous (binary) splitting approach, which can generate overly complex trees with
many rules, leading to overfitting
ASHISH TRIPATHI
Fuzzy Decision Trees Working

FDTs address these issues by using fuzzy logic at the decision nodes. Instead of a hard split (e.g., "Age > 30"), an
instance can follow multiple branches simultaneously with different membership degrees.

•Fuzzy ID3 and IFD are well-established algorithms that use fuzziness for feature selection and tree induction

ASHISH TRIPATHI
Fuzzy ID3 is a generalization of the ID3 algorithm, a popular
and efficient method for inducing decision trees from symbolic
data. The key motivation for Fuzzy ID3 is to handle
the imprecision and uncertainty inherent in much human
knowledge and real-world data, which traditional "crisp"
decision trees cannot process effectively .

By incorporating fuzzy logic, Fuzzy ID3 can process continuous-

valued attributes directly without the need for sharp cut-offs
and produce more robust and interpretable models

ASHISH TRIPATHI
The general procedure for generating fuzzy decision trees using Fuzzy ID3 is outlined as follows :
Prerequisites: A Fuzzy partition space, leaf selection threshold βth and the best node selection criterion.
Procedure:
While there exist candidate nodes
DO
Select one of them using a search strategy,
Generate its child-nodes according to an expanded attribute obtained by the given heuristic.
Check child nodes for the leaf selection threshold.
Child-nodes meeting the leaf threshold has to be terminated as leaf-nodes.
The remaining child-nodes are regarded as new candidate nodes.
end
Before training, the α-cut is usually used for the initial data to
reduce the fuzziness. The α-cut of a fuzzy set A is defined as:

ASHISH TRIPATHI
Stochastic Search Methods

ASHISH TRIPATHI
Stochastic search is the method of choice for solving many hard combinatorial problems.
Combinatorial Decision Problems: For a given problem instance, decide whether a solution (grouping, ordering, or
assignment) exists which satisfies the given constraints.
Stochastic search methods are a broad class of optimisation algorithms that use randomness to find optima in a
search space. They are particularly useful for complex, high-dimensional, or non-differentiable problems where
traditional gradient-based methods fail.

Key Characteristics

•Randomness: They incorporate random elements to explore the search space, which helps in escaping local
optima.

•Adaptive Search: Many modern algorithms are highly adaptive, changing their search strategy based on feedback
from the environment .

•Parallelisation: These methods are often well-suited for implementation on parallel computers, as multiple
independent searches can be run simultaneously ASHISH TRIPATHI
Common Examples in Machine Learning

•Genetic Algorithms (GAs): Inspired by natural selection, GAs evolve a population of candidate solutions over
generations using operators like mutation and crossover.

•Simulated Annealing (SA): Inspired by annealing in metallurgy, SA probabilistically accepts worse solutions at the
beginning to explore the space and gradually reduces this probability to converge on an optimum.

•Particle Swarm Optimisation (PSO): Models a population (swarm) of candidate solutions (particles) that move around
the search space, influenced by their own best-known position and the swarm's best-known position.

Applications

Stochastic methods are used to solve a wide variety of real-world problems, from engineering design to financial
modelling, especially when the problem is too complex for deterministic algorithms

ASHISH TRIPATHI
General Working Principle
[Link] with an initial solution
[Link] new candidate solutions randomly
[Link] using objective/fitness function
[Link] or reject based on probability rules
[Link] until stopping condition

ASHISH TRIPATHI
Types of Stochastic Search Methods 4. Genetic Algorithm
1. Random Search
•Simplest method •Based on natural evolution:
•Randomly samples solutions • Selection
No memory, no learning
• Crossover
2. Hill Climbing (Stochastic Variant) • Mutation
•Moves to a better neighbor randomly
•If multiple better options exist → choose randomly Works with a population of solutions
May still get stuck in local optimum

3. Simulated Annealing 5. Particle Swarm Optimization

•Inspired by annealing in metallurgy •Inspired by bird flocking
•Accepts worse solutions with probability:
Δ𝐸
−𝑇 •Particles move based on:
𝑃= 𝑒
•𝑇 =temperature (decreases over time) • Personal best
Helps escape local minima • Global best

6. Ant Colony Optimization

•Inspired by ants finding shortest paths
•Uses
ASHISH TRIPATHI pheromone trails
Advantages
•Avoid local optima
•Works for complex/non-linear problems
•No need for gradient information

Disadvantages
•No guarantee of global optimum
•May require many iterations
•Performance depends on randomness

Feature Deterministic Stochastic

Output Same every run May vary
Exploration Limited Wide
Speed Fast Slower
Accuracy May get stuck Better global search

ASHISH TRIPATHI

ML Notes Regression Unit-II
No ratings yet
ML Notes Regression Unit-II
39 pages
Da Unit2
No ratings yet
Da Unit2
31 pages
Da Unit 3 Notes
No ratings yet
Da Unit 3 Notes
13 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
22 pages
UNIT 2pdf
No ratings yet
UNIT 2pdf
71 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
20 pages
Introduction to Regression Modeling Techniques
No ratings yet
Introduction to Regression Modeling Techniques
48 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
22 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Understanding Regression Analysis in ML
No ratings yet
Understanding Regression Analysis in ML
41 pages
Understanding Regression Analysis in ML
No ratings yet
Understanding Regression Analysis in ML
53 pages
Predictive Analytics and Regression Techniques
No ratings yet
Predictive Analytics and Regression Techniques
19 pages
Understanding Regression and Covariance
No ratings yet
Understanding Regression and Covariance
34 pages
Understanding Regression Analysis in ML
No ratings yet
Understanding Regression Analysis in ML
30 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
67 pages
Understanding Linear Regression Basics
100% (2)
Understanding Linear Regression Basics
11 pages
Regression Analysis
No ratings yet
Regression Analysis
30 pages
15 Types of Regression Explained
No ratings yet
15 Types of Regression Explained
42 pages
Understanding Linear Models and Regression
No ratings yet
Understanding Linear Models and Regression
20 pages
Understanding Regression Modeling Techniques
No ratings yet
Understanding Regression Modeling Techniques
37 pages
Understanding Regression Analysis in ML
No ratings yet
Understanding Regression Analysis in ML
12 pages
Regression Guide for Supporting Characters
100% (1)
Regression Guide for Supporting Characters
21 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
15 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
21 pages
Linear Regression Analysis Explained
No ratings yet
Linear Regression Analysis Explained
5 pages
MLT Unit-2
No ratings yet
MLT Unit-2
43 pages
Understanding Regression in Machine Learning
100% (2)
Understanding Regression in Machine Learning
20 pages
Correlation Analysis Overview and Methods
No ratings yet
Correlation Analysis Overview and Methods
52 pages
Environmental Modeling Techniques Explained
No ratings yet
Environmental Modeling Techniques Explained
28 pages
Understanding Regression Analysis Basics
100% (1)
Understanding Regression Analysis Basics
30 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
32 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
9 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
19 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
12 pages
Regression Analysis in Data Analytics
No ratings yet
Regression Analysis in Data Analytics
76 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
9 pages
Module 2-ML
No ratings yet
Module 2-ML
36 pages
Linear Regression: Simple vs. Multiple
No ratings yet
Linear Regression: Simple vs. Multiple
6 pages
Regression Analysis Techniques Explained
No ratings yet
Regression Analysis Techniques Explained
21 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
48 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
27 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
33 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
20 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
57 pages
Understanding Regression Analysis Basics
100% (1)
Understanding Regression Analysis Basics
52 pages
Regression Analysis: Concepts & Techniques
No ratings yet
Regression Analysis: Concepts & Techniques
54 pages
Understanding Linear Regression in ML
No ratings yet
Understanding Linear Regression in ML
66 pages
Understanding Regression Analysis in ML
No ratings yet
Understanding Regression Analysis in ML
48 pages
Classification vs. Regression Algorithms
No ratings yet
Classification vs. Regression Algorithms
19 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
14 pages
Understanding Regression Variables
No ratings yet
Understanding Regression Variables
3 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
3 pages
Introduction to Regression Modeling Techniques
No ratings yet
Introduction to Regression Modeling Techniques
11 pages
Data Analytics Unit-2
No ratings yet
Data Analytics Unit-2
47 pages
Supervised Learning: Regression Techniques
No ratings yet
Supervised Learning: Regression Techniques
67 pages
DSTL U-2 One Shot Notes
No ratings yet
DSTL U-2 One Shot Notes
148 pages
Unit 1 Data Analytics
No ratings yet
Unit 1 Data Analytics
60 pages
Understanding Predicate Logic Concepts
No ratings yet
Understanding Predicate Logic Concepts
52 pages
2nd Year 3rd Sem PYP (Previous Year Papers)
No ratings yet
2nd Year 3rd Sem PYP (Previous Year Papers)
25 pages
Types of Chemical Bonds Explained
No ratings yet
Types of Chemical Bonds Explained
1 page
Essential Baking Tips and Techniques
No ratings yet
Essential Baking Tips and Techniques
92 pages
Crop Yield Prediction Using ML Techniques
No ratings yet
Crop Yield Prediction Using ML Techniques
17 pages
Slope Failure Analysis in Sri Lanka
No ratings yet
Slope Failure Analysis in Sri Lanka
42 pages
Chicken Infectious Anemia: An Emerging Immunosuppressive Viral Threat To The Poultry Industry
No ratings yet
Chicken Infectious Anemia: An Emerging Immunosuppressive Viral Threat To The Poultry Industry
8 pages
Overview of Geography Branches
No ratings yet
Overview of Geography Branches
6 pages
Hydroxyl Compounds: Alcohols & Phenols Guide
No ratings yet
Hydroxyl Compounds: Alcohols & Phenols Guide
19 pages
Split AC Installation Quotation
No ratings yet
Split AC Installation Quotation
1 page
Aerobic Degradation in Wastewater
No ratings yet
Aerobic Degradation in Wastewater
2 pages
Ribonnet Air Blown Fiber System
No ratings yet
Ribonnet Air Blown Fiber System
23 pages
Letterpress Printing in Journalism
No ratings yet
Letterpress Printing in Journalism
6 pages
HSC 2023 Physics Exam Questions Guide
No ratings yet
HSC 2023 Physics Exam Questions Guide
69 pages
Mechatronics Engineer: Automation & Robotics
No ratings yet
Mechatronics Engineer: Automation & Robotics
2 pages
Python Code Output Predictions and Analysis
No ratings yet
Python Code Output Predictions and Analysis
5 pages
Textile Engineering Internship Report
No ratings yet
Textile Engineering Internship Report
26 pages
CHROMOPHARE® Light Service Manual
No ratings yet
CHROMOPHARE® Light Service Manual
148 pages
Circle Equations and Properties
No ratings yet
Circle Equations and Properties
4 pages
Electric Conduit Design & Installation Guide
No ratings yet
Electric Conduit Design & Installation Guide
32 pages
Measuring Young's Modulus of Copper Wire
No ratings yet
Measuring Young's Modulus of Copper Wire
3 pages
Guttercrest Aluminium Rainwater Systems
No ratings yet
Guttercrest Aluminium Rainwater Systems
64 pages
Current Electricity Concepts and Laws
No ratings yet
Current Electricity Concepts and Laws
19 pages
Lewis on Identity Theory of Truth
No ratings yet
Lewis on Identity Theory of Truth
4 pages
Medium Voltage Switchgear Specifications
No ratings yet
Medium Voltage Switchgear Specifications
9 pages
Understanding Contracted Pelvis in OBG
80% (5)
Understanding Contracted Pelvis in OBG
48 pages
Pollution Issues for 3rd Year English
No ratings yet
Pollution Issues for 3rd Year English
2 pages
Soap Making via Saponification Experiment
No ratings yet
Soap Making via Saponification Experiment
2 pages
Tenma 72-8155 Operating Manual
No ratings yet
Tenma 72-8155 Operating Manual
26 pages
Autumn's Weekend ENG
50% (2)
Autumn's Weekend ENG
6 pages
General Studies Test 4722 - 2025 Exam Prep
No ratings yet
General Studies Test 4722 - 2025 Exam Prep
20 pages
AONX38168
No ratings yet
AONX38168
10 pages

Unit 2 Data Analytics

Uploaded by

Unit 2 Data Analytics

Uploaded by

Unit-2

Linear Regression model

For Example, if β1 = 2, then Y is expected to increase by 2 for each 1 unit increase in X.

Solution: The scatter diagram for the above mentioned data is

n = 4 (number of data points)

Predict Salary for 5 Years Experience Verification

“Bayesian statistics is a mathematical procedure that applies

probabilities to statistical problems. It provides people the tools

to update their beliefs in the evidence of new data.”

Apply Bayes' Theorem

Calculate 𝑃 + using Law of Total Probability Substitute into Bayes' Theorem

It is also called a Bayes network, belief network, decision network, or Bayesian

It consists of two parts:

Time series is a sequence of observations of categorical or numeric variables indexed by a date, or

Timestamp Stock - Price

Types of Time Series

The 1st value and the 24th value have a high

Usually rules are expressions of the form

if (attribute − 1, value − 1) and (attribute − 2,

• It is a data mining process of deducing if-then rules from a dataset.

•The IF part of the rule is called rule antecedent or precondition.

We can also write rule R1 as follows −

R1: (age = youth) ^ (student = yes)) (buys computer = yes)

Rule Induction Using Sequential Covering Algorithm

Support Vector Machine or SVM are supervised machine learning models

As we go from left to right, all the examples will be classified as

we have an infinite number of possibilities to draw the

Finding the Optimal Hyperplane

Each of the calculations (calculate distance and optimal

• K is the kernel function,

Popular Use Cases

Cons of Kernelized SVM:

Core Concepts for Improving Generalisation

•Bias-Variance Trade-off: This is a fundamental trade-off in machine learning.

These are methods used to prevent overfitting and enhance generalisation .

Types of Network optimizing algorithms

• Learning in a multilayer network proceeds the same way as for a perceptron.

In pattern recognition, Dimension Reduction is defined as-

Consider the following example-

Using dimension reduction techniques-

Principal Component Analysis-

Calculate the mean vector (µ).

Mean vector (µ)

Subtract mean vector (µ) from the given feature vectors.

Feature vectors (xi) after subtracting mean vector (µ) are-

Calculate the covariance matrix.

On adding the above matrices and dividing by 6, we get-

Solving this quadratic equation, we get λ = 8.22, 0.38

We use the following equation to find the eigen vector-

Substituting the values in the above equation, we get-

From (1) and (2), X1 = 0.69X2

Thus, principal component for the given data set is-

PC1= 2.55 X + 3.67 Y

3. Gaussian Membership Function

4. Sigmoid Membership Function

Limitations of Classical Trees

By incorporating fuzzy logic, Fuzzy ID3 can process continuous-

3. Simulated Annealing 5. Particle Swarm Optimization

6. Ant Colony Optimization

Feature Deterministic Stochastic

You might also like