0% found this document useful (0 votes)

4 views31 pages

8 Machine Learning Models Overview

The document provides an overview of machine learning models, explaining their types, applications, and implementation using the Scikit-Learn library in Python. It covers key concepts such as supervised and unsupervised learning, regression and classification models, and evaluation metrics. The article aims to simplify complex machine learning concepts for data science beginners without a strong background in math or statistics.

Uploaded by

nyesiga65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views31 pages

8 Machine Learning Models Overview

Uploaded by

nyesiga65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

8 Machine Learning Models Explained in 20

Minutes
Find out everything you need to know about the types of machine learning
models, including what they're used for and examples of how to implement
them.
Sep 16, 2022 · 25 min read
CONTENTS
 Why use Machine Learning Models?
 Machine Learning Regression Models
 Machine Learning Classification Models
 Machine Learning Tree-Based Models
 Machine Learning Clustering
 Machine Learning Models Explained - Next Steps:
SHARE

Machine learning models are algorithms that can identify patterns or make
predictions on unseen datasets. Unlike rule-based programs, these models do
not have to be explicitly coded and can evolve over time as new data enters
the system.

This article will introduce you to the different types of problems that can be
solved using machine learning. Then, you will learn about the eight most
popular machine learning algorithms used by data scientists to solve business
problems.

By the end of this article, you will be familiar with the theory and mathematical
intuition behind these models, along with how to implement them using
the Scikit-Learn library in Python.

We will explain complex machine learning concepts in plain English, and this
article is recommended for data science aspirants with no strong background
in math or statistics.

Why use Machine Learning Models?

Today, many large organizations use some form of predictive modeling to
maximize revenue and drive business growth.

Machine learning has a variety of use-cases in different domains.

Subscription-based platforms like Netflix and Spotify, for instance, use
machine learning to recommend content based on user activity on the
application.

Recommendation systems add direct business value to these companies

since a better user experience will make it likely for customers to continue
subscribing to the platform. This is an example of an unsupervised machine
learning model.

Similarly, a mobile service provider might use machine learning to analyze

user sentiment and curate its product offering according to market demand.
This is an example of a supervised machine learning model.

All machine learning models can be classified as supervised or unsupervised.

The biggest difference between the two is that a supervised algorithm
requires labeled input and output training data, while an unsupervised model
can process raw, unlabeled datasets.

Supervised machine learning models can then be further classified into

regression and classification algorithms, which will be explained in more
detail in this article.

Machine Learning Regression Models

Regression algorithms are used to predict a continuous outcome (y) using

independent variables (x).

For example, look at the table below:

Image by author
In this case, we would like to predict the rent of a house based on its size,
the number of bedrooms, and whether it is fully furnished. The dependent
variable, “Rent”, is numeric, which makes this a regression problem.

A problem with many input variables like the one above is called a
multivariate regression problem.

Regression Metrics

A common misconception by data science beginners is that a regression

model can be evaluated using a metric like accuracy. Accuracy is a metric
used to assess the performance of classification models, as will be explained
later in this article.

Regression models, on the other hand, are evaluated using metrics such as
MAE (Mean Absolute Error), MSE (Mean Squared Error), and RMSE (Root
Mean Squared Error).

Let’s add a predicted value to the house price problem above and evaluate
these predictions using a few regression metrics:

1. Mean Absolute Error:

The mean absolute error calculates the sum of the difference between all
true and predicted values, and divides this by the total number of
observations. Here is the formula to calculate MAE:

Let’s calculate the Mean Absolute Error of the above values using this
formula:

Let’s calculate the MSE between the actual and predicted values above:
3. Root Mean Squared Error:

The RMSE of an estimator is calculated by finding the square root of its mean
squared error. One advantage of calculating a dataset’s RMSE over its MSE is
that the error is returned in the same unit of the variable we are predicting.

In this case, for instance, the RMSE is √54,520.25=233.5. This value is

interpretable since it is in terms of house price, while the Mean Squared Error
was not.

Now that you understand the concept of regression, let’s look into the
different types of regression models:

Simple Linear Regression

Linear regression is a linear approach to modeling the relationship between a

dependent and one or more independent variables. This algorithm involves
finding a line that best fits the data at hand.

Here is a visual representation of how a simple linear regression model

works:
Image by author

The chart above showcases the relationship between house price and size.
The linear regression model will create a line that best models this
relationship. All house price predictions relative to different values of size will
lie on the best fit line.

Observe that there are three lines drawn on the diagram above. Which of
these lines is the “line of best fit?”
Line of Best Fit

Just by looking at the diagram above, we can see that the orange line is the
closest to all the data points showcased. Hence, we can intuitively say that it
represents the “line of best fit.”

Let’s take a simple example to understand the concept of overfitting:

Image by author

In the figure above, the line of best fit above models the relationship
between X and y perfectly, and the sum of squared distance between the
true and predicted values is 0. Recall that the equation for this line
is y=mx+c.
While this line is a perfect fit on the training dataset, it likely would not
generalize well to test data. This phenomenon is called overfitting, and you
can read this article on overfitting to learn more about it.

In simple words, a model that is highly complex will pick up on unnecessary

nuances of the training dataset that aren’t reflected in the real world. This
model will perform extremely well on training data but will underperform on
datasets outside what it was trained on.

A linear regression model with large coefficients is prone to overfitting.

Ridge regression is a regularization technique that will force the algorithm to

choose smaller coefficients by penalizing its loss function to include an
additional cost.

As shown in the previous section, here is the error that we want to minimize
in simple linear regression:

In ridge regression, this equation will change slightly, and a penalty term will
be added to the above error:

Notice that there is a value (lambda) multiplied to the model’s coefficients.

Since this model only has one variable, there is a single coefficient with a
penalty term added to it. If there are multiple independent variables, lambda
will be multiplied by the sum of squared coefficients.

This penalty term punishes the model for choosing larger coefficients. The
aim here is to shrink the coefficient values so that variables with a minor
contribution to the outcome will have their coefficients close to 0. This
reduces model variance and helps mitigate overfitting.

What is the optimal lambda value for ridge regression?

Observe that a lambda value of 0 will have no effect whatsoever, and the
penalty term is eliminated. A higher value of lambda will add a larger
shrinkage penalty, and the model coefficients will get closer to zero.

When choosing a lambda value, make sure to strike a balance between

simplicity and a good training data fit. A higher lambda value results in a
simple, generalized model, but choosing a value that is too high comes with
the risk of underfitting. On the other hand, choosing a value of lambda that is
very close to zero can lead to a highly complex model.

Lasso Regression

Lasso regression is another extension of linear regression that shrinks model

coefficients by adding a penalty term to its cost function.

Here is the error that needs to be minimized in lasso regression:

Notice that this equation is like that of a ridge regression model, except,
instead of multiplying lambda to the square of the coefficient, we are
multiplying it with the coefficient’s absolute value.

The biggest difference between ridge and lasso regression is that in ridge
regression, while model coefficients can shrink towards zero, they never
actually become zero. In lasso regression, it is possible for model coefficients
to become zero.

If an independent variable’s coefficient reaches zero, the feature can be

eliminated from the model. This reduces the feature space and makes the
algorithm easier to interpret, which is the biggest advantage of lasso
regression.

Due to this, lasso regression can also be used as a feature selection

technique, since variables with low importance can have coefficients that
reach zero and will be removed entirely from the model.

How to Build a Regression Machine Learning Model in Python

You can build linear, ridge, and lasso regression models using the Scikit-
Learn library:

1. Linear Regression

from sklearn.linear_model import LinearRegression

lr_model = LinearRegression()

To fit the model on your training dataset, run:

lr_model.fit(X_train,y_train)

2. Ridge Regression

from sklearn.linear_model import Ridge

model = Ridge(alpha=1.0)

The lambda term can be configured via the “alpha” parameter when defining
the model.

3. Lasso Regression

from sklearn.linear_model import Lasso

model = Lasso(alpha=1.0)

If you’d like to learn more about linear models and how to build them in
Python, take our Introduction to Linear Modeling in Python course.

Machine Learning Classification Models

We use Classification algorithms to predict a discrete outcome (y) using

independent variables (x). The dependent variable, in this case, is always a
class or category.

For example, predicting whether a patient is likely to develop heart disease

based on their risk factors is a classification problem:
Image by author

The table above showcases a classification problem with four independent

variables and one dependent variable, heart disease. Since there are only
two possible outcomes (Yes and No), this is called a binary classification
problem.

A multiclass classification problem is one with three or more possible

outcomes, such as weather forecasting or distinguishing between different
animal species.

Classification Metrics

There are many ways to evaluate a classification model. While accuracy is

the most used metric, it is not always the most reliable.

Let’s look at some common methods used to evaluate a classification

algorithm based on the dataset below:
Image by author

1. Accuracy: Accuracy can be defined as the fraction of correct predictions

made by the machine learning model.

The formula to calculate accuracy is:

In this case, the accuracy is 46, or 0.67.

2. Precision: Precision is a metric used to calculate the quality of positive

predictions made by the model. It is defined as:

The above model has a precision of 24, or 0.5.

3. Recall: Recall is used to calculate the quality of negative predictions

made by the model. It is defined as:

The above model has a recall of 2/2 or 1.

Let’s look at a simple example to understand the difference between
precision and recall:

There is a rare, fatal disease that affects a fraction of the population. 95% of
the patients in a hospital’s database do not have the disease, while only 5%
do. If we build a machine learning algorithm that predicts that nobody has
the disease, then the training accuracy of this model will be 95%. Despite
the high accuracy, we know this is not a good model since it fails to identify
patients with the disease.

This is where metrics like precision and recall come in. Precision, or
specificity, tells us the ability of the model to correctly identify people
without the disease. Recall, or sensitivity, tells us how well the model
identifies people with the disease.

A “good” precision and recall value is subjective and depends on your use
case.

In this disease prediction scenario, we always want to identify people with

the disease, even if this comes with the risk of a false positive. Here, we will
build the model to have higher recall than precision.

On the other hand, if we were to build a model that prevents malicious

actors from entering an e-commerce website, we might want higher
precision since blocking legitimate users will lead to a decline in sales.

We often use a metric called the F1-Score to find the harmonic mean of a
classifier’s precision and recall. Simply put, the F1-Score combines precision
and recall into a single metric by computing their average.

AUC, or Area Under the Curve, is another popular metric used to measure
the performance of a classification model. An algorithm’s AUC tells us about
its ability to distinguish between positive and negative classes.

To learn more about measures like AUC and how they are calculated, take
the Supervised Learning in R course by Datacamp.

Now, let’s look at the different types of classification models and how they
work:

Logistic Regression

Logistic regression is a simple classification model that predicts the

probability of an event taking place.

Here is an example of how the logistic regression model works:

Image by author

The chart above displays a logistic function that maps email data into two
categories: “Spam” and “Not Spam” based on the frequency of negative
keywords in its text.

Observe that, unlike the linear regression algorithm, logistic regression is

modeled with an S-shaped curve. This is known as the logistic function and
has the following formula:
While the linear function does not have an upper and lower bound, the
logistic function ranges between 0 and 1. The model predicts a probability
that ranges from 0 to 1, which determines the class that the data point
belongs to.

In this spam email example, if the text contains little to no suspicious

keywords, then the probability of it being spam will be low and close to 0. On
the other hand, an email with many suspicious keywords will have a high
probability of being spam, close to 1.

This probability is then turned into a classification outcome:

Image by author

All the points colored in red have a probability >= 0.5 of being spam. Hence,
they are classified as spam and the logistic regression model will return a
classification outcome of 1. The points colored in green have a probability <
0.5 of being spam, so they are classified by the model as “Not Spam” and
will return a classification outcome of 0.

For binary classification problems like the above, the default threshold of a
logistic regression model is 0.5, which means that data points with a higher
probability than 0.5 will automatically be assigned a label of 1. This threshold
value can be manually changed depending on your use case to achieve
better results.

Now, recall that in linear regression, we found the line of best fit by
minimizing the sum of squared error between the predicted and true values.
In logistic regression, however, the coefficients are estimated using a
technique called maximum likelihood estimation instead of least squares.

Read Python logistic regression tutorial to learn more about the concept
of maximum likelihood estimation and how logistic regression works.

K-Nearest Neighbors

KNN is a classification algorithm that classifies a data point based on what

group the data points nearest to it belong to.

Here is a simple example to demonstrate how the K-Nearest Neighbors

model works:
Image by author

In the diagram above, there are two classes of data points - A and B. The
black triangle represents a new data point that needs to be classified into
one of these two classes.

The K-Nearest Neighbors algorithm works like this:

 Step 1: The model first stores all the training data.

 Step 2: Then, it calculates the distance from the new data point to
all points in the dataset.
 Step 3: The model sorts these data points based on their distance
to the new data point.
 Step 4: The new data point is assigned to the class of its nearest
neighbors depending on the value of “k.”
In the visual above, the value of k is 1. This means that we look at only one
closest neighbor to the black triangle and assign the data point to that class.
The new data point is closest to the blue point, so we assign it to class B.

Now, let’s amend the value of k. Let’s try two possible values of k, 3 and 7:

Image by author
Now, notice that when we choose k=3, the new data point is between two
categories. This means that we pick the majority class. Tw nearest neighbors
are blue, and one nearest neighbor is green, so the data point will again be
assigned to the class with blue points, class B.

When k=7, however, things change. Now, two nearest neighbors are blue,
and seven are green. In this case, the data point will be assigned to the
green class, class A.

Choosing different values of k will impact what class the new point is
assigned to.

Selecting a value that is too small can be noisy and subject to outliers while
selecting a large value might make you overlook categories with fewer data
points.

If you’d like to learn more about the K-Nearest Neighbors algorithm and how
to select an optimal “k” value, read this KNN tutorial.

Build a Classification Model in Python

Here are some code snippets you can use to build a classification model in
Python using the Scikit-Learn library:

1. Logistic Regression

from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression()

2. K-Nearest Neighbors

from [Link] import KNeighborsClassifier

knn = KNeighborsClassifier()

Machine Learning Tree-Based Models

Tree-based models are supervised machine learning algorithms that

construct a tree-like structure to make predictions. They can be used for
both classification and regression problems.
In this section, we will explore two of the most commonly used tree-based
machine learning models: decision trees and random forests.

Decision Trees

A decision tree is the simplest tree-based machine learning algorithm. This

model allows us to continuously split the dataset based on specific
parameters until a final decision is made.

Here is a simple example demonstrating how the decision tree algorithm

works:

Image by author
Decision trees split on different nodes until an outcome is obtained.

In this case, if a student does not study every week, they will fail. If they
study every week but do not complete their homework, the result is still
“Fail.” They will only pass if they were to study every week and finish all
their homework.

Notice that the decision tree above splits first on the variable “Studies Every
Week?” It then stops splitting if the answer is “No,” saying that the student
will fail.

The decision tree will choose a variable to split on first based on a metric
called entropy. It will stop splitting when a “pure split” is obtained, i.e., when
all the data points belong to a single class.

There are many ways to build a decision tree. The tree needs to find a
feature to split on first, second, third, etc. This structure is created based on
a metric called information gain. The best possible decision tree is one with
the highest information gain.

To learn more about how decision trees work, along with metrics like entropy
and information gain, this Python decision tree classification article has
more details.

One of the biggest advantages of decision trees is that they are highly
interpretable. It is easy to work backward and understand how a decision
tree has obtained its final outcome based on the training dataset.

However, decision trees are also highly prone to overfitting if left to grow
completely. This is because they are designed to split perfectly on all
samples of the training dataset, which makes them unable to generalize well
to external data.

This drawback of decision trees can be solved by using the random forest
algorithm.

Random Forests

The random forest model is a tree-based algorithm that helps us mitigate

some of the problems that arise when using decision trees, one of which is
overfitting. Random forests are created by combining the predictions made
by multiple decision tree models and returning a single output.

It does this in two steps:

 Step 1: First, the rows and variables of the dataset are randomly
sampled with replacement. Multiple decision trees are then created
and trained on each data sample.
 Step 2: Next, the predictions made by all these decision trees are
combined to come up with a single output. For instance, if 3
separate decision trees were trained and 2 of them predicted “Yes”
while 1 predicted “No,” then the final outcome of the random forest
algorithm would be “Yes.”
In case of a regression problem, the outcome will be the average prediction
of all decision trees.

Here is a simple visual to showcase how the random forest algorithm works:
Image by author

In the diagram above, the first and third decision trees predict “Yes” while
the second predicts “No.”

Since this is a classification task, the majority class is selected. In this case,
the random forest algorithm will return a final outcome of “Yes” based on the
predictions made by 2 out of 3 decision trees.

One of the biggest advantages of the random forest algorithm is that it

generalizes well, since it combines the output of multiple decision trees that
are trained on a subset of features.

Furthermore, while the output of a single decision tree can vary dramatically
based on a small change in the training dataset, this problem does not arise
with the random forest algorithm as the training dataset is sampled many
times.

Build a Tree-Based Model in Python

Run the following lines of code to build a tree-based machine learning

algorithm with Scikit-Learn:

1. Decision Tree

# classification

from [Link] import DecisionTreeClassifier

clf = DecisionTreeClassifier()

# regression

from [Link] import DecisionTreeRegressor

dt_reg = DecisionTreeRegressor()
POWERED BY

2. Random Forests

# classification

from [Link] import RandomForestClassifier

rf_clf = RandomForestClassifier()

# regression

from [Link] import RandomForestRegressor

rf_reg = RandomForestRegressor()

Machine Learning Clustering

So far, we’ve explored supervised machine learning models to tackle

classification and regression problems. Now, we will dive into a popular
unsupervised learning approach called clustering.

In simple words, clustering is the task of creating a group of objects that are
similar to each other but different from others. This technique has a variety
of business use cases, such as recommending movies to users with similar
viewing patterns on a video streaming site, anomaly detection, and customer
segmentation.

In this section, we will examine an algorithm called K-Means clustering - the

simplest and most popular machine learning model used for unsupervised
learning tasks.
K-Means Clustering

K-Means clustering is an unsupervised machine learning technique that is

used to group similar objects together in data.

Here is an example of how the K-Means clustering algorithm works:

Image by author

Step 1: The image above consists of unlabeled observations that have not
been grouped. Initially, each observation will be assigned to a cluster at
random. A centroid will then be computed for each cluster.

These are represented with the “+” symbol in the diagram below:
Image by author

Step 2: Next, the distance of each data point to the centroid is measured,
and each point is assigned to the nearest centroid:
Image by author

Step 3: The centroid of the new cluster is then recalculated, and data points
will be reassigned accordingly.

Step 4: This process is repeated until data points are no longer being
reassigned:
Image by author

Observe that three clusters were created in the example above. The number
of clusters is referred to as “k” in the K-Means clustering algorithm, and this
has to be determined by us.

There are a few different ways to select “k” in K-Means, the most popular of
which is the elbow method. This technique consists of plotting the error for a
different number of clusters on a graph and choosing the inflection point of
the curve as “k.”

Learn more in our K-Means clustering in Python tutorial to discover the

elbow method and the inner workings of K-Means clustering.
Build a K-Means Clustering Model in Python

from [Link] import KMeans

kmeans = KMeans(n_clusters = 3, init='k-means++')

The n_clusters argument indicates the number of clusters “k” that you need
to define when building the algorithm.

Machine Learning Models Explained - Next Steps:

If you managed to follow along with this entire article, congratulations! You
now know about some of the most popular supervised and unsupervised
machine learning models and algorithms and how they can be applied to
solve a variety of predictive modeling problems.

To become a data scientist, you need to understand how different types of

machine learning models work to apply them to solve a problem. For
instance, if you’d like to build a model that is interpretable and has low
computation time, it might make sense to create a decision tree. If your aim
is to create a model that generalizes well, however, then you can choose to
build a random forest algorithm instead.

It is also important to understand how to evaluate machine learning models.

A “good” model is subjective and highly dependent on your use case. In
classification problems, for instance, high accuracy alone isn’t indicative of a
good model. As a data scientist, you need to review metrics like precision,
recall, and F1-Score to get a better idea of how well your model is
performing.

If you would like to gain a deeper understanding of machine learning models

than the concepts covered in this article, take the Machine Learning
Scientist with Python course. This career track will teach you the theory
behind how machine learning models operate and how they can be
implemented in Python. You will also learn data preparation techniques such
as normalization, decorrelation, and feature selection in the course.

Here is a more formal explanation as to how the line of best fit is found in
linear regression:
The equation of a straight line is y=mx+c. Here, m represents the slope of
the line and c represents its y intercept. There are infinite ways to draw this
line, as there are infinite possible values for m and c.

The line of best fit, also known as the least squares regression line, is found
by minimizing the sum of squared distance between the true and predicted
values:

You can read the Essentials of Linear Regression in Python tutorial to

gain a deeper understanding of the linear regression machine learning model
and its implementation.

Ridge Regression

Ridge regression is an extension of the linear regression model explained

above. It is a technique used to keep a regression model’s coefficients as low
as possible.

One problem with a simple linear regression model is that its coefficients can
become large, which makes the model more sensitive to inputs. This can
lead to overfitting.

Let’s take a simple example to understand the concept of overfitting:

The mean absolute error between the actual and predicted house price is
approximately $155.
2. Mean Squared Error:

The formula to calculate a model’s mean squared error is similar to that of its
mean absolute error:

Note that while the mean absolute error calculates the average absolute
distance between the actual and predicted value, the mean squared error
finds the averaged squared distance between actual and predicted values.

Let’s calculate the MSE between the actual and predicted values above:

3. Root Mean Squared Error:

In this case, for instance, the RMSE is √54,520.25=233.5. This value is

interpretable since it is in terms of house price, while the Mean Squared Error
was not.

Now that you understand the concept of regression, let’s look into the
different types of regression models:

Simple Linear Regression

Linear regression is a linear approach to modeling the relationship between a

dependent and one or more independent variables. This algorithm involves
finding a line that best fits the data at hand.

Here is a visual representation of how a simple linear regression model

works:

Machine Learning Regression Overview
No ratings yet
Machine Learning Regression Overview
24 pages
Understanding Regression Techniques in ML
No ratings yet
Understanding Regression Techniques in ML
20 pages
Understanding Regression in Machine Learning
No ratings yet
Understanding Regression in Machine Learning
137 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
169 pages
Supervised Learning Unit 3notes
No ratings yet
Supervised Learning Unit 3notes
21 pages
Machine Learning Regression SLR 1
No ratings yet
Machine Learning Regression SLR 1
36 pages
Predicting House Prices with Regression
No ratings yet
Predicting House Prices with Regression
5 pages
Regression Models in Machine Learning
No ratings yet
Regression Models in Machine Learning
15 pages
Supervised Learning: Regression & Classification
No ratings yet
Supervised Learning: Regression & Classification
141 pages
Regression Models for Real Estate Pricing
No ratings yet
Regression Models for Real Estate Pricing
20 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
34 pages
Supervised Learning Overview and Types
No ratings yet
Supervised Learning Overview and Types
31 pages
Simple & Multiple Linear Regression Explained
No ratings yet
Simple & Multiple Linear Regression Explained
19 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
12 pages
Supervised Learning: Regression & Classification
No ratings yet
Supervised Learning: Regression & Classification
30 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
13 pages
Unit 3
No ratings yet
Unit 3
103 pages
Machine Learning: Regression & Clustering
No ratings yet
Machine Learning: Regression & Clustering
92 pages
Regression in ML
No ratings yet
Regression in ML
11 pages
Linear Regression Basics Explained
No ratings yet
Linear Regression Basics Explained
6 pages
Understanding Machine Learning Models
No ratings yet
Understanding Machine Learning Models
49 pages
In-Depth Machine Learning Concepts
No ratings yet
In-Depth Machine Learning Concepts
50 pages
Linear Regression Lab with Python Guide
No ratings yet
Linear Regression Lab with Python Guide
13 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
71 pages
Key Concepts in Regression Models
No ratings yet
Key Concepts in Regression Models
26 pages
Module 2
No ratings yet
Module 2
53 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
93 pages
Regression and Logistic Models Explained
No ratings yet
Regression and Logistic Models Explained
46 pages
Supervised Learning: Regression vs. Classification
No ratings yet
Supervised Learning: Regression vs. Classification
10 pages
Linear Regression and Evaluation Metrics
No ratings yet
Linear Regression and Evaluation Metrics
28 pages
COM2028 Week 3 Linear Regression4
No ratings yet
COM2028 Week 3 Linear Regression4
50 pages
MLT Unit-2 Regression
No ratings yet
MLT Unit-2 Regression
13 pages
Supervised Learning Overview
No ratings yet
Supervised Learning Overview
42 pages
Linear Regression Techniques and Examples
No ratings yet
Linear Regression Techniques and Examples
89 pages
Machine Learning: Linear Regression Insights
No ratings yet
Machine Learning: Linear Regression Insights
24 pages
Understanding Linear Regression Techniques
No ratings yet
Understanding Linear Regression Techniques
35 pages
Supervised Machine Learning Techniques Guide
No ratings yet
Supervised Machine Learning Techniques Guide
131 pages
UPSupervised Learning Overview
No ratings yet
UPSupervised Learning Overview
38 pages
Understanding Regression Techniques
No ratings yet
Understanding Regression Techniques
15 pages
Regression 1
No ratings yet
Regression 1
128 pages
Understanding Linear Regression in ML
No ratings yet
Understanding Linear Regression in ML
9 pages
Machine Learning: Linear Models Overview
No ratings yet
Machine Learning: Linear Models Overview
25 pages
Lec1-Introduction To Machine Learning
No ratings yet
Lec1-Introduction To Machine Learning
53 pages
MLT Unit-2
No ratings yet
MLT Unit-2
43 pages
Types of Machine Learning Algorithms
No ratings yet
Types of Machine Learning Algorithms
9 pages
Regression vs. Classification in ML
No ratings yet
Regression vs. Classification in ML
42 pages
Regression Algorithms for Predictions
No ratings yet
Regression Algorithms for Predictions
9 pages
Supervided Learning With Python
No ratings yet
Supervided Learning With Python
14 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
18 pages
Linear Regression Essentials in Python
No ratings yet
Linear Regression Essentials in Python
23 pages
Linear & Logistic Regression Insights
No ratings yet
Linear & Logistic Regression Insights
27 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
9 pages
Supervised Learning: Linear Regression Guide
No ratings yet
Supervised Learning: Linear Regression Guide
31 pages
Machine Learning Module 2 Notes
No ratings yet
Machine Learning Module 2 Notes
12 pages
Controlling 3-Phase AC Induction Motors Using PIC18F4431
No ratings yet
Controlling 3-Phase AC Induction Motors Using PIC18F4431
24 pages
Homemade Chicken Burger Patties Guide
No ratings yet
Homemade Chicken Burger Patties Guide
2 pages
Ceniza Pearl Corp Income Statement 2024
No ratings yet
Ceniza Pearl Corp Income Statement 2024
42 pages
CuNi Thin Film Resistors TCR Study
No ratings yet
CuNi Thin Film Resistors TCR Study
5 pages
Logistic Equation Predictions for Jordan's Population
No ratings yet
Logistic Equation Predictions for Jordan's Population
11 pages
Factors Influencing Global Tourism Growth
No ratings yet
Factors Influencing Global Tourism Growth
34 pages
Leadership Communication and Decision-Making
No ratings yet
Leadership Communication and Decision-Making
5 pages
Understanding the Three Levels of Ethics
No ratings yet
Understanding the Three Levels of Ethics
3 pages
HemoHIM Plus: Immune Boosting Supplement
No ratings yet
HemoHIM Plus: Immune Boosting Supplement
13 pages
Lossy vs Lossless Compression Explained
No ratings yet
Lossy vs Lossless Compression Explained
12 pages
Delivery Manager Roles and Responsibilities
No ratings yet
Delivery Manager Roles and Responsibilities
4 pages
Overview of Pigmented Lesions
No ratings yet
Overview of Pigmented Lesions
10 pages
Cost of Capital Calculations Guide
No ratings yet
Cost of Capital Calculations Guide
6 pages
Luciferian Gnosis Merchandise Catalog
No ratings yet
Luciferian Gnosis Merchandise Catalog
15 pages
App Initialization and Module Loading Logs
No ratings yet
App Initialization and Module Loading Logs
8 pages
Nexans Expert Tool Kit Compressed
No ratings yet
Nexans Expert Tool Kit Compressed
12 pages
Operations Improvement Strategies Explained
No ratings yet
Operations Improvement Strategies Explained
41 pages
5SL43067RC Datasheet en
No ratings yet
5SL43067RC Datasheet en
5 pages
Mechanization in Rice Production Challenges
No ratings yet
Mechanization in Rice Production Challenges
45 pages
Amdahl's Law and CPU Performance Analysis
No ratings yet
Amdahl's Law and CPU Performance Analysis
26 pages
Flood, Engaging Men and Boys in Violence Prevention. in Gottzen 2020
No ratings yet
Flood, Engaging Men and Boys in Violence Prevention. in Gottzen 2020
20 pages
Rectifiers, Filters, and Regulators Explained
100% (1)
Rectifiers, Filters, and Regulators Explained
48 pages
BCMP Admission Requirements at UP
No ratings yet
BCMP Admission Requirements at UP
7 pages
Business Process Reengineering Exam Guide
No ratings yet
Business Process Reengineering Exam Guide
11 pages
Welded Connection Design and Calculations
No ratings yet
Welded Connection Design and Calculations
5 pages
Exploring Needs of Living Things
No ratings yet
Exploring Needs of Living Things
24 pages
EARP Project Brief for Gakenke Area
No ratings yet
EARP Project Brief for Gakenke Area
61 pages
Chatbot Recovery: Warmth vs. Competence
No ratings yet
Chatbot Recovery: Warmth vs. Competence
21 pages
TC Electronic M-100 User Manual
No ratings yet
TC Electronic M-100 User Manual
21 pages
On-Site Problem Categorization in SCM
No ratings yet
On-Site Problem Categorization in SCM
22 pages

8 Machine Learning Models Overview

Uploaded by

8 Machine Learning Models Overview

Uploaded by

8 Machine Learning Models Explained in 20

Why use Machine Learning Models?

Machine learning has a variety of use-cases in different domains.

Recommendation systems add direct business value to these companies

Similarly, a mobile service provider might use machine learning to analyze

All machine learning models can be classified as supervised or unsupervised.

Supervised machine learning models can then be further classified into

Machine Learning Regression Models

Regression algorithms are used to predict a continuous outcome (y) using

For example, look at the table below:

A common misconception by data science beginners is that a regression

1. Mean Absolute Error:

In this case, for instance, the RMSE is √54,520.25=233.5. This value is

Simple Linear Regression

Linear regression is a linear approach to modeling the relationship between a

Here is a visual representation of how a simple linear regression model

Let’s take a simple example to understand the concept of overfitting:

In simple words, a model that is highly complex will pick up on unnecessary

A linear regression model with large coefficients is prone to overfitting.

Ridge regression is a regularization technique that will force the algorithm to

Notice that there is a value (lambda) multiplied to the model’s coefficients.

What is the optimal lambda value for ridge regression?

When choosing a lambda value, make sure to strike a balance between

Lasso regression is another extension of linear regression that shrinks model

Here is the error that needs to be minimized in lasso regression:

If an independent variable’s coefficient reaches zero, the feature can be

Due to this, lasso regression can also be used as a feature selection

How to Build a Regression Machine Learning Model in Python

from sklearn.linear_model import LinearRegression

To fit the model on your training dataset, run:

from sklearn.linear_model import Ridge

from sklearn.linear_model import Lasso

Machine Learning Classification Models

We use Classification algorithms to predict a discrete outcome (y) using

For example, predicting whether a patient is likely to develop heart disease

The table above showcases a classification problem with four independent

Other examples of a binary classification problem include classifying whether

A multiclass classification problem is one with three or more possible

There are many ways to evaluate a classification model. While accuracy is

Let’s look at some common methods used to evaluate a classification

1. Accuracy: Accuracy can be defined as the fraction of correct predictions

The formula to calculate accuracy is:

In this case, the accuracy is 46, or 0.67.

2. Precision: Precision is a metric used to calculate the quality of positive

The above model has a precision of 24, or 0.5.

3. Recall: Recall is used to calculate the quality of negative predictions

The above model has a recall of 2/2 or 1.

In this disease prediction scenario, we always want to identify people with

On the other hand, if we were to build a model that prevents malicious

Logistic regression is a simple classification model that predicts the

Here is an example of how the logistic regression model works:

Observe that, unlike the linear regression algorithm, logistic regression is

In this spam email example, if the text contains little to no suspicious

This probability is then turned into a classification outcome:

KNN is a classification algorithm that classifies a data point based on what

Here is a simple example to demonstrate how the K-Nearest Neighbors

The K-Nearest Neighbors algorithm works like this:

 Step 1: The model first stores all the training data.

Build a Classification Model in Python

from sklearn.linear_model import LogisticRegression

from [Link] import KNeighborsClassifier

Machine Learning Tree-Based Models

Tree-based models are supervised machine learning algorithms that

A decision tree is the simplest tree-based machine learning algorithm. This

Here is a simple example demonstrating how the decision tree algorithm

The random forest model is a tree-based algorithm that helps us mitigate

It does this in two steps:

One of the biggest advantages of the random forest algorithm is that it

Build a Tree-Based Model in Python

Run the following lines of code to build a tree-based machine learning

from [Link] import DecisionTreeClassifier

from [Link] import DecisionTreeRegressor

from [Link] import RandomForestClassifier

from [Link] import RandomForestRegressor

Machine Learning Clustering

So far, we’ve explored supervised machine learning models to tackle