0% found this document useful (0 votes)

13 views39 pages

Understanding Machine Learning Basics

Machine learning is a branch of AI that uses data and algorithms to mimic human learning and improve prediction accuracy across various fields such as finance, entertainment, and insurance. Key processes in machine learning include data importing, exploratory data analysis, model selection, training, testing, and deployment, with algorithms categorized into supervised and unsupervised learning. Techniques like linear regression and K-nearest neighbors are commonly used for prediction and classification tasks.

Uploaded by

adityadas8512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views39 pages

Understanding Machine Learning Basics

Uploaded by

adityadas8512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine Learning

What is Machine Learning ?

● Machine learning is a branch of artificial

intelligence (AI) and computer science
which focuses on the use of data and
algorithms to imitate the way that
humans learn, gradually improving its
accuracy.
● It is the use of data to help the computer
understand various patterns which helps
it make better predictions (better
accuracy).
Why Machine learning ?
Machine Learning helps us make better predictions. It is like knowing
approximately what’s going to happen in the future. Almost every field now
used machine learning. Let’s Discuss a few.
● Stock Market Price Predictions (Finance)
● Recommendation System (Entertainment / Sales)
● Customer Segmentation (All Fields)
● Fraud Detection (Insurance)
Stock Market Price Predictions (Finance)
Stock Market has huge variations in the
prices every second and tons is generated on
a daily basis, understanding the data, deriving
useful information and patterns and then
making predictions is one of the use cases of
Machine Learning.
Recommendation System (Entertainment / Sales)
A system that looks into the various
factors it is provided with and then makes
similar suggestions based on that.
The recommendations made by the
program, can be judged on the basis of
how appealing are they to the customers.
If it works perfectly, then the GIF on the
right comes true.
Customer Segmentation (All Fields)
This system helps us segregate the
customers on the basis of various
factors and then cater to them
separately so that we can get the best
results.
We again use various factors to identify
and determine which customer falls into
which group.
Fraud Detection (Insurance)
The system predicts if a particular
transactions seems to be a fraud or not. This
is one of the famous use cases for ML.
With the advancement in technology, this
topic has gain a lot of attention. It is
becoming normal transactions are difficult
to differentiate from the fraudulent ones as
the days pass by.
Basic Requirements

● You must know how to code. (in any language) –

Python is best
● Basic understanding of statistics (applied statistics)
● Basic understanding of mathematics (applied math)
Step in Machine Learning
Importing Data
EDA Model selection
the data Transformation

Training the
Testing the Model Deployment
Model
Importing the Data
● The data can be in various formats and distributed amongst various files.
Combining them is one of the task that is crucial before processing it any
further.
● Many a times the data is unstructured, understanding the flow, pattern
and converting it into a consistent format is another task that comes
under this process.
● Data Cleaning : Sometimes there are missing values in the data that
need to be taken care of, there are various of doing so.
Exploratory Data Analysis (EDA)
● Viewing the data in various ways to understand the
structure.
● Understanding how the values distributed in the columns
● Use Visual and Non-Visual Methods to understand the
data.
● Make a note of all the inferences, insights and
assumptions that you gain or make for the visuals.
Data Transformation
● If there is any discrepancy in the data type of the column
values or you need to modify the data into a consistent
format.
● Sometimes you need to make different column values from
the existing ones or remove a few columns as they don’t add
any information that helps us build a better model.
Model selection
After understanding how the data is and transforming it in the
best way possible for further computation. Choosing the
model is an important task. Just because a complex model
gives us better accuracy (generally) we don’t start of with
them.
We start of with the weak models, and gradually move up the
ladder. Each model is selected trained, tested and tuned till
we feel that that’s the best that the model can do on this data
set.
Training the Model
Each and every model selected is trained over multiple times in
multiple ways to understand why one training set is better than
the other or why one model performs better over the other.
Testing the Model
We can only understand how good the model is by testing its learning on
data that it has not been trained on.
The training data has to be unseen by the model to understand how well
the model is able to work on data that it hasn’t seen before.
Deployment
After being satisfied with the results we deploy the model in the
real world to see how it works on the data from the real world.
The task doesn’t end here. If the model works fine we try to
make it work better. If the model fails to work, then we take it up
again and go through the entire cycle all over again. It’s a
continuous process.
Types of machine learning algorithms
Supervised
When the data set that we are working with has labels in it which tells us
which all column values represent which category or a continuous value, we
perform a supervised task on it.
Regression:
When the value that you want to predict is of continuous type.
Classification:
When the value that you trying to predict is a category.
Regression Classification
Task Task
Unsupervised
If the target variable or the value that we are trying to predict is not available,
then we perform an unsupervised task.
Clustering :
Where you group similar points together.
Association:
Where you try to find a pattern and try to recommend
Clustering Association
What is Linear Regression?
Linear regression performs the task to
predict a dependent variable value (y)
based on a given independent variable
(x). So, this regression technique finds
out a linear relationship between x
(input) and y(output). Hence the name,
Linear Regression.
Formula
Since we are trying to build a linear
relation between the the 2 variables, we
use the formula for a straight line.
The same formula can be written as
y =mx+c.
● c = θ1 = Intercept.
● m = θ2 = Slope
How do we update the θ1 and θ2 values?
To find the best fit line we need to have the
best θ1 and θ2 values. In order to find that
we need to minimise the cost function (J).
The cost function represents the difference
value between predicted and actual.
Since the predicted and actual difference
might have positive and negative values
iteratively we square the error to make it
positive. (MSE)
How do we understand if the line is best fit.
θ1 and θ2 are randomly selected at first and then optimised using Gradient
Descent on the Cost Function. Now when the cost function is minimum we
consider that to be the best fit line .
Other ways of evaluation
MAE : Mean Absolute Error, taking the
absolute value of the error value.
RMSE: Mean Squared Error can be
difficult to interpret at times when we
are dealing with large values. Taking the
root of the same gives us better
understanding.
Other ways of evaluation (Cont.)
In R2 we first see what is the variation in
error terms when we fit a line to the R2 = Var (mean) - Var(line)
mean of the distribution. Then we fit our Var(mean)
line and see how much variation was
explained by the new fit line. The higher
the variation explained the better is the
line. The value is always between 0 and
1.
Example for R2
Consider the Var(mean) = 32; and the Var(line) =6
R2 = Var (mean) - Var(line)
Var(mean)
R2 = 32 - 6
32
R2 = 26/32 = 0.8125

That means the line explains 81.25% of the variance, the remaining is considered as
error and which can’t be explained.
Classification
What is Logistic Regression?
Logistic regression is basically a
supervised classification algorithm. In
this analytics approach, the dependent
variable is finite or categorical: either
A or B (binary regression) or a range
of finite options A, B, C or D
(multinomial regression).
It is used in statistical software to
understand the relationship between
the dependent variable and one or
more independent variables by
estimating probabilities using a logistic
regression equation.
Evaluate Logistic Regressions

Confusion matrix is a good way to have a look

at the correctly identified classes and
misclassified classes.

Using the values from there we can find the

accuracy. The formula for that is total number
of correctly classified records divided by the
total number of records.
Additional Information

Stratified Sampling : when there is a class imbalance it’s best to use stratified
sampling, this makes sure that the test data and train data have an equal
distributions in terms of class proportions.

Example:

● Total number of classes : {0: 100 , 1:50}

● Considering the test size to be 20%:
○ Test records for the model {0 : 20, 1:10}
What is KNN?

KNN stands for K nearest Neighbours.

Now k in nothing but a placeholder,
which depicts the number of neighbours
you want to take into consideration.
Example k =3, I am going to take the 3
most nearest neighbours.

How do we measure which element is

close, we use some distance measure
to decide that.
Introduction to KNN

K-Nearest Neighbour is one of the simplest Machine Learning

algorithms based on Supervised Learning technique
K-NN algorithm assumes the similarity between the new case/data
and available cases and put the new case into the category that is
most similar to the available categories.
K-NN algorithm stores all the available data and classifies a new
data point based on the similarity. This means when new data
appears then it can be easily classified into a well suite category by
using K- NN algorithm.
K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification problems
It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and at
Steps of KNN:
Step 1 : load the training as well as test data.

Step 2 : choose the value of K i.e. the nearest data points. K can be any integer.

Step 3 − For each point in the test data do the following −

 Calculate the distance between test data and each row of training data with the help of
Euclidean distance
 Take the K nearest neighbors as per the calculated Euclidean distance.
 Among these k neighbors, count the number of the data points in each category.
 Assign the new data points to that category for which the number of the neighbor is maximum.

Step 4 : Model is ready

How to select value of K:
 Selecting the right K value is a process called parameter tuning, which is important to achieve higher
accuracy.

 There is not a definitive way to determine the best value of K

 It depends on the type of problem you are solving

 Selecting a K value of one or two can be noisy and may lead to outliers in the model, and thus resulting in
overfitting of the model.

 To choose the value of K, take the square root of n (sqrt(n)), where n is the total number of data points.

 Usually, an odd value of K is selected to avoid confusion between two classes of data.
Distance Measures for KNN

Euclidean : The distance is calculated

through a straight line between two
points.

Manhattan : The distance is the

summation of the perpendicular
distance and horizontal distance.

Minkowski : it’s the distance between 2

points by using a curved line.
Additional Information

Model Summary:

● Precision : What proportion of positive

identifications was actually correct?
● Recall : What proportion of actual positives
was identified correctly?
● F1 Score : It is calculated from the precision
F1
and recall of the test, The F1 score is the
harmonic mean of the precision and recall.
Additional Information

Grid Search Cross Validation: This is a hyperparameter tuning method where

you put in all the parameter values that you want to train and test your model
with and on the basis of that you get a combination of all the values passed.
You can select the best out of that.

Random Search Cross Validation: This is similar to Grid Search but doesn’t
make a combination of all the values. It make a combination of that values that
are most likely to give you better results. (Best for larger datasets and more
number of parameters)

Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
14 pages
Unit IV Supervised Machine Learning For Financial Data Analysis
No ratings yet
Unit IV Supervised Machine Learning For Financial Data Analysis
60 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
12 pages
Linier & Logistic
No ratings yet
Linier & Logistic
15 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
12 pages
Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
33 pages
Machine Learning Lifecycle Explained
No ratings yet
Machine Learning Lifecycle Explained
20 pages
Lecture 2 - Student V - 094715
No ratings yet
Lecture 2 - Student V - 094715
11 pages
Understanding Machine Learning Types
No ratings yet
Understanding Machine Learning Types
244 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
40 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
52 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
48 pages
Machine Learning Notes: Md. Mehedi Hasan
No ratings yet
Machine Learning Notes: Md. Mehedi Hasan
61 pages
Chap 2 ML
No ratings yet
Chap 2 ML
14 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
38 pages
Unit 4 Supervised Machine Learning 17052022
No ratings yet
Unit 4 Supervised Machine Learning 17052022
169 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
93 pages
Understanding Machine Learning Models
No ratings yet
Understanding Machine Learning Models
49 pages
Machine Learning Regression Overview
No ratings yet
Machine Learning Regression Overview
24 pages
Supervised Machine Learning1 (AutoRecovered)
No ratings yet
Supervised Machine Learning1 (AutoRecovered)
53 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
13 pages
Foundational Machine Learning Concepts
No ratings yet
Foundational Machine Learning Concepts
22 pages
Linear Regression and Classification Methods
No ratings yet
Linear Regression and Classification Methods
38 pages
Machine Learning Fundamentals Explained
No ratings yet
Machine Learning Fundamentals Explained
24 pages
Types of Machine Learning Explained
No ratings yet
Types of Machine Learning Explained
50 pages
Comprehensive Guide to Machine Learning Models
No ratings yet
Comprehensive Guide to Machine Learning Models
59 pages
Machine Learning Life Cycle Explained
No ratings yet
Machine Learning Life Cycle Explained
126 pages
ML Exam Ready Notes - MD
No ratings yet
ML Exam Ready Notes - MD
22 pages
Mla M1
No ratings yet
Mla M1
39 pages
Understanding Supervised Machine Learning
No ratings yet
Understanding Supervised Machine Learning
45 pages
AI Unit1 Unit2 Complete Notes
No ratings yet
AI Unit1 Unit2 Complete Notes
46 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
20 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
75 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
80 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
37 pages
Unit 1
No ratings yet
Unit 1
82 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
20 pages
Unit 2 - Machine Learning Notes
No ratings yet
Unit 2 - Machine Learning Notes
43 pages
MIT414 01 Notes
No ratings yet
MIT414 01 Notes
5 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
19 pages
Chapter - 01 - Introduction To ML
No ratings yet
Chapter - 01 - Introduction To ML
60 pages
Overview of Supervised Learning
No ratings yet
Overview of Supervised Learning
24 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
5 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
19 pages
Regression and Logistic Models Explained
No ratings yet
Regression and Logistic Models Explained
46 pages
Supervised Learning: Regression & Classification
No ratings yet
Supervised Learning: Regression & Classification
32 pages
Unsupervised Learning in Machine Learning
No ratings yet
Unsupervised Learning in Machine Learning
49 pages
Understanding Machine Learning Types
No ratings yet
Understanding Machine Learning Types
63 pages
ML - Supervised and Unsupervised Learning
No ratings yet
ML - Supervised and Unsupervised Learning
146 pages
Types of Machine Learning Models Explained
No ratings yet
Types of Machine Learning Models Explained
48 pages
Fundamentals of Machine Learning Unit 1
No ratings yet
Fundamentals of Machine Learning Unit 1
9 pages
Independence of Events in Machine Learning
No ratings yet
Independence of Events in Machine Learning
39 pages
Types and Techniques in Machine Learning
No ratings yet
Types and Techniques in Machine Learning
9 pages
FINAL Machine - Learning - COUSE HAND OUT Good One
No ratings yet
FINAL Machine - Learning - COUSE HAND OUT Good One
14 pages
Electric Vehicle Market Research Insights
100% (1)
Electric Vehicle Market Research Insights
12 pages
Data Analysis Course by Elisa Omodei
No ratings yet
Data Analysis Course by Elisa Omodei
3 pages
Single Exponential Smoothing for Sales Forecasting
No ratings yet
Single Exponential Smoothing for Sales Forecasting
9 pages
Cmmi Checklist v1.3
100% (2)
Cmmi Checklist v1.3
129 pages
Manish Gawande: Data Analyst Profile
No ratings yet
Manish Gawande: Data Analyst Profile
1 page
Impact of Quality and Price on Daihatsu Sales
No ratings yet
Impact of Quality and Price on Daihatsu Sales
11 pages
Test Bank for Business Statistics 13th Ed.
No ratings yet
Test Bank for Business Statistics 13th Ed.
21 pages
ECO-2400 Problem Set 4 Instructions
No ratings yet
ECO-2400 Problem Set 4 Instructions
1 page
Data Warehouse & Mining Exam Solutions
No ratings yet
Data Warehouse & Mining Exam Solutions
34 pages
Data Scientist Resume of Sanket Pawar
No ratings yet
Data Scientist Resume of Sanket Pawar
2 pages
Chetan Goudar: Data Analyst Profile
No ratings yet
Chetan Goudar: Data Analyst Profile
2 pages
Minitab Statistical Software Features
No ratings yet
Minitab Statistical Software Features
9 pages
Linear Regression Exercises and Analysis
No ratings yet
Linear Regression Exercises and Analysis
2 pages
Regression Analysis and Financial Metrics
No ratings yet
Regression Analysis and Financial Metrics
306 pages
BERF Action Research Proposal Template
86% (7)
BERF Action Research Proposal Template
6 pages
FDI Inflows to China from ASEAN Analysis
No ratings yet
FDI Inflows to China from ASEAN Analysis
16 pages
Final Project of AIML
No ratings yet
Final Project of AIML
4 pages
Understanding Big Data Concepts
No ratings yet
Understanding Big Data Concepts
7 pages
Business Plan for Meat Processing in Kumasi
No ratings yet
Business Plan for Meat Processing in Kumasi
66 pages
Worcester Police Racial Equity Audit 2024
No ratings yet
Worcester Police Racial Equity Audit 2024
127 pages
Rural Banking Schemes in India
0% (1)
Rural Banking Schemes in India
13 pages
Ordinal and Multinomial Models
100% (1)
Ordinal and Multinomial Models
58 pages
Big Data Weather Forecasting Tutorial
No ratings yet
Big Data Weather Forecasting Tutorial
72 pages
Teaching Strategies for Grade 5 Math
No ratings yet
Teaching Strategies for Grade 5 Math
7 pages
Anil Mansukhani: Marketing Strategy Expert
No ratings yet
Anil Mansukhani: Marketing Strategy Expert
3 pages
Indonesian Thesis Abstract Translation Study
No ratings yet
Indonesian Thesis Abstract Translation Study
20 pages
OOS Investigations: FDA Guidance Overview
No ratings yet
OOS Investigations: FDA Guidance Overview
14 pages
ARCH Effect Explained (Excel)
100% (2)
ARCH Effect Explained (Excel)
7 pages
Understanding Big Data Concepts
No ratings yet
Understanding Big Data Concepts
87 pages
Data Scientist Resume - Kiran Kumar Kannan
No ratings yet
Data Scientist Resume - Kiran Kumar Kannan
1 page

Understanding Machine Learning Basics

Uploaded by

Understanding Machine Learning Basics

Uploaded by

Machine Learning

What is Machine Learning ?

● Machine learning is a branch of artificial

● You must know how to code. (in any language) –

Confusion matrix is a good way to have a look

Using the values from there we can find the

● Total number of classes : {0: 100 , 1:50}

KNN stands for K nearest Neighbours.

How do we measure which element is

K-Nearest Neighbour is one of the simplest Machine Learning

Step 3 − For each point in the test data do the following −

Step 4 : Model is ready

 There is not a definitive way to determine the best value of K

 It depends on the type of problem you are solving

Euclidean : The distance is calculated

Manhattan : The distance is the

Minkowski : it’s the distance between 2

● Precision : What proportion of positive

Grid Search Cross Validation: This is a hyperparameter tuning method where

You might also like