0% found this document useful (0 votes)

31 views17 pages

Understanding Machine Learning Basics

Machine Learning is a subset of artificial intelligence focused on developing algorithms that enable computers to learn from data and past experiences. It is classified into three main types: supervised learning, unsupervised learning, and reinforcement learning, each with distinct methods and applications. Key applications include regression analysis for predicting continuous values and classification algorithms for categorizing data into distinct classes.

Uploaded by

Rishabh Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views17 pages

Understanding Machine Learning Basics

Uploaded by

Rishabh Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine Learning

Machine Learning is said as a subset of artificial intelligence that is mainly concerned with the
development of algorithms which allow a computer to learn from the data and past experiences
on their own.

With the help of sample historical data, which is known as training data, machine learning being
explicitly programmed. Machine learning brings computer science and statistics together for
creating predictive models. Machine learning constructs or uses the algorithms that learn from
historical data.

A machine has the ability to learn if it can improve its performance by gaining more data.

A Machine Learning system learns from historical data, builds the prediction models, and
whenever it receives new data, predicts the output for it. The accuracy of predicted output
depends upon the amount of data, as the huge amount of data helps to build a better model
which predicts the output more accurately.

Features of Machine Learning:

o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge
amount of the data.
Classification of Machine Learning
At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it
predicts the output.

The system creates a model using labeled data to understand the datasets and learn about
each data, once the training and processing are done then we test the model by providing
a sample data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.

The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two
categories of algorithms:

o Clustering
o Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent
gets a reward for each right action and gets a penalty for each wrong action. The agent
learns automatically with these feedbacks and improves its performance. In reinforcement
learning, the agent interacts with the environment and explores it. The goal of an agent
is to get the most reward points, and hence, it improves its performance.
Applications of Machine learning
Supervised Learning

Regression Analysis
Regression analysis is a statistical method to model the relationship between a dependent (target)
and independent (predictor) variables with one or more independent variables. More specifically,
Regression analysis helps us to understand how the value of the dependent variable is changing
corresponding to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.

Example: Suppose there is a marketing company A, who does various advertisement

every year and get sales on that. The below list shows the advertisement made by the
company in the last 5 years and the corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2019 and wants to know
the prediction about the sales for this year. So to solve such type of prediction problems in
machine learning, we need regression analysis.
In Regression, we plot a graph between the variables which best fits the given datapoints, using
this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum."

Some examples of regression can be as:

o Prediction of rain using temperature and other factors

o Determining Market trends
o Prediction of road accidents due to rash driving.

Terminologies Related to the Regression Analysis:

o Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target
variable.
o Independent Variable: The factors which affect the dependent variables or which
are used to predict the values of the dependent variables are called independent
variable, also called as a predictor.
o Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result,
so it should be avoided.
o Underfitting and Overfitting: If our algorithm works well with the training dataset
but not well with test dataset, then such problem is called Overfitting. And if our
algorithm does not perform well even with training dataset, then such problem is
called underfitting.
Linear Regression:

o It is one of the very simple and easy algorithms which works on regression and
shows the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable
(X-axis) and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple
linear regression. And if there is more than one input variable, then such linear
regression is called multiple linear regression.

o Below is the mathematical equation for Linear regression:

Y= aX+b

Here, Y = dependent variables (target variables),

X= Independent variables (predictor variables),
a and b are the linear coefficients

Example: Prediction of house price based upon size

Decision Tree Regression:
o Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal node represents
the "test" for an attribute, each branch represent the result of the test, and each leaf node
represents the final decision or result.
o A decision tree is constructed starting from the root node/parent node (dataset), which
splits into left and right child nodes (subsets of dataset). These child nodes are further
divided into their children node, and themselves become the parent node of those nodes.

Example:
Here The model is trying to predict whether the person is fit or not
Classification Algorithm

Supervised Machine Learning algorithm can be broadly classified into Regression and
Classification Algorithms. In Regression algorithms, we have predicted the output for continuous
values, but to predict the categorical values, we need Classification algorithms.

The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program learns from
the given dataset or observations and then classifies new observation into a number of classes or
groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.

Unlike regression, the output variable of Classification is a category, not a value, such as "Green
or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised learning technique,
hence it takes labeled input data, which means it contains input with corresponding output.
The algorithm which implements the classification on a dataset is known as a classifier.
There are two types of Classifications:

o Binary Classifier: If the classification problem has only two possible outcomes,
then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
o Multi-class Classifier: If a classification problem has more than two outcomes,
then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.
Logistic Regression in Machine Learning

o Logistic regression predicts the output of a categorical dependent variable.

Therefore, the outcome must be a categorical or discrete value. It can be either Yes
or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it
gives the probabilistic values which lie between 0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how they
are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight,
etc.
K-Nearest Neighbor (KNN) Algorithm for
Machine Learning
o K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
o K-NN algorithm stores all the available data and classifies a new data point based
on the similarity. This means when new data appears then it can be easily classified
into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly
it is used for the Classification problems.
o It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
KNN Algorithm working
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories. To solve this type of problem, we need
a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each
category.
o Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
o Step-6: Our model is ready.
Support Vector Machine

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
primarily, it is used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.

Types of SVM
SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear SVM
classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.

Common questions

The K-Nearest Neighbor (KNN) algorithm works as follows: 1) Select the number of K neighbors; 2) Calculate the Euclidean distance between the new data point and all available points in the dataset; 3) Choose the K nearest neighbors based on the calculated distances; 4) Count the number of data points in each category among these K neighbors; 5) Assign the new data point to the category with the maximum neighbors. This makes KNN a lazy learner because it does not build an internal model until a classification is required .

Logistic regression predicts discrete, categorical outcomes by computing probabilistic values between 0 and 1 using an S-shaped logistic function, making it suitable for binary classification problems like determining if an email is spam or not. Linear regression, however, models the relationship between continuous input and output variables to predict real values, such as predicting house prices based on size. While logistic regression categorizes data, linear regression is used for assessing the relationship between variables .

Overfitting occurs when a machine learning model performs well on the training data but poorly on unseen test data. This happens because the model has learned not only the underlying patterns but also the noise in the training data. Underfitting, however, occurs when a model performs poorly on both training and test datasets because it is too simple to capture the underlying pattern of the data. This typically indicates that the model has not learned sufficiently from the training data .

Supervised learning involves training a machine learning model on a labeled dataset, where each input has a corresponding output. The goal is to map the input data to the output data, often used in tasks like classification and regression, such as spam filtering. Unsupervised learning, on the other hand, works with unlabeled data, and the algorithm tries to find patterns or groupings inherently present in the data without specific supervision, often used in clustering and association tasks .

Classification algorithms in machine learning can be broadly categorized into binary classifiers and multi-class classifiers. Binary classifiers handle problems with two possible outcomes, such as 'Spam' or 'Not Spam'. Multi-class classifiers, however, deal with situations having more than two outcomes, like classifying types of crops or genres of music. Both types use labeled training data to learn and predict the categories of new data points .

Outliers in regression analysis are observations significantly distant from other data points. They can distort or impact the calculated relationships between variables, leading to biased or misleading results. To mitigate these effects, methods such as removing outliers, transforming data, or using robust regression techniques that reduce the influence of outliers can be employed. Identifying and handling outliers properly is crucial for building accurate and reliable regression models .

Predictive algorithms are crucial in machine learning as they enable computers to learn from historical data and make informed predictions for new data. These algorithms benefit significantly from larger datasets as the extensive data helps refine the model with more patterns and correlations, resulting in improved accuracy and the ability to generalize better to new, unseen data. More data typically leads to the development of more robust and reliable predictive models .

Reinforcement learning differs from supervised and unsupervised learning by focusing on learning through interaction with an environment to maximize rewards. The agent receives feedback in the form of rewards or penalties based on its actions, learning a policy to select actions to achieve the highest reward over time. This is unlike supervised learning, where the model learns from labeled data, or unsupervised learning, where it finds patterns in unlabeled data without direct feedback. Reinforcement learning is particularly useful for dynamic systems like robotics or game playing .

In Support Vector Machines (SVM), a hyperplane is used to segregate the data into different classes. It is a decision boundary that optimally separates data points of different classes with the maximum margin. For linearly separable data, a single straight hyperplane is used, while for non-linear data, SVM uses kernel tricks to transform the data into higher dimensions where a linear hyperplane can then separate the classes, thus classifying data accurately .

Decision trees can be used for both classification and regression problems. In classification, they partition data into classes based on feature values, represented in a tree-like model where each node denotes a test on an attribute. In regression, they estimate outcomes for continuous labels similarly but focus on minimizing the prediction error for numerical outputs. Despite these differences, both applications involve splitting data recursively according to feature thresholds and organizing results into a hierarchical structure readable as a tree .

Understanding Machine Learning Types
No ratings yet
Understanding Machine Learning Types
44 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
38 pages
Overview of Machine Learning Types
No ratings yet
Overview of Machine Learning Types
47 pages
Unit 3 AI& ML
No ratings yet
Unit 3 AI& ML
27 pages
Understanding Machine Learning Types
No ratings yet
Understanding Machine Learning Types
8 pages
Understanding Supervised Learning Basics
No ratings yet
Understanding Supervised Learning Basics
47 pages
Understanding Supervised Machine Learning
No ratings yet
Understanding Supervised Machine Learning
45 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
32 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
64 pages
Machine Learning IV
No ratings yet
Machine Learning IV
79 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
10 pages
Understanding Machine Learning Types
No ratings yet
Understanding Machine Learning Types
63 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
21 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
18 pages
Unit IV Supervised Machine Learning For Financial Data Analysis
No ratings yet
Unit IV Supervised Machine Learning For Financial Data Analysis
60 pages
Understanding Machine Learning Types
No ratings yet
Understanding Machine Learning Types
8 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
12 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
42 pages
ML Chapter1 English
No ratings yet
ML Chapter1 English
14 pages
Machine Learning Fundamentals Explained
No ratings yet
Machine Learning Fundamentals Explained
41 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
100 pages
Unsupervised Learning in Machine Learning
No ratings yet
Unsupervised Learning in Machine Learning
49 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
115 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
61 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
9 pages
Supervised Learning: Regression vs Classification
No ratings yet
Supervised Learning: Regression vs Classification
14 pages
Understanding Machine Learning Types
No ratings yet
Understanding Machine Learning Types
50 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
29 pages
Types of Machine Learning Explained
No ratings yet
Types of Machine Learning Explained
5 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
45 pages
Introduction to Supervised Learning
No ratings yet
Introduction to Supervised Learning
35 pages
Fundamentals of Machine Learning Unit 1
No ratings yet
Fundamentals of Machine Learning Unit 1
9 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
62 pages
Python for Machine Learning Basics
No ratings yet
Python for Machine Learning Basics
78 pages
Supervised Machine Learning Explained
No ratings yet
Supervised Machine Learning Explained
47 pages
Supervised Learning Fundamentals
No ratings yet
Supervised Learning Fundamentals
17 pages
Understanding Linear Regression in ML
No ratings yet
Understanding Linear Regression in ML
67 pages
Understanding Machine Learning Types
No ratings yet
Understanding Machine Learning Types
12 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
62 pages
Overview of Supervised Learning
No ratings yet
Overview of Supervised Learning
24 pages
S5 Machine Learning Course Overview
No ratings yet
S5 Machine Learning Course Overview
53 pages
Ai Module 3
No ratings yet
Ai Module 3
35 pages
Regression vs Classification in ML
No ratings yet
Regression vs Classification in ML
71 pages
Machine Learning Overview and Types
No ratings yet
Machine Learning Overview and Types
63 pages
Understanding Estimators in ML
100% (2)
Understanding Estimators in ML
38 pages
Week 5
No ratings yet
Week 5
27 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
20 pages
Machine Learning
No ratings yet
Machine Learning
100 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
58 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
27 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
80 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
445 pages
Intro To Choice of Methods
No ratings yet
Intro To Choice of Methods
69 pages
Types of Machine Learning Explained
No ratings yet
Types of Machine Learning Explained
14 pages
FALLSEM2025 26 VL BMEE407L 00100 TH 2025-09-27 Module 4 General Introduction
No ratings yet
FALLSEM2025 26 VL BMEE407L 00100 TH 2025-09-27 Module 4 General Introduction
19 pages
Topic 1 ISP565
No ratings yet
Topic 1 ISP565
58 pages
Machine Learning Course in Tamil
No ratings yet
Machine Learning Course in Tamil
7 pages
Plant Leaf Recognition via Deep Learning
No ratings yet
Plant Leaf Recognition via Deep Learning
14 pages
Image Classification and Deep Learning
No ratings yet
Image Classification and Deep Learning
35 pages
Introduction To Data Mining: Natasha Balac, PH.D
No ratings yet
Introduction To Data Mining: Natasha Balac, PH.D
60 pages
Aerial 3D Building Detection and Modeling From Airborne Lidar Point Clouds
No ratings yet
Aerial 3D Building Detection and Modeling From Airborne Lidar Point Clouds
10 pages
HipXNet: Detecting Aseptic Loosening in X-Rays
No ratings yet
HipXNet: Detecting Aseptic Loosening in X-Rays
15 pages
Key Machine Learning Concepts and Questions
No ratings yet
Key Machine Learning Concepts and Questions
2 pages
Handwritten Character Recognition Survey
No ratings yet
Handwritten Character Recognition Survey
5 pages
Classification Techniques and Decision Trees
No ratings yet
Classification Techniques and Decision Trees
17 pages
Loan Approval Prediction Using ML Techniques
No ratings yet
Loan Approval Prediction Using ML Techniques
36 pages
Types of Prediction Errors Explained
No ratings yet
Types of Prediction Errors Explained
9 pages
Be The Outlier - How To Ace Data Science Interviews - Shrilata Murthy
100% (2)
Be The Outlier - How To Ace Data Science Interviews - Shrilata Murthy
150 pages
Road Maintenance Management Guide
No ratings yet
Road Maintenance Management Guide
109 pages
AI Patent Classification with D2SBERT
No ratings yet
AI Patent Classification with D2SBERT
8 pages
Hierarchical Clustering Insights
No ratings yet
Hierarchical Clustering Insights
48 pages
Ai ML Sample Paper 2 by Diplomawallah DW
No ratings yet
Ai ML Sample Paper 2 by Diplomawallah DW
4 pages
Data Warehouse and Mining Overview
No ratings yet
Data Warehouse and Mining Overview
11 pages
Advanced Predictive Analytics 2024
No ratings yet
Advanced Predictive Analytics 2024
39 pages
Tennis Game Analysis via Data Mining
No ratings yet
Tennis Game Analysis via Data Mining
21 pages
Deep Learning Exam Questions Guide
No ratings yet
Deep Learning Exam Questions Guide
3 pages
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
No ratings yet
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
21 pages
Image Pattern Classification Guide
No ratings yet
Image Pattern Classification Guide
3 pages
Signature Identification Techniques Overview
No ratings yet
Signature Identification Techniques Overview
19 pages
MLT Machine Learning Course Syllabus
No ratings yet
MLT Machine Learning Course Syllabus
1 page
MSCIS Semester-III Assignment Guidelines
No ratings yet
MSCIS Semester-III Assignment Guidelines
6 pages
AL3451 Machine Learning Syllabus
No ratings yet
AL3451 Machine Learning Syllabus
2 pages
ML 6
No ratings yet
ML 6
24 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
44 pages
ASL Recognition Summary
No ratings yet
ASL Recognition Summary
3 pages

Understanding Machine Learning Basics

Uploaded by

Understanding Machine Learning Basics

Uploaded by

Machine Learning

Features of Machine Learning:

Supervised learning can be grouped further in two categories of algorithms:

Example: Suppose there is a marketing company A, who does various advertisement

Some examples of regression can be as:

o Prediction of rain using temperature and other factors

Terminologies Related to the Regression Analysis:

o Below is the mathematical equation for Linear regression:

Here, Y = dependent variables (target variables),

Example: Prediction of house price based upon size

o Logistic regression predicts the output of a categorical dependent variable.

o Step-1: Select the number K of the neighbors

Common questions

Can you explain the working mechanism of a K-Nearest Neighbor (KNN) algorithm?

Describe how logistic regression differs from linear regression and provide an example of each.

How does the concept of overfitting differ from underfitting in machine learning models?

What are the key differences between supervised learning and unsupervised learning in machine learning?

Identify the types of classification algorithms and distinguish between a binary classifier and a multi-class classifier with examples.

What impacts do outliers have on regression analysis and how can they be mitigated?

What is the importance of predictive algorithms in machine learning and how do they benefit from larger datasets?

How does reinforcement learning differ from supervised and unsupervised learning in terms of goal and mechanism?

Explain the role of hyperplanes in Support Vector Machines and how they classify data.

How are decision trees used in both classification and regression problems, and what similarities do they share in these applications?

You might also like