0% found this document useful (0 votes)
31 views17 pages

Understanding Machine Learning Basics

Machine Learning is a subset of artificial intelligence focused on developing algorithms that enable computers to learn from data and past experiences. It is classified into three main types: supervised learning, unsupervised learning, and reinforcement learning, each with distinct methods and applications. Key applications include regression analysis for predicting continuous values and classification algorithms for categorizing data into distinct classes.

Uploaded by

Rishabh Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views17 pages

Understanding Machine Learning Basics

Machine Learning is a subset of artificial intelligence focused on developing algorithms that enable computers to learn from data and past experiences. It is classified into three main types: supervised learning, unsupervised learning, and reinforcement learning, each with distinct methods and applications. Key applications include regression analysis for predicting continuous values and classification algorithms for categorizing data into distinct classes.

Uploaded by

Rishabh Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning

Machine Learning is said as a subset of artificial intelligence that is mainly concerned with the
development of algorithms which allow a computer to learn from the data and past experiences
on their own.

With the help of sample historical data, which is known as training data, machine learning being
explicitly programmed. Machine learning brings computer science and statistics together for
creating predictive models. Machine learning constructs or uses the algorithms that learn from
historical data.

A machine has the ability to learn if it can improve its performance by gaining more data.

A Machine Learning system learns from historical data, builds the prediction models, and
whenever it receives new data, predicts the output for it. The accuracy of predicted output
depends upon the amount of data, as the huge amount of data helps to build a better model
which predicts the output more accurately.

Features of Machine Learning:


o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge
amount of the data.
Classification of Machine Learning
At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it
predicts the output.

The system creates a model using labeled data to understand the datasets and learn about
each data, once the training and processing are done then we test the model by providing
a sample data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.

The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two
categories of algorithms:

o Clustering
o Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent
gets a reward for each right action and gets a penalty for each wrong action. The agent
learns automatically with these feedbacks and improves its performance. In reinforcement
learning, the agent interacts with the environment and explores it. The goal of an agent
is to get the most reward points, and hence, it improves its performance.
Applications of Machine learning
Supervised Learning

Regression Analysis
Regression analysis is a statistical method to model the relationship between a dependent (target)
and independent (predictor) variables with one or more independent variables. More specifically,
Regression analysis helps us to understand how the value of the dependent variable is changing
corresponding to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.

Example: Suppose there is a marketing company A, who does various advertisement


every year and get sales on that. The below list shows the advertisement made by the
company in the last 5 years and the corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2019 and wants to know
the prediction about the sales for this year. So to solve such type of prediction problems in
machine learning, we need regression analysis.
In Regression, we plot a graph between the variables which best fits the given datapoints, using
this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum."

Some examples of regression can be as:

o Prediction of rain using temperature and other factors


o Determining Market trends
o Prediction of road accidents due to rash driving.

Terminologies Related to the Regression Analysis:


o Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target
variable.
o Independent Variable: The factors which affect the dependent variables or which
are used to predict the values of the dependent variables are called independent
variable, also called as a predictor.
o Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result,
so it should be avoided.
o Underfitting and Overfitting: If our algorithm works well with the training dataset
but not well with test dataset, then such problem is called Overfitting. And if our
algorithm does not perform well even with training dataset, then such problem is
called underfitting.
Linear Regression:

o It is one of the very simple and easy algorithms which works on regression and
shows the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable
(X-axis) and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple
linear regression. And if there is more than one input variable, then such linear
regression is called multiple linear regression.

o Below is the mathematical equation for Linear regression:

Y= aX+b

Here, Y = dependent variables (target variables),


X= Independent variables (predictor variables),
a and b are the linear coefficients

Example: Prediction of house price based upon size


Decision Tree Regression:
o Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal node represents
the "test" for an attribute, each branch represent the result of the test, and each leaf node
represents the final decision or result.
o A decision tree is constructed starting from the root node/parent node (dataset), which
splits into left and right child nodes (subsets of dataset). These child nodes are further
divided into their children node, and themselves become the parent node of those nodes.

Example:
Here The model is trying to predict whether the person is fit or not
Classification Algorithm

Supervised Machine Learning algorithm can be broadly classified into Regression and
Classification Algorithms. In Regression algorithms, we have predicted the output for continuous
values, but to predict the categorical values, we need Classification algorithms.

The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program learns from
the given dataset or observations and then classifies new observation into a number of classes or
groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.

Unlike regression, the output variable of Classification is a category, not a value, such as "Green
or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised learning technique,
hence it takes labeled input data, which means it contains input with corresponding output.
The algorithm which implements the classification on a dataset is known as a classifier.
There are two types of Classifications:

o Binary Classifier: If the classification problem has only two possible outcomes,
then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
o Multi-class Classifier: If a classification problem has more than two outcomes,
then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.
Logistic Regression in Machine Learning

o Logistic regression predicts the output of a categorical dependent variable.


Therefore, the outcome must be a categorical or discrete value. It can be either Yes
or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it
gives the probabilistic values which lie between 0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how they
are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight,
etc.
K-Nearest Neighbor (KNN) Algorithm for
Machine Learning
o K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
o K-NN algorithm stores all the available data and classifies a new data point based
on the similarity. This means when new data appears then it can be easily classified
into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly
it is used for the Classification problems.
o It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
KNN Algorithm working
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories. To solve this type of problem, we need
a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:

The K-NN working can be explained on the basis of the below algorithm:

o Step-1: Select the number K of the neighbors


o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each
category.
o Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
o Step-6: Our model is ready.
Support Vector Machine

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
primarily, it is used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane.

Types of SVM
SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear SVM
classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.

Common questions

Powered by AI

The K-Nearest Neighbor (KNN) algorithm works as follows: 1) Select the number of K neighbors; 2) Calculate the Euclidean distance between the new data point and all available points in the dataset; 3) Choose the K nearest neighbors based on the calculated distances; 4) Count the number of data points in each category among these K neighbors; 5) Assign the new data point to the category with the maximum neighbors. This makes KNN a lazy learner because it does not build an internal model until a classification is required .

Logistic regression predicts discrete, categorical outcomes by computing probabilistic values between 0 and 1 using an S-shaped logistic function, making it suitable for binary classification problems like determining if an email is spam or not. Linear regression, however, models the relationship between continuous input and output variables to predict real values, such as predicting house prices based on size. While logistic regression categorizes data, linear regression is used for assessing the relationship between variables .

Overfitting occurs when a machine learning model performs well on the training data but poorly on unseen test data. This happens because the model has learned not only the underlying patterns but also the noise in the training data. Underfitting, however, occurs when a model performs poorly on both training and test datasets because it is too simple to capture the underlying pattern of the data. This typically indicates that the model has not learned sufficiently from the training data .

Supervised learning involves training a machine learning model on a labeled dataset, where each input has a corresponding output. The goal is to map the input data to the output data, often used in tasks like classification and regression, such as spam filtering. Unsupervised learning, on the other hand, works with unlabeled data, and the algorithm tries to find patterns or groupings inherently present in the data without specific supervision, often used in clustering and association tasks .

Classification algorithms in machine learning can be broadly categorized into binary classifiers and multi-class classifiers. Binary classifiers handle problems with two possible outcomes, such as 'Spam' or 'Not Spam'. Multi-class classifiers, however, deal with situations having more than two outcomes, like classifying types of crops or genres of music. Both types use labeled training data to learn and predict the categories of new data points .

Outliers in regression analysis are observations significantly distant from other data points. They can distort or impact the calculated relationships between variables, leading to biased or misleading results. To mitigate these effects, methods such as removing outliers, transforming data, or using robust regression techniques that reduce the influence of outliers can be employed. Identifying and handling outliers properly is crucial for building accurate and reliable regression models .

Predictive algorithms are crucial in machine learning as they enable computers to learn from historical data and make informed predictions for new data. These algorithms benefit significantly from larger datasets as the extensive data helps refine the model with more patterns and correlations, resulting in improved accuracy and the ability to generalize better to new, unseen data. More data typically leads to the development of more robust and reliable predictive models .

Reinforcement learning differs from supervised and unsupervised learning by focusing on learning through interaction with an environment to maximize rewards. The agent receives feedback in the form of rewards or penalties based on its actions, learning a policy to select actions to achieve the highest reward over time. This is unlike supervised learning, where the model learns from labeled data, or unsupervised learning, where it finds patterns in unlabeled data without direct feedback. Reinforcement learning is particularly useful for dynamic systems like robotics or game playing .

In Support Vector Machines (SVM), a hyperplane is used to segregate the data into different classes. It is a decision boundary that optimally separates data points of different classes with the maximum margin. For linearly separable data, a single straight hyperplane is used, while for non-linear data, SVM uses kernel tricks to transform the data into higher dimensions where a linear hyperplane can then separate the classes, thus classifying data accurately .

Decision trees can be used for both classification and regression problems. In classification, they partition data into classes based on feature values, represented in a tree-like model where each node denotes a test on an attribute. In regression, they estimate outcomes for continuous labels similarly but focus on minimizing the prediction error for numerical outputs. Despite these differences, both applications involve splitting data recursively according to feature thresholds and organizing results into a hierarchical structure readable as a tree .

You might also like