0% found this document useful (0 votes)
5 views31 pages

Introduction to Machine Learning Concepts

dfgdfgd

Uploaded by

morkheri32
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views31 pages

Introduction to Machine Learning Concepts

dfgdfgd

Uploaded by

morkheri32
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT – 1

(MACHINE LEARNING CONCEPTS)


What is Machine Learning
In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which
work on our instructions. But can a machine also learn from experiences or past data like
a human does? So here comes the role of Machine Learning.

Introduction to Machine Learning

A subset of artificial intelligence known as machine learning focuses primarily on the


creation of algorithms that enable a computer to independently learn from data and
previous experiences. Arthur Samuel first used the term "machine learning" in 1959. It
could be summarized as follows:

Without being explicitly programmed, machine learning enables a machine to


automatically learn from data, improve performance from experiences, and predict
things.

Machine learning algorithms create a mathematical model that, without being explicitly
programmed, aids in making predictions or decisions with the assistance of sample
historical data, or training data. For the purpose of developing predictive models,
machine learning brings together statistics and computer science. Algorithms that learn
from historical data are either constructed or utilized in machine learning. The
performance will rise in proportion to the /quantity of information we provide.

A machine can learn if it can gain more data to improve its performance.

How does Machine Learning work


A machine learning system builds prediction models, learns from previous data, and
predicts the output of new data whenever it receives it. The amount of data helps to
build a better model that accurately predicts the output, which in turn affects the
accuracy of the predicted output.
Let's say we have a complex problem in which we need to make predictions. Instead of
writing code, we just need to feed the data to generic algorithms, which build the logic
based on the data and predict the output. Our perspective on the issue has changed as
a result of machine learning. The Machine Learning algorithm's operation is depicted in
the following block diagram:

Features of Machine Learning:


o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge
amount of the data.

Classification of Machine Learning


At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
In supervised learning, sample labeled data are provided to the machine learning system
for training, and the system then predicts the output based on the training data.

The system uses labeled data to build a model that understands the datasets and learns
about each one. After the training and processing are done, we test the model with
sample data to see if it can accurately predict the output.

The mapping of the input data to the output data is the objective of supervised
learning. The managed learning depends on oversight, and it is equivalent to when an
understudy learns things in the management of the educator. Spam filtering is an
example of supervised learning.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.

The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to


find useful insights from the huge amount of data. It can be further classifieds into two
categories of algorithms:

o Clustering
o Association

3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent
gets a reward for each right action and gets a penalty for each wrong action. The agent
learns automatically with these feedbacks and improves its performance. In
reinforcement learning, the agent interacts with the environment and explores it. The
goal of an agent is to get the most reward points, and hence, it improves its
performance.

The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.

Note: We will learn about the above types of machine learning in detail in later chapters.

History of Machine Learning


Before some years (about 40-50 years), machine learning was science fiction, but today
it is the part of our daily life. Machine learning is making our day to day life easy
from self-driving cars to Amazon virtual assistant "Alexa". However, the idea behind
machine learning is so old and has a long history. Below some milestones are given
which have occurred in the history of machine learning:
The early history of Machine Learning (Pre-1940):

o 1834: In 1834, Charles Babbage, the father of the computer, conceived a device that
could be programmed with punch cards. However, the machine was never built, but all
modern computers rely on its logical structure.
o 1936: In 1936, Alan Turing gave a theory that how a machine can determine and execute
a set of instructions.

The era of stored program computers:

o 1940: In 1940, the first manually operated computer, "ENIAC" was invented, which was
the first electronic general-purpose computer. After that stored program computer such
as EDSAC in 1949 and EDVAC in 1951 were invented.
o 1943: In 1943, a human neural network was modeled with an electrical circuit. In 1950,
the scientists started applying their idea to work and analyzed how human neurons
might work.

Computer machinery and intelligence:


o 1950: In 1950, Alan Turing published a seminal paper, "Computer Machinery and
Intelligence," on the topic of artificial intelligence. In his paper, he asked, "Can
machines think?"

Machine intelligence in Games:

o 1952: Arthur Samuel, who was the pioneer of machine learning, created a program that
helped an IBM computer to play a checkers game. It performed better more it played.
o 1959: In 1959, the term "Machine Learning" was first coined by Arthur Samuel.

The first "AI" winter:

o The duration of 1974 to 1980 was the tough time for AI and ML researchers, and this
duration was called as AI winter.
o In this duration, failure of machine translation occurred, and people had reduced their
interest from AI, which led to reduced funding by the government to the researches.

Machine Learning from theory to reality

o 1959: In 1959, the first neural network was applied to a real-world problem to remove
echoes over phone lines using an adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural
network NETtalk, which was able to teach itself how to correctly pronounce 20,000
words in one week.
o 1997: The IBM's Deep blue intelligent computer won the chess game against the chess
expert Garry Kasparov, and it became the first computer which had beaten a human
chess expert.

Machine Learning at 21st century

2006:

o Geoffrey Hinton and his group presented the idea of profound getting the hang of
utilizing profound conviction organizations.
o The Elastic Compute Cloud (EC2) was launched by Amazon to provide scalable
computing resources that made it easier to create and implement machine learning
models.

2007:

o Participants were tasked with increasing the accuracy of Netflix's recommendation


algorithm when the Netflix Prize competition began.
o Support learning made critical progress when a group of specialists utilized it to prepare
a PC to play backgammon at a top-notch level.

2008:

o Google delivered the Google Forecast Programming interface, a cloud-based


help that permitted designers to integrate AI into their applications.
o Confined Boltzmann Machines (RBMs), a kind of generative brain organization,
acquired consideration for their capacity to demonstrate complex information
conveyances.

2009:

o Profound learning gained ground as analysts showed its viability in different


errands, including discourse acknowledgment and picture grouping.
o The expression "Large Information" acquired ubiquity, featuring the difficulties
and open doors related with taking care of huge datasets.

2010:

o The ImageNet Huge Scope Visual Acknowledgment Challenge (ILSVRC) was


presented, driving progressions in PC vision, and prompting the advancement of
profound convolutional brain organizations (CNNs).

2011:

o On Jeopardy! IBM's Watson defeated human champions., demonstrating the


potential of question-answering systems and natural language processing.
2012:

o AlexNet, a profound CNN created by Alex Krizhevsky, won the ILSVRC,


fundamentally further developing picture order precision and laying out
profound advancing as a predominant methodology in PC vision.
o Google's Cerebrum project, drove by Andrew Ng and Jeff Dignitary, utilized
profound figuring out how to prepare a brain organization to perceive felines
from unlabeled YouTube recordings.

2013:

o Ian Goodfellow introduced generative adversarial networks (GANs), which made


it possible to create realistic synthetic data.
o Google later acquired the startup DeepMind Technologies, which focused on
deep learning and artificial intelligence.

2014:

o Facebook presented the DeepFace framework, which accomplished close human


precision in facial acknowledgment.
o AlphaGo, a program created by DeepMind at Google, defeated a world champion
Go player and demonstrated the potential of reinforcement learning in
challenging games.

2015:

o Microsoft delivered the Mental Toolbox (previously known as CNTK), an open-


source profound learning library.
o The performance of sequence-to-sequence models in tasks like machine
translation was enhanced by the introduction of the idea of attention
mechanisms.

2016:

o The goal of explainable AI, which focuses on making machine learning models
easier to understand, received some attention.
o Google's DeepMind created AlphaGo Zero, which accomplished godlike Go
abilities to play without human information, utilizing just support learning.

2017:

o Move learning acquired noticeable quality, permitting pretrained models to be


utilized for different errands with restricted information.
o Better synthesis and generation of complex data were made possible by the
introduction of generative models like variational autoencoders (VAEs) and
Wasserstein GANs.
o These are only a portion of the eminent headways and achievements in AI during
the predefined period. The field kept on advancing quickly past 2017, with new
leap forwards, strategies, and applications arising.

Machine Learning at present:


The field of machine learning has made significant strides in recent years, and its
applications are numerous, including self-driving cars, Amazon Alexa, Catboats, and the
recommender system. It incorporates clustering, classification, decision tree, SVM
algorithms, and reinforcement learning, as well as unsupervised and supervised learning.

Present day AI models can be utilized for making different expectations, including
climate expectation, sickness forecast, financial exchange examination, and so on.

Applications of Machine learning


Machine learning is a buzzword for today's technology, and it is growing very rapidly
day by day. We are using machine learning in our daily life even without knowing it such
as Google Maps, Google assistant, Alexa, etc. Below are some most trending real-world
applications of Machine Learning:
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is
used to identify objects, persons, places, digital images, etc. The popular use case of
image recognition and face detection is, Automatic friend tagging suggestion:

Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload


a photo with our Facebook friends, then we automatically get a tagging suggestion with
name, and the technology behind this is machine learning's face
detection and recognition algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is also
known as "Speech to text", or "Computer speech recognition." At present, machine
learning algorithms are widely used by various applications of speech
recognition. Google assistant, Siri, Cortana, and Alexa are using speech recognition
technology to follow the voice instructions.

3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or


heavily congested with the help of two ways:

o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the performance.

4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies
such as Amazon, Netflix, etc., for product recommendation to the user. Whenever we
search for some product on Amazon, then we started getting an advertisement for the
same product while internet surfing on the same browser and this is because of machine
learning.

__

Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.

As similar, when we use Netflix, we find some recommendations for entertainment


series, movies, etc., and this is also done with the help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car
manufacturing company is working on self-driving car. It is using unsupervised learning
method to train the car models to detect people and objects while driving.

6. Email Spam and Malware Filtering:


Whenever we receive a new email, it is filtered automatically as important, normal, and
spam. We always receive an important mail in our inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine learning.
Below are some spam filters used by Gmail:

o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,


and Naïve Bayes classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant:


We have various virtual personal assistants such as Google
assistant, Alexa, Cortana, Siri. As the name suggests, they help us in finding the
information using our voice instruction. These assistants can help us in various ways just
by our voice instructions such as Play music, call someone, Open an email, Scheduling
an appointment, etc.

These virtual assistants use machine learning algorithms as an important part.

These assistant record our voice instructions, send it over the server on a cloud, and
decode it using ML algorithms and act accordingly.
8. Online Fraud Detection:
Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways
that a fraudulent transaction can take place such as fake accounts, fake ids, and steal
money in the middle of a transaction. So to detect this, Feed Forward Neural
network helps us by checking whether it is a genuine transaction or a fraud transaction.

For each genuine transaction, the output is converted into some hash values, and these
values become the input for the next round. For each genuine transaction, there is a
specific pattern which gets change for the fraud transaction hence, it detects it and
makes our online transactions more secure.

9. Stock Market trading:


Machine learning is widely used in stock market trading. In the stock market, there is
always a risk of up and downs in shares, so for this machine learning's long short term
memory neural network is used for the prediction of stock market trends.

10. Medical Diagnosis:


In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact
position of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.

11. Automatic Language Translation:


Nowadays, if we visit a new place and we are not aware of the language then it is not a
problem at all, as for this also machine learning helps us by converting the text into our
known languages. Google's GNMT (Google Neural Machine Translation) provide this
feature, which is a Neural Machine Learning that translates the text into our familiar
language, and it called as automatic translation.

The technology behind the automatic translation is a sequence to sequence learning


algorithm, which is used with image recognition and translates the text from one
language to another language.
Machine learning Life cycle
Machine learning has given the computer systems the abilities to automatically learn
without being explicitly programmed. But how does a machine learning system work?
So, it can be described using the life cycle of machine learning. Machine learning life
cycle is a cyclic process to build an efficient machine learning project. The main purpose
of the life cycle is to find a solution to the problem or project.

Machine learning life cycle involves seven major steps, which are given below:

o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment
The most important thing in the complete process is to understand the problem and to
know the purpose of the problem. Therefore, before starting the life cycle, we need to
understand the problem because the good result depends on the better understanding
of the problem.

In the complete life cycle process, to solve a problem, we create a machine learning
system called "model", and this model is created by providing "training". But to train a
model, we need data, hence, life cycle starts by collecting data.

1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this step is
to identify and obtain all data-related problems.

In this step, we need to identify the different data sources, as data can be collected from
various sources such as files, database, internet, or mobile devices. It is one of the
most important steps of the life cycle. The quantity and quality of the collected data will
determine the efficiency of the output. The more will be the data, the more accurate will
be the prediction.
This step includes the below tasks:

o Identify various data sources


o Collect data
o Integrate the data obtained from different sources

By performing the above task, we get a coherent set of data, also called as a dataset. It
will be used in further steps.

2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is a
step where we put our data into a suitable place and prepare it to use in our machine
learning training.

In this step, first, we put all data together, and then randomize the ordering of data.

This step can be further divided into two processes:

o Data exploration:
It is used to understand the nature of data that we have to work with. We need to
understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
o Data pre-processing:
Now the next step is preprocessing of data for its analysis.

3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable format.
It is the process of cleaning the data, selecting the variable to use, and transforming the
data in a proper format to make it more suitable for analysis in the next step. It is one of
the most important steps of the complete process. Cleaning of data is required to
address the quality issues.

It is not necessary that data we have collected is always of our use as some of the data
may not be useful. In real-world applications, collected data may have various issues,
including:

o Missing Values
o Duplicate data
o Invalid data
o Noise

So, we use various filtering techniques to clean the data.

It is mandatory to detect and remove the above issues because it can negatively affect
the quality of the outcome.

4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step involves:

o Selection of analytical techniques


o Building models
o Review the result

The aim of this step is to build a machine learning model to analyze the data using
various analytical techniques and review the outcome. It starts with the determination of
the type of the problems, where we select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association, etc. then build the model
using prepared data, and evaluate the model.

Hence, in this step, we take the data and use machine learning algorithms to build the
model.
5. Train Model
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.

We use datasets to train the model using various machine learning algorithms. Training
a model is required so that it can understand the various patterns, rules, and, features.

__

6. Test Model
Once our machine learning model has been trained on a given dataset, then we test the
model. In this step, we check for the accuracy of our model by providing a test dataset
to it.

Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem.

7. Deployment
__
__

The last step of machine learning life cycle is deployment, where we deploy the model in
the real-world system.

If the above-prepared model is producing an accurate result as per our requirement


with acceptable speed, then we deploy the model in the real system. But before
deploying the project, we will check whether it is improving its performance using
available data or not. The deployment phase is similar to making the final report for a
project.
Machine Learning Models
A machine learning model is defined as a mathematical representation of the
output of the training process. Machine learning is the study of different algorithms
that can improve automatically through experience & old data and build the model. A
machine learning model is similar to computer software designed to recognize patterns
or behaviors based on previous experience or data. The learning algorithm discovers
patterns within the training data, and it outputs an ML model which captures these
patterns and makes predictions on new data.

Let's understand an example of the ML model where we are creating an app to


recognize the user's emotions based on facial expressions. So, creating such an app is
possible by Machine learning models where we will train a model by feeding images of
faces with various emotions labeled on them. Whenever this app is used to determine
the user's mood, it reads all fed data then determines any user's mood.

Hence, in simple words, we can say that a machine learning model is a simplified
representation of something or a process. In this topic, we will discuss different machine
learning models and their techniques and algorithms.

What is Machine Learning Model?


Machine Learning models can be understood as a program that has been trained to find
patterns within new data and make predictions. These models are represented as a
mathematical function that takes requests in the form of input data, makes predictions
on input data, and then provides an output in response. First, these models are trained
over a set of data, and then they are provided an algorithm to reason over data, extract
the pattern from feed data and learn from those data. Once these models get trained,
they can be used to predict the unseen dataset.

There are various types of machine learning models available based on different
business goals and data sets.

Classification of Machine Learning Models:


Based on different business goals and data sets, there are three learning models for
algorithms. Each machine learning algorithm settles into one of the three models:

__

o Supervised Learning
o Unsupervised Learning
o Reinforcement Learning

Supervised Learning is further divided into two categories:

o Classification
o Regression
Unsupervised Learning is also divided into below categories:

o Clustering
o Association Rule
o Dimensionality Reduction

1. Supervised Machine Learning Models


Supervised Learning is the simplest machine learning model to understand in which
input data is called training data and has a known label or result as an output. So, it
works on the principle of input-output pairs. It requires creating a function that can be
trained using a training data set, and then it is applied to unknown data and makes
some predictive performance. Supervised learning is task-based and tested on labeled
data sets.

We can implement a supervised learning model on simple real-life problems. For


example, we have a dataset consisting of age and height; then, we can build a
supervised learning model to predict the person's height based on their age.

Supervised Learning models are further classified into two categories:

Regression
In regression problems, the output is a continuous variable. Some commonly used
Regression models are as follows:

a) Linear Regression

Linear regression is the simplest machine learning model in which we try to predict one
output variable using one or more input variables. The representation of linear
regression is a linear equation, which combines a set of input values(x) and predicted
output(y) for the set of those input values. It is represented in the form of a line:

Y = bx+ c.
The main aim of the linear regression model is to find the best fit line that best fits the
data points.

Linear regression is extended to multiple linear regression (find a plane of best fit) and
polynomial regression (find the best fit curve).

b) Decision Tree

Decision trees are the popular machine learning models that can be used for both
regression and classification problems.

A decision tree uses a tree-like structure of decisions along with their possible
consequences and outcomes. In this, each internal node is used to represent a test on
an attribute; each branch is used to represent the outcome of the test. The more nodes
a decision tree has, the more accurate the result will be.

The advantage of decision trees is that they are intuitive and easy to implement, but
they lack accuracy.

Decision trees are widely used in operations research, specifically in decision


analysis, strategic planning, and mainly in machine learning.

c) Random Forest
Random Forest is the ensemble learning method, which consists of a large number of
decision trees. Each decision tree in a random forest predicts an outcome, and the
prediction with the majority of votes is considered as the outcome.

A random forest model can be used for both regression and classification problems.

For the classification task, the outcome of the random forest is taken from the majority
of votes. Whereas in the regression task, the outcome is taken from the mean or
average of the predictions generated by each tree.

d) Neural Networks

Neural networks are the subset of machine learning and are also known as artificial
neural networks. Neural networks are made up of artificial neurons and designed in a
way that resembles the human brain structure and working. Each artificial neuron
connects with many other neurons in a neural network, and such millions of connected
neurons create a sophisticated cognitive structure.

Neural networks consist of a multilayer structure, containing one input layer, one or
more hidden layers, and one output layer. As each neuron is connected with another
neuron, it transfers data from one layer to the other neuron of the next layers. Finally,
data reaches the last layer or output layer of the neural network and generates output.
Neural networks depend on training data to learn and improve their accuracy. However,
a perfectly trained & accurate neural network can cluster data quickly and become a
powerful machine learning and AI tool. One of the best-known neural networks
is Google's search algorithm.

Classification
Classification models are the second type of Supervised Learning techniques, which are
used to generate conclusions from observed values in the categorical form. For example,
the classification model can identify if the email is spam or not; a buyer will purchase the
product or not, etc. Classification algorithms are used to predict two classes and
categorize the output into different groups.

In classification, a classifier model is designed that classifies the dataset into different
categories, and each category is assigned a label.

There are two types of classifications in machine learning:

o Binary classification: If the problem has only two possible classes, called a binary
classifier. For example, cat or dog, Yes or No,
o Multi-class classification: If the problem has more than two possible classes, it is a
multi-class classifier.

Some popular classification algorithms are as below:

a) Logistic Regression

Logistic Regression is used to solve the classification problems in machine learning. They
are similar to linear regression but used to predict the categorical variables. It can
predict the output in either Yes or No, 0 or 1, True or False, etc. However, rather than
giving the exact values, it provides the probabilistic values between 0 & 1.

b) Support Vector Machine

Support vector machine or SVM is the popular machine learning algorithm, which is
widely used for classification and regression tasks. However, specifically, it is used to
solve classification problems. The main aim of SVM is to find the best decision
boundaries in an N-dimensional space, which can segregate data points into classes,
and the best decision boundary is known as Hyperplane. SVM selects the extreme vector
to find the hyperplane, and these vectors are known as support vectors.
c) Naïve Bayes

Naïve Bayes is another popular classification algorithm used in machine learning. It is


called so as it is based on Bayes theorem and follows the naïve(independent)
assumption between the features which is given as:

Each naïve Bayes classifier assumes that the value of a specific variable is independent of
any other variable/feature. For example, if a fruit needs to be classified based on color,
shape, and taste. So yellow, oval, and sweet will be recognized as mango. Here each
feature is independent of other features.

2. Unsupervised Machine learning models


Unsupervised Machine learning models implement the learning process opposite to
supervised learning, which means it enables the model to learn from the unlabeled
training dataset. Based on the unlabeled dataset, the model predicts the output. Using
unsupervised learning, the model learns hidden patterns from the dataset by itself
without any supervision.
Unsupervised learning models are mainly used to perform three tasks, which are as
follows:

o Clustering
Clustering is an unsupervised learning technique that involves clustering or groping the
data points into different clusters based on similarities and differences. The objects with
the most similarities remain in the same group, and they have no or very few similarities
from other groups.
Clustering algorithms can be widely used in different tasks such as Image
segmentation, Statistical data analysis, Market segmentation, etc.
Some commonly used Clustering algorithms are K-means Clustering, hierarchal
Clustering, DBSCAN, etc.

o Association Rule Learning


Association rule learning is an unsupervised learning technique, which finds interesting
relations among variables within a large dataset. The main aim of this learning algorithm
is to find the dependency of one data item on another data item and map those
variables accordingly so that it can generate maximum profit. This algorithm is mainly
applied in Market Basket analysis, Web usage mining, continuous production, etc.
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growth algorithm.
o Dimensionality Reduction
The number of features/variables present in a dataset is known as the dimensionality of
the dataset, and the technique used to reduce the dimensionality is known as the
dimensionality
o reduction technique.
Although more data provides more accurate results, it can also affect the performance of
the model/algorithm, such as overfitting issues. In such cases, dimensionality reduction
techniques are used.
"It is a process of converting the higher dimensions dataset into lesser dimensions
dataset ensuring that it provides similar information."
Different dimensionality reduction methods such as PCA(Principal Component
Analysis), Singular Value Decomposition, etc.

Reinforcement Learning
In reinforcement learning, the algorithm learns actions for a given set of states that lead
to a goal state. It is a feedback-based learning model that takes feedback signals after
each state or action by interacting with the environment. This feedback works as a
reward (positive for each good action and negative for each bad action), and the agent's
goal is to maximize the positive rewards to improve their performance.

The behavior of the model in reinforcement learning is similar to human learning, as


humans learn things by experiences as feedback and interact with the environment.

Below are some popular algorithms that come under reinforcement learning:

o Q-learning: Q-learning is one of the popular model-free algorithms of reinforcement


learning, which is based on the Bellman equation.

It aims to learn the policy that can help the AI agent to take the best action for
maximizing the reward under a specific circumstance. It incorporates Q values for each
state-action pair that indicate the reward to following a given state path, and it tries to
maximize the Q-value.

o State-Action-Reward-State-Action (SARSA): SARSA is an On-policy algorithm based


on the Markov decision process. It uses the action performed by the current policy to
learn the Q-value. The SARSA algorithm stands for State Action Reward State Action,
which symbolizes the tuple (s, a, r, s', a').
o Deep Q Network: DQN or Deep Q Neural network is Q-learning within the neural
network. It is basically employed in a big state space environment where defining a Q-
table would be a complex task. So, in such a case, rather than using Q-table, the neural
network uses Q-values for each action based on the state.

__

Training Machine Learning Models


Once the Machine learning model is built, it is trained in order to get the appropriate
results. To train a machine learning model, one needs a huge amount of pre-processed
data. Here pre-processed data means data in structured form with reduced null values,
etc. If we do not provide pre-processed data, then there are huge chances that our
model may perform terribly.
How to get datasets for Machine Learning
The field of ML depends vigorously on datasets for preparing models and making
precise predictions. Datasets assume a vital part in the progress of AIML projects and
are fundamental for turning into a gifted information researcher. In this article, we will
investigate the various sorts of datasets utilized in AI and give a definite aid on where to
track down them.

What is a dataset?
A dataset is a collection of data in which data is arranged in some order. A dataset can
contain any data from a series of an array to a database table. Below table shows an
example of the dataset:

Country Age Salary Purchased

India 38 48000 No

France 43 45000 Yes

Germany 30 54000 No

France 48 65000 No

Germany 40 Yes

India 35 58000 Yes

A tabular dataset can be understood as a database table or matrix, where each column
corresponds to a particular variable, and each row corresponds to the fields of the
dataset. The most supported file type for a tabular dataset is "Comma Separated
File," or CSV. But to store a "tree-like data," we can use the JSON file more efficiently.

Types of data in datasets

o Numerical data:Such as house price, temperature, etc.


o Categorical data:Such as Yes/No, True/False, Blue/green, etc.
o Ordinal data:These data are similar to categorical data but can be measured on the
basis of comparison.
Note: A real-world dataset is of huge size, which is difficult to manage and
process at the initial level. Therefore, to practice machine learning algorithms, we
can use any dummy dataset.

Types of datasets
Machine learning incorporates different domains, each requiring explicit sorts of
datasets. A few normal sorts of datasets utilized in machine learning include:

Image Datasets:
Image datasets contain an assortment of images and are normally utilized in computer
vision tasks such as image classification, object detection, and image segmentation.

Examples :

o ImageNet
o CIFAR-10
o MNIST

Text Datasets:
Text datasets comprise textual information, like articles, books, or virtual entertainment
posts. These datasets are utilized in NLP techniques like sentiment analysis, text
classification, and machine translation.

Examples :

o Gutenberg Task dataset

o IMDb film reviews dataset

Time Series Datasets:


Time series datasets include information focuses gathered after some time. They are
generally utilized in determining, abnormality location, and pattern
examination. Examples :

o Securities exchange information


o Climate information
o Sensor readings.

Tabular Datasets:
Tabular datasets are organized information coordinated in tables or calculation sheets.
They contain lines addressing examples or tests and segments addressing highlights or
qualities. Tabular datasets are utilized for undertakings like relapse and arrangement.
The dataset given before in the article is an illustration of a tabular dataset.

Need of Dataset
o Completely ready and pre-handled datasets are significant for machine learning projects.
o They give the establishment to prepare exact and solid models. Notwithstanding,
working with enormous datasets can introduce difficulties regarding the board and
handling.
o To address these difficulties, productive information the executive's strategies and are
expected to handle calculations.

You might also like