0% found this document useful (0 votes)
4 views18 pages

UNIT III-Machine Learning Full Notes

This document provides an overview of machine learning, detailing the modeling process and types of machine learning, including supervised, unsupervised, and semi-supervised learning. It outlines the steps involved in data science modeling, such as defining objectives, data collection, and model evaluation, as well as explaining classification and regression techniques. Additionally, it discusses clustering methods and outlier analysis, emphasizing the importance of data quality and the characteristics of outliers.

Uploaded by

vetrivendan7177
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views18 pages

UNIT III-Machine Learning Full Notes

This document provides an overview of machine learning, detailing the modeling process and types of machine learning, including supervised, unsupervised, and semi-supervised learning. It outlines the steps involved in data science modeling, such as defining objectives, data collection, and model evaluation, as well as explaining classification and regression techniques. Additionally, it discusses clustering methods and outlier analysis, emphasizing the importance of data quality and the characteristics of outliers.

Uploaded by

vetrivendan7177
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT III

MACHINE LEARNING

The modeling process - Types of machine learning - Supervised learning - Unsupervised


learning - Semi-supervised learning- Classification, regression - Clustering – Outliers and
Outlier Analysis

THE MODELING PROCESS:

What is Data Science Modelling?


Data science modeling is a set of steps from defining the problem to deploying the model in
reality.
Data Science Modelling Steps
 1. Define Your Objective
 2. Collect Data
 3. Clean Your Data
 4. Explore Your Data
 5. Split Your Data
 6. Choose a Model
 7. Train Your Model
 8. Evaluate Your Model
 9. Improve Your Model
 10. Deploy Your Model
1. Define Your Objective
First, define very clearly what problem you are going to solve. Whether that is a customer
churn prediction, better product recommendations, or patterns in data, you first need to know
your direction. This should bring clarity to the choice of data, algorithms, and evaluation
metrics.
2. Collect Data
Gather data relevant to your objective. This can include internal data from your company,
publicly available datasets, or data purchased from external sources. Ensure you have enough
data to train your model effectively.
3. Clean Your Data
Data cleaning is a critical step to prepare your dataset for modeling. It involves handling
missing values, removing duplicates, and correcting errors. Clean data ensures the reliability
of your model's predictions.
4. Explore Your Data
Data exploration, or exploratory data analysis (EDA), involves summarizing the main
characteristics of your dataset. Use visualizations and statistics to uncover patterns,
anomalies, and relationships between variables.
5. Split Your Data
Divide your dataset into training and testing sets. The training set is used to train your model,
while the testing set evaluates its performance. A common split ratio is 80% for training and
20% for testing.
6. Choose a Model
Select a model that suits your problem type (e.g., regression, classification) and data.
Beginners can start with simpler models like linear regression or decision trees before
moving on to more complex models like neural networks.
7. Train Your Model
Feed your training data into the model. This process involves the model learning from the
data, adjusting its parameters to minimize errors. Training a model can take time, especially
with large datasets or complex models.
8. Evaluate Your Model
After training, assess your model's performance using the testing set. Common evaluation
metrics include accuracy, precision, recall, and F1 score. Evaluation helps you understand
how well your model will perform on unseen data.
9. Improve Your Model
Based on the evaluation, you may need to refine your model. This can involve tuning
hyperparameters, choosing a different model, or going back to data cleaning and preparation
for further improvements.
10. Deploy Your Model
Once satisfied with your model's performance, deploy it for real-world use. This could mean
integrating it into an application or using it for decision-making within your organization

MACHING LEARNING:
Machine learning is the branch of Artificial Intelligence that focuses on developing
models and algorithms that let computers learn from data and improve from previous
experience without being explicitly programmed for every task
Types of Machine Learning:

Types of Machine Learning

1. Supervised Machine Learning

Supervised learning is defined as when a model gets trained on a “Labelled Dataset”.


Labelled datasets have both input and output parameters. In Supervised
Learning algorithms learn to map points between inputs and correct outputs. It has
both training and validation datasets labelled.
Example: Consider a scenario where you have to build an image classifier to
differentiate between cats and dogs. If you feed the datasets of dogs and cats labelled
images to the algorithm, the machine will learn to classify between a dog or a cat from
these labeled images.

There are two main categories of supervised learning that are mentioned below:

 Classification ii Regression

Classification:Classification deals with predicting categorical target variables, which


represent discrete classes or labels.

Here are some classification algorithms:

Logistic Regression,Support Vector Machine,Random ForestDecision Tree.

Regression :Regression, on the other hand, deals with predicting continuous target
variables, which represent numerical values.

Here are some regression algorithms:

 Linear Regression

 Polynomial Regression

 Decision tree

 Random Forest

Advantages of Supervised Machine Learning

 Supervised Learning models can have high accuracy as they are trained on labelled
data.

Disadvantages of Supervised Machine Learning

 It can be time-consuming and costly as it relies on labeled data only.

 It may lead to poor generalizations based on new data.

Unsupervised Machine Learning


Unsupervised Learning Unsupervised learning is a type of machine learning technique in
which an algorithm discovers patterns and relationships using unlabeled data. The primary
goal of Unsupervised learning is often to discover hidden patterns, similarities, or clusters
within the data.
There are two main categories of unsupervised learning that are mentioned below:

 Clustering

 Association

Clustering

Clustering is the process of grouping data points into clusters based on their similarity.

Here are some clustering algorithms:

 K-Means Clustering algorithm

Association

Association rule learning is a technique for discovering relationships between items in a


dataset.

Here are some association rule learning algorithms:

 Apriori Algorithm

Reinforcement Machine Learning:


Reinforcement machine learningalgorithm is a learning method that interacts with the
environment by producing actions and discovering errors. Trial, error, and delay are the most
relevant characteristics of reinforcement learning.

Semi-Supervised Learning: Supervised + Unsupervised Learning


Semi-Supervised learningis a machine learning algorithm that works between the supervised
and unsupervised learning so it uses both labelled and unlabelled data.
CLASSIFICATION IN MACHINE LEARNING:

Classification teaches a machine to sort things into categories. It learns by looking at


examples with labels.

For example a classification model might be trained on dataset of images labeled as


either dogs or cats and it can be used to predict the class of new and unseen images as
dogs or cats based on their features such as color, texture and shape.

Types of Classification

When we talk about classification in machine learning, we’re talking about the process
of sorting data into categories based on specific features or characteristics.

There are two main classification types in machine learning:

1. Binary Classification.

2. Multiclass Classification.

3. Multilabel Classification.

1. Binary Classification

This is the simplest kind of classification. In binary classification, the goal is to sort the
data into two distinct categories. Think of it like a simple choice between two options.
Imagine a system that sorts emails into either spam or not spam.
2. Multiclass Classification.

Here, instead of just two categories, the data needs to be sorted into more than two
categories. The model picks the one that best matches the input. Think of an image
recognition system that sorts pictures of animals into categories like cat, dog, and bird.

3. Multi-Label Classification

In multi-label classification single piece of data can belong to multiple categories at


once. Unlike multiclass classification where each data point belongs to only one class,
multi-label classification allows datapoints to belong to multiple classes. A movie
recommendation system could tag a movie as both action and comedy.

How does Classification in Machine Learning Work?

Classification involves training a model using a labeled dataset, where each input is
paired with its correct output label.

1. Data Collection: You start with a dataset where each item is labeled with the
correct class (for example, “cat” or “dog”).

2. Feature Extraction: The system identifies features (like color, shape, or texture)
that help distinguish one class from another. These features are what the model
uses to make predictions.
3. Model Training: Classification – machine learning algorithm uses the labeled
data to learn how to map the features to the correct class. It looks for patterns
and relationships in the data.

4. Model Evaluation: Once the model is trained, it’s tested on new, unseen data to
check how accurately it can classify the items.

5. Prediction: After being trained and evaluated, the model can be used to predict
the class of new data based on the features it has learned.

6. Model Evaluation: Evaluating a classification model is a key step in machine


learning. It helps us check how well the model performs and how good it is at
handling new, unseen data.

Regression in Machine Learning:

Regression in machine learning refers to a supervised learning technique where the goal
is to predict a continuous numerical value based on one or more independent features. It
finds relationships between variables so that predictions can be made. we have two types
of variables present in regression.

 Dependent Variable (Target): The variable we are trying to predict e.g house price.

 Independent Variables (Features): The input variables that influence the prediction
e.g locality, number of rooms.

Regression analysis problem works with if output variable is a real or continuous


value such as “salary” or “weight”.

Different types of Regression :

Simple Linear Regression:

Simple Linear Regression is a statistical method used to model the relationship between
two variables:

 Independent Variable (X) → The input or predictor variable.


 Dependent Variable (Y) → The output or target variable.
It assumes that there is a linear relationship between X and Y, which means t hat an
increase or decrease in X leads to a proportional change in Y.

Problems :
Multiple Linear Regression:

Clustering in Machine Learning :

A way of grouping the data points into different clusters, consisting of similar data
points. The objects with the possible similarities remain in a group that has less or no
similarities with another group."
Types of Clustering Methods

The clustering methods are broadly divided into Hard clustering (datapoint belongs
to only one group) and Soft Clustering (data points can belong to another group
also). But there are also other various approaches of Clustering exist. Below are the
main clustering methods used in Machine learning:

1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It is also known as
the centroid-based method. The most common example of partitioning clustering is the K-
Means Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define the number
of pre-defined groups. The cluster center is created in such a way that the distance between the
data points of one cluster is minimum as compared to another cluster centroid.

Density-Based Clustering

The density-based clustering method connects the highly-dense areas into clusters,
and the arbitrarily shaped distributions are formed as long as the dense region can
be connected. This algorithm does it by identifying different clusters in the dataset
and connects the areas of high densities into clusters.
Distribution Model-Based Clustering

In the distribution model-based clustering method, the data is divided based on the
probability of how a dataset belongs to a particular distribution. The grouping is
done by assuming some distributions commonly Gaussian Distribution.

Hierarchical Clustering

Hierarchical clustering can be used as an alternative for the partitioned clustering


as there is no requirement of pre-specifying the number of clusters to be created. In
this technique, the dataset is divided into clusters to create a tree-like structure,
which is also called a dendrogram.

Fuzzy Clustering

Fuzzy clustering is a type of soft method in which a data object may belong to more
than one group or cluster. Each dataset has a set of membership coefficients, which
depend on the degree of membership to be in a cluster. Fuzzy C-means algorithm is the
example of this type of clustering; it is sometimes also known as the Fuzzy k-means
algorithm.
Outliers and Outlier Analysis

What is an Outlier?

An outlier is a data point that significantly differs from the rest of the dataset. It deviates
so much from the other observations that it raises suspicion of being generated by a
different process or containing errors.

Types of Outliers

1. Global Outliers (Point Outliers)


o A single data point is far removed from the rest of the dataset.
o Example: In a dataset of student ages (18-22), an entry of 80 years is a
global outlier.
2. Contextual Outliers (Conditional Outliers)
o A data point is an outlier only in a specific context.
o Example: A temperature of 30°C is normal in summer but an outlier in
winter.
3. Collective Outliers
o A group of data points together deviate from the expected pattern.
o Example: A sudden drop in stock prices that does not match market
trends.
Detecting outlier using IQR Method:

You might also like