0% found this document useful (0 votes)

17 views15 pages

Feature Engineering in Machine Learning

Uploaded by

niranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views15 pages

Feature Engineering in Machine Learning

Uploaded by

niranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Basics of Feature Engineering

Feature engineering is a critical preparatory process in machine learning.

It is responsible for taking raw input data and converting that to well-aligned features which are ready to be
used by the machine learning models.

Unstructured data is raw, unorganized data which doesn’t follow a specific format or hierarchy.

Typical examples of unstructured data include text data from social networks, e.g. Twitter, Facebook, etc. or
data from server logs, etc.

“Feature engineering refers to the process of translating a data set into

features such that these features are able to represent the data set more
effectively and result in a better learning performance.”
What is a feature?

• A feature is an attribute of a data set that is used in a machine learning process.

• The features in a data set are also called its dimensions.

• So a data set having ‘n’ features is called an n-dimensional data set.

For example consider famous machine learning
data set, Iris, introduced by the British statistician
and biologist Ronald Fisher. It has five attributes or
features namely
[Link], [Link], [Link],
[Link] and Species.
Out of these, the feature ‘Species’ represent the
class variable and the remaining
features are the predictor variables. It is a five- FIG. 4.1 Data set features

dimensional data set.

What is feature engineering?

Feature engineering refers to the process of translating a data set into features such that these features are able to
represent the data set more effectively and result in a better learning performance.

As feature engineering is an important pre-processing step for machine learning. It has two major elements:

 feature transformation
 feature subset selection

Feature transformation transforms the data – structured or unstructured, into a new set of features which can
represent the underlying problem which machine learning is trying to solve. There are two variants of feature
transformation:

 feature construction
 feature extraction

Both are sometimes known as feature discovery.

Feature construction process discovers missing information about the relationships between features and augments
the feature space by creating additional features.

Hence, if there are ‘n’ features or dimensions in a data set, after feature construction ‘m’ more features or
dimensions may get added.

So at the end, the data set will become ‘n + m’ dimensional.

Feature extraction is the process of extracting or creating a new set of features from the original set of features
using some functional mapping.

Unlike feature transformation, in case of feature subset selection (or simply feature selection) no new feature is
generated.

The objective of feature selection is to derive a subset of features from the full feature set which is most
meaningful in the context of a specific machine learning problem.

So, essentially the job of feature selection is to derive a subset Fj (F1, F2, …, Fm) of Fi (F1, F2, …, Fn), where m < n,
such that Fj is most meaningful and gets the best result for a machine learning problem.
4.2 FEATURE TRANSFORMATION

Feature transformation is used as an effective tool for dimensionality reduction and hence for boosting learning
model performance. Broadly, there are two distinct goals of feature transformation:

 Achieving best reconstruction of the original features in the data set

 Achieving highest efficiency in the learning task
4.2.1 Feature construction
Feature construction involves transforming a given set of input features to generate a new set of more
powerful features.

let’s take the example of a real estate data set having details of all apartments sold in a specific region.

The data set has three features – apartment length, apartment breadth, and price of the apartment.
If it is used as an input to a regression problem, such data can be training data for the regression model.
So given the training data, the model should be able to predict the price of an apartment whose price is not
known or which has just come up for sale.
 However, instead of using length and breadth of the apartment as a predictor, it is much convenient and
makes more sense to use the area of the apartment, which is not an existing feature of the data set.
 So such a feature, namely apartment area, can be added to the data set.
 In other words, we transform the three-dimensional data set to a four-dimensional data set, with the newly
‘discovered’ feature apartment area being added to the original data set.

FIG. 4.2 Feature construction

There are certain situations where feature construction is an essential activity before starting with the machine
learning task. These situations are

 when features have categorical value and machine learning needs numeric value inputs
 when features having numeric (continuous) values and need to be converted to ordinal values
 when text-specific feature construction needs to be done
[Link] Encoding categorical (nominal) variables

Let’s take the example of another data set on athletes, as presented in Figure 4.3a.

Say the data set has features age, city of origin, parents athlete (i.e. indicate whether any one of the parents
was an athlete) and Chance of Win.

The feature chance of a win is a class variable while the others are predictor variables.

We know that any machine learning algorithm, whether it’s a classification algorithm (like kNN) or a
regression algorithm, requires numerical figures to learn from. So there are three features – City of origin,
Parents athlete, and Chance of win, which are categorical in nature and cannot be used by any machine
learning task.
FIG. 4.3 Feature construction (encoding nominal variables)
FIG. 4.4 Feature construction (encoding ordinal variables

[Link] Transforming numeric (continuous) features to categorical features

FIG. 4.5 Feature construction (numeric to categorical

[Link] Text-specific feature construction
In the current world, text is arguably the most predominant medium of communication.

Whether we think about social networks like Facebook or micro-blogging channels like Twitter or emails or
short messaging services such as Whatsapp, text plays a major role in the flow of information.

Hence, text mining is an important area of research – not only for technology practitioners but also for industry
practitioners.

However, making sense of text data, due to the inherent unstructured nature of the data, is not so
straightforward.

In the first place, the text data chunks that we can think about do not have readily available features, like
structured data sets, on which machine learning tasks can be executed.

All machine learning models need numerical data as input. So the text data in the data sets need to be
transformed into numerical features.
Text data, or corpus which is the more popular keyword, is converted to a numerical representation following a
process is known as vectorization. In this process, word occurrences in all documents belonging to the corpus are
consolidated in the form of bag-of-words. There are three major steps that are followed:

 Tokenize

 Count

 normalize

In order to tokenize a corpus, the blank spaces and punctuations are used as delimiters to separate out the
words, or tokens.

Then the number of occurrences of each token is counted, for each document.

Lastly, tokens are weighted with reducing importance when they occur in the majority of the documents.

A matrix is then formed with each token representing a column and a specific document of the corpus
representing each row.

Each cell contains the count of occurrence of the token in a specific document. This matrix is known as a
document-term matrix (also known as a term-document matrix).
FIG. 4.6 Feature construction (text-specific
Feature extraction
 In feature extraction, new features are created from a combination of original features.

 Some of the commonly used operators for combining the original features include

For Boolean features: Conjunctions, Disjunctions, Negation, etc.

For nominal features: Cartesian product, M of N, etc.
For numerical features: Min, Max, Addition, Subtraction, Multiplication, Division, Average, Equivalence,
Inequality, etc.
After feature extraction using a mapping function f (F1, F2, …, Fn) say, we will have a set of features

such that
The most popular feature extraction algorithms used in machine learning:

 Principal Component Analysis

 Singular value decomposition

Principal Component Analysis

Every data set, has multiple attributes or dimensions – many of which might have similarity with each other. For
example, the height and weight of a person,.

In general, any machine learning algorithm performs better as the number of related attributes or features
reduced.

In other words, a key to the success of machine learning lies in the fact that the features are less in number as
well as the similarity between each other is very less.

This is the main guiding philosophy of principal component analysis (PCA) technique of feature extraction.

Feature Engineering
No ratings yet
Feature Engineering
50 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
91 pages
Feature Engineering Basics in ML
No ratings yet
Feature Engineering Basics in ML
97 pages
Introduction to Feature Engineering
No ratings yet
Introduction to Feature Engineering
33 pages
Feature Engineering & Dimensionality Reduction
No ratings yet
Feature Engineering & Dimensionality Reduction
44 pages
Feature Engineering Basics for ML
No ratings yet
Feature Engineering Basics for ML
35 pages
Feature Engineering Basics in Machine Learning
No ratings yet
Feature Engineering Basics in Machine Learning
98 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
23 pages
Feature Engineering Essentials for ML
No ratings yet
Feature Engineering Essentials for ML
66 pages
Feature Transformation in Machine Learning
No ratings yet
Feature Transformation in Machine Learning
6 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
64 pages
Feature Engineering Fundamentals Explained
No ratings yet
Feature Engineering Fundamentals Explained
201 pages
Module 3 Data Science
No ratings yet
Module 3 Data Science
11 pages
Understanding Feature Engineering
No ratings yet
Understanding Feature Engineering
11 pages
Understanding Feature Engineering in ML
No ratings yet
Understanding Feature Engineering in ML
6 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
119 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
25 pages
Understanding Feature Engineering in ML
No ratings yet
Understanding Feature Engineering in ML
20 pages
Understanding Feature Engineering in ML
No ratings yet
Understanding Feature Engineering in ML
20 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
139 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
29 pages
Understanding Feature Engineering
No ratings yet
Understanding Feature Engineering
2 pages
Feature Engg2
No ratings yet
Feature Engg2
12 pages
Machine Learning Pipeline & Feature Engineering
No ratings yet
Machine Learning Pipeline & Feature Engineering
35 pages
Understanding Feature Engineering in ML
No ratings yet
Understanding Feature Engineering in ML
10 pages
Understanding Feature Engineering in ML
No ratings yet
Understanding Feature Engineering in ML
7 pages
Lecture Updated
No ratings yet
Lecture Updated
29 pages
Feature Engineering for Machine Learning
No ratings yet
Feature Engineering for Machine Learning
41 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
6 pages
Feature Enginnering
No ratings yet
Feature Enginnering
4 pages
Model Selection & Feature Engineering Guide
No ratings yet
Model Selection & Feature Engineering Guide
9 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
19 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
19 pages
Understanding Data and Feature Engineering
No ratings yet
Understanding Data and Feature Engineering
5 pages
Feature Engineering in Machine Learning
0% (1)
Feature Engineering in Machine Learning
29 pages
Lecture4 Data &features
No ratings yet
Lecture4 Data &features
100 pages
Feature Engineering in Machine Learning
100% (1)
Feature Engineering in Machine Learning
12 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
24 pages
Feature Engineering for Machine Learning
No ratings yet
Feature Engineering for Machine Learning
6 pages
Feature Engineering Techniques for AI
No ratings yet
Feature Engineering Techniques for AI
12 pages
Machine Learning Feature Engineering Guide
No ratings yet
Machine Learning Feature Engineering Guide
11 pages
What Is A Feature Engineering - IBM
No ratings yet
What Is A Feature Engineering - IBM
16 pages
Feature Engineering
No ratings yet
Feature Engineering
4 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
12 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
64 pages
Understanding Feature Engineering
No ratings yet
Understanding Feature Engineering
17 pages
Feature Engineering for Machine Learning
No ratings yet
Feature Engineering for Machine Learning
4 pages
Bim 41
No ratings yet
Bim 41
37 pages
Feature Engineering in Data Mining
No ratings yet
Feature Engineering in Data Mining
12 pages
Best Practices in Feature Engineering
No ratings yet
Best Practices in Feature Engineering
7 pages
Feature Engineering for NLP Explained
No ratings yet
Feature Engineering for NLP Explained
1 page
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
53 pages
Feature Engineering for Data Insights
No ratings yet
Feature Engineering for Data Insights
13 pages
Dimensionality Reduction with PCA Techniques
No ratings yet
Dimensionality Reduction with PCA Techniques
86 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
18 pages
Feature Engineering for Predictive Modeling
No ratings yet
Feature Engineering for Predictive Modeling
25 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
69 pages
ISO 9000 Certification for Software Quality
No ratings yet
ISO 9000 Certification for Software Quality
56 pages
Steps in Classification Learning Process
No ratings yet
Steps in Classification Learning Process
12 pages
Machine Learning Course Lesson Plan
No ratings yet
Machine Learning Course Lesson Plan
4 pages
FDP Invitation: AI Program at JNTUA
No ratings yet
FDP Invitation: AI Program at JNTUA
1 page
Physics Form 3: Work, Energy, Power Exam
No ratings yet
Physics Form 3: Work, Energy, Power Exam
7 pages
Electrical Engineer Resume: RF & Analog Focus
No ratings yet
Electrical Engineer Resume: RF & Analog Focus
2 pages
Java Method Overloading Assignments
No ratings yet
Java Method Overloading Assignments
10 pages
Industrial Tunnel Washing Systems Guide
No ratings yet
Industrial Tunnel Washing Systems Guide
43 pages
SEMIKRON DataSheet SKD 160 07913230 PDF
No ratings yet
SEMIKRON DataSheet SKD 160 07913230 PDF
3 pages
Space Syntax Observation Manual
0% (1)
Space Syntax Observation Manual
18 pages
Free Tour Listing Websites Guide
No ratings yet
Free Tour Listing Websites Guide
35 pages
Petronas Twin Towers: Engineering Marvels
No ratings yet
Petronas Twin Towers: Engineering Marvels
3 pages
Class 7 Science: Electric Current Q&A
No ratings yet
Class 7 Science: Electric Current Q&A
8 pages
CAR-66 AME License Guidelines
100% (4)
CAR-66 AME License Guidelines
2 pages
Differential Equations Exam Questions
No ratings yet
Differential Equations Exam Questions
16 pages
Deed of Gift for Public Recreation
No ratings yet
Deed of Gift for Public Recreation
2 pages
The Shard: London's Iconic Skyscraper
No ratings yet
The Shard: London's Iconic Skyscraper
3 pages
CSR Practices in Islami Bank Bangladesh
No ratings yet
CSR Practices in Islami Bank Bangladesh
40 pages
High Sensitivity LSPR Sensor for SARS-CoV-2
No ratings yet
High Sensitivity LSPR Sensor for SARS-CoV-2
15 pages
Modern Transport and Time Savings
No ratings yet
Modern Transport and Time Savings
4 pages
Electrical Drives and Control Exam Guide
100% (1)
Electrical Drives and Control Exam Guide
2 pages
CO - Unit 2
No ratings yet
CO - Unit 2
31 pages
Trigonometric Graph Transformations Guide
No ratings yet
Trigonometric Graph Transformations Guide
3 pages
Applications of Communication Satellites
No ratings yet
Applications of Communication Satellites
17 pages
Bronze Age Studies in Europe and Beyond
No ratings yet
Bronze Age Studies in Europe and Beyond
259 pages
Buoyancy Experiment: Density Determination
No ratings yet
Buoyancy Experiment: Density Determination
8 pages
IIT-JAM Biotechnology Genetics Test
No ratings yet
IIT-JAM Biotechnology Genetics Test
5 pages
Enhancing Student Transportation Proposal
No ratings yet
Enhancing Student Transportation Proposal
7 pages
M V Enterprises LPG Invoice Details
No ratings yet
M V Enterprises LPG Invoice Details
2 pages
Level 2 Assessment Form for Coliform
No ratings yet
Level 2 Assessment Form for Coliform
45 pages
Infant Tub Bath Procedure Guide
No ratings yet
Infant Tub Bath Procedure Guide
3 pages
Blood Test Report - Hematology Analysis
No ratings yet
Blood Test Report - Hematology Analysis
1 page
Let's Talk About Time
No ratings yet
Let's Talk About Time
3 pages
Understanding Magnetars: A Review
No ratings yet
Understanding Magnetars: A Review
42 pages

Feature Engineering in Machine Learning

Uploaded by

Feature Engineering in Machine Learning

Uploaded by

Basics of Feature Engineering

Feature engineering is a critical preparatory process in machine learning.

“Feature engineering refers to the process of translating a data set into

• A feature is an attribute of a data set that is used in a machine learning process.

• The features in a data set are also called its dimensions.

• So a data set having ‘n’ features is called an n-dimensional data set.

dimensional data set.

Both are sometimes known as feature discovery.

So at the end, the data set will become ‘n + m’ dimensional.

 Achieving best reconstruction of the original features in the data set

FIG. 4.2 Feature construction

[Link] Transforming numeric (continuous) features to categorical features

FIG. 4.5 Feature construction (numeric to categorical

For Boolean features: Conjunctions, Disjunctions, Negation, etc.

 Principal Component Analysis

Principal Component Analysis

You might also like