0% found this document useful (0 votes)

11 views25 pages

Introduction to Machine Learning Basics

Machine learning (ML) enables computers to learn from data and make decisions without explicit programming, with applications in various fields such as healthcare and finance. It encompasses different types including supervised, unsupervised, and reinforcement learning, each with unique methodologies for pattern recognition and decision-making. The importance of quality data is emphasized, as it underpins the effectiveness of ML models, while challenges such as data bias and interpretability are also discussed.

Uploaded by

lavanya.d

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views25 pages

Introduction to Machine Learning Basics

Uploaded by

lavanya.d

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Introduction to Machine Learning

Machine learning (ML) allows computers to learn and make decisions without being
explicitly programmed. It involves feeding data into algorithms to identify patterns
and make predictions on new data. Machine learning is used in various applications,
including image and speech recognition, natural language processing, and recommender
systems.

Why do we need Machine Learning?

Machine Learning algorithm learns from data, train on patterns, and solve or predict
complex problems beyond the scope of traditional programming. It drives better decision-
making and tackles intricate challenges efficiently.
Here’s why ML is indispensable across industries:
1. Solving Complex Business Problems
Traditional programming struggles with tasks like image recognition, natural language
processing (NLP), and medical diagnosis. ML, however, thrives by learning from examples
and making predictions without relying on predefined rules.
Example Applications:
 Image and speech recognition in healthcare.
 Language translation and sentiment analysis.
2. Handling Large Volumes of Data
With the internet’s growth, the data generated daily is immense. ML effectively processes
and analyzes this data, extracting valuable insights and enabling real-time predictions.
Use Cases:
 Fraud detection in financial transactions.
 Social media platforms like Facebook and Instagram predicting personalized feed
recommendations from billions of interactions.
3. Automate Repetitive Tasks
ML automates time-intensive and repetitive tasks with precision, reducing manual effort
and error-prone systems.
Examples:
 Email Filtering: Gmail uses ML to keep your inbox spam-free.
 Chatbots: ML-powered chatbots resolve common issues like order tracking and
password resets.
 Data Processing: Automating large-scale invoice analysis for key insights.
4. Personalized User Experience
ML enhances user experience by tailoring recommendations to individual preferences. Its
algorithms analyze user behavior to deliver highly relevant content.
Real-World Applications:
 Netflix: Suggests movies and TV shows based on viewing history.
 E-Commerce: Recommends products you’re likely to purchase.
5. Self Improvement in Performance
ML models evolve and improve with more data, making them smarter over time. They
adapt to user behavior and refine their performance.
Examples:
 Voice Assistants (e.g., Siri, Alexa): Learn user preferences, improve voice recognition,
and handle diverse accents.
 Search Engines: Refine ranking algorithms based on user interactions.
 Self-Driving Cars: Enhance decision-making using millions of miles of data from
simulations and real-world driving.
What Makes a Machine “Learn”?
A machine “learns” by recognizing patterns and improving its performance on a task based
on data, without being explicitly programmed.
The process involves:
1. Data Input: Machines require data (e.g., text, images, numbers) to analyze.
2. Algorithms: Algorithms process the data, finding patterns or relationships.
3. Model Training: Machines learn by adjusting their parameters based on the input data
using mathematical models.
4. Feedback Loop: The machine compares predictions to actual outcomes and corrects
errors (via optimization methods like gradient descent).
5. Experience and Iteration: Repeating this process with more data improves the
machine’s accuracy over time.
6. Evaluation and Generalization: The model is tested on unseen data to ensure it
performs well on real-world tasks.
In essence, machines “learn” by continuously refining their understanding through data-
driven iterations, much like humans learn from experience.
Importance of Data in Machine Learning
Data is the foundation of machine learning (ML). Without quality data, ML models cannot
learn, perform, or make accurate predictions.
 Data provides the examples from which models learn patterns and relationships.
 High-quality and diverse data improves model accuracy and generalization.
 Data ensures models understand real-world scenarios and adapt to practical applications.
 Features derived from data are critical for training models.
 Separate datasets for validation and testing assess how well the model performs on
unseen data.
 Data fuels iterative improvements in ML models through feedback loops.
Types of Machine Learning
1. Supervised learning
Supervised learning is a type of machine learning where a model is trained on labeled data
—meaning each input is paired with the correct output. The model learns by comparing its
predictions with the actual answers provided in the training data.
Both classification and regression problems are supervised learning problems.
Example: Consider the following data regarding patients entering a clinic. The data
consists of the gender and age of the patients and each patient is labeled as “healthy” or
“sick”.
Gender Age Label

M 48 sick

M 67 sick

F 53 healthy

M 49 sick

F 32 healthy

M 34 healthy

M 21 healthy

In this example, supervised learning is to use this labeled data to train a model that can
predict the label (“healthy” or “sick”) for new patients based on their gender and age. For
instance, if a new patient (e.g., Male, 50 years old) visits the clinic, the model can classify
whether the patient is “healthy” or “sick” based on the patterns it learned during training.
2. Unsupervised learning:
Unsupervised learning algorithms draw inferences from datasets consisting of input data
without labeled responses. In unsupervised learning algorithms, classification or
categorization is not included in the observations.
Example: Consider the following data regarding patients entering a clinic. The dataset
includes unlabeled data, where only the gender and age of the patients are available, with
no health status labels.
Gender Age

M 48

M 67

F 53

M 49
Gender Age

F 34

M 21

Here, unsupervised learning technique will be used to find patterns or groupings in the data
such as clustering patients by age or gender. For example, the algorithm might group
patients into clusters, such as “younger healthy patients” or “older patients,” without prior
knowledge of their health status.
3. Reinforcement Learning
Reinforcement Learning (RL) trains an agent to act in an environment by maximizing
rewards through trial and error. Unlike other machine learning types, RL doesn’t provide
explicit instructions.
Instead, the agent learns by:
 Exploring Actions: Trying different actions.
 Receiving Feedback: Rewards for correct actions, punishments for incorrect ones.
 Improving Performance: Refining strategies over time.
Example: Identifying a Fruit
The system receives an input (e.g., an apple) and initially makes an incorrect prediction
(“It’s a mango”). Feedback is provided to correct the error (“Wrong! It’s an apple”), and
the system updates its model based on this feedback.
Over time, it learns to respond correctly (“It’s an apple”) when encountering similar inputs,
improving accuracy through trial, error, and feedback.

Beyond these three of machine learning techniques, there are two additional approaches
have gained significant attention in modern Machine Learning Self-Supervised
Learning and Semi-Supervised Learning .
Self-Supervised Learning
Self-supervised learning (SSL) is a machine learning technique that trains models using unlabeled
data to generate labels, or supervisory signals.
Semi-Supervised Learning .

Semi-supervised learning is a type of machine learning that falls in

between supervised and unsupervised learning. It is a method that
uses a small amount of labeled data and a large amount of
unlabeled data to train a model. The goal of semi-supervised
learning is to learn a function that can accurately predict the output
variable based on the input variable

Benefits of Machine Learning

 Enhanced Efficiency and Automation: ML automates repetitive tasks, freeing up
human resources for more complex work. It also streamlines processes, leading to
increased efficiency and productivity.
 Data-Driven Insights: ML can analyze vast amounts of data to identify patterns and
trends that humans might miss. This allows for better decision-making based on real-
world data.
 Improved Personalization: ML personalizes user experiences across various platforms.
From recommendation systems to targeted advertising, ML tailors content and services
to individual preferences.
 Advanced Automation and Robotics: ML empowers robots and machines to perform
complex tasks with greater accuracy and adaptability. This is revolutionizing fields like
manufacturing and logistics.
Challenges of Machine Learning
 Data Bias and Fairness: ML algorithms are only as good as the data they are trained
on. Biased data can lead to discriminatory outcomes, requiring careful data selection and
monitoring of algorithms.
 Security and Privacy Concerns: As ML relies heavily on data, security breaches can
expose sensitive information. Additionally, the use of personal data raises privacy
concerns that need to be addressed.
 Interpretability and Explainability: Complex ML models can be difficult to
understand, making it challenging to explain their decision-making processes. This lack
of transparency can raise questions about accountability and trust.
 Job Displacement and Automation: Automation through ML can lead to job
displacement in certain sectors. Addressing the need for retraining and reskilling the
workforce is crucial.
Evolution of Machine Learing
 Machine learning is a subset of Artificial Intelligence (AI) which uses algorithms to
learn from data. If you are familiar with AI as it has been through the news recently in
regards to fending off cyber attacks or causing self-driving cars accidents then this will
be easy for you understand. The idea that machines can “learn” without being
programmed by humans goes back further than just recent years though; think about the
first computers in the history of machine learning and how they could only do one thing
at a time. For a practical example of this concept in action, consider how Clickworker
facilitated the training of face recognition software through their innovative crowd-
sourced approach. You can read about this case study here.
 One of the main reasons that machine learning is important for business use today is
because it helps businesses be more efficient in their processes, as well as provide better
customer service through AI-assisted chat bots or automated emails. It also provides
great tools to help teach other people about new things like geography or historical
events!

Machine Learning Paradigms:

Machine learning (ML) is a dynamic field dedicated to developing methods that enable
machines to learn from extensive datasets to enable machines to learn and make predictions.
The learning paradigms in ML are categorized based on their resemblance to human
interventions, each serving specific purposes and applications. This dynamic field
encompasses various learning paradigms, each with its unique approach to handling data.

Supervised and Unsupervised learning

Supervised Learning (SL)

Supervised learning involves labelled datasets, where each data observation is paired with a
corresponding class label. Algorithms in supervised learning aim to build a mathematical
function that maps input features to desired output values based on these labeled examples.
Common applications include classification and regression.
Stages in Supervised Learning

Understanding Supervised Learning pictorially

Unsupervised Learning

In unsupervised learning, algorithms work with unlabeled data to identify patterns and
relationships. These methods uncover commonalities within the data without predefined
categories. Techniques such as clustering and association rules fall under unsupervised
learning.

Stages in Unsupervised Learning

Understanding Unsupervised Learning pictorially

Semi-supervised Learning

Semi-supervised learning strikes a balance by combining a small amount of labelled data with
a larger pool of unlabeled data. This approach leverages the benefits of both supervised and
unsupervised learning paradigms, making it a cost-effective and efficient method for training
models when the labeled data is limited.

Understanding Semi-supervised Learning pictorially

Self-supervised Learning (SSL)

In scenarios where obtaining high-quality labeled data is challenging, self-supervised learning
emerges as a solution. In this paradigm, models are pre-trained using unlabeled data, and data
labels are generated automatically during subsequent iterations. SSL transforms unsupervised
ML problems into supervised ones, enhancing learning efficiency. This paradigm is
particularly relevant with the rise of large language models.

Reinforcement Learning

Reinforcement learning focuses on enabling intelligent agents to learn tasks through trial-and-
error interactions with dynamic environments. Without the need for labelled datasets, agents
make decisions to maximize a reward function. This autonomous exploration and learning
approach is crucial for tasks where explicit programming is challenging.

Action-Reward feedback loop: an agent takes actions in an environment, which is interpreted

into a reward and a representation of the state, which are fed back into the agent.

Action-Reward Feedback Loop:

Reinforcement learning operates on an action-reward feedback loop, where agents take

actions, receive rewards, and interpret the environment’s state. This iterative process allows
the agent to autonomously learn optimal actions to maximize positive feedback.
Learning by rote
Rote learning in machine learning is a simple learning pattern that involves memorizing new
information and storing it for future use. It involves the accumulation of information over
time, which can impact the speed and accuracy of recognition as the database grows.
Learn by induction
"Learn by induction" in machine learning refers to the process where a model learns general
rules or patterns by analyzing specific examples from a dataset, essentially making inferences
about broader concepts based on observed data, allowing it to predict outcomes on unseen
data - this is the core principle of most machine learning algorithms and is considered the
primary method of learning in the field.
Key points about inductive learning:
 Generalization:
The key goal of inductive learning is to generalize from the training data to make accurate
predictions on new, unseen data.
 Example-based learning:
Models learn by observing labelled examples where input data is paired with corresponding
outputs.
 Pattern recognition:
The algorithm identifies patterns and relationships within the data to build a model that can
represent the underlying structure.
 Inductive bias:
This refers to the assumptions a model makes about the data, which can be built into the
algorithm and influence how it learns patterns.
How it works:
1. Training phase:
The model is presented with a labeled dataset containing input data and corresponding target
values.
2. Pattern extraction:
The model analyzes the data, identifying recurring patterns and relationships between features
and targets.
3. Model building:
Based on the identified patterns, the model builds a mathematical representation that can be
used to make predictions on new data.
Examples of inductive learning algorithms:
 Decision trees: Learn by splitting data based on features to create a set of rules for
classification.
 Naive Bayes classifiers: Based on Bayes theorem, making probabilistic predictions based on
features
 Neural networks: Learn complex patterns through a layered structure, adapting weights to
improve predictions
Important considerations:
 Overfitting:
If a model learns the training data too closely, it may not generalize well to unseen data.
 Underfitting:
If a model is too simple, it may not capture important patterns in the data.
 Data quality:
The quality of the training data significantly impacts the model's ability to learn accurate
patterns.
Reinforcement Learning
Reinforcement Learning: An Overview
Reinforcement Learning (RL) is a branch of machine learning focused on making decisions
to maximize cumulative rewards in a given situation. Unlike supervised learning, which
relies on a training dataset with predefined answers, RL involves learning through
experience. In RL, an agent learns to achieve a goal in an uncertain, potentially complex
environment by performing actions and receiving feedback through rewards or penalties.
Key Concepts of Reinforcement Learning
 Agent: The learner or decision-maker.
 Environment: Everything the agent interacts with.
 State: A specific situation in which the agent finds itself.
 Action: All possible moves the agent can make.
 Reward: Feedback from the environment based on the action taken.
How Reinforcement Learning Works
RL operates on the principle of learning optimal behaviour through trial and error. The
agent takes actions within the environment, receives rewards or penalties, and adjusts its
behaviou to maximize the cumulative reward. This learning process is characterized by the
following elements:
 Policy: A strategy used by the agent to determine the next action based on the current
state.
 Reward Function: A function that provides a scalar feedback signal based on the state
and action.
 Value Function: A function that estimates the expected cumulative reward from a given
state.
 Model of the Environment: A representation of the environment that helps in planning
by predicting future states and rewards.
Example: Navigating a Maze
The problem is as follows: We have an agent and a reward, with many hurdles in between.
The agent is supposed to find the best possible path to reach the reward. The following
problem explains the problem more easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the
reward that is the diamond and avoid the hurdles that are fired. The robot learns by trying
all the possible paths and then choosing the path which gives him the reward with the least
hurdles. Each right step will give the robot a reward and each wrong step will subtract the
reward of the robot. The total reward will be calculated when it reaches the final reward
that is the diamond.

Main points in Reinforcement learning –

 Input: The input should be an initial state from which the model will start
 Output: There are many possible outputs as there are a variety of solutions to a particular
problem
 Training: The training is based upon the input, The model will return a state and the user
will decide to reward or punish the model based on its output.
 The model keeps continues to learn.
 The best solution is decided based on the maximum reward.

Difference between Reinforcement learning and Supervised learning:

Reinforcement learning Supervised learning

Reinforcement learning is all about making decisions

In Supervised learning, the
sequentially. In simple words, we can say that the
decision is made on the initial
output depends on the state of the current input and
input or the input given at the
the next input depends on the output of the previous
start
input

In supervised learning the

In Reinforcement learning decision is dependent, So decisions are independent of
we give labels to sequences of dependent decisions each other so labels are given to
each decision.

Example:Object recognition,spam
Example: Chess game,text summarization
detetction

Types of Reinforcement:
1. Positive: Positive Reinforcement is defined as when an event, occurs due to a particular
behavior, increases the strength and the frequency of the behaviour. In other words, it
has a positive effect on behaviour.
Advantages of reinforcement learning are:

 Maximizes Performance
 Sustain Change for a long period of time
 Too much Reinforcement can lead to an overload of states which can diminish the
results
2. Negative: Negative Reinforcement is defined as strengthening of behaviour because a
negative condition is stopped or avoided.
Advantages of reinforcement learning:

 Increases behaviour
 Provide defiance to a minimum standard of performance
 It Only provides enough to meet up the minimum behaviour
Elements of Reinforcement Learning
i) Policy: Defines the agent’s behaviour at a given time.
ii) Reward Function: Defines the goal of the RL problem by providing feedback.
iii) Value Function: Estimates long-term rewards from a state.
iv) Model of the Environment: Helps in predicting future states and rewards for planning.
Types of data In ML
Data Types Are A Way Of Classification That Specifies Which Type Of Value A Variable
Can Store And What Type Of Mathematical Operations, Relational, Or Logical Operations
Can Be Applied To The Variable Without Causing An Error. In Machine Learning, It Is
Very Important To Know Appropriate Datatypes Of Independent And Dependent Variable.

As It Provides The Basis For Selecting Classification Or Regression Models. Incorrect

Identification Of Data Types Leads To Incorrect Modelling Which In Turn Leads To An
Incorrect Solution.

Here I Will Be Discussing Different Types Of Data Types With Suitable Examples.

Different Types Of Data Types

The Data Type Is Broadly Classified Into

1. Quantitative
2. Qualitative

Fig: Different Data

Types

1. Quantitative Data Type: –

This Type Of Data Type Consists Of Numerical Values. Anything Which Is Measured By
Numbers.
E.G., Profit, Quantity Sold, Height, Weight, Temperature, Etc.

This Is Again Of Two Types

A.) Discrete Data Type: –

The Numeric Data Which Have Discrete Values Or Whole Numbers. This Type Of Variable
Value If Expressed In Decimal Format Will Have No Proper Meaning. Their Values Can Be
Counted.

E.G.: – No. Of Cars You Have, No. Of Marbles In Containers, Students In A Class, Etc.

Fig: Discrete Data Types

B.) Continuous Data Type: –

The Numerical Measures Which Can Take The Value Within A Certain Range. This Type Of
Variable Value If Expressed In Decimal Format Has True Meaning. Their Values Can Not
Be Counted But Measured. The Value Can Be Infinite

E.G.: – Height, Weight, Time, Area, Distance, Measurement Of Rainfall, Etc.

Fig: Continuous Data Types

2. Qualitative Data Type: –

These Are The Data Types That Cannot Be Expressed In Numbers. This Describes
Categories Or Groups And Is Hence Known As The Categorical Data Type.

This Can Be Divided Into:-

A. Structured Data:
This Type Of Data Is Either Number Or Words. This Can Take Numerical Values But
Mathematical Operations Cannot Be Performed On It. This Type Of Data Is Expressed In
Tabular Format.

E.G.) Sunny=1, Cloudy=2, Windy=3 Or Binary Form Data Like 0 Or1, Good Or Bad, Etc.

Fig: Structured Data

B. Unstructured Data:
This Type Of Data Does Not Have The Proper Format And Therefore Known As
Unstructured [Link] Comprises Textual Data, Sounds, Images, Videos, Etc.

Fig: Unstructured
Data
Besides This, There Are Also Other Types Refer As Data Types Preliminaries Or Data
Measures:-

1. Nominal
2. Ordinal
3. Interval
4. Ratio
These Can Also Be Refer Different Scales Of Measurements.
I. Nominal Data Type:
This Is In Use To Express Names Or Labels Which Are Not Order Or Measurable.

E.G., Male Or Female (Gender), Race, Country, Etc.

Fig: Gender (Female, Male), An Example Of Nominal Data

Type

II. Ordinal Data Type:

This Is Also A Categorical Data Type Like Nominal Data But Has Some Natural Ordering
Associated With It.

E.G., Likert Rating Scale, Shirt Sizes, Ranks, Grades, Etc.

Fig: Rating
(Good, Average, Poor), An Example Of Ordinal Data Type

III. Interval Data Type:

This Is Numeric Data Which Has Proper Order And The Exact Zero Means The True
Absence Of A Value Attached. Here Zero Means Not A Complete Absence But Has Some
Value. This Is The Local Scale.

E.G., Temperature Measured In Degree Celsius, Time, Sat Score, Credit Score, PH, Etc.
Difference Between Values Is Familiar. In This Case, There Is No Absolute Zero. Absolute
Fig: Temperature, An Example Of Interval Data
Type

IV. Ratio Data Type:

This Quantitative Data Type Is The Same As The Interval Data Type But Has The Absolute
Zero. Here Zero Means Complete Absence And The Scale Starts From Zero. This Is The
Global Scale.

E.G., Temperature In Kelvin, Height, Weight, Etc.

Fig: Weight, An
Example Of Ratio Data Type
The stages in machine learning
It include data collection, model development, and model deployment.
Data collection
 Data collection: The first step in the machine learning process
 Data preparation: The process of cleaning, normalizing, and organizing data so it can be
used for analysis and modeling
Model development
 Model engineering: The process of creating, training, and refining machine learning models
 Model evaluation: The process of assessing the model's performance
 Hyperparameter tuning and optimization: The process of adjusting the model's parameters
to improve its performance
Model deployment
 Model deployment
The process of putting the machine learning model into production so it can be used to make
predictions
 Monitoring and maintenance
The process of continuously monitoring and improving the model's performance
ML LifeCycle
Machine learning lifecycle is a process that guides development and deployment of
machine learning models in a structured way. It consists of various steps. Each step plays a
crucial role in ensuring the success and effectiveness of the machine learning model. By
following the machine learning lifecycle we can solve complex problems, can get data-
driven insights and create scalable and sustainable models. The steps are:
1. Problem Definition
2. Data Collection
3. Data Cleaning and Preprocessing
4. Exploratory Data Analysis (EDA)
5. Feature Engineering and Selection
6. Model Selection
7. Model Training
8. Model Evaluation and Tuning
9. Model Deployment
10. Model Monitoring and Maintenance

Machine
Learning Lifecycle
Step 1: Problem Definition
In this initial phase we need to identify the business problem and frame it. By framing the
problem in a comprehensive manner, team can establishes foundation for machine learning
lifecycle. Crucial elements such as project objectives, desired outcomes and the scope of
the task are carefully designed during this stage.
Here are some steps for problem definition:
 Collaboration: Work together with stakeholders to understand and define the business
problem.
 Clarity: Clearly write down the objectives, desired outcomes and scope of the task.
 Foundation: Establish a solid foundation for the machine learning process by framing
the problem comprehensively.
Step 2: Data Collection
After problem definition, machine learning lifecycle progresses to data collection. This
phase involves systematic collection of datasets that can be used as raw data to train model.
The quality and diversity of the data collected directly impact the robustness and
generalization of the model.
During data collection we must consider the relevance of the data to the defined problem
ensuring that the selected datasets consist all necessary features and characteristics. A well-
organized approach for data collection helps in effective model training, evaluation and
deployment ensuring that the resulting model is accurate and can be used for real world
scenarios.
Here are some basic features of Data Collection:
 Relevance: Collect data should be relevant to the defined problem and include
necessary features.
 Quality: Ensure data quality by considering factors like accuracy and ethical use.
 Quantity: Gather sufficient data volume to train a robust model.
 Diversity: Include diverse datasets to capture a broad range of scenarios and patterns.
Step 3: Data Cleaning and Preprocessing
With datasets in hand now we need to do data cleaning and preprocessing . Raw data is
often messy and unstructured and if we use this data directly to train then it can lead to poor
accuracy and capturing unnecessary relation in data, data cleaning involves addressing
issues such as missing values, outliers and inconsistencies in data that could compromise
the accuracy and reliability of the machine learning model.
Preprocessing is done by standardizing formats, scaling values and encoding categorical
variables creating a consistent and well-organized dataset. The objective is to refine the raw
data into a format that it is meaningful for analysis and training. By data cleaning and
preprocessing we ensure that the model is trained on high-quality and reliable data.
Here are the basic features of Data Cleaning and Preprocessing:
 Data Cleaning: Address issues such as missing values, outliers and inconsistencies in
the data.
 Data Preprocessing: Standardize formats, scale values, and encode categorical
variables for consistency.
 Data Quality: Ensure that the data is well-organized and prepared for meaningful
analysis.
Step 4: Exploratory Data Analysis (EDA)
To find patterns and characteristics hidden in the data Exploratory Data Analysis (EDA) is
used to uncover insights and understand the dataset's structure. During EDA patterns, trends
and insights are provided which may not be visible by naked eyes. This valuable insight can
be used to make informed decision.
Visualizations helps in showing statistical summary in easy and understandable way. It also
helps in making choices in feature engineering, model selection and other critical aspects.
Here are the basic features of Exploratory Data Analysis:
 Exploration: Use statistical and visual tools to explore patterns in data.
 Patterns and Trends: Identify underlying patterns, trends and potential challenges
within the dataset.
 Insights: Gain valuable insights for informed decisions making in later stages.
 Decision Making: Use EDA for feature engineering and model selection.
Step 5: Feature Engineering and Selection
Feature engineering and selection is a transformative process that involve selecting only
relevant features for model prediction. Feature selection refines pool of variables
identifying the most relevant ones to enhance model efficiency and effectiveness.
Feature engineering involves selecting relevant features or creating new features by
transforming existing ones for prediction. This creative process requires domain expertise
and a deep understanding of the problem ensuring that the engineered features contribute
meaningfully for model prediction. It helps accuracy while minimizing computational
complexity.
Here are the basic features of Feature Engineering and Selection:
 Feature Engineering: Create new features or transform existing ones to capture better
patterns and relationships.
 Feature Selection: Identify subset of features that most significantly impact the model's
performance.
 Domain Expertise: Use domain knowledge to engineer features that contribute
meaningfully for prediction.
 Optimization: Balance set of features for accuracy while minimizing computational
complexity.
Step 6: Model Selection
For a good machine learning model, model selection is a very important part as we need to
find model that aligns with our defined problem and the characteristics of the dataset.
Model selection is a important decision that determines the algorithmic framework for
prediction. The choice depends on the nature of the data, the complexity of the problem and
the desired outcomes.
Here are the basic features of Model Selection:
 Alignment: Select a model that aligns with the defined problem and characteristics of
the dataset.
 Complexity: Consider the complexity of the problem and the nature of the data when
choosing a model.
 Decision Factors: Evaluate factors like performance, interpretability and scalability
when selecting a model.
 Experimentation: Experiment with different models to find the best fit for the problem.
Step 7: Model Training
With the selected model the machine learning lifecycle moves to model training process.
This process involves exposing model to historical data allowing it to learn patterns,
relationships and dependencies within the dataset.
Model training is an iterative process where the algorithm adjusts its parameters to
minimize errors and enhance predictive accuracy. During this phase the model fine-tunes
itself for better understanding of data and optimizing its ability to make predictions.
Rigorous training process ensure that the trained model works well with new unseen data
for reliable predictions in real-world scenarios.
Here are the basic features of Model Training:
 Training Data: Expose the model to historical data to learn patterns, relationships and
dependencies.
 Iterative Process: Train the model iteratively, adjusting parameters to minimize errors
and enhance accuracy.
 Optimization: Fine-tune model to optimize its predictive capabilities.
 Validation: Rigorously train model to ensure accuracy to new unseen data.
Step 8: Model Evaluation and Tuning
Model evaluation involves rigorous testing against validation or test datasets to test
accuracy of model on new unseen data. We can use technique like accuracy, precision,
recall and F1 score to check model effectiveness.
Evaluation is critical to provide insights into the model's strengths and weaknesses. If the
model fails to acheive desired performance levels we may need to tune model again and
adjust its hyperparameters to enhance predictive accuracy. This iterative cycle of evaluation
and tuning is crucial for achieving the desired level of model robustness and reliability.
Here are the basic features of Model Evaluation and Tuning:
 Evaluation Metrics: Use metrics like accuracy, precision, recall and F1 score to
evaluate model performance.
 Strengths and Weaknesses: Identify the strengths and weaknesses of the model
through rigorous testing.
 Iterative Improvement: Initiate model tuning to adjust hyperparameters and enhance
predictive accuracy.
 Model Robustness: Iterative tuning to achieve desired levels of model robustness and
reliability.
Step 9: Model Deployment
Upon successful evaluation machine learning model is ready for deployment for real-world
application. Model deployment involves integrating the predictive model with existing
systems allowing business to use this for informed decision-making.
Here are the basic features of Model Deployment:
 Integration: Integrate the trained model into existing systems or processes for real-
world application.
 Decision Making: Use the model's predictions for informed decision.
 Practical Solutions: Deploy the model to transform theoretical insights into practical
use that address business needs.
 Continuous Improvement: Monitor model performance and make adjustments as
necessary to maintain effectiveness over time.
CRISP -MODEL-CRISP-DM FRAMEWORK
The CRISP-DM (Cross Industry Standard Process for Data Mining) framework is a standard
process or framework for solving analytical problems. The framework is comprised of a six-
phase workflow and was designed to be flexible, so it suits a wide variety of projects.
What is crisp in ML?
Cross-Industry Standard Process for Machine Learning with Quality Assurance is abbreviated
as CRISP-ML
What are the 6 CRISP-DM Phases?
I. Business Understanding. ...
II. Data Understanding. ...
III. Data Preparation. ...
IV. Modeling. ...
V. Evaluation. ...
VI. Deployment. ...
CRISP-DM Waterfall: Horizontal Slicing. Learn more about slicing at Vertical vs Horizontal
Slicing Data Science.
CRISP-DM

Creating a machine learning system involves more than just selecting a model, training it, and
applying it to new data. There are frameworks that help us organize machine learning
projects.

One such framework is CRISP-DM — the Cross-Industry Standard Process for Data Mining.
It was invented quite long ago, in 1996, but in spite of its age, it’s still applicable to today’s
problems.

According to CRISP-DM, the machine learning process has six steps:

1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment

Each phase covers typical tasks:

 In the business understanding step, we try to identify the problem, to understand how
we can solve it, and to decide whether machine learning will be a useful tool for
solving it.
 In the data understanding step, we analyze available datasets and decide whether we
need to collect more data.
 In the data preparation step, we transform the data into tabular form that we can use as
input for a machine learning model.
 When the data is prepared, we move to the modeling step, in which we train a model.
 After the best model is identified, there’s the evaluation step, where we evaluate the
model to see if it solves the original business problem and measure its success at
doing that.
 Finally, in the deployment step, we deploy the model to the production environment.

Example
Suppose we want to build a spam detection system: for each email we get, we want to
determine if it’s spam or not. If it is, we want to put it into the “spam” folder.

Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
7 pages
ML in DSP
No ratings yet
ML in DSP
45 pages
UNIT1 Machine Learning
No ratings yet
UNIT1 Machine Learning
62 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
14 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
19 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
25 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
21 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
6 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
8 pages
Module 1
No ratings yet
Module 1
19 pages
ML Unit I
No ratings yet
ML Unit I
31 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
17 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
153 pages
Fda Unit 4
No ratings yet
Fda Unit 4
35 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
12 pages
Unit I
No ratings yet
Unit I
61 pages
I Unit Seminar
No ratings yet
I Unit Seminar
8 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
33 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
24 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
36 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
11 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
4 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
12 pages
Machine Learning
No ratings yet
Machine Learning
18 pages
Unit 1
No ratings yet
Unit 1
12 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
8 pages
Domains of AI
No ratings yet
Domains of AI
36 pages
Unit-I: Introduction To Machine Learning & Preparing To Model 1. What Is Human Learning?
No ratings yet
Unit-I: Introduction To Machine Learning & Preparing To Model 1. What Is Human Learning?
28 pages
Intro. Machine Learning
No ratings yet
Intro. Machine Learning
97 pages
Unit 1
No ratings yet
Unit 1
47 pages
Human and Machine Learning Overview
No ratings yet
Human and Machine Learning Overview
13 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
19 pages
Fundamentals of Machine Learning (AIM-405)
No ratings yet
Fundamentals of Machine Learning (AIM-405)
30 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
20 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
23 pages
Machine Learning Techniques and Applications
No ratings yet
Machine Learning Techniques and Applications
26 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
29 pages
Overview of Machine Learning Types
No ratings yet
Overview of Machine Learning Types
3 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
31 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
70 pages
Introduction to Machine Learning Overview
No ratings yet
Introduction to Machine Learning Overview
62 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
10 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
30 pages
Machine Learning: Definition & Applications
No ratings yet
Machine Learning: Definition & Applications
20 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
33 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
47 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
43 pages
Human and Machine Learning Overview
No ratings yet
Human and Machine Learning Overview
44 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
25 pages
Overview of Machine Learning Types
No ratings yet
Overview of Machine Learning Types
17 pages
MLT Unit-1 Introduction
No ratings yet
MLT Unit-1 Introduction
32 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
157 pages
Machine Learning: History and Applications
No ratings yet
Machine Learning: History and Applications
11 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
AI vs Data Science Overview
No ratings yet
AI vs Data Science Overview
24 pages
Updated - Ai and Cloud
No ratings yet
Updated - Ai and Cloud
20 pages
AIML - MinorDegree - III-II Syllabus
No ratings yet
AIML - MinorDegree - III-II Syllabus
8 pages
Understanding Data Mining Processes
No ratings yet
Understanding Data Mining Processes
70 pages
Deep Learning for Real Estate Analysis
No ratings yet
Deep Learning for Real Estate Analysis
18 pages
Week 8
No ratings yet
Week 8
4 pages
Data Engineering Lab Exam Questions
No ratings yet
Data Engineering Lab Exam Questions
1 page
Array Operations in C Programming
No ratings yet
Array Operations in C Programming
12 pages
Operating System QP
No ratings yet
Operating System QP
2 pages
Decision Trees and Ensemble Models Guide
No ratings yet
Decision Trees and Ensemble Models Guide
14 pages
Predictive Modeling in Data Science
No ratings yet
Predictive Modeling in Data Science
38 pages
K-Means Classification Prediction
No ratings yet
K-Means Classification Prediction
2 pages
KNN Algorithm for Classification & Regression
No ratings yet
KNN Algorithm for Classification & Regression
2 pages
Java OOP and EDA Course Outline 2024
No ratings yet
Java OOP and EDA Course Outline 2024
28 pages
Data Analysis Summary Report
No ratings yet
Data Analysis Summary Report
2 pages
SQL Procedures and Queries for Database Management
No ratings yet
SQL Procedures and Queries for Database Management
2 pages
C Programming: Data Structures & Algorithms
No ratings yet
C Programming: Data Structures & Algorithms
1 page
Graph Algorithms and Techniques Explained
No ratings yet
Graph Algorithms and Techniques Explained
1 page
Dynamic Huffman Coding Explained
No ratings yet
Dynamic Huffman Coding Explained
22 pages
Operating Systems Exam Questions and Answers
No ratings yet
Operating Systems Exam Questions and Answers
10 pages
C Program: Bubble, Insertion, Selection Sort
No ratings yet
C Program: Bubble, Insertion, Selection Sort
6 pages
Brain Cancer Detection via CNN-RF Fusion
No ratings yet
Brain Cancer Detection via CNN-RF Fusion
6 pages
Open Addressing Hash Table Operations
No ratings yet
Open Addressing Hash Table Operations
12 pages
Data Structures Course Overview
No ratings yet
Data Structures Course Overview
94 pages
Data Structures & Algorithms Course Guide
No ratings yet
Data Structures & Algorithms Course Guide
291 pages
McDonald's Global Sales User Guide
No ratings yet
McDonald's Global Sales User Guide
103 pages
Understanding Device Tree in Linux
No ratings yet
Understanding Device Tree in Linux
26 pages
Disadvantages of Presentation Software
No ratings yet
Disadvantages of Presentation Software
20 pages
Create Maven Build Pipeline in Azure
No ratings yet
Create Maven Build Pipeline in Azure
23 pages
Editable Shipping Process Slide
No ratings yet
Editable Shipping Process Slide
2 pages
YBL Account Transaction Summary
No ratings yet
YBL Account Transaction Summary
4 pages
HI 800 013 E H41qH51q Safety Manual PDF
No ratings yet
HI 800 013 E H41qH51q Safety Manual PDF
88 pages
Post-Secondary Planning Family Night
No ratings yet
Post-Secondary Planning Family Night
1 page
Dji VSM User Guide: Ugcs 2.9.929
No ratings yet
Dji VSM User Guide: Ugcs 2.9.929
15 pages
X990-C Fingerprint Time Attendance Terminal
No ratings yet
X990-C Fingerprint Time Attendance Terminal
2 pages
Linux Cut and Awk Command Guide
No ratings yet
Linux Cut and Awk Command Guide
2 pages
Group Management Functions in C
No ratings yet
Group Management Functions in C
15 pages
Collapsible PET Water Fountain Bottle Design
No ratings yet
Collapsible PET Water Fountain Bottle Design
7 pages
ETC Gadget - Programming - Guide
No ratings yet
ETC Gadget - Programming - Guide
24 pages
Object Tracking Techniques in Computer Vision
No ratings yet
Object Tracking Techniques in Computer Vision
20 pages
AMD CPUs: Issues and Recommendations
No ratings yet
AMD CPUs: Issues and Recommendations
1 page
New Registration and Re-registration List
No ratings yet
New Registration and Re-registration List
24 pages
Challenges in Remote Access to E-Resources
No ratings yet
Challenges in Remote Access to E-Resources
18 pages
AUD MAN DanteController
No ratings yet
AUD MAN DanteController
136 pages
Juniper M320 Hardware Guide
No ratings yet
Juniper M320 Hardware Guide
338 pages
Brocade Web Tools: A Better Way To Monitor and Manage SAN Fabrics
No ratings yet
Brocade Web Tools: A Better Way To Monitor and Manage SAN Fabrics
4 pages
Revit Tutorial for Beginners PDF
No ratings yet
Revit Tutorial for Beginners PDF
26 pages
Picker Packer Resume Overview
No ratings yet
Picker Packer Resume Overview
2 pages
Muhammad Tariq Masood: CV & Skills
No ratings yet
Muhammad Tariq Masood: CV & Skills
1 page
Grade 4 Math Test: Factors & Patterns
No ratings yet
Grade 4 Math Test: Factors & Patterns
2 pages
Bajaj Finserv Health Internship 2023-24
No ratings yet
Bajaj Finserv Health Internship 2023-24
1 page
HLK-LD2461 Tool User Instructions
No ratings yet
HLK-LD2461 Tool User Instructions
7 pages
Advanced Lighting Control with nManager
No ratings yet
Advanced Lighting Control with nManager
2 pages
Rekap Media Sosial OSIS 467
No ratings yet
Rekap Media Sosial OSIS 467
18 pages
Rzaku
No ratings yet
Rzaku
90 pages

Introduction to Machine Learning Basics

Uploaded by

Introduction to Machine Learning Basics

Uploaded by

Introduction to Machine Learning

Why do we need Machine Learning?

Semi-supervised learning is a type of machine learning that falls in

Benefits of Machine Learning

Machine Learning Paradigms:

Supervised and Unsupervised learning

Supervised Learning (SL)

Understanding Supervised Learning pictorially

Stages in Unsupervised Learning

Understanding Semi-supervised Learning pictorially

Self-supervised Learning (SSL)

Action-Reward feedback loop: an agent takes actions in an environment, which is interpreted

Action-Reward Feedback Loop:

Reinforcement learning operates on an action-reward feedback loop, where agents take

Main points in Reinforcement learning –

Difference between Reinforcement learning and Supervised learning:

Reinforcement learning is all about making decisions

In supervised learning the

As It Provides The Basis For Selecting Classification Or Regression Models. Incorrect

Different Types Of Data Types

Fig: Different Data

1. Quantitative Data Type: –

This Is Again Of Two Types

A.) Discrete Data Type: –

Fig: Discrete Data Types

B.) Continuous Data Type: –

E.G.: – Height, Weight, Time, Area, Distance, Measurement Of Rainfall, Etc.

Fig: Continuous Data Types

2. Qualitative Data Type: –

This Can Be Divided Into:-

Fig: Structured Data

E.G., Male Or Female (Gender), Race, Country, Etc.

Fig: Gender (Female, Male), An Example Of Nominal Data

II. Ordinal Data Type:

E.G., Likert Rating Scale, Shirt Sizes, Ranks, Grades, Etc.

III. Interval Data Type:

IV. Ratio Data Type:

E.G., Temperature In Kelvin, Height, Weight, Etc.

According to CRISP-DM, the machine learning process has six steps:

Each phase covers typical tasks:

You might also like