0% found this document useful (0 votes)
18 views30 pages

Introduction to Machine Learning Concepts

Machine learning, a subset of artificial intelligence, enables systems to learn from data without explicit programming, improving performance through experience and pattern recognition. Key applications include speech recognition, self-driving cars, recommendation systems, fraud detection, and image recognition. The document outlines the evolution of machine learning, its paradigms (supervised, unsupervised, semi-supervised, and reinforcement learning), and discusses the advantages and disadvantages of supervised learning.

Uploaded by

hema
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views30 pages

Introduction to Machine Learning Concepts

Machine learning, a subset of artificial intelligence, enables systems to learn from data without explicit programming, improving performance through experience and pattern recognition. Key applications include speech recognition, self-driving cars, recommendation systems, fraud detection, and image recognition. The document outlines the evolution of machine learning, its paradigms (supervised, unsupervised, semi-supervised, and reinforcement learning), and discusses the advantages and disadvantages of supervised learning.

Uploaded by

hema
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT-1

INTRODUCTION TO MACHINE LEARNING


Machine learning is a subset of artificial intelligence that enables systems
to learn and improve from data without being explicitly programmed. Instead
of following strict instructions, machine learning algorithms find patterns in
large datasets to make predictions or decisions. This allows them to perform
tasks like identifying spam, recommending products, or even powering self-
driving cars, with performance improving as they are exposed to more data

data model training prediction

improvement
The key features of machine learning
 Learning from Experience/Data: Machine learning systems improve their
performance on tasks by learning from large volumes of data, rather than
being explicitly programmed with rigid rules for every possible scenario.
The more relevant data a model is exposed to during training, the better
it can become at its given task.
 Pattern Recognition: ML algorithms excel at automatically discovering
hidden patterns, trends, and relationships within data that might be
difficult or impossible for humans to identify through simple analysis.
 Generalization: The primary goal of a trained model is not merely to
memorize the data it was trained on, but to generalize its learned
patterns to make accurate predictions or decisions on new, previously
unseen data.
 Adaptive Improvement: Machine learning models can continuously
evolve and refine their accuracy over time as new data becomes
available. This is often achieved through feedback loops where the
model's predictions are compared with actual outcomes to correct errors
and optimize its internal parameters.
 Automation: ML automates complex and repetitive analytical tasks,
enabling fast, data-driven insights and decision-making at a scale and
speed that is not feasible manually.
Key Applications of Machine Learning

1. Speech Recognition
Speech recognition allows devices to convert spoken words into text. Applications
like virtual assistants (e.g., Alexa and Siri) rely on natural language processing (NLP)
and neural networks to understand and respond to commands. This technology
enhances user experiences, particularly in accessibility and hands-free
environments.

2. Driving Cars
Self-driving cars are a prime example of real-time decision-making enabled by
machine learning. Autonomous vehicles use sensors and cameras combined with
reinforcement learning to navigate roads, avoid obstacles, and adapt to changing
conditions. These cars rely on machine learning models to continuously improve their
performance.

3. Recommendation Systems
Platforms like Netflix and Amazon use recommendation systems to suggest movies,
products, or shows tailored to user preferences. These systems leverage large data
sets to analyze user behavior and employ advanced recommendation engines
powered by machine learning algorithms.

4. Fraud Detection
Banks and financial institutions use machine learning applications to combat
fraudulent activities. By analyzing transaction data, machine learning can detect
fraudulent transactions in real time, ensuring customer safety. These systems use
pattern recognition to flag anomalies that deviate from typical behavior.

5. Image Recognition
Image recognition is widely used in areas like healthcare, security, and social media.
Machine learning enables systems to analyze visual data, such as medical images,
to detect diseases or identify objects and faces in photos. This technology is powered
by deep neural networks.

 Evolution of Machine Learning:Machine Learning (ML) has evolved


significantly since its inception, progressing through various stages of theoretical
development, practical applications, and technological advances. Here's an overview
of its journey:

 Early Foundations (1940s-1950s)Alan Turing proposed the concept of


machine intelligence in his 1950 paper
"Computing Machinery and Intelligence".
The Perceptron was introduced in 1958 by Frank Rosenblatt as one of the earliest
artificial neural networks capable of learning.

Symbolic AI and Expert Systems (1960s-1970s)


Focused on rule-based systems where machines used explicit rules to perform
reasoning and problem-solving.
Development of Expert Systems, such as DENDRAL and MYCIN, which used domain-
specific knowledge [Link]:Struggled with ambiguity and [Link]
extensive manual rule creation.

Statistical Methods and Early ML (1980s)


Shift from symbolic AI to data-driven approaches due to increased computational power.
 Development of foundational ML algorithms:1)Decision Trees,2)Linear Regression
3)K-Nearest Neighbors (KNN),4)Support Vector Machines (SVMs),Key milestones:
Introduction of backpropagation in neural networks. Adoption of probabilistic reasoning,
e.g., Bayesian networks.
Emergence of Big Data and Modern ML (1990s-2000s)

 Growth of the internet and digital storage enabled the collection of massive datasets.
 Breakthroughs in unsupervised and supervised learning techniques:
o Random Forests and Ensemble Methods improved prediction accuracy.
o Advancements in Clustering (e.g., k-means) and Dimensionality Reduction
(e.g., PCA).
 Key applications emerged, including:
o Speech and image recognition.
o Predictive analytics in business and finance.

Deep Learning Revolution (2010s)


Introduction of deep neural networks and architectures such as:
o Convolutional Neural Networks (CNNs) for image data.
o Recurrent Neural Networks (RNNs) and later Transformers for sequential
data.
 Enablers:
o Access to large labeled datasets (e.g., ImageNet).
o Advances in hardware, especially GPUs.
o Development of frameworks like TensorFlow and PyTorch.
 Major breakthroughs:
o AlphaGo's success in defeating human Go champions.
o Image and speech recognition achieving near-human performance.

Current Trends and Future Directions (2020s-Present)


Generative AI:
o Models like GPT, DALL·E, and Stable Diffusion revolutionize text, image,
and multimodal generation.
 Reinforcement Learning:
o Applied in autonomous systems, robotics, and complex decision-making.
 Edge AI:
o Deploying ML models on devices like smartphones and IoT hardware for real-
time applications.
 Ethics and Explainability:
o Increased focus on AI fairness, accountability, and reducing bias.
 Quantum ML:
o Exploration of quantum computing to solve problems beyond classical
capabilities.
PARADIGMS FOR MACHINE LEARNING

Machine learning paradigms are the fundamental approaches and categorizations used to
frame how algorithms learn from data [1]. These are typically classified into four major
categories, with several specialized paradigms also recognized [1, 2, 3].
The four primary paradigms are:

1. Supervised Learning
2. Unsupervised Learning

3. Semi-Supervised Learning

4. Reinforcement Learning (RL)

Supervised learning
Supervised learning is a type of machine learning where a model learns from labelled data—meaning
every input has a corresponding correct output. The model makes predictions and compares them
with the true outputs, adjusting itself to reduce errors and improve accuracy over time. The goal is to
make accurate predictions on new, unseen data. For example, a model trained on images of
handwritten digits can recognise new digits it has never seen before.

Types of Supervised Learning in Machine Learning


Now, Supervised learning can be applied to two main types of problems:
 Classification: Classification in machine learning is a supervised learning task where an
algorithm assigns new, unseen data points to predefined categories or classes. It learns this
mapping from a labeled training dataset, where each data point is already associated with a
correct category. The goal is to predict the correct class label for new observations based on
learned patterns, such as in spam filtering (spam vs. not spam) or image recognition
 Regression:

Regression in machine learning refers to a supervised learning technique where the goal is to
predict a continuous numerical value based on one or more independent features. It finds
relationships between variables so that predictions can be made. we have two types of
variables present in regression:
 Dependent Variable (Target): The variable we are trying to predict e.g house price.
 Independent Variables (Features): The input variables that influence the prediction e.g
locality, number of rooms
Supervised Learning
While training the model, data is usually split in the ratio of 80:20 i.e. 80% as training data and the
rest as testing data. In training data, we feed input as well as output for 80% of data. The model
learns from training data only.
Working of Supervised Machine Learning
The working of supervised machine learning follows these key steps:
1. Collect Labeled Data
 Gather a dataset where each input has a known correct output (label).
 Example: Images of handwritten digits with their actual numbers as labels.
2. Split the Dataset
 Divide the data into training data (about 80%) and testing data (about 20%).
 The model will learn from the training data and be evaluated on the testing data.
3. Train the Model
 Feed the training data (inputs and their labels) to a suitable supervised learning algorithm
(like Decision Trees, SVM or Linear Regression).
 The model tries to find patterns that map inputs to correct outputs.
4. Validate and Test the Model
 Evaluate the model using testing data it has never seen before.
 The model predicts outputs and these predictions are compared with the actual labels to
calculate accuracy or error.
5. Deploy and Predict on New Data
 Once the model performs well, it can be used to predict outputs for completely new, unseen
data.
Supervised Machine Learning Algorithms
Supervised learning can be further divided into several different types, each with its own unique
characteristics and applications. Here are some of the most common types of supervised learning
algorithms:
 Linear Regression: Linear regression is a type of supervised learning regression algorithm
that is used to predict a continuous output value. It is one of the simplest and most widely
used algorithms in supervised learning.
 Logistic Regression: Logistic regression is a type of supervised learning classification
algorithm that is used to predict a binary output variable.
 Decision Trees : Decision tree is a tree-like structure that is used to model decisions and their
possible consequences. Each internal node in the tree represents a decision, while each leaf
node represents a possible outcome.
 Random Forests: Random forests again are made up of multiple decision trees that work
together to make predictions. Each tree in the forest is trained on a different subset of the
input features and data. The final prediction is made by aggregating the predictions of all the
trees in the forest.
 Support Vector Machine(SVM): The SVM algorithm creates a hyperplane to segregate n-
dimensional space into classes and identify the correct category of new data points. The
extreme cases that help create the hyperplane are called support vectors, hence the name
Support Vector Machine.
 K-Nearest Neighbors: KNN works by finding k training examples closest to a given input and
then predicts the class or value based on the majority class or average value of these
neighbors. The performance of KNN can be influenced by the choice of k and the distance
metric used to measure proximity.
 Gradient Boosting: Gradient Boosting combines weak learners, like decision trees, to create a
strong model. It iteratively builds new models that correct errors made by previous ones.
 Naive Bayes Algorithm: The Naive Bayes algorithm is a supervised machine learning
algorithm based on applying Bayes' Theorem with the “naive” assumption that features are
independent of each other given the class label.
Let's summarize the supervised machine learning algorithms in table:
These types of supervised learning in machine learning vary based on the problem we're trying to
solve and the dataset we're working with. In classification problems, the task is to assign inputs to
predefined classes, while regression problems involve predicting numerical outcomes.
Practical Examples of Supervised learning
Few practical examples of supervised machine learning across various industries:
 Fraud Detection in Banking: Utilizes supervised learning algorithms on historical transaction
data, training models with labeled datasets of legitimate and fraudulent transactions to
accurately predict fraud patterns.
 Parkinson Disease Prediction: Parkinson’s disease is a progressive disorder that affects the
nervous system and the parts of the body controlled by the nerves.
 Customer Churn Prediction: Uses supervised learning techniques to analyze historical
customer data, identifying features associated with churn rates to predict customer retention
effectively.
 Cancer cell classification: Implements supervised learning for cancer cells based on their
features and identifying them if they are ‘malignant’ or ‘benign.
 Stock Price Prediction: Applies supervised learning to predict a signal that indicates whether
buying a particular stock will be helpful or not.
Advantages
Here are some advantages of supervised learning listed below:
 Simplicity & clarity: Easy to understand and implement since it learns from labeled examples.
 High accuracy: When sufficient labeled data is available, models achieve strong predictive
performance.
 Versatility: Works for both classification like spam detection, disease prediction and
regression like price forecasting.
 Generalization: With enough diverse data and proper training, models can generalize well to
unseen inputs.
 Wide application: Used in speech recognition, medical diagnosis, sentiment analysis, fraud
detection and more.
Disadvantages
 Requires labeled data: Large amounts of labeled datasets are expensive and time-consuming
to prepare.
 Bias from data: If training data is biased or unbalanced, the model may learn and amplify
those biases.
 Overfitting risk: Model may memorize training data instead of learning general patterns,
especially with small datasets.
 Limited adaptability: Performance drops significantly when applied to data distributions very
different from training data.
 Not scalable for some problems: In tasks with millions of possible labels like natural
language, supervised labeling becomes impractical.

Unsupervised learning
Unsupervised learning is a type of machine learning that analyzes and models data without labelled
responses or predefined categories. Unlike supervised learning, where the algorithm learns from
input-output pairs, unsupervised learning algorithms work solely with input data and aim to discover
hidden patterns, structures or relationships within the dataset independently, without any human
intervention or prior knowledge of the data's meaning.

Working of Unsupervised Learning


The working of unsupervised machine learning can be explained in these steps:
1. Collect Unlabeled Data
 Gather a dataset without predefined labels or categories.
 Example: Images of various animals without any tags.
2. Select an Algorithm
 Choose a suitable unsupervised algorithm such as clustering like K-Means, association rule
learning like Apriori or dimensionality reduction like PCA based on the goal.
3. Train the Model on Raw Data
 Feed the entire unlabeled dataset to the algorithm.
 The algorithm looks for similarities, relationships or hidden structures within the data.
4. Group or Transform Data
 The algorithm organizes data into groups (clusters), rules or lower-dimensional forms without
human input.
 Example: It may group similar animals together or extract key patterns from large datasets.
5. Interpret and Use Results
 Analyze the discovered groups, rules or features to gain insights or use them for further tasks
like visualization, anomaly detection or as input for other models.
Unsupervised Learning Algorithms
There are mainly 3 types of Unsupervised Algorithms that are used:
1. Clustering Algorithms
Clustering is an unsupervised machine learning technique that groups unlabeled data into clusters
based on similarity. Its goal is to discover patterns or relationships within the data without any prior
knowledge of categories or labels.
 Groups data points that share similar features or characteristics.
 Helps find natural groupings in raw, unclassified data.
 Commonly used for customer segmentation, anomaly detection and data organization.
 Works purely from the input data without any output labels.
 Enables understanding of data structure for further analysis or decision-making.
Some common clustering algorithms:
 K-means Clustering: Groups data into K clusters based on how close the points are to each
other.
 Hierarchical Clustering: Creates clusters by building a tree step-by-step, either merging or
splitting groups.
 Density-Based Clustering (DBSCAN): Finds clusters in dense areas and treats scattered points
as noise.
 Mean-Shift Clustering: Discovers clusters by moving points toward the most crowded areas.
 Spectral Clustering: Groups data by analyzing connections between points using graphs.

Applications of Unsupervised learning


Unsupervised learning has diverse applications across industries and domains. Key applications
include:
 Customer Segmentation: Algorithms cluster customers based on purchasing behavior or
demographics, enabling targeted marketing strategies.
 Anomaly Detection: Identifies unusual patterns in data, aiding fraud detection, cybersecurity
and equipment failure prevention.
 Recommendation Systems: Suggests products, movies or music by analyzing user behavior
and preferences.
 Image and Text Clustering: Groups similar images or documents for tasks like organization,
classification or content recommendation.
 Social Network Analysis: Detects communities or trends in user interactions on social media
platforms.
Advantages
 No need for labeled data: Works with raw, unlabeled data hence saving time and effort on
data annotation.
 Discovers hidden patterns: Finds natural groupings and structures that might be missed by
humans.
 Handles complex and large datasets: Effective for high-dimensional or vast amounts of data.
 Useful for anomaly detection: Can identify outliers and unusual data points without prior
examples.
Challenges
Here are the key challenges of unsupervised learning:
 Noisy Data: Outliers and noise can distort patterns and reduce the effectiveness of
algorithms.
 Assumption Dependence: Algorithms often rely on assumptions (e.g., cluster shapes) which
may not match the actual data structure.
 Overfitting Risk: Overfitting can occur when models capture noise instead of meaningful
patterns in the data.
 Limited Guidance: The absence of labels restricts the ability to guide the algorithm toward
specific outcomes.
 Cluster Interpretability: Results such as clusters may lack clear meaning or alignment with
real-world categories.
 Sensitivity to Parameters: Many algorithms require careful tuning of hyperparameters such
as the number of clusters in k-means.
 Lack of Ground Truth: Unsupervised learning lacks labeled data making it difficult to evaluate
the accuracy of results.

Semi-Supervised Learning in ML
Semi-supervised learning is a hybrid machine learning approach which uses both supervised and
unsupervised learning. It uses a small amount of labelled data combined with a large amount of
unlabelled data to train models. The goal is to learn a function that accurately predicts outputs based
on inputs, similar to supervised learning, but with much less labelled data
Working of Semi-Supervised Learning,
Several techniques fall under semi-supervised learning including:
 Self-Training: The model is first trained on labeled data. It then predicts labels for unlabeled
data, adding high-confidence predictions to the labeled set iteratively to refine the model.
 Co-Training: Two models are trained on different feature subsets of the data. Each model
labels unlabeled data for the other, enabling them to learn from complementary views.
 Multi-View Training: A variation of co-training where models train on different data
representations (e.g., images and text) to predict the same output.
 Graph-Based Models: Data is represented as a graph with nodes (data points) and edges
(similarities). Labels are propagated from labeled nodes to unlabeled ones based on graph
connectivity.

As we can see in the result that the model was able to classify images into the categories or labels
after successful operations of semi-supervised learning.
When to Use Semi-Supervised Learning
 When labeled data is scarce or costly, such as medical imaging requiring expert annotation.
 When large volumes of unlabeled data exist, like social media or web content.
 For unstructured data types (text, images, audio) where labeling is difficult.
 When classes are rare and labeled examples few, improving class recognition.
 When purely supervised or unsupervised methods are insufficient.
Applications
Let's see the applications,
 Face Recognition: Enhancing accuracy by learning from limited labeled face images plus
many unlabeled ones using graph-based methods.
 Handwritten Text Recognition: Adapting models to diverse handwriting styles through
generative models.
 Speech Recognition: Improving transcription quality by using unlabeled speech data with
CNNs and other techniques.
 Security: Google uses semi-supervised learning for anomaly detection in network traffic and
malware detection.
 Finance: PayPal applies it for fraud detection and creditworthiness assessment using
transaction data.
Advantages
 Better Generalization: Utilizes both labeled and unlabeled data to capture the whole data
structure, improving prediction robustness.
 Cost Efficient: Reduces dependency on costly manual labeling by exploiting unlabeled data.
 Flexible and Robust: Handles different data types and sources, adapting well to changing
data distributions.
 Improved Clustering: Refines clusters by leveraging unlabeled data, yielding better class
separation.
 Handling Rare Classes: Enhances learning for underrepresented classes where labeled
examples are minimal.
Limitations
 Model Complexity: Requires careful choice of architecture and hyperparameters, which may
require extensive tuning.
 Noisy Data: Unlabeled data may contain errors or irrelevant information, risking degraded
model performance.
 Assumption Sensitivity: Relies on assumptions such as data consistency and clusterability,
which may not hold in all cases.
 Evaluation Challenge: Assessing performance is difficult due to limited labeled data and
varied quality of unlabeled data.

Reinforcement Learning
Reinforcement Learning (RL) is a branch of machine learning that focuses on how agents can learn to
make decisions through trial and error to maximize cumulative rewards. RL allows machines to learn
by interacting with an environment and receiving feedback based on their actions. This feedback
comes in the form of rewards or penalties.
Reinforcement Learning revolves around the idea that an agent (the learner or decision-maker)
interacts with an environment to achieve a goal. The agent performs actions and receives feedback to
optimize its decision-making over time.
 Agent: The decision-maker that performs actions.
 Environment: The world or system in which the agent operates.
 State: The situation or condition the agent is currently in.
 Action: The possible moves or decisions the agent can make.
 Reward: The feedback or result from the environment based on the agent’s action

Working of Reinforcement Learning


The agent interacts iteratively with its environment in a feedback loop:
 The agent observes the current state of the environment.
 It chooses and performs an action based on its policy.
 The environment responds by transitioning to a new state and providing a reward (or
penalty).
 The agent updates its knowledge (policy, value function) based on the reward received and
the new state.
 This cycle repeats with the agent balancing exploration (trying new actions)
and exploitation (using known good actions) to maximize the cumulative reward over time.
This process is mathematically framed as a Markov Decision Process (MDP) where future states
depend only on the current state and action, not on the prior sequence of events.
Types of Reinforcements
1. Positive Reinforcement: Positive Reinforcement is defined as when an event, occurs due to a
particular behavior, increases the strength and the frequency of the behavior. In other words, it has a
positive effect on behavior.
 Advantages: Maximizes performance, helps sustain change over time.
 Disadvantages: Overuse can lead to excess states that may reduce effectiveness.
2. Negative Reinforcement: Negative Reinforcement is defined as strengthening of behavior because
a negative condition is stopped or avoided.
 Advantages: Increases behavior frequency, ensures a minimum performance standard.
 Disadvantages: It may only encourage just enough action to avoid penalties.

Application
 Robotics: RL is used to automate tasks in structured environments such as manufacturing,
where robots learn to optimize movements and improve efficiency.
 Games: Advanced RL algorithms have been used to develop strategies for complex games like
chess, Go and video games, outperforming human players in many instances.
 Industrial Control: RL helps in real-time adjustments and optimization of industrial
operations, such as refining processes in the oil and gas industry.
 Personalized Training Systems: RL enables the customization of instructional content based
on an individual's learning patterns, improving engagement and effectiveness.
Advantages
 Solves complex sequential decision problems where other approaches fail.
 Learns from real-time interaction, enabling adaptation to changing environments.
 Does not require labeled data, unlike supervised learning.
 Can innovate by discovering new strategies beyond human intuition.
 Handles uncertainty and stochastic environments effectively.
Disadvantages
 Computationally intensive, requiring large amounts of data and processing power.
 Reward function design is critical; poor design leads to unintended behaviors.
 Not suitable for simple problems where traditional methods are more efficient.
 Challenging to debug and interpret, making it hard to explain decisions.
 Exploration-exploitation trade-off requires careful balancing to optimize learning.

1. Self-Supervised Learning

 Definition: A form of supervised learning where the data generates its own labels
based on its inherent structure.
 Key Characteristics:
o Often used to pre-train models on large datasets.
o Particularly popular in deep learning for representation learning.
 Applications:
o Natural Language Processing (e.g., GPT, BERT)
o Computer Vision (e.g., contrastive learning, SimCLR)
 Common Techniques:
o Contrastive Learning
o Masked Language Models

2. Online Learning

 Definition: Learning continuously from a stream of data, updating the model


incrementally.
 Key Characteristics:
o Useful in dynamic environments where data arrives in real-time.
o Can adapt to changes in data distribution.
 Applications:
o Stock market prediction
o Recommendation systems
 Common Algorithms:
o Stochastic Gradient Descent (SGD)
o Online variants of supervised methods

3. Evolutionary Algorithms and Genetic Programming

 Definition: Inspired by biological evolution, these methods use populations of solutions to


evolve better models over time.

 Key Characteristics:
o Stochastic and adaptive.
o Does not require gradient information.
 Applications:
o Hyperparameter optimization
o Designing neural network architectures
 Common Techniques:
o Genetic Algorithms
o Evolution Strategies

4. Hybrid and Meta-Learning Paradigms

 Definition: Combines multiple paradigms or focuses on learning how to learn (meta- learning).

 Key Characteristics:
o Increases flexibility and generalization.
o Optimizes model design and performance.
 Applications:
o Few-shot learning
o Multi-task learning
 Common Approaches:
o Neural Architecture Search (NAS)
o Model-Agnostic Meta-Learning (MAML)

Each of these paradigms has distinct use cases and is often chosen based on the type of problem,
the available data, and the desired outcomes
LEARNING BY ROTE
Learning by rote in machine learning is a method of memorizing facts and knowledge
through repetition, where new information is simply stored without deep understanding. This
technique allows for quick recall of basic information but can lead to poor generalization on
new, unseen data and overfitting, hindering the ability to apply knowledge in new contexts.
WORKING OF LEARNING BY ROTE
Storing information:
The system stores specific pieces of information as they are encountered. This can be as simple
as storing a computed value or as complex as saving a feature set from an image.
 Repetition and repetition:
The process of rote learning in machine learning is similar to traditional learning methods,
relying on repetition to memorize information without deep conceptual understanding.
Strengths
 Quick information recall: Rote learning allows for rapid retrieval of stored facts and
data.
 Foundation building: It can establish a basic foundation of knowledge for more
advanced learning.
 Efficiency: It can be an efficient method for early-stage learning or when quick, accurate
retrieval is critical, such as with standardized tests.
Weaknesses
 Poor generalization:
The primary weakness is that models trained this way perform poorly on data that was not in
the original training set.
 Overfitting:
Rote learning can cause models to "overfit" to the training data, meaning they have
memorized it too well and cannot react effectively to new data.
 Lack of deep understanding:
It does not build a deep, conceptual understanding, making it difficult to relate information to
new or real-world scenarios.
Use cases
 Simple AI applications that require fast lookup of specific information.
 As a part of more complex systems, where stored values can be used more efficiently or
to handle specific, repetitive tasks
LREARNING BY INDUCTION
Learning by induction is a fundamental concept in machine learning where a system
learns general rules or patterns from specific examples. Instead of being explicitly
programmed, the machine observes input–output pairs and builds a model that can make
accurate predictions on new, unseen data. Most supervised learning techniques such as
decision trees, neural networks, SVMs, and regression follow the inductive approach
Induction is a reasoning process that moves from specific observations to general
conclusions. In machine learning, this means analyzing a dataset of examples and discovering
hidden patterns. The model formed through induction is then used to generalize to new cases.

2. Inductive Learning Process


Inductive learning follows a structured workflow:
1. Training Data Collection:
Gather labeled examples (input features + correct output).
2. Hypothesis/Model Selection:
Choose an algorithm such as decision tree, neural network, or regression.
3. Learning / Pattern Extraction:
The algorithm analyzes the data to find relationships and patterns.
4. Model Formation:
The system constructs a general rule (hypothesis) that fits the examples.
5. Generalization:
The learned model is applied to predict outputs for unseen data.
6. Evaluation:
Accuracy, error rate, and other metrics are used to measure performance.

3. Diagram: Inductive Learning Workflow


Training Examples (Input + Output)

Learning Algorithm (Pattern Extraction)

Induced Hypothesis / Model

Prediction on New Unseen Data
Characteristics of Inductive Learning
 Data-driven approach
 Depends on the quality and quantity of examples
 Involves pattern discovery
 Supports prediction and classification
 Often involves uncertainty and approximations

6. Advantages
1. Generalization ability: Works well on unseen data.
2. Flexible: Can handle large and complex datasets.
3. Automated learning: Reduces manual programming.
4. Applicable to many tasks: Classification, prediction, detection.

7. Disadvantages
1. Requires high-quality data: Noisy data leads to wrong patterns.
2. May overfit: Learns noise instead of true patterns.
3. Cannot guarantee correctness: Generalization may fail sometimes.
4. Computationally expensive: Training may require more time and resources.

8. Applications
 Email spam detection
 Image and speech recognition
 Medical diagnosis
 Weather forecasting
 Recommender systems (YouTube, Amazon)
MATCHING

Matching is the process of comparing data vectors using distance or similarity measures to find
the best corresponding pattern in machine learning.
Matching = finding the most similar thing
👉 Just like:
Finding the student whose marks are closest to yours
Finding the nearest house to your home
Finding the most similar image or document
Why do we need Matching in ML?
Because machines do not understand similarity like humans.
So we teach the machine:
How to compare
How to decide which is close
How to choose the best match
How does a machine “see” data?
A machine sees numbers only.
So every object is converted into a vector (list of numbers).
Example: Student marks
Student A → (80, 70, 90)
Student B → (78, 72, 88)
Student C → (40, 50, 45)
Each student is a vector.
4️⃣ How does the machine compare two vectors?
It uses a proximity measure.
Proximity measure = a rule to compare
Matching is done using a proximity measure, which can be:
Distanc e measure → smaller distance = better match
Similarity measure → larger similarity = better match
5️⃣ Distance measure (Euclidean Distance)
Real-life example:
Distance between two houses:
Small distance → houses are close
Large distance → houses are far
In ML:
Distance between two vectors:
Small distance → very similar
Large distance → very different
Euclidean distance works like a ruler.
l
d ( u , v )=√ ∑ ( u ( i )−v ( i ) )
2

i=1

Simple numeric example


Compare:
A = (2, 3)
B = (3, 4)
C = (10, 10)
Distance(A, B) → small
Distance(A, C) → very large
👉 So A matches B, not C.
6️⃣ Similarity measure (Cosine Similarity)
Instead of distance, sometimes we check direction.
Real-life example:
Two people walking:
Same direction → very similar
Opposite direction → not similar
In ML:
Cosine similarity checks angle between vectors
Value near 1 → very similar
Value near 0 → not similar
Used a lot in:
Text documents
Recommendation systems
Nearest Neighbor (Very Important)
Meaning:
The object which is closest to the input is called the Nearest Neighbor.
Example:
You ask:
“Who is most similar to me?”
Machine answers:
“This one, because distance is smallest.”
This idea is used in k-Nearest Neighbor (k-NN) algorithm.
Applications of Matching
Nearest Neighbor classification
Clustering
Pattern recognition
Information retrieval
Recommendation systems
Stages in machine learning
Building Machine Learning system involves a number of steps.
Data Acquisition
Data acquisition in machine learning (ML) is the crucial first step of collecting raw data from
diverse sources (databases, sensors, web) and converting it into a digital format for training
and testing ML models, ensuring it's relevant, accurate, and sufficient in volume to capture
patterns, bridging the physical world with digital analysis for reliable predictions. This process
involves identifying sources, using methods like web scraping or APIs, and often includes
digitizing physical signals via Data Acquisition Systems (DAQ) for a comprehensive, high-quality
dataset
Feature engineering in machine learning is the crucial process of transforming raw data into
meaningful input features that help algorithms learn patterns better, leading to more accurate
models. It involves using domain knowledge to select, create, transform, or combine existing
data into new variables (like calculating BMI from height/weight) to enhance model
performance, simplify complexity, and bridge the gap between messy real-world data and
algorithm requirements, making it a cornerstone of successful ML projects.
Data Preprocessing
Data preprocessing is a crucial, multi-stage process for ensuring raw data is clean,
standardized, and correctly formatted for machine learning models
Occurrence of Missing Data: Missing data is a common issue in various domains, often
resulting from the inability to measure a feature value or from errors during data entry.
Algorithm Compatibility: Some machine learning (ML) algorithms are capable of working
effectively even with a reasonable number of missing data values, eliminating the need for
specific preprocessing in those cases.
Algorithm Limitations: A large number of other ML models, however, are unable to handle
missing values, which necessitates the use of specific techniques to address this problem.
Need for Techniques: Due to these limitations, it is essential to examine and implement
various techniques for dealing with missing data.
Prediction Schemes: Different schemes and methods are employed specifically for the
prediction and imputation of these missing values

BOOK SUM DO IT.


DATA FROM DIFFERENT DOMAINS
Different features (attributes) in data may be measured in different units and
[Link]: height in meters, weight in grams.
Features with large numerical values (like weight) will dominate distance
calculations,while features with small values (like height) will have very little effect.
When feature ranges are unequal, only a few features may decide the distance,even
though other features are also important.(BOOK SUM DO IT).
OUTLIERS IN THE DATA
An outlier is a data item that is very different from most of the [Link] does not behave
like normal data points.
Example idea:
If most values are around 50, and one value is 500, that value is an outlier.
🔹 Step 2: Why do outliers occur?
Outliers usually occur because of:
Noise
– Random errors during data collection
– Sensor errors, typing mistakes
Erroneous data
– Wrong measurements
– Incorrect data entry
📌 So, outliers may not represent true behavior.
🔹 Step 3: Outliers are common in real-world data
Outliers appear in many applications:
Machine Learning
Data Mining
Medical data
Network intrusion detection
Financial transactions
👉 When we call the data item is outliers
🔹 Step 4: Condition 1 – Far away from average values
“Assumes values that are far away from those of the average data items.”
Explanation:
Most data values lie near the mean (average).
An outlier lies very far from the mean.
Example:
Marks of students:
Average ≈ 48
200 is far away → Outlier
🔹 Step 5: Condition 2 – Deviates from normal behavior
“Deviates from the normally behaving data item.”
Explanation:
Normal data follows a regular pattern
Outliers break that pattern
Example:
Daily temperature (°C):
85°C does not follow normal pattern
Hence, outlier
🔹 Step 6: Condition 3 – Not similar or not connected to others
“Not connected/similar to any other object in terms of its characteristics.”
Explanation:
Normal data points form groups (clusters)
Outliers stay alone, far from clusters
Example (2D data):
Height Weight
160 55
162 57
159 54
190 20
Last point does not match others,It is isolated → Outlier
Noisy measurements
What it means
👉 Noise is unwanted or random error in data caused by faulty measurement.
Why it happens
Sensor malfunction
Instrument calibration problems
Environmental interference
How it creates outliers
The instrument records a value outside the expected range
That value looks very different from normal data
Example
Normal temperature readings:
30, 31, 32, 30
Faulty sensor reading:
120
📌 120 is an outlier caused by noisy measurement, not real behavior.
2️⃣ Erroneous data entry
What it means
👉 Outliers caused by human mistakes during data entry.
Common mistakes
Extra zero
Missing decimal point
Wrong unit
Example
Employee salary data:
Correct values:
180000, 200000, 220000
Mistaken entry:
2,000,000
📌 The extra zero makes the value extremely large → Outlier
3️⃣ Evolving systems
What it means
👉 Systems change over time, and new patterns appear gradually.
Why outliers appear
Early-stage data is sparse
New items may not yet belong to any group
Example: Social network
Early users have many connections
A new user has very few or no connections
📌 The new user appears isolated, so it looks like an outlier
➡️But it is not wrong, just new
4️⃣ Very naturally occurring outliers
What it means
👉 Some outliers are genuine and meaningful, not errors.
Why they exist
Real-world data is not always uniform
Extreme cases naturally occur
Examples
A highly paid sportsperson
A luxury car costing ₹5 crore
📌 These values are:
Rare
Far from average
But important and informative
DATA REPRESENTATION
Data representation is how raw data is converted into a form that a machine learning
(ML) model can understand.
In ML and Deep Learning (DL), data is usually represented as vectors.
Each data item (image, document, record, etc.) → numerical vector.
Why DATA REPPRESENTATION is important
Good representation → better learning and prediction.
Poor representation → model performs badly, even if the algorithm is strong.
2. Representation in ML vs DL
In Traditional ML
Representation is explicit.
Humans design features manually (feature engineering).
Feature selection and feature extraction are critical steps.
In Deep Learning
Representation is often learned automatically.
Neural networks learn features layer by layer
Representation of Data Items as Vectors
Each data item is represented as a vector
X=(x1,x2,……..xL)
Collections of vectors are often stored as matrices.
Current day applications deal with high dimensional data
High-Dimensional Data
What is High Dimensionality?
High dimensionality means that a dataset has a very large number of features
(dimensions).
Each feature = one dimension
Example
Text data → each word = a feature → thousands or millions of dimensions
Image data → each pixel = a feature
👉 When L is very large, the data is called high-dimensional data
Dimensionality (L) = number of features.
In real-world problems, L can be very large:
Text data → vocabulary size (e.g., billions of n-grams)
Bioinformatics, image retrieval, satellite imagery
Problems with high dimensionality
Computation time increases
Storage space increases
Model performance degrades
Accuracy improves only up to a point.
After that, accuracy decreases.
Reason: Overfitting
Model memorizes training data.
Performs poorly on validation/test data.
Solution
Reduce dimensionality while keeping important information.
6. Dimensionality Reduction Techniques
Two main approaches:
7. Feature Selection
Feature selection is the process of choosing a subset of relevant features from the
original feature set without creating new features. Keep important features, it Remove
irrelevant ,The selected features are exactly the same as original features.
👉 No transformation, no combination — only selection.
Why Feature Selection is Needed
Reduces dimensionality
Reduces overfitting
Improves model accuracy
Decreases training time
Mathematical view
F={F1,F2,….FL}
What is Feature Extraction?
Simple meaning
Feature extraction is the process of creating new, meaningful features from the original
features to represent the data better and in fewer dimensions.
👉 We do not select features.
👉 We create new features.
Feature extraction:
Reduces dimensionality
Improves learning
Mathematical view
Original feature set is F={F1,F2,….FL}
Feature extraction creates:
H={h1 , h2 ,…..,hl}, l<L
MODAL SELECTION
A model is a mathematical method that learns patterns from data.
Examples of models:
Decision Tree
Naive Bayes
SVM
Neural Network
Why do we need model selection?
Because no single model works well for all problems.
What decides which model to choose?
1. Type of data
Numerical data
Example: age, height, salary
Categorical data
Example: gender, color, yes/no
2. Example from the book
SVM / Perceptron
Use dot product
Dot product works only with numbers
❌ Not suitable for categorical data
Decision Trees / Bayesian models
✅ Work well with categorical data
Simple example
If your data is:
Age = 25
Gender = Male
Education = Graduate
✔ Decision Tree is better
❌ SVM is not suitable
Model selection means choosing the most suitable learning algorithm based on data
and application
STEP 2: MODEL LEARNING (Teaching the model)
What is model learning?
This is the step where the model learns from data.
Just like:
A student learns from textbooks and examples.
Types of data used
1. Training data
Used to teach the model
Model finds patterns
2. Validation data
Used to check how well the model learned
Used for tuning
Transparent vs Black-box models
Transparent models
Decision Trees
You can clearly see rules
Black-box models
Neural Networks
Internal calculations are hidden
You get output but:
You don’t know how the decision was made ❌
Model learning is the process of training an ML model using data.
STEP 3: MODEL EVALUATION (Is the model good?)
Why evaluation is needed?
Because a model may:
Perform very well on training data
Perform badly on new data
Overfitting (Very important concept)
What is overfitting?
The model memorizes training data instead of learning general rules.
How do we detect overfitting?
Use validation data
If:
Training accuracy = high
Validation accuracy = low
➡ Overfitting
Model evaluation checks how well the model generalizes to unseen data.
MODEL PREDICTION (Using the model)
What is prediction?
Using the trained and validated model to make decisions on new data.
Example
ML model for disease detection:
Trained using old patient data
Now:
New patient comes
Model predicts disease
Like a doctor 👨‍⚕️
Types of prediction
Classification → Yes/No
Regression → Numerical value
One-line meaning
Model prediction applies the learned model to new unseen data.
STEP 5: MODEL EXPLANATION (Why did the model decide this?)
Why explanation is important?
Doctors
Managers
Engineers
All want to know:
“Why did the model give this output?”
Problem with Deep Learning
Neural networks are black boxes
Very accurate
Very hard to explain
Example
Model says:
“Loan rejected”
User asks:
“Why?”
❌ Neural network cannot explain clearly
Solution: Explainable AI (XAI)
Techniques to explain ML decisions
Makes models trustworthy
One-line meaning
Model explanation helps humans understand and trust ML decisions.
COMPLETE FLOW (Very important)
Select model → right tool
Learn model → train using data
Evaluate model → check performance
Predict → use on new data
Explain → justify decisions
.

CHAPTER 2 NEAREST NEIGHBOR BASED MODELS


K-Nearest Neighbors (KNN) is a supervised machine learning algorithm generally used
for classification but can also be used for regression tasks. It works by finding the "k"
closest data points (neighbors) to a given input and makes a predictions based on the
majority class (for classification) or the average value (for regression).
INTRODUCTION TO PROXIMITY MEASURES

You might also like