Introduction to Machine Learning Concepts
Introduction to Machine Learning Concepts
improvement
The key features of machine learning
Learning from Experience/Data: Machine learning systems improve their
performance on tasks by learning from large volumes of data, rather than
being explicitly programmed with rigid rules for every possible scenario.
The more relevant data a model is exposed to during training, the better
it can become at its given task.
Pattern Recognition: ML algorithms excel at automatically discovering
hidden patterns, trends, and relationships within data that might be
difficult or impossible for humans to identify through simple analysis.
Generalization: The primary goal of a trained model is not merely to
memorize the data it was trained on, but to generalize its learned
patterns to make accurate predictions or decisions on new, previously
unseen data.
Adaptive Improvement: Machine learning models can continuously
evolve and refine their accuracy over time as new data becomes
available. This is often achieved through feedback loops where the
model's predictions are compared with actual outcomes to correct errors
and optimize its internal parameters.
Automation: ML automates complex and repetitive analytical tasks,
enabling fast, data-driven insights and decision-making at a scale and
speed that is not feasible manually.
Key Applications of Machine Learning
1. Speech Recognition
Speech recognition allows devices to convert spoken words into text. Applications
like virtual assistants (e.g., Alexa and Siri) rely on natural language processing (NLP)
and neural networks to understand and respond to commands. This technology
enhances user experiences, particularly in accessibility and hands-free
environments.
2. Driving Cars
Self-driving cars are a prime example of real-time decision-making enabled by
machine learning. Autonomous vehicles use sensors and cameras combined with
reinforcement learning to navigate roads, avoid obstacles, and adapt to changing
conditions. These cars rely on machine learning models to continuously improve their
performance.
3. Recommendation Systems
Platforms like Netflix and Amazon use recommendation systems to suggest movies,
products, or shows tailored to user preferences. These systems leverage large data
sets to analyze user behavior and employ advanced recommendation engines
powered by machine learning algorithms.
4. Fraud Detection
Banks and financial institutions use machine learning applications to combat
fraudulent activities. By analyzing transaction data, machine learning can detect
fraudulent transactions in real time, ensuring customer safety. These systems use
pattern recognition to flag anomalies that deviate from typical behavior.
5. Image Recognition
Image recognition is widely used in areas like healthcare, security, and social media.
Machine learning enables systems to analyze visual data, such as medical images,
to detect diseases or identify objects and faces in photos. This technology is powered
by deep neural networks.
Growth of the internet and digital storage enabled the collection of massive datasets.
Breakthroughs in unsupervised and supervised learning techniques:
o Random Forests and Ensemble Methods improved prediction accuracy.
o Advancements in Clustering (e.g., k-means) and Dimensionality Reduction
(e.g., PCA).
Key applications emerged, including:
o Speech and image recognition.
o Predictive analytics in business and finance.
Machine learning paradigms are the fundamental approaches and categorizations used to
frame how algorithms learn from data [1]. These are typically classified into four major
categories, with several specialized paradigms also recognized [1, 2, 3].
The four primary paradigms are:
1. Supervised Learning
2. Unsupervised Learning
3. Semi-Supervised Learning
Supervised learning
Supervised learning is a type of machine learning where a model learns from labelled data—meaning
every input has a corresponding correct output. The model makes predictions and compares them
with the true outputs, adjusting itself to reduce errors and improve accuracy over time. The goal is to
make accurate predictions on new, unseen data. For example, a model trained on images of
handwritten digits can recognise new digits it has never seen before.
Regression in machine learning refers to a supervised learning technique where the goal is to
predict a continuous numerical value based on one or more independent features. It finds
relationships between variables so that predictions can be made. we have two types of
variables present in regression:
Dependent Variable (Target): The variable we are trying to predict e.g house price.
Independent Variables (Features): The input variables that influence the prediction e.g
locality, number of rooms
Supervised Learning
While training the model, data is usually split in the ratio of 80:20 i.e. 80% as training data and the
rest as testing data. In training data, we feed input as well as output for 80% of data. The model
learns from training data only.
Working of Supervised Machine Learning
The working of supervised machine learning follows these key steps:
1. Collect Labeled Data
Gather a dataset where each input has a known correct output (label).
Example: Images of handwritten digits with their actual numbers as labels.
2. Split the Dataset
Divide the data into training data (about 80%) and testing data (about 20%).
The model will learn from the training data and be evaluated on the testing data.
3. Train the Model
Feed the training data (inputs and their labels) to a suitable supervised learning algorithm
(like Decision Trees, SVM or Linear Regression).
The model tries to find patterns that map inputs to correct outputs.
4. Validate and Test the Model
Evaluate the model using testing data it has never seen before.
The model predicts outputs and these predictions are compared with the actual labels to
calculate accuracy or error.
5. Deploy and Predict on New Data
Once the model performs well, it can be used to predict outputs for completely new, unseen
data.
Supervised Machine Learning Algorithms
Supervised learning can be further divided into several different types, each with its own unique
characteristics and applications. Here are some of the most common types of supervised learning
algorithms:
Linear Regression: Linear regression is a type of supervised learning regression algorithm
that is used to predict a continuous output value. It is one of the simplest and most widely
used algorithms in supervised learning.
Logistic Regression: Logistic regression is a type of supervised learning classification
algorithm that is used to predict a binary output variable.
Decision Trees : Decision tree is a tree-like structure that is used to model decisions and their
possible consequences. Each internal node in the tree represents a decision, while each leaf
node represents a possible outcome.
Random Forests: Random forests again are made up of multiple decision trees that work
together to make predictions. Each tree in the forest is trained on a different subset of the
input features and data. The final prediction is made by aggregating the predictions of all the
trees in the forest.
Support Vector Machine(SVM): The SVM algorithm creates a hyperplane to segregate n-
dimensional space into classes and identify the correct category of new data points. The
extreme cases that help create the hyperplane are called support vectors, hence the name
Support Vector Machine.
K-Nearest Neighbors: KNN works by finding k training examples closest to a given input and
then predicts the class or value based on the majority class or average value of these
neighbors. The performance of KNN can be influenced by the choice of k and the distance
metric used to measure proximity.
Gradient Boosting: Gradient Boosting combines weak learners, like decision trees, to create a
strong model. It iteratively builds new models that correct errors made by previous ones.
Naive Bayes Algorithm: The Naive Bayes algorithm is a supervised machine learning
algorithm based on applying Bayes' Theorem with the “naive” assumption that features are
independent of each other given the class label.
Let's summarize the supervised machine learning algorithms in table:
These types of supervised learning in machine learning vary based on the problem we're trying to
solve and the dataset we're working with. In classification problems, the task is to assign inputs to
predefined classes, while regression problems involve predicting numerical outcomes.
Practical Examples of Supervised learning
Few practical examples of supervised machine learning across various industries:
Fraud Detection in Banking: Utilizes supervised learning algorithms on historical transaction
data, training models with labeled datasets of legitimate and fraudulent transactions to
accurately predict fraud patterns.
Parkinson Disease Prediction: Parkinson’s disease is a progressive disorder that affects the
nervous system and the parts of the body controlled by the nerves.
Customer Churn Prediction: Uses supervised learning techniques to analyze historical
customer data, identifying features associated with churn rates to predict customer retention
effectively.
Cancer cell classification: Implements supervised learning for cancer cells based on their
features and identifying them if they are ‘malignant’ or ‘benign.
Stock Price Prediction: Applies supervised learning to predict a signal that indicates whether
buying a particular stock will be helpful or not.
Advantages
Here are some advantages of supervised learning listed below:
Simplicity & clarity: Easy to understand and implement since it learns from labeled examples.
High accuracy: When sufficient labeled data is available, models achieve strong predictive
performance.
Versatility: Works for both classification like spam detection, disease prediction and
regression like price forecasting.
Generalization: With enough diverse data and proper training, models can generalize well to
unseen inputs.
Wide application: Used in speech recognition, medical diagnosis, sentiment analysis, fraud
detection and more.
Disadvantages
Requires labeled data: Large amounts of labeled datasets are expensive and time-consuming
to prepare.
Bias from data: If training data is biased or unbalanced, the model may learn and amplify
those biases.
Overfitting risk: Model may memorize training data instead of learning general patterns,
especially with small datasets.
Limited adaptability: Performance drops significantly when applied to data distributions very
different from training data.
Not scalable for some problems: In tasks with millions of possible labels like natural
language, supervised labeling becomes impractical.
Unsupervised learning
Unsupervised learning is a type of machine learning that analyzes and models data without labelled
responses or predefined categories. Unlike supervised learning, where the algorithm learns from
input-output pairs, unsupervised learning algorithms work solely with input data and aim to discover
hidden patterns, structures or relationships within the dataset independently, without any human
intervention or prior knowledge of the data's meaning.
Semi-Supervised Learning in ML
Semi-supervised learning is a hybrid machine learning approach which uses both supervised and
unsupervised learning. It uses a small amount of labelled data combined with a large amount of
unlabelled data to train models. The goal is to learn a function that accurately predicts outputs based
on inputs, similar to supervised learning, but with much less labelled data
Working of Semi-Supervised Learning,
Several techniques fall under semi-supervised learning including:
Self-Training: The model is first trained on labeled data. It then predicts labels for unlabeled
data, adding high-confidence predictions to the labeled set iteratively to refine the model.
Co-Training: Two models are trained on different feature subsets of the data. Each model
labels unlabeled data for the other, enabling them to learn from complementary views.
Multi-View Training: A variation of co-training where models train on different data
representations (e.g., images and text) to predict the same output.
Graph-Based Models: Data is represented as a graph with nodes (data points) and edges
(similarities). Labels are propagated from labeled nodes to unlabeled ones based on graph
connectivity.
As we can see in the result that the model was able to classify images into the categories or labels
after successful operations of semi-supervised learning.
When to Use Semi-Supervised Learning
When labeled data is scarce or costly, such as medical imaging requiring expert annotation.
When large volumes of unlabeled data exist, like social media or web content.
For unstructured data types (text, images, audio) where labeling is difficult.
When classes are rare and labeled examples few, improving class recognition.
When purely supervised or unsupervised methods are insufficient.
Applications
Let's see the applications,
Face Recognition: Enhancing accuracy by learning from limited labeled face images plus
many unlabeled ones using graph-based methods.
Handwritten Text Recognition: Adapting models to diverse handwriting styles through
generative models.
Speech Recognition: Improving transcription quality by using unlabeled speech data with
CNNs and other techniques.
Security: Google uses semi-supervised learning for anomaly detection in network traffic and
malware detection.
Finance: PayPal applies it for fraud detection and creditworthiness assessment using
transaction data.
Advantages
Better Generalization: Utilizes both labeled and unlabeled data to capture the whole data
structure, improving prediction robustness.
Cost Efficient: Reduces dependency on costly manual labeling by exploiting unlabeled data.
Flexible and Robust: Handles different data types and sources, adapting well to changing
data distributions.
Improved Clustering: Refines clusters by leveraging unlabeled data, yielding better class
separation.
Handling Rare Classes: Enhances learning for underrepresented classes where labeled
examples are minimal.
Limitations
Model Complexity: Requires careful choice of architecture and hyperparameters, which may
require extensive tuning.
Noisy Data: Unlabeled data may contain errors or irrelevant information, risking degraded
model performance.
Assumption Sensitivity: Relies on assumptions such as data consistency and clusterability,
which may not hold in all cases.
Evaluation Challenge: Assessing performance is difficult due to limited labeled data and
varied quality of unlabeled data.
Reinforcement Learning
Reinforcement Learning (RL) is a branch of machine learning that focuses on how agents can learn to
make decisions through trial and error to maximize cumulative rewards. RL allows machines to learn
by interacting with an environment and receiving feedback based on their actions. This feedback
comes in the form of rewards or penalties.
Reinforcement Learning revolves around the idea that an agent (the learner or decision-maker)
interacts with an environment to achieve a goal. The agent performs actions and receives feedback to
optimize its decision-making over time.
Agent: The decision-maker that performs actions.
Environment: The world or system in which the agent operates.
State: The situation or condition the agent is currently in.
Action: The possible moves or decisions the agent can make.
Reward: The feedback or result from the environment based on the agent’s action
Application
Robotics: RL is used to automate tasks in structured environments such as manufacturing,
where robots learn to optimize movements and improve efficiency.
Games: Advanced RL algorithms have been used to develop strategies for complex games like
chess, Go and video games, outperforming human players in many instances.
Industrial Control: RL helps in real-time adjustments and optimization of industrial
operations, such as refining processes in the oil and gas industry.
Personalized Training Systems: RL enables the customization of instructional content based
on an individual's learning patterns, improving engagement and effectiveness.
Advantages
Solves complex sequential decision problems where other approaches fail.
Learns from real-time interaction, enabling adaptation to changing environments.
Does not require labeled data, unlike supervised learning.
Can innovate by discovering new strategies beyond human intuition.
Handles uncertainty and stochastic environments effectively.
Disadvantages
Computationally intensive, requiring large amounts of data and processing power.
Reward function design is critical; poor design leads to unintended behaviors.
Not suitable for simple problems where traditional methods are more efficient.
Challenging to debug and interpret, making it hard to explain decisions.
Exploration-exploitation trade-off requires careful balancing to optimize learning.
1. Self-Supervised Learning
Definition: A form of supervised learning where the data generates its own labels
based on its inherent structure.
Key Characteristics:
o Often used to pre-train models on large datasets.
o Particularly popular in deep learning for representation learning.
Applications:
o Natural Language Processing (e.g., GPT, BERT)
o Computer Vision (e.g., contrastive learning, SimCLR)
Common Techniques:
o Contrastive Learning
o Masked Language Models
2. Online Learning
Key Characteristics:
o Stochastic and adaptive.
o Does not require gradient information.
Applications:
o Hyperparameter optimization
o Designing neural network architectures
Common Techniques:
o Genetic Algorithms
o Evolution Strategies
Definition: Combines multiple paradigms or focuses on learning how to learn (meta- learning).
Key Characteristics:
o Increases flexibility and generalization.
o Optimizes model design and performance.
Applications:
o Few-shot learning
o Multi-task learning
Common Approaches:
o Neural Architecture Search (NAS)
o Model-Agnostic Meta-Learning (MAML)
Each of these paradigms has distinct use cases and is often chosen based on the type of problem,
the available data, and the desired outcomes
LEARNING BY ROTE
Learning by rote in machine learning is a method of memorizing facts and knowledge
through repetition, where new information is simply stored without deep understanding. This
technique allows for quick recall of basic information but can lead to poor generalization on
new, unseen data and overfitting, hindering the ability to apply knowledge in new contexts.
WORKING OF LEARNING BY ROTE
Storing information:
The system stores specific pieces of information as they are encountered. This can be as simple
as storing a computed value or as complex as saving a feature set from an image.
Repetition and repetition:
The process of rote learning in machine learning is similar to traditional learning methods,
relying on repetition to memorize information without deep conceptual understanding.
Strengths
Quick information recall: Rote learning allows for rapid retrieval of stored facts and
data.
Foundation building: It can establish a basic foundation of knowledge for more
advanced learning.
Efficiency: It can be an efficient method for early-stage learning or when quick, accurate
retrieval is critical, such as with standardized tests.
Weaknesses
Poor generalization:
The primary weakness is that models trained this way perform poorly on data that was not in
the original training set.
Overfitting:
Rote learning can cause models to "overfit" to the training data, meaning they have
memorized it too well and cannot react effectively to new data.
Lack of deep understanding:
It does not build a deep, conceptual understanding, making it difficult to relate information to
new or real-world scenarios.
Use cases
Simple AI applications that require fast lookup of specific information.
As a part of more complex systems, where stored values can be used more efficiently or
to handle specific, repetitive tasks
LREARNING BY INDUCTION
Learning by induction is a fundamental concept in machine learning where a system
learns general rules or patterns from specific examples. Instead of being explicitly
programmed, the machine observes input–output pairs and builds a model that can make
accurate predictions on new, unseen data. Most supervised learning techniques such as
decision trees, neural networks, SVMs, and regression follow the inductive approach
Induction is a reasoning process that moves from specific observations to general
conclusions. In machine learning, this means analyzing a dataset of examples and discovering
hidden patterns. The model formed through induction is then used to generalize to new cases.
6. Advantages
1. Generalization ability: Works well on unseen data.
2. Flexible: Can handle large and complex datasets.
3. Automated learning: Reduces manual programming.
4. Applicable to many tasks: Classification, prediction, detection.
7. Disadvantages
1. Requires high-quality data: Noisy data leads to wrong patterns.
2. May overfit: Learns noise instead of true patterns.
3. Cannot guarantee correctness: Generalization may fail sometimes.
4. Computationally expensive: Training may require more time and resources.
8. Applications
Email spam detection
Image and speech recognition
Medical diagnosis
Weather forecasting
Recommender systems (YouTube, Amazon)
MATCHING
Matching is the process of comparing data vectors using distance or similarity measures to find
the best corresponding pattern in machine learning.
Matching = finding the most similar thing
👉 Just like:
Finding the student whose marks are closest to yours
Finding the nearest house to your home
Finding the most similar image or document
Why do we need Matching in ML?
Because machines do not understand similarity like humans.
So we teach the machine:
How to compare
How to decide which is close
How to choose the best match
How does a machine “see” data?
A machine sees numbers only.
So every object is converted into a vector (list of numbers).
Example: Student marks
Student A → (80, 70, 90)
Student B → (78, 72, 88)
Student C → (40, 50, 45)
Each student is a vector.
4️⃣ How does the machine compare two vectors?
It uses a proximity measure.
Proximity measure = a rule to compare
Matching is done using a proximity measure, which can be:
Distanc e measure → smaller distance = better match
Similarity measure → larger similarity = better match
5️⃣ Distance measure (Euclidean Distance)
Real-life example:
Distance between two houses:
Small distance → houses are close
Large distance → houses are far
In ML:
Distance between two vectors:
Small distance → very similar
Large distance → very different
Euclidean distance works like a ruler.
l
d ( u , v )=√ ∑ ( u ( i )−v ( i ) )
2
i=1