0% found this document useful (0 votes)

259 views25 pages

Neural Networks and Genetic Algorithms Overview

Q: How do issues like local minima or saddle points affect training in neural networks, and what strategies are used to overcome them?

Local minima and saddle points affect neural network training by trapping the optimization process in sub-optimal solutions or flat spots with no clear direction for improvement, leading to slow convergence and reduced accuracy. Strategies to overcome these issues include using advanced optimizers like Adam or RMSProp that incorporate momentum or adaptive learning rates, careful initialization methods such as Xavier or He, and practices like batch normalization which stabilize and expedite training .

Q: What challenges do neural networks face in terms of interpretability and how does this affect their deployment in critical applications?

Neural networks face challenges in interpretability because they function as 'black boxes', where the decision-making process is not transparent or easily understood. This lack of clarity poses significant issues in critical applications such as medical diagnoses or safety-critical systems where understanding the rationale behind decisions is vital to ensuring trust and reliability. As a result, deploying neural networks in these domains requires additional tools and methods to interpret and explain the inner workings and decisions made by the models .

Q: How does the Universal Approximation Theorem highlight the representation power of neural networks?

The Universal Approximation Theorem demonstrates the representation power of neural networks by stating that a feedforward neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function on a closed and bounded subset of Euclidean space, under mild assumptions about the activation function. This means that neural networks can, in theory, represent a very wide range of complex functions given sufficient neurons and proper training .

Q: Why is the choice of activation function crucial in neural network architecture, particularly for hidden layers?

The choice of activation function is crucial because it determines the network's ability to capture non-linearities in the data. For hidden layers, non-linear activation functions such as ReLU, Sigmoid, or Tanh allow the network to learn complex patterns and hierarchies within the data, which linear functions cannot capture. These functions enable the model to approximate a wide range of functions, crucial for tasks requiring intricate pattern recognition and decision boundaries .

Q: How do genetic algorithms improve hyperparameter optimization for neural networks compared to traditional methods like grid search?

Genetic algorithms improve hyperparameter optimization by exploring a broader and more diverse range of potential solutions through processes like selection, crossover, and mutation. Unlike grid search, which systematically examines each possible parameter combination, genetic algorithms can cover a more complex search space more efficiently by evolving the parameter sets and focusing computational resources on the most promising configurations. This capability makes them particularly useful for tuning hyperparameters in neural networks, where iterative and vast combination evaluations are computationally expensive .

Q: In what ways does backpropagation benefit from advanced optimizers like Adam and RMSProp, and what are their impacts on learning efficiency?

Advanced optimizers like Adam and RMSProp enhance backpropagation by adapting the learning rates of model parameters based on their gradients, which accelerates convergence and leads to more stable learning. Adam, specifically, combines the benefits of two other extensions of stochastic gradient descent, AdaGrad and RMSProp, by maintaining an adaptive learning rate for each parameter's gradient. These optimizers help avoid the pitfalls of poor convergence associated with traditional gradient descent, especially in deep networks, by efficiently managing momentum and learning rate adjustments .

Q: Explain the significance of ReLU and its variants, such as Leaky ReLU, in addressing the vanishing gradient problem.

ReLU and its variants, like Leaky ReLU, address the vanishing gradient problem by preventing the gradient from becoming too small, which can occur with activation functions like sigmoid or tanh. ReLU introduces sparsity and avoids saturating activation regions, which help preserve gradient flow. Leaky ReLU further improves upon ReLU by allowing a small, non-zero gradient for negative inputs, which mitigates the issue of dying ReLUs where a model might stop updating weights because of inactive neurons .

Q: What are the computational challenges of deploying neural networks on low-resource devices, and how can they be mitigated?

Deploying neural networks on low-resource devices presents challenges such as high memory consumption, CPU/GPU demand, and latency. These challenges can be mitigated using techniques such as model pruning, quantization, or reducing network size and complexity, thereby reducing the resource requirements without significantly impacting performance. Techniques like distillation can also help by transferring knowledge from large models to smaller ones that are more suitable for low-power devices .

Q: Why are ReLU activation functions preferred in deep networks, and how do they address the vanishing gradient problem?

ReLU (Rectified Linear Unit) functions are preferred in deep networks because they introduce non-linearity while being computationally efficient, as they simply output the input if it is positive, otherwise zero. ReLU addresses the vanishing gradient problem by allowing direct gradient flow through the active neurons, maintaining larger gradient values during backpropagation, hence helping to avoid the diminishing updates in weights that occur with other activation functions like sigmoid or tanh .

Q: Discuss how batch normalization can address the issues of vanishing gradients and improve neural network training.

Batch normalization addresses vanishing gradients by normalizing layer inputs, which stabilizes the learning process and reduces the internal covariate shift that can lead to vanishing or exploding gradients. This normalization process ensures that the input to each layer maintains a consistent distribution, which helps preserve effective gradient magnitudes throughout training layers, thereby enabling deeper networks to be trained more efficiently and with reduced sensitivity to initialization and learning rates .

This document provides an overview of Neural Networks and Genetic Algorithms, detailing their structures, functions, advantages, and challenges. It explains the components of neural networks, including layers, neurons, and activation functions, and contrasts them with genetic algorithms, which are inspired by natural evolution for optimization tasks. Additionally, it covers advanced topics in both fields, such as deep learning, regularization techniques, and the integration of genetic algorithms for optimizing neural networks.

Uploaded by

surekha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

259 views25 pages

Neural Networks and Genetic Algorithms Overview

Uploaded by

surekha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT II

Neural networks and genetic algorithms Neural Network Representation – Problems –

Perceptrons – Multilayer Networks and Back Propagation Algorithms – Advanced Topics –
Genetic Algorithms – Hypothesis Space Search – Genetic Programming – Models of Evaluation
and Learning.

NEURAL NETWORKS AND GENETIC ALGORITHMS

1. Neural Networks (NN)

A Neural Network is a computational model inspired by the human brain. It contains

interconnected units called neurons that process information in layers.

Structure

1. Input Layer – receives input features.

2. Hidden Layers – perform computations using weights and activation functions.
3. Output Layer – produces predicted class/output value.

How it Works

 Each neuron computes:

y = f(wT x + b)
where f = activation function (ReLU, Sigmoid, Tanh).
 Network learns by adjusting weights using Backpropagation and Gradient Descent.

Applications

 Image recognition
 Speech processing
 Natural language processing
 Medical diagnosis
 Pattern recognition

Advantages

✔ Can model complex non-linear patterns

✔ Automatic feature extraction
✔ High accuracy on large datasets

Disadvantages

✘ Needs large data

✘ Computationally expensive
✘ Hard to interpret

2. Genetic Algorithms (GA)

Genetic Algorithms are search and optimization algorithms inspired by natural evolution and
genetics.

GA repeatedly transforms a set of candidate solutions (population) using:

 Selection (choose best individuals)

 Crossover (combine two solutions)
 Mutation (randomly modify solutions)

Steps in GA

1. Initialization – generate random population

2. Fitness Evaluation – measure quality of each solution
3. Selection – choose best parents
4. Crossover – mix parents to create offspring
5. Mutation – introduce variations
6. Replacement – produce new population
7. Termination – stop when optimal solution found

Advantages

✔ Works well for complex optimization problems

✔ Does not require gradient or differentiability
✔ Can escape local minima

Disadvantages

✘ Slow convergence
✘ No guarantee of optimal solution
✘ Large number of evaluations required

3. Neural Networks vs Genetic Algorithms

Feature Neural Networks Genetic Algorithms

Purpose Learning from data, prediction Search & optimization
Inspiration Human brain Biological evolution
Learning Backpropagation Evolutionary operations
Requires gradient? Yes No
Application Classification, regression Optimization problems

4. Combined Use: Neuro-Genetic Systems

Sometimes Genetic Algorithms are used to optimize Neural Networks, such as:

 Optimizing weights
 Tuning hyperparameters
 Designing network architecture
NEURAL NETWORK REPRESENTATION

A Neural Network Representation refers to the way in which a neural network models a
function using neurons, layers, weights, biases, and activation functions. It shows how inputs
are transformed step by step to produce an output.

Neural networks represent a complex non-linear function by combining many simple

computational units (neurons).

1. Basic Elements of Neural Network Representation

(a) Neurons (Nodes)

 The fundamental processing units.

 Each neuron takes input, multiplies by weight, adds bias, and applies an activation
function.

a = f(wT x + b)

(b) Layers

A neural network is structured into layers:

1. Input Layer – receives raw features

2. Hidden Layers – perform intermediate computations
3. Output Layer – produces final prediction

The number of hidden layers and neurons decides the capacity of the network.

2. Parameters: Weights and Bias

 Weights (w) determine importance of each input.

 Bias (b) shifts the activation function.
 Together, they represent the learned function.
 Learning = adjusting weights & biases using training data.

3. Activation Functions

Represent how a neuron transforms input to output.

Common functions:

 Sigmoid – for binary classification

 ReLU – for deep networks
 Tanh
 Softmax – for multi-class outputs

They introduce non-linearity, enabling the network to learn complex patterns.

4. Forward Propagation Representation

Data flows from input → hidden layers → output.

Each layer performs:

a (l) = f(W (l) a (l-1)+ b(l) )

This chained transformation represents the overall function:

F(x) = fn(fn-1(...f1(x)...))

5. Training Representation (Backpropagation)

To learn from data:

 Compute output from forward pass

 Compare with actual label
 Compute error
 Adjust weights using Gradient Descent
 This repeated process tunes parameters to represent the target function accurately.

6. Representation Power

 A neural network can represent linear and non-linear functions.

 With enough neurons, it can approximate any function (Universal Approximation
Theorem).

7. Graphical Representation (Structure)

A neural network is represented as a directed graph:

 Nodes = neurons
 Edges = weighted connections
 Flow = from input → output

NEURAL NETWORK PROBLEMS / CHALLENGES

Neural networks are powerful models, but they suffer from several practical and theoretical
problems. These issues affect training, performance, interpretability, and computational cost.

1. Overfitting

 Neural networks have many parameters.

 They may “memorize” the training data instead of learning patterns.
 Leads to poor performance on test data.

Reasons:

 Small training dataset

 Too many layers or neurons
 No regularization

Solutions:
 Dropout, early stopping, regularization, more data.

2. Vanishing & Exploding Gradients

During backpropagation, gradients become very small (vanishing) or very large (exploding),
especially in deep networks.

Effects:

 Slow learning
 Unstable training
 Weights unable to update properly

Solutions:

 ReLU activation
 Batch normalization
 Proper weight initialization (Xavier/He)

3. High Computational Cost

 Training deep neural networks requires powerful GPUs, large memory, and long time.
 Real-time or low-resource environments struggle to run NNs.

4. Need for Large Amount of Data

 Neural networks perform well only with large labeled datasets.

 Data collection and annotation are expensive.

5. Black-Box Nature (Lack of Interpretability)

 Neural networks do not clearly show how they arrived at a decision.

 Hard to interpret weights and internal layers.
 Difficult for medical, legal, and safety-critical applications.

6. Hyperparameter Tuning Difficulty

NNs need careful tuning of:

 Learning rate
 Batch size
 Number of layers
 Number of neurons
 Activation functions
 Dropout rate

This process is time-consuming and requires expertise.

7. Local Minima & Saddle Points

 Training may get stuck at sub-optimal minima or flat regions.

 Slows down learning and reduces accuracy.

Solutions:

 Advanced optimizers (Adam, RMSProp)

 Learning rate schedules

8. Sensitivity to Input Scaling

Neural networks perform poorly if input features are not:

 Normalized
 Standardized

Unscaled inputs cause unstable gradients.

9. Hardware Dependence

 Efficient training requires GPUs/TPUs.

 Hardware cost is high for beginners or small organizations.

10. Difficult Deployment

 Neural networks require large memory

 High latency
 Not suitable for simple, low-power devices unless optimized (e.g., quantization)

Perceptrons

A Perceptron is the simplest type of artificial neural network model, introduced by Frank
Rosenblatt (1957).
It is a binary classifier that decides whether an input belongs to one of two classes.

The perceptron forms the basic building block of neural networks.

2. Working of a Perceptron

1. The perceptron computes the weighted sum of inputs.

2. Applies the step activation.
3. Output is either 0 or 1.
4. The perceptron learns by adjusting weights using the Perceptron Learning Rule.

4. Types of Perceptrons

(a) Single-Layer Perceptron

 Only one neuron.

 Handles linearly separable problems.
 Cannot solve XOR problem.

(b) Multi-Layer Perceptron (MLP)

 Has hidden layers.

 Uses nonlinear activation functions.
 Trained using backpropagation.
 Solves non-linear problems (including XOR).

5. Limitations of Perceptron

 Only works for linearly separable data.

 Cannot model complex patterns.
 Uses a discontinuous step function → not suitable for gradient-based training.

6. Applications

 Binary classification
 Pattern recognition
 Character recognition
 Simple decision-based tasks
MULTILAYER NETWORKS AND BACKPROPAGATION ALGORITHM

1. Multilayer Networks (Multilayer Perceptrons – MLP)

A Multilayer Network is an extension of the single-layer perceptron.

It has:

1. Input Layer
2. One or More Hidden Layers
3. Output Layer

Each layer contains neurons, and each neuron performs:

a=f(wTx+b)

where f is a nonlinear activation function (ReLU, Sigmoid, Tanh).

Why Multiple Layers?

 Single-layer perceptron can solve only linearly separable problems.

 Multilayer networks learn complex, non-linear patterns.
 Can approximate any function → Universal Approximation Theorem.

Key Features of MLP

 Fully connected layers

 Non-linear activations
 Uses Gradient Descent for training
 Requires a learning algorithm → Backpropagation

2. Backpropagation Algorithm

Backpropagation is the standard learning algorithm used to train multilayer networks.

It adjusts the weights to reduce the error between predicted output and actual output.

Main Idea

1. Compute output using forward pass

2. Compare output with target → compute error
3. Propagate the error backward through the network
4. Update weights using Gradient Descent

2.1 Steps in Backpropagation

Step 1: Forward Pass

 Pass input through layers

 Compute outputs of each neuron
 Get the final network output

y=f(Wx+b)
Step 2: Compute Error

E=1/2(t−y)2

where t = target value.

Step 3: Backward Pass (Error Propagation)

3. Advantages of Backpropagation

✔ Efficient for training deep networks

✔ Works with any differentiable activation function
✔ Supports multiple hidden layers
✔ Fast learning with gradient descent variants (Adam, RMSProp)

4. Disadvantages

✘ Suffers from vanishing/exploding gradients

✘ Requires differentiable activation functions
✘ Training can be slow
✘ Sensitive to learning rate and weight initialization

5. Applications of MLP + Backpropagation

 Handwriting and digit recognition (MNIST)

 Speech and image processing
 Medical diagnosis
 Financial forecasting
 Pattern classification & regression tasks

ADVANCED TOPICS IN MULTILAYER NETWORKS & BACKPROPAGATION

Below are the most important advanced topics that extend and improve the basic MLP +
Backpropagation architecture.

1. Deep Neural Networks (DNNs)

When multilayer networks have many hidden layers, they are called Deep Neural Networks.
Deeper architectures learn high-level hierarchical features.

Key ideas:

 Layer-wise feature extraction

 Better representation power
 Used in audio, image, text, and speech tasks

2. Activation Function Innovations

New activation functions solve problems like vanishing gradients.

Advanced Activations:

 ReLU (Rectified Linear Unit)

 Leaky ReLU
 ELU, SELU
 Softplus
 Swish
 GELU

These help deeper networks train faster and more accurately.

3. Regularization Techniques

Advanced regularization is used to reduce overfitting.

Methods:

 Dropout – randomly deactivate neurons

 Batch Normalization – normalizes layer outputs
 L2/L1 Regularization
 Early Stopping
 Data Augmentation

Batch Normalization also helps reduce vanishing gradient.

4. Optimization Enhancements

Classic gradient descent is slow and unstable.

Advanced optimizers speed up training.

Modern Optimizers:

 Adam
 RMSProp
 Adagrad
 Nesterov Momentum

These help faster convergence and stable learning.

5. Weight Initialization Strategies

Proper weight initialization reduces vanishing/exploding gradient issues.

Techniques:

 Xavier (Glorot) Initialization

 He Initialization
 Orthogonal Initialization

These help deep networks learn better.

6. Convolutional Neural Networks (CNNs)

A specialized architecture for image and video processing.

Features:

 Convolution layers
 Pooling layers
 Feature maps
 Sparse connectivity

CNNs learn spatial hierarchies (edges → textures → objects).

7. Recurrent Neural Networks (RNNs) and LSTMs

Designed for sequential data like text, speech, time-series.

Advanced RNNs:
 LSTM (Long Short-Term Memory)
 GRU (Gated Recurrent Unit)

These overcome the vanishing gradient problem in sequences.

8. Autoencoders

Autoencoders learn compressed representations (latent space).

Advanced types:

 Denoising Autoencoder
 Sparse Autoencoder
 Variational Autoencoder (VAE)

Used for compression, anomaly detection, generative tasks.

9. Transfer Learning

Using a pre-trained deep network on a new dataset.

Advantages:

 Requires less data

 High accuracy
 Faster training

Popular models: VGG, ResNet, Inception, BERT.

10. Gradient Vanishing/Exploding Solutions

Advanced approaches to reduce gradient problems:

 Residual Networks (ResNet)

 Skip Connections
 Highway Networks

These allow very deep networks to train successfully.

11. Hyperparameter Optimization

Systematic ways to find optimal model settings.

Methods:

 Grid Search
 Random Search
 Bayesian Optimization
 Genetic Algorithms for NN tuning
12. Advanced Loss Functions

New loss functions help specialized tasks.

Examples:

 Cross-Entropy Loss
 Focal Loss (for class imbalance)
 Huber Loss
 Contrastive Loss
 Triplet Loss

13. Backpropagation Variants

 Backpropagation Through Time (BPTT) – for RNNs

 Truncated BPTT – reduces computation
 Parallel and Distributed Backpropagation (GPU, TPU training)

14. Gradient Checking

A numerical method to verify correctness of backpropagation implementation.

15. Ensemble of Neural Networks

Combining predictions of multiple networks:

 Bagging
 Boosting
 Stacking
 Random Forest-like NN ensembles

Increases accuracy and robustness.

GENETIC ALGORITHMS

Genetic Algorithms (GAs) are search and optimization algorithms inspired by the principles of
natural selection and genetics, first proposed by John Holland (1975).
They are widely used in machine learning for optimization problems, feature selection,
parameter tuning, and generating optimal solutions when traditional methods fail.

1. Basic Idea of Genetic Algorithms

GAs imitate biological evolution:

 Best individuals survive

 Weak solutions are eliminated
 New solutions evolve over generations
 Population gradually improves

Each solution is encoded as a chromosome (string of bits, numbers, or symbols).

2. Components of a Genetic Algorithm

(a) Population

A collection of candidate solutions.

Each individual is called a chromosome.
Example:
10101011 or a vector like [0.3, 0.2, 0.9].

(b) Fitness Function

Measures quality of each solution.

Higher fitness → better solution.

Used to select parents for the next generation.

Chooses the fittest individuals for reproduction.

Common methods:

 Roulette Wheel Selection

 Tournament Selection
 Rank Selection

Better solutions have higher probability to reproduce.

(d) Crossover (Recombination)

Two parents exchange parts of their chromosome to form new offspring.

Types:

 Single-point crossover
 Two-point crossover
 Uniform crossover

Example:
Parent 1: 1010 | 1100
Parent 2: 0011 | 0101
Child: 10100101

(e) Mutation

Random change in genes to introduce diversity.

Example:
100101 → 100111

Mutation avoids premature convergence and allows exploration of new search space.
(f) Replacement

The new offspring replace some or all individuals in the population.

3. Genetic Algorithm Process (Step-by-Step)

1. Initialize Population
Randomly generate N solutions.
2. Evaluate Fitness
Compute fitness of each individual.
3. Selection
Choose best-performing parents.
4. Crossover
Generate new offspring by mixing genes.
5. Mutation
Randomly alter genes to preserve diversity.
6. Replacement
Form new population from parents and offspring.
7. Termination
Stop when:
o Maximum number of generations reached
o Fitness stops improving
o Optimal solution found

4. Applications of Genetic Algorithms in Machine Learning

(a) Feature Selection

GAs identify important features to improve:

 Accuracy
 Speed
 Generalization

(b) Hyperparameter Optimization

Used to tune:

 Learning rate
 Number of layers
 Batch size
 Activation functions

Works better than grid search for complex models.

(c) Neural Network Training

GAs can:

 Initialize NN weights
 Evolve network structure
 Optimize architecture
 Replace or supplement backpropagation

Used in Neuro-Genetic Systems.

(d) Clustering

GA-based clustering avoids issues in k-means or hierarchical clustering.

(e) Reinforcement Learning

GAs evolve the best policies or action strategies.

(f) Function Optimization

Widely used to find global optima for complicated functions.

5. Advantages of Genetic Algorithms

✔ Can find global optimum

✔ Works on non-linear, multi-modal, complex search spaces
✔ Does not require gradient or differentiability
✔ Easy to implement
✔ Highly parallelizable
✔ Robust to noise

6. Disadvantages of Genetic Algorithms

✘ Computationally expensive
✘ Slow convergence for large search spaces
✘ No guarantee of optimal solution
✘ Requires careful tuning of parameters (mutation rate, population size)
✘ Performance depends on representation quality

7. Real-World Applications

 Scheduling problems
 Route optimization (Traveling Salesman Problem)
 Game playing strategies
 Robotics path planning
 Cryptography
 Medical diagnosis
 Financial modeling and portfolio optimization
9. Flowchart of Genetic Algorithm (for diagram-based questions)

1. Start
2. Initialize population
3. Evaluate fitness
4. Selection
5. Crossover
6. Mutation
7. Replacement
8. If stopping condition met → Stop
9. Else → repeat

HYPOTHESIS SPACE SEARCH IN MACHINE LEARNING – DETAILED NOTES

In machine learning, learning = searching for the best hypothesis that fits the given data.
A hypothesis is a possible model, pattern, rule, or function that maps inputs to outputs.

Example:
In classification, a hypothesis might be:
IF (Age > 40 AND Income > 50K) THEN Class = “Yes”

1. What is Hypothesis Space?

The Hypothesis Space (H) is the set of all possible hypotheses a learning algorithm can choose
from.

Examples:

 For linear regression: all linear equations

 For decision trees: all possible trees
 For a perceptron: all possible separating hyperplanes

The hypothesis space depends on:

✔ model type
✔ parameters
✔ features
✔ constraints

2. The Learning Problem as a Search Problem

Learning can be viewed as:
[
Searching the hypothesis space H to find the best hypothesis } h\*.
]

Goal:
Choose the hypothesis that best fits training data AND generalizes well.

3. How is Hypothesis Space Searched?

Different algorithms use different search strategies.

A. Enumerative (Brute Force) Search

 Try every hypothesis in H

 Choose the one that fits the data
 Works only for small spaces
 Computationally expensive

Examples:

 Exhaustively checking all decision stumps

B. Greedy Search

 Expand the best partial solution at each step

 Used when H is huge

Examples:

 Decision Tree Learning (ID3, C4.5) chooses best attribute at each node
 Hill-climbing search

Pros: fast
Cons: may get stuck in local optimum

C. Gradient-Based Search

Used when hypothesis is expressed using continuous parameters.

Examples:

 Linear Regression: minimize squared error

 Neural Networks: backpropagation uses gradient descent

Algorithm:

1. Compute gradient
2. Move in opposite direction
3. Stop when convergence
D. Evolutionary Search (Genetic Algorithms)

 Treat hypotheses as chromosomes

 Use mutation, crossover, selection
 Good for large, non-differentiable spaces

E. Randomized or Stochastic Search

 Random sampling of hypotheses

 E.g., Random Search, Monte Carlo methods

Used for:

 Hyperparameter tuning
 Feature selection

F. Bayesian Search

Searches hypothesis space by computing:

P(h | D) = P(D | h) . P(h)/P(D)

Choose hypothesis with:

 Maximum Posterior Probability (MAP)

 Maximum Likelihood (ML)

Used in:

 Naive Bayes
 Bayesian Networks

4. Version Space Model (Concept Learning)

Introduced in MITCHELL's Candidate-Elimination Algorithm.

 Hypothesis space represented by two boundaries:

o S-set (Specific boundary): most specific hypothesis
o G-set (General boundary): most general hypothesis

Learning = narrowing down the version space until only valid hypotheses remain.

This is a structured hypothesis space search.

5. Bias in Hypothesis Space Search

To make search efficient, ML algorithms use:

 Inductive Bias (preferences for some hypotheses)

 Language Bias (restriction on representation)
 Search Bias (restriction on how the space is searched)

Example:

 Decision tree prefers shorter trees (inductive bias).

6. Hypothesis Evaluation

Once a hypothesis is found, its quality is checked using:

 Training error
 Validation error
 Generalization performance

The best hypothesis is the one with minimum loss.

7. Hypothesis Space Types

Type Description Example

Finite Countable hypotheses Boolean functions of n variables
Infinite Continuous parameterized Linear regression weights
Discrete Symbolic Decision trees
Continuous Real-valued Neural networks

8. Challenges in Hypothesis Space Search

 Extremely large or infinite hypothesis spaces

 Over fitting and under fitting
 Local minima
 Computational limitations
 Non-convex search spaces

GENETIC PROGRAMMING (GP) IN MACHINE LEARNING – DETAILED NOTES

Genetic Programming (GP) is an evolutionary algorithm that automatically evolves computer

programs, expressions, or models to solve a problem.
It is an extension of Genetic Algorithms (GA), introduced by John Koza (1992).

While GA evolves fixed-length chromosomes,

GP evolves complete programs represented as trees.

1. What is Genetic Programming (GP)?

Genetic Programming is a technique where:

 A population of computer programs is created randomly.

 Programs evolve using selection, crossover, and mutation.
 Over generations, the programs become better at solving the task.

Programs are usually represented as syntax trees:

Example Tree for f(x) = x + (3 * y):

+
/\
x *
/\
3 y

GP is widely used for:

 Symbolic regression
 Automatic feature generation
 Rule learning
 Robot control
 Optimization

2. How GP Differs from GA

Genetic Algorithms (GA) Genetic Programming (GP)

Evolves strings (bitstrings, arrays) Evolves programs (tree structures)
Fixed length chromosomes Variable-length programs
Optimizes parameter values Discovers entire algorithms / models
Simple encoding Complex hierarchical encoding
Output: solution values Output: an executable model

3. Components of Genetic Programming

1. Representation

Programs are usually represented as:

 Syntax trees (most common)

 Lisp expressions
 Graphs

2. Functions and Terminals

GP requires defining:

 Function set (F): operations

Examples: +, -, *, /, sin, cos, if-else
 Terminal set (T): inputs
Examples: x, y, constants, variables

A program is built from combinations of F and T.

3. Fitness Function

Measures how well a program solves the problem.

Examples:
 Mean squared error (symbolic regression)
 Classification accuracy
 Reward score (robot control)

4. Genetic Operators

GP uses modified GA operators suitable for tree structures.

A. Selection

Choose best programs using:

 Tournament selection
 Roulette wheel
 Rank-based selection

B. Crossover (Subtree crossover)

Two parent trees exchange randomly selected subtrees.

C. Mutation

Randomly replace a subtree with a new randomly generated subtree.

D. Reproduction

Copy the fittest programs to the next generation.

4. Working of Genetic Programming (Step-by-Step)

1. Initialize population of random programs

2. Evaluate fitness for each program
3. Select best programs based on fitness
4. Apply crossover & mutation to create offspring
5. Replace population with offspring + elite programs
6. Repeat for many generations
7. Best evolving program is the solution

5. Applications of Genetic Programming in ML

1. Symbolic Regression

GP discovers mathematical equations that describe a dataset:

 Physics formula discovery

 Predictive modeling

2. Feature Construction & Feature Selection

GP automatically:
 Creates new features
 Removes irrelevant features

3. Decision Rule Learning

Automatically evolves IF–THEN rules:

 Medical diagnosis
 Fraud detection
 Pattern recognition

4. Neural Network Optimization

GP can evolve:

 Network architectures
 Activation functions
 Learning strategies

(Used in Neuro-Evolution)

5. Reinforcement Learning

GP evolves:

 Control policies
 Robot movement rules

6. Automatic Program Generation

Used for:

 Algorithm discovery
 Code generation
 Bug fixes

6. Advantages of Genetic Programming

✔ Can automatically create interpretable models

✔ Finds global optima
✔ Handles variable-length and complex structures
✔ Works on non-linear, symbolic tasks
✔ No need for gradient or differentiability
✔ Highly flexible and domain-independent

7. Disadvantages of Genetic Programming

✘ Computationally expensive
✘ Slow convergence
✘ Final model may be too large (bloat problem)
✘ Requires careful tuning of function/terminal sets
✘ Hard to control complexity

8. Example of GP Output

GP might evolve a mathematical expression like:

f(x) = sin(x) + 0.5x2

Or a rule:

IF temperature > 30 AND humidity < 60 THEN play = "no"

ELSE play = "yes"

These are automatically discovered through evolution.

9. Genetic Programming Workflow (Diagram – easy to draw in exam)

Start
↓
Generate initial random programs
↓
Evaluate fitness
↓
Selection of best programs
↓
Crossover + Mutation → create new programs
↓
Replacement → new generation
↓
Stopping condition?
→ No → Repeat cycle
→ Yes → Output best program

Models of Evaluation in Machine Learning

Model evaluation assesses how well machine learning models generalize to unseen data using
metrics and techniques to detect overfitting, underfitting, or bias, ensuring reliable predictions
before deployment.

Key Evaluation Metrics

 Accuracy: Ratio of correct predictions to total predictions, suitable for balanced datasets.
 Precision/Recall/F1-Score: Precision measures true positives among predicted positives;
recall captures true positives among actual positives; F1 balances both for imbalanced
data.
 AUC-ROC: Area under receiver operating characteristic curve, evaluating classification
across thresholds.
 Confusion Matrix: Table showing true positives (TP), true negatives (TN), false
positives (FP), and false negatives (FN).

For regression: Mean Absolute Error (MAE), Mean Squared Error (MSE), R² score quantify
prediction errors.

Learning and Evaluation Techniques

 Holdout: Split data into train/test sets for quick evaluation.

 K-Fold Cross-Validation: Divide data into K folds, train on K-1, validate on 1; average
results reduce variance.
 Leave-One-Out: Extreme cross-validation using all but one sample for training.

Comparison Table

Technique/Metric Use Case Strengths

Holdout Simple, fast Low computation
K-Fold CV Robust generalization Reduces bias
Accuracy Balanced classification Intuitive
F1-Score Imbalanced data Handles class skew
AUC-ROC Threshold-independent Model comparison

These models guide model selection, hyperparameter tuning, and deployment decisions.

Common questions