UNIT II
Neural networks and genetic algorithms Neural Network Representation – Problems –
Perceptrons – Multilayer Networks and Back Propagation Algorithms – Advanced Topics –
Genetic Algorithms – Hypothesis Space Search – Genetic Programming – Models of Evaluation
and Learning.
NEURAL NETWORKS AND GENETIC ALGORITHMS
1. Neural Networks (NN)
A Neural Network is a computational model inspired by the human brain. It contains
interconnected units called neurons that process information in layers.
Structure
1. Input Layer – receives input features.
2. Hidden Layers – perform computations using weights and activation functions.
3. Output Layer – produces predicted class/output value.
How it Works
Each neuron computes:
y = f(wT x + b)
where f = activation function (ReLU, Sigmoid, Tanh).
Network learns by adjusting weights using Backpropagation and Gradient Descent.
Applications
Image recognition
Speech processing
Natural language processing
Medical diagnosis
Pattern recognition
Advantages
✔ Can model complex non-linear patterns
✔ Automatic feature extraction
✔ High accuracy on large datasets
Disadvantages
✘ Needs large data
✘ Computationally expensive
✘ Hard to interpret
2. Genetic Algorithms (GA)
Genetic Algorithms are search and optimization algorithms inspired by natural evolution and
genetics.
GA repeatedly transforms a set of candidate solutions (population) using:
Selection (choose best individuals)
Crossover (combine two solutions)
Mutation (randomly modify solutions)
Steps in GA
1. Initialization – generate random population
2. Fitness Evaluation – measure quality of each solution
3. Selection – choose best parents
4. Crossover – mix parents to create offspring
5. Mutation – introduce variations
6. Replacement – produce new population
7. Termination – stop when optimal solution found
Advantages
✔ Works well for complex optimization problems
✔ Does not require gradient or differentiability
✔ Can escape local minima
Disadvantages
✘ Slow convergence
✘ No guarantee of optimal solution
✘ Large number of evaluations required
3. Neural Networks vs Genetic Algorithms
Feature Neural Networks Genetic Algorithms
Purpose Learning from data, prediction Search & optimization
Inspiration Human brain Biological evolution
Learning Backpropagation Evolutionary operations
Requires gradient? Yes No
Application Classification, regression Optimization problems
4. Combined Use: Neuro-Genetic Systems
Sometimes Genetic Algorithms are used to optimize Neural Networks, such as:
Optimizing weights
Tuning hyperparameters
Designing network architecture
NEURAL NETWORK REPRESENTATION
A Neural Network Representation refers to the way in which a neural network models a
function using neurons, layers, weights, biases, and activation functions. It shows how inputs
are transformed step by step to produce an output.
Neural networks represent a complex non-linear function by combining many simple
computational units (neurons).
1. Basic Elements of Neural Network Representation
(a) Neurons (Nodes)
The fundamental processing units.
Each neuron takes input, multiplies by weight, adds bias, and applies an activation
function.
a = f(wT x + b)
(b) Layers
A neural network is structured into layers:
1. Input Layer – receives raw features
2. Hidden Layers – perform intermediate computations
3. Output Layer – produces final prediction
The number of hidden layers and neurons decides the capacity of the network.
2. Parameters: Weights and Bias
Weights (w) determine importance of each input.
Bias (b) shifts the activation function.
Together, they represent the learned function.
Learning = adjusting weights & biases using training data.
3. Activation Functions
Represent how a neuron transforms input to output.
Common functions:
Sigmoid – for binary classification
ReLU – for deep networks
Tanh
Softmax – for multi-class outputs
They introduce non-linearity, enabling the network to learn complex patterns.
4. Forward Propagation Representation
Data flows from input → hidden layers → output.
Each layer performs:
a (l) = f(W (l) a (l-1)+ b(l) )
This chained transformation represents the overall function:
F(x) = fn(fn-1(...f1(x)...))
5. Training Representation (Backpropagation)
To learn from data:
Compute output from forward pass
Compare with actual label
Compute error
Adjust weights using Gradient Descent
This repeated process tunes parameters to represent the target function accurately.
6. Representation Power
A neural network can represent linear and non-linear functions.
With enough neurons, it can approximate any function (Universal Approximation
Theorem).
7. Graphical Representation (Structure)
A neural network is represented as a directed graph:
Nodes = neurons
Edges = weighted connections
Flow = from input → output
NEURAL NETWORK PROBLEMS / CHALLENGES
Neural networks are powerful models, but they suffer from several practical and theoretical
problems. These issues affect training, performance, interpretability, and computational cost.
1. Overfitting
Neural networks have many parameters.
They may “memorize” the training data instead of learning patterns.
Leads to poor performance on test data.
Reasons:
Small training dataset
Too many layers or neurons
No regularization
Solutions:
Dropout, early stopping, regularization, more data.
2. Vanishing & Exploding Gradients
During backpropagation, gradients become very small (vanishing) or very large (exploding),
especially in deep networks.
Effects:
Slow learning
Unstable training
Weights unable to update properly
Solutions:
ReLU activation
Batch normalization
Proper weight initialization (Xavier/He)
3. High Computational Cost
Training deep neural networks requires powerful GPUs, large memory, and long time.
Real-time or low-resource environments struggle to run NNs.
4. Need for Large Amount of Data
Neural networks perform well only with large labeled datasets.
Data collection and annotation are expensive.
5. Black-Box Nature (Lack of Interpretability)
Neural networks do not clearly show how they arrived at a decision.
Hard to interpret weights and internal layers.
Difficult for medical, legal, and safety-critical applications.
6. Hyperparameter Tuning Difficulty
NNs need careful tuning of:
Learning rate
Batch size
Number of layers
Number of neurons
Activation functions
Dropout rate
This process is time-consuming and requires expertise.
7. Local Minima & Saddle Points
Training may get stuck at sub-optimal minima or flat regions.
Slows down learning and reduces accuracy.
Solutions:
Advanced optimizers (Adam, RMSProp)
Learning rate schedules
8. Sensitivity to Input Scaling
Neural networks perform poorly if input features are not:
Normalized
Standardized
Unscaled inputs cause unstable gradients.
9. Hardware Dependence
Efficient training requires GPUs/TPUs.
Hardware cost is high for beginners or small organizations.
10. Difficult Deployment
Neural networks require large memory
High latency
Not suitable for simple, low-power devices unless optimized (e.g., quantization)
Perceptrons
A Perceptron is the simplest type of artificial neural network model, introduced by Frank
Rosenblatt (1957).
It is a binary classifier that decides whether an input belongs to one of two classes.
The perceptron forms the basic building block of neural networks.
2. Working of a Perceptron
1. The perceptron computes the weighted sum of inputs.
2. Applies the step activation.
3. Output is either 0 or 1.
4. The perceptron learns by adjusting weights using the Perceptron Learning Rule.
4. Types of Perceptrons
(a) Single-Layer Perceptron
Only one neuron.
Handles linearly separable problems.
Cannot solve XOR problem.
(b) Multi-Layer Perceptron (MLP)
Has hidden layers.
Uses nonlinear activation functions.
Trained using backpropagation.
Solves non-linear problems (including XOR).
5. Limitations of Perceptron
Only works for linearly separable data.
Cannot model complex patterns.
Uses a discontinuous step function → not suitable for gradient-based training.
6. Applications
Binary classification
Pattern recognition
Character recognition
Simple decision-based tasks
MULTILAYER NETWORKS AND BACKPROPAGATION ALGORITHM
1. Multilayer Networks (Multilayer Perceptrons – MLP)
A Multilayer Network is an extension of the single-layer perceptron.
It has:
1. Input Layer
2. One or More Hidden Layers
3. Output Layer
Each layer contains neurons, and each neuron performs:
a=f(wTx+b)
where f is a nonlinear activation function (ReLU, Sigmoid, Tanh).
Why Multiple Layers?
Single-layer perceptron can solve only linearly separable problems.
Multilayer networks learn complex, non-linear patterns.
Can approximate any function → Universal Approximation Theorem.
Key Features of MLP
Fully connected layers
Non-linear activations
Uses Gradient Descent for training
Requires a learning algorithm → Backpropagation
2. Backpropagation Algorithm
Backpropagation is the standard learning algorithm used to train multilayer networks.
It adjusts the weights to reduce the error between predicted output and actual output.
Main Idea
1. Compute output using forward pass
2. Compare output with target → compute error
3. Propagate the error backward through the network
4. Update weights using Gradient Descent
2.1 Steps in Backpropagation
Step 1: Forward Pass
Pass input through layers
Compute outputs of each neuron
Get the final network output
y=f(Wx+b)
Step 2: Compute Error
E=1/2(t−y)2
where t = target value.
Step 3: Backward Pass (Error Propagation)
3. Advantages of Backpropagation
✔ Efficient for training deep networks
✔ Works with any differentiable activation function
✔ Supports multiple hidden layers
✔ Fast learning with gradient descent variants (Adam, RMSProp)
4. Disadvantages
✘ Suffers from vanishing/exploding gradients
✘ Requires differentiable activation functions
✘ Training can be slow
✘ Sensitive to learning rate and weight initialization
5. Applications of MLP + Backpropagation
Handwriting and digit recognition (MNIST)
Speech and image processing
Medical diagnosis
Financial forecasting
Pattern classification & regression tasks
ADVANCED TOPICS IN MULTILAYER NETWORKS & BACKPROPAGATION
Below are the most important advanced topics that extend and improve the basic MLP +
Backpropagation architecture.
1. Deep Neural Networks (DNNs)
When multilayer networks have many hidden layers, they are called Deep Neural Networks.
Deeper architectures learn high-level hierarchical features.
Key ideas:
Layer-wise feature extraction
Better representation power
Used in audio, image, text, and speech tasks
2. Activation Function Innovations
New activation functions solve problems like vanishing gradients.
Advanced Activations:
ReLU (Rectified Linear Unit)
Leaky ReLU
ELU, SELU
Softplus
Swish
GELU
These help deeper networks train faster and more accurately.
3. Regularization Techniques
Advanced regularization is used to reduce overfitting.
Methods:
Dropout – randomly deactivate neurons
Batch Normalization – normalizes layer outputs
L2/L1 Regularization
Early Stopping
Data Augmentation
Batch Normalization also helps reduce vanishing gradient.
4. Optimization Enhancements
Classic gradient descent is slow and unstable.
Advanced optimizers speed up training.
Modern Optimizers:
Adam
RMSProp
Adagrad
Nesterov Momentum
These help faster convergence and stable learning.
5. Weight Initialization Strategies
Proper weight initialization reduces vanishing/exploding gradient issues.
Techniques:
Xavier (Glorot) Initialization
He Initialization
Orthogonal Initialization
These help deep networks learn better.
6. Convolutional Neural Networks (CNNs)
A specialized architecture for image and video processing.
Features:
Convolution layers
Pooling layers
Feature maps
Sparse connectivity
CNNs learn spatial hierarchies (edges → textures → objects).
7. Recurrent Neural Networks (RNNs) and LSTMs
Designed for sequential data like text, speech, time-series.
Advanced RNNs:
LSTM (Long Short-Term Memory)
GRU (Gated Recurrent Unit)
These overcome the vanishing gradient problem in sequences.
8. Autoencoders
Autoencoders learn compressed representations (latent space).
Advanced types:
Denoising Autoencoder
Sparse Autoencoder
Variational Autoencoder (VAE)
Used for compression, anomaly detection, generative tasks.
9. Transfer Learning
Using a pre-trained deep network on a new dataset.
Advantages:
Requires less data
High accuracy
Faster training
Popular models: VGG, ResNet, Inception, BERT.
10. Gradient Vanishing/Exploding Solutions
Advanced approaches to reduce gradient problems:
Residual Networks (ResNet)
Skip Connections
Highway Networks
These allow very deep networks to train successfully.
11. Hyperparameter Optimization
Systematic ways to find optimal model settings.
Methods:
Grid Search
Random Search
Bayesian Optimization
Genetic Algorithms for NN tuning
12. Advanced Loss Functions
New loss functions help specialized tasks.
Examples:
Cross-Entropy Loss
Focal Loss (for class imbalance)
Huber Loss
Contrastive Loss
Triplet Loss
13. Backpropagation Variants
Backpropagation Through Time (BPTT) – for RNNs
Truncated BPTT – reduces computation
Parallel and Distributed Backpropagation (GPU, TPU training)
14. Gradient Checking
A numerical method to verify correctness of backpropagation implementation.
15. Ensemble of Neural Networks
Combining predictions of multiple networks:
Bagging
Boosting
Stacking
Random Forest-like NN ensembles
Increases accuracy and robustness.
GENETIC ALGORITHMS
Genetic Algorithms (GAs) are search and optimization algorithms inspired by the principles of
natural selection and genetics, first proposed by John Holland (1975).
They are widely used in machine learning for optimization problems, feature selection,
parameter tuning, and generating optimal solutions when traditional methods fail.
1. Basic Idea of Genetic Algorithms
GAs imitate biological evolution:
Best individuals survive
Weak solutions are eliminated
New solutions evolve over generations
Population gradually improves
Each solution is encoded as a chromosome (string of bits, numbers, or symbols).
2. Components of a Genetic Algorithm
(a) Population
A collection of candidate solutions.
Each individual is called a chromosome.
Example:
10101011 or a vector like [0.3, 0.2, 0.9].
(b) Fitness Function
Measures quality of each solution.
Higher fitness → better solution.
Used to select parents for the next generation.
(c) Selection
Chooses the fittest individuals for reproduction.
Common methods:
Roulette Wheel Selection
Tournament Selection
Rank Selection
Better solutions have higher probability to reproduce.
(d) Crossover (Recombination)
Two parents exchange parts of their chromosome to form new offspring.
Types:
Single-point crossover
Two-point crossover
Uniform crossover
Example:
Parent 1: 1010 | 1100
Parent 2: 0011 | 0101
Child: 10100101
(e) Mutation
Random change in genes to introduce diversity.
Example:
100101 → 100111
Mutation avoids premature convergence and allows exploration of new search space.
(f) Replacement
The new offspring replace some or all individuals in the population.
3. Genetic Algorithm Process (Step-by-Step)
1. Initialize Population
Randomly generate N solutions.
2. Evaluate Fitness
Compute fitness of each individual.
3. Selection
Choose best-performing parents.
4. Crossover
Generate new offspring by mixing genes.
5. Mutation
Randomly alter genes to preserve diversity.
6. Replacement
Form new population from parents and offspring.
7. Termination
Stop when:
o Maximum number of generations reached
o Fitness stops improving
o Optimal solution found
4. Applications of Genetic Algorithms in Machine Learning
(a) Feature Selection
GAs identify important features to improve:
Accuracy
Speed
Generalization
(b) Hyperparameter Optimization
Used to tune:
Learning rate
Number of layers
Batch size
Activation functions
Works better than grid search for complex models.
(c) Neural Network Training
GAs can:
Initialize NN weights
Evolve network structure
Optimize architecture
Replace or supplement backpropagation
Used in Neuro-Genetic Systems.
(d) Clustering
GA-based clustering avoids issues in k-means or hierarchical clustering.
(e) Reinforcement Learning
GAs evolve the best policies or action strategies.
(f) Function Optimization
Widely used to find global optima for complicated functions.
5. Advantages of Genetic Algorithms
✔ Can find global optimum
✔ Works on non-linear, multi-modal, complex search spaces
✔ Does not require gradient or differentiability
✔ Easy to implement
✔ Highly parallelizable
✔ Robust to noise
6. Disadvantages of Genetic Algorithms
✘ Computationally expensive
✘ Slow convergence for large search spaces
✘ No guarantee of optimal solution
✘ Requires careful tuning of parameters (mutation rate, population size)
✘ Performance depends on representation quality
7. Real-World Applications
Scheduling problems
Route optimization (Traveling Salesman Problem)
Game playing strategies
Robotics path planning
Cryptography
Medical diagnosis
Financial modeling and portfolio optimization
9. Flowchart of Genetic Algorithm (for diagram-based questions)
1. Start
2. Initialize population
3. Evaluate fitness
4. Selection
5. Crossover
6. Mutation
7. Replacement
8. If stopping condition met → Stop
9. Else → repeat
HYPOTHESIS SPACE SEARCH IN MACHINE LEARNING – DETAILED NOTES
In machine learning, learning = searching for the best hypothesis that fits the given data.
A hypothesis is a possible model, pattern, rule, or function that maps inputs to outputs.
Example:
In classification, a hypothesis might be:
IF (Age > 40 AND Income > 50K) THEN Class = “Yes”
1. What is Hypothesis Space?
The Hypothesis Space (H) is the set of all possible hypotheses a learning algorithm can choose
from.
Examples:
For linear regression: all linear equations
For decision trees: all possible trees
For a perceptron: all possible separating hyperplanes
The hypothesis space depends on:
✔ model type
✔ parameters
✔ features
✔ constraints
2. The Learning Problem as a Search Problem
Learning can be viewed as:
[
Searching the hypothesis space H to find the best hypothesis } h\*.
]
Goal:
Choose the hypothesis that best fits training data AND generalizes well.
3. How is Hypothesis Space Searched?
Different algorithms use different search strategies.
A. Enumerative (Brute Force) Search
Try every hypothesis in H
Choose the one that fits the data
Works only for small spaces
Computationally expensive
Examples:
Exhaustively checking all decision stumps
B. Greedy Search
Expand the best partial solution at each step
Used when H is huge
Examples:
Decision Tree Learning (ID3, C4.5) chooses best attribute at each node
Hill-climbing search
Pros: fast
Cons: may get stuck in local optimum
C. Gradient-Based Search
Used when hypothesis is expressed using continuous parameters.
Examples:
Linear Regression: minimize squared error
Neural Networks: backpropagation uses gradient descent
Algorithm:
1. Compute gradient
2. Move in opposite direction
3. Stop when convergence
D. Evolutionary Search (Genetic Algorithms)
Treat hypotheses as chromosomes
Use mutation, crossover, selection
Good for large, non-differentiable spaces
E. Randomized or Stochastic Search
Random sampling of hypotheses
E.g., Random Search, Monte Carlo methods
Used for:
Hyperparameter tuning
Feature selection
F. Bayesian Search
Searches hypothesis space by computing:
P(h | D) = P(D | h) . P(h)/P(D)
Choose hypothesis with:
Maximum Posterior Probability (MAP)
Maximum Likelihood (ML)
Used in:
Naive Bayes
Bayesian Networks
4. Version Space Model (Concept Learning)
Introduced in MITCHELL's Candidate-Elimination Algorithm.
Hypothesis space represented by two boundaries:
o S-set (Specific boundary): most specific hypothesis
o G-set (General boundary): most general hypothesis
Learning = narrowing down the version space until only valid hypotheses remain.
This is a structured hypothesis space search.
5. Bias in Hypothesis Space Search
To make search efficient, ML algorithms use:
Inductive Bias (preferences for some hypotheses)
Language Bias (restriction on representation)
Search Bias (restriction on how the space is searched)
Example:
Decision tree prefers shorter trees (inductive bias).
6. Hypothesis Evaluation
Once a hypothesis is found, its quality is checked using:
Training error
Validation error
Generalization performance
The best hypothesis is the one with minimum loss.
7. Hypothesis Space Types
Type Description Example
Finite Countable hypotheses Boolean functions of n variables
Infinite Continuous parameterized Linear regression weights
Discrete Symbolic Decision trees
Continuous Real-valued Neural networks
8. Challenges in Hypothesis Space Search
Extremely large or infinite hypothesis spaces
Over fitting and under fitting
Local minima
Computational limitations
Non-convex search spaces
GENETIC PROGRAMMING (GP) IN MACHINE LEARNING – DETAILED NOTES
Genetic Programming (GP) is an evolutionary algorithm that automatically evolves computer
programs, expressions, or models to solve a problem.
It is an extension of Genetic Algorithms (GA), introduced by John Koza (1992).
While GA evolves fixed-length chromosomes,
GP evolves complete programs represented as trees.
1. What is Genetic Programming (GP)?
Genetic Programming is a technique where:
A population of computer programs is created randomly.
Programs evolve using selection, crossover, and mutation.
Over generations, the programs become better at solving the task.
Programs are usually represented as syntax trees:
Example Tree for f(x) = x + (3 * y):
+
/\
x *
/\
3 y
GP is widely used for:
Symbolic regression
Automatic feature generation
Rule learning
Robot control
Optimization
2. How GP Differs from GA
Genetic Algorithms (GA) Genetic Programming (GP)
Evolves strings (bitstrings, arrays) Evolves programs (tree structures)
Fixed length chromosomes Variable-length programs
Optimizes parameter values Discovers entire algorithms / models
Simple encoding Complex hierarchical encoding
Output: solution values Output: an executable model
3. Components of Genetic Programming
1. Representation
Programs are usually represented as:
Syntax trees (most common)
Lisp expressions
Graphs
2. Functions and Terminals
GP requires defining:
Function set (F): operations
Examples: +, -, *, /, sin, cos, if-else
Terminal set (T): inputs
Examples: x, y, constants, variables
A program is built from combinations of F and T.
3. Fitness Function
Measures how well a program solves the problem.
Examples:
Mean squared error (symbolic regression)
Classification accuracy
Reward score (robot control)
4. Genetic Operators
GP uses modified GA operators suitable for tree structures.
A. Selection
Choose best programs using:
Tournament selection
Roulette wheel
Rank-based selection
B. Crossover (Subtree crossover)
Two parent trees exchange randomly selected subtrees.
C. Mutation
Randomly replace a subtree with a new randomly generated subtree.
D. Reproduction
Copy the fittest programs to the next generation.
4. Working of Genetic Programming (Step-by-Step)
1. Initialize population of random programs
2. Evaluate fitness for each program
3. Select best programs based on fitness
4. Apply crossover & mutation to create offspring
5. Replace population with offspring + elite programs
6. Repeat for many generations
7. Best evolving program is the solution
5. Applications of Genetic Programming in ML
1. Symbolic Regression
GP discovers mathematical equations that describe a dataset:
Physics formula discovery
Predictive modeling
2. Feature Construction & Feature Selection
GP automatically:
Creates new features
Removes irrelevant features
3. Decision Rule Learning
Automatically evolves IF–THEN rules:
Medical diagnosis
Fraud detection
Pattern recognition
4. Neural Network Optimization
GP can evolve:
Network architectures
Activation functions
Learning strategies
(Used in Neuro-Evolution)
5. Reinforcement Learning
GP evolves:
Control policies
Robot movement rules
6. Automatic Program Generation
Used for:
Algorithm discovery
Code generation
Bug fixes
6. Advantages of Genetic Programming
✔ Can automatically create interpretable models
✔ Finds global optima
✔ Handles variable-length and complex structures
✔ Works on non-linear, symbolic tasks
✔ No need for gradient or differentiability
✔ Highly flexible and domain-independent
7. Disadvantages of Genetic Programming
✘ Computationally expensive
✘ Slow convergence
✘ Final model may be too large (bloat problem)
✘ Requires careful tuning of function/terminal sets
✘ Hard to control complexity
8. Example of GP Output
GP might evolve a mathematical expression like:
f(x) = sin(x) + 0.5x2
Or a rule:
IF temperature > 30 AND humidity < 60 THEN play = "no"
ELSE play = "yes"
These are automatically discovered through evolution.
9. Genetic Programming Workflow (Diagram – easy to draw in exam)
Start
↓
Generate initial random programs
↓
Evaluate fitness
↓
Selection of best programs
↓
Crossover + Mutation → create new programs
↓
Replacement → new generation
↓
Stopping condition?
→ No → Repeat cycle
→ Yes → Output best program
Models of Evaluation in Machine Learning
Model evaluation assesses how well machine learning models generalize to unseen data using
metrics and techniques to detect overfitting, underfitting, or bias, ensuring reliable predictions
before deployment.
Key Evaluation Metrics
Accuracy: Ratio of correct predictions to total predictions, suitable for balanced datasets.
Precision/Recall/F1-Score: Precision measures true positives among predicted positives;
recall captures true positives among actual positives; F1 balances both for imbalanced
data.
AUC-ROC: Area under receiver operating characteristic curve, evaluating classification
across thresholds.
Confusion Matrix: Table showing true positives (TP), true negatives (TN), false
positives (FP), and false negatives (FN).
For regression: Mean Absolute Error (MAE), Mean Squared Error (MSE), R² score quantify
prediction errors.
Learning and Evaluation Techniques
Holdout: Split data into train/test sets for quick evaluation.
K-Fold Cross-Validation: Divide data into K folds, train on K-1, validate on 1; average
results reduce variance.
Leave-One-Out: Extreme cross-validation using all but one sample for training.
Comparison Table
Technique/Metric Use Case Strengths
Holdout Simple, fast Low computation
K-Fold CV Robust generalization Reduces bias
Accuracy Balanced classification Intuitive
F1-Score Imbalanced data Handles class skew
AUC-ROC Threshold-independent Model comparison
These models guide model selection, hyperparameter tuning, and deployment decisions.