Machine Learning
Neural Networks
Slides mostly adapted from Tom
Mithcell, Han and Kamber
Introduction to Artificial
Neural Networks
Neural networks to the rescue
Neural network: information processing
paradigm inspired by biological nervous systems,
such as our brain
Structure: large number of highly interconnected
processing elements (neurons) working together
Like people, they learn from experience (by
example)
Neural networks to the rescue
Neural networks are configured for a specific
application, such as pattern recognition or data
classification, through a learning process
In a biological system, learning involves
adjustments to the synaptic connections
between neurons
Same for artificial neural networks (ANNs)
Inspiration from Neurobiology
A neuron: many-inputs / one-
output unit
output can be excited or not
excited
incoming signals from other
neurons determine if the
neuron shall excite ("fire")
Output subject to attenuation
in the synapses, which are
junction parts of the neuron
Synapse concept
The synapse resistance to the incoming signal can be changed
during a "learning" process [1949]
Hebb’s Rule:
If an input of a neuron is repeatedly and persistently
causing the neuron to fire, a metabolic change
happens in the synapse of that particular input to
reduce its resistance
Mathematical representation
The neuron calculates a weighted sum of inputs and compares it
to a threshold. If the sum is higher than the threshold, the
output is set to 1, otherwise to -1.
Non-linearity
A simple perceptron
It’s a single-unit network
Change the weight by an
amount proportional to the
difference between the desired
output and the actual output.
Δ Wi = η * (D-Y).Ii Input
Actual output
Learning rate
Desired output
Perceptron Learning Rule
Example: A simple single unit
adaptive network
The network has 2 inputs,
and one output. All are
binary. The output is
1 if W0I0 + W1I1 + Wb > 0
0 if W0I0 + W1I1 + Wb ≤ 0
We want it to learn
simple OR: output a 1 if
either I0 or I1 is 1.
Learning
From experience: examples / training data
Strength of connection between the neurons is
stored as a weight-value for the specific
connection
Learning the solution to a problem = changing
the connection weights
Artificial Neural Networks
Adaptive interaction between individual neurons
Power: collective behavior of interconnected neurons
The hidden layer learns to
recode (or to provide a
representation of) the
inputs: associative mapping
Evolving networks
Continuous process of:
Evaluate output
Adapt weights
Take new inputs
ANN
“Learning”
evolving causes stable state of the weights,
but neurons continue working: network has
‘learned’ dealing with the problem
Where are NN used?
Recognizing and matching complicated, vague, or
incomplete patterns
Data is unreliable
Problems with noisy data
Prediction
Classification
Data association
Data conceptualization
Filtering
Planning
Applications
Prediction: learning from past experience
pick the best stocks in the market
predict weather
identify people with cancer risk
Classification
Image processing
Predict bankruptcy for credit card companies
Risk assessment
Applications
Recognition
Pattern recognition: SNOOPE (bomb detector in U.S.
airports)
Character recognition
Handwriting: processing checks
Data association
Not only identify the characters that were scanned but
identify when the scanner is not working properly
Applications
Data Conceptualization
infer grouping relationships
e.g. extract from a database the names of those most likely to
buy a particular product.
Data Filtering
e.g. take the noise out of a telephone signal, signal smoothing
Planning
Unknown environments
Sensor data is noisy
Fairly new approach to planning
Artificial Neural Networks
Computational models inspired by the human
brain:
Algorithms that try to mimic the brain.
Massively parallel, distributed system, made up of
simple processing units (neurons)
Synaptic connection strengths among neurons are
used to store the acquired knowledge.
Knowledge is acquired by the network from its
environment through a learning process
History
late-1800's- Neural Networks appear as an
analogy to biological systems
1960's and 70's – Simple neural networks appear
Fallout of favor because the perceptron is not
effective by itself, and there were no good algorithms
for multilayer nets
1986 – Backpropagation algorithm appears
NeuralNetworks have a resurgence in popularity
More computationally expensive
Applications of ANNs
ANNs have been widely used in various domains
for:
Pattern recognition
Function approximation
Associative memory
Properties
Inputs are flexible
any real values
Highly correlated or independent
Target function may be discrete-valued, real-valued, or
vectors of discrete or real values
Outputs are real numbers between 0 and 1
Resistant to errors in the training data
Long training time
Fast evaluation
The function produced can be difficult for humans to
interpret
When to consider neural networks
Input is high-dimensional discrete or raw-valued
Output is discrete or real-valued
Output is a vector of values
Possibly noisy data
Form of target function is unknown
Human readability of the result is not important
Examples:
Speech phoneme recognition
Image classification
Financial prediction
A Neuron (= a perceptron)
- t
x0 w0
x1 w1
f
output y
xn wn
For Example
n
Input weight weighted Activation y sign( wi xi t )
vector x vector w sum function i 0
The n-dimensional input vector x is mapped into variable y by
means of the scalar product and a nonlinear function mapping
Data Mining: Concepts and
January 18, 2025 Techniques 22
Perceptron
Basic unit in a neural network
Linear separator
Parts
N inputs, x1 ... xn
Weights for each input, w1 ... wn
A bias input x0 (constant) and associated weight w0
Weighted sum of inputs, y = w0x0 + w1x1 + ... + wnxn
A threshold function or activation function,
i.e 1 if y > t, -1 if y <= t
Artificial Neural Networks (ANN)
Model is an assembly of Input
nodes
inter-connected nodes Black box
Output
and weighted links X1 w1 node
w2
X2 Y
Output node sums up w3
each of its input value X3 t
according to the weights
of its links Perceptron Model
Y I ( wi xi t ) or
Compare output node i
against some threshold t Y sign( wi xi t )
i
Types of connectivity
output units
Feedforward networks
These compute a series of
transformations hidden units
Typically, the first layer is the
input and the last layer is the
input units
output.
Recurrent networks
These have directed cycles in their
connection graph. They can have
complicated dynamics.
More biologically realistic.
Different Network Topologies
Single layer feed-forward networks
Input layer projecting into the output layer
Single layer
network
Input Output
layer layer
Different Network Topologies
Multi-layer feed-forward networks
One or more hidden layers. Input projects only
from previous layers onto a layer.
2-layer or
1-hidden layer
fully connected
network
Input Hidden Output
layer layer layer
Different Network Topologies
Multi-layer feed-forward networks
Input Hidden Output
layer layers layer
Different Network Topologies
Recurrent networks
A network with feedback, where some of its
inputs are connected to some of its outputs (discrete
time).
Recurrent
network
Input Output
layer layer
Algorithm for learning ANN
Initialize the weights (w0, w1, …, wk)
Adjustthe weights in such a way that the output
of ANN is consistent with class labels of training
examples
E Yi f ( wi , X i )
2
Error function:
i
Find the weights wi’s that minimize the above error
function
e.g., gradient descent, backpropagation algorithm
Optimizing concave/convex function
Maximum of a concave function = minimum of a
convex function
Gradient ascent (concave) / Gradient descent (convex)
Gradient ascent rule
Multi-layer Networks
Linear units inappropriate
No more expressive than a single layer
„ Introduce non-linearity
Threshold not differentiable
„ Use sigmoid function
Backpropagation
Iteratively process a set of training tuples & compare the network's
prediction with the actual known target value
For each training tuple, the weights are modified to minimize the mean
squared error between the network's prediction and the actual target
value
Modifications are made in the “backwards” direction: from the output
layer, through each hidden layer down to the first hidden layer, hence
“backpropagation”
Steps
Initialize weights (to small random #s) and biases in the network
Propagate the inputs forward (by applying activation function)
Backpropagate the error (by updating weights and biases)
Terminating condition (when error is very small, etc.)
Data Mining: Concepts and
January 18, 2025 Techniques 37
Neural Network as a Classifier
Weakness
Long training time
Require a number of parameters typically best determined empirically,
e.g., the network topology or “structure.”
Poor interpretability: Difficult to interpret the symbolic meaning behind
the learned weights and of “hidden units” in the network
Strength
High tolerance to noisy data
Ability to classify untrained patterns
Well-suited for continuous-valued inputs and outputs
Successful on a wide array of real-world data
Algorithms are inherently parallel
Techniques have recently been developed for the extraction of rules
from trained neural networks
Data Mining: Concepts and
January 18, 2025 Techniques 38
Artificial Neural Networks (ANN)
Input
nodes Black box
X1 X2 X3 Y
1 0 0 0 Output
1 0 1 1 X1 0.3 node
1 1 0 1
1 1 1 1
X2 0.3
0 0 1 0
Y
0 1 0 0
0 1 1 1 X3 0.3 t=0.4
0 0 0 0
Y I ( 0 .3 X 1 0 .3 X 2 0 .3 X 3 0 .4 0 )
1 if z is true
where I ( z )
0 otherwise
General Structure of ANN
x1 x2 x3 x4 x5
Input
Layer Input Neuron i Output
I1 wi1
wi2 Activation
I2
wi3
Si function Oi Oi
Hidden g(Si )
Layer I3
threshold, t
Output Training ANN means learning
Layer the weights of the neurons
y