0% found this document useful (0 votes)
18 views14 pages

Intro to Artificial Neural Networks

Uploaded by

junjunliu307
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views14 pages

Intro to Artificial Neural Networks

Uploaded by

junjunliu307
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Artificial Neural Network Tutorial

Instructor: Young H. Cho, Ph.D.

This tutorial is designed as an introduction to the most commonly used form of artificial neural
networks for a person with minimal knowledge of the algorithm in the shortest amount of time.
However, one would need to review and gain some understanding of the following prerequisite
skill sets. Since students who would go through this tutorial have different familiarity with these
topics.

 Python
o Official Python Documentation – [Link]
o Real Python Tutorials – [Link]
o W3Schools Python Tutorial – [Link]
 NumPy
o NumPy Official Documentation – [Link]
o NumPy Tutorial (W3Schools) – [Link]
o SciPy/NumPy Guide – [Link]
 Artificial Neural Network Basics
o Neural Networks and Deep Learning (Michael Nielsen, free book) –
[Link]
o Deep Learning Specialization (Coursera, Andrew Ng) –
[Link]
o CS231n: Convolutional Neural Networks for Visual Recognition –
[Link]

PART – I: Basics
1. Crash Introduction of ANN-ReLU network

In this example, we will build a basic Multi-Layer Perceptron. This is one of the most common
types of artificial neural networks (ANN) used today. The perceptron was the earliest model of a
neuron (1950s, Rosenblatt). It takes inputs, multiplies each by a weight, sums them up with a bias,
and applies an activation function (e.g., step, sigmoid, ReLU).

An MLP is just many perceptrons stacked in layers:


 Input layer → receives raw data.
 Hidden layers → apply weighted sums and nonlinear activations.
 Output layer → produces predictions (e.g., regression values, classification
probabilities).
An example structure may look like the following:

A single perceptron can only separate things with a straight line (linear decision boundary). By
adding hidden layers with nonlinear activations, MLPs can represent complex, nonlinear
relationships. The more layers/neurons, the more powerful (but also harder to train).

When you hear MLP, it usually means a feedforward neural network (no recurrence, no
convolution). Training is done via backpropagation + gradient descent. The network is called fully
connected (every neuron in one layer connects to every neuron in the next).

2. First MLP Code

If we wanted to classify several components in 2D space (x,y) into 2 sets, an MLP might look like:
 Input layer: 2 nodes
 Hidden layer 1: 4 nodes with ReLU
 Hidden layer 2: 4 nodes with ReLU
 Output layer: 2 nodes (probability of Class A vs Class B)

For gradient descent, we will implement the Stochastic Gradient Descent (SGD) algorithm. SGD
is an optimization heuristic algorithm that initially selects random weights, then updates weights
by searching the values around the previous weights that decrease the overall loss (network
training using gradient).

Since the gradient descent algorithm needs a loss function equation, we will use Mean Squared
Error (MSE) to compute the “loss” between the desired output (the output with the local minimum
slope) and the predicted output computed with the current weight that can be updated with the
algorithm.

A loss function for regression tasks:


 Measures the average squared difference between predicted outputs and true outputs.
 Lower MSE → predictions closer to true values.

In a supervised learning task where the goal is to assign inputs to discrete classes, you have
different algorithms that you can use. For example, predicting if an image is a cat, dog, or rabbit
may use softmax + cross-entropy as the loss. Softmax converts raw outputs into probabilities over
classes while cross-entropy measures how well predicted probabilities match true class labels.

To better understand this type of network, implement the following example code for a minimal
2-layer MLP with SGD and backprop on your machine with Python and numpy library.
import math, random
import numpy as np

def relu(x):
return [Link](0.0, x)

def d_relu(x):
return (x > 0).astype([Link])

def softmax(logits):
logits = logits - [Link](axis=1, keepdims=True) # stability
e = [Link](logits)
return e / [Link](axis=1, keepdims=True)

def one_hot(y, num_classes):


Y = [Link](([Link], num_classes))
Y[[Link]([Link]), y] = 1.0
return Y

class MLP2:
"""
2-layer MLP:
hidden = ReLU(X @ W1 + b1)
out = hidden @ W2 + b2
mode: 'regression' (MSE) or 'classification' (softmax CE)
"""
def __init__(self, in_dim, hidden_dim, out_dim, mode='regression',
seed=0):
rng = [Link].default_rng(seed)
k1 = [Link](2.0/in_dim) # He init for ReLU
k2 = [Link](2.0/hidden_dim)
self.W1 = [Link](0, k1, (in_dim, hidden_dim))
self.b1 = [Link]((1, hidden_dim))
self.W2 = [Link](0, k2, (hidden_dim, out_dim))
self.b2 = [Link]((1, out_dim))
assert mode in ('regression','classification')
[Link] = mode

def forward(self, X):


self.X = X
self.z1 = X @ self.W1 + self.b1
self.h1 = relu(self.z1)
self.z2 = self.h1 @ self.W2 + self.b2
if [Link] == 'classification':
[Link] = softmax(self.z2)
return [Link]
return self.z2 # regression raw output

def loss(self, y):


if [Link] == 'regression':
# y: shape (N, out_dim)
diff = self.z2 - y
return 0.5 * [Link]([Link](diff*diff, axis=1))
else:
# y: integer labels shape (N,)
N = [Link][0]
# cross-entropy
logp = -[Link]([Link][[Link](N), y] + 1e-12)
return [Link](logp)

def backward(self, y):


N = [Link][0]
if [Link] == 'regression':
# dL/dz2 = (z2 - y)
dz2 = (self.z2 - y) / N
else:
# dL/dz2 = probs - onehot(y)
Y = one_hot(y, [Link][1])
dz2 = ([Link] - Y) / N

# Gradients for layer 2


dW2 = self.h1.T @ dz2
db2 = [Link](axis=0, keepdims=True)

# Backprop through ReLU


dh1 = dz2 @ self.W2.T
dz1 = dh1 * d_relu(self.z1)

# Gradients for layer 1


dW1 = self.X.T @ dz1
db1 = [Link](axis=0, keepdims=True)

return dW1, db1, dW2, db2

def step(self, grads, lr=1e-2, weight_decay=0.0):


dW1, db1, dW2, db2 = grads

# L2 regularization on weights (not biases)


if weight_decay > 0:
dW1 += weight_decay * self.W1
dW2 += weight_decay * self.W2

self.W1 -= lr * dW1
self.b1 -= lr * db1
self.W2 -= lr * dW2
self.b2 -= lr * db2

def batch_iter(X, y, batch_size, shuffle=True, seed=0):


rng = [Link].default_rng(seed)
idx = [Link]([Link][0])
if shuffle:
[Link](idx)
for i in range(0, len(idx), batch_size):
j = idx[i:i+batch_size]
yield X[j], (y[j] if isinstance(y, [Link]) else y[j])

Training Function:
def train(model, X, y, epochs=200, lr=1e-2, batch_size=64, weight_decay=0.0,
verbose=True):
losses = []
for e in range(1, epochs+1):
for Xb, yb in batch_iter(X, y, batch_size, shuffle=True, seed=e):
[Link](Xb)
grads = [Link](yb)
[Link](grads, lr=lr, weight_decay=weight_decay)
# track loss full-batch
[Link](X)
L = [Link](y)
[Link](L)
if verbose and (e % max(1, epochs//10) == 0):
print(f"epoch {e:4d}/{epochs} loss={L:.4f}")
return losses

The above code is an implementation of a two-layer fully connected neural network and training
function using only the basic numpy library functions. The input layer takes data (features). The
hidden layers do a linear transform, then pass it through ReLU activation. Then the output layer
does another linear transform. Depending on the task, the code performs regression using raw
outputs, while classification applies softmax and computes probabilities.

The basic training tasks are:

1. Forward pass: compute predictions.


2. Loss: measure prediction error.
3. Backward pass: compute gradients using the chain rule.
4. Step: update weights with SGD.
5. Repeat over epochs.

Step by Step Breakdown

def relu(x):
return [Link](0.0, x)

def d_relu(x):
return (x > 0).astype([Link])

def softmax(logits):
logits = logits - [Link](axis=1, keepdims=True) # stability trick
e = [Link](logits)
return e / [Link](axis=1, keepdims=True)

def one_hot(y, num_classes):


Y = [Link](([Link], num_classes))
Y[[Link]([Link]), y] = 1.0
return Y

 ReLU: Rectified Linear Unit


It introduces non-linearity so networks can approximate complex functions.
 d_ReLU: derivative: 1 if z>0z > 0z>0, else 0. Used in backprop.
 softmax: converts raw scores (“logits”) into probabilities that sum to 1.
 one_hot: turns class labels into vectors (needed for cross-entropy loss).

class MLP2:
def __init__(self, in_dim, hidden_dim, out_dim, mode='regression',
seed=0):
rng = [Link].default_rng(seed)
k1 = [Link](2.0/in_dim) # He init for ReLU
k2 = [Link](2.0/hidden_dim)
self.W1 = [Link](0, k1, (in_dim, hidden_dim))
self.b1 = [Link]((1, hidden_dim))
self.W2 = [Link](0, k2, (hidden_dim, out_dim))
self.b2 = [Link]((1, out_dim))
[Link] = mode

 Architecture: input → hidden → output.


 Initialization: “He initialization” for ReLU layers → samples weights with variance.
This keeps gradients stable.
 Mode: 'regression' or 'classification' determines how we interpret outputs.

def forward(self, X):


self.X = X
self.z1 = X @ self.W1 + self.b1
self.h1 = relu(self.z1)
self.z2 = self.h1 @ self.W2 + self.b2
if [Link] == 'classification':
[Link] = softmax(self.z2)
return [Link]
return self.z2

Step 1: Compute the first layer pre-activation

Step 2: Apply activation function (ReLU)

Step 3: Compute the second layer pre-activation (output layer)

Step 4: Compute softmax probabilities (for classification)


We also store intermediate values (z1, h1, etc.) because they’re needed for backprop.

def loss(self, y):


if [Link] == 'regression':
diff = self.z2 - y
return 0.5 * [Link]([Link](diff*diff, axis=1))
else:
N = [Link][0]
logp = -[Link]([Link][[Link](N), y] + 1e-12)
return [Link](logp)

Loss function incorporating MSE for regression and Cross-Entropy with Softmax for
classification.

def backward(self, y):


N = [Link][0]
if [Link] == 'regression':
dz2 = (self.z2 - y) / N
else:
Y = one_hot(y, [Link][1])
dz2 = ([Link] - Y) / N

dW2 = self.h1.T @ dz2


db2 = [Link](axis=0, keepdims=True)

dh1 = dz2 @ self.W2.T


dz1 = dh1 * d_relu(self.z1)

dW1 = self.X.T @ dz1


db1 = [Link](axis=0, keepdims=True)

return dW1, db1, dW2, db2

Backward propagation is a chain rule applied in reverse. Start from the loss derivative wrt outputs
then flow backward through output layer → hidden layer → input layer.

Step 1: Output layer error

Step 2: Gradients for second layer weights and biases

Step 3: Backpropagate through ReLU


where ReLU’(z1) is 0 for inactive neurons and 1 for active neurons.

Step 4: Gradients for first layer weights and biases

def step(self, grads, lr=1e-2, weight_decay=0.0):


dW1, db1, dW2, db2 = grads
if weight_decay > 0:
dW1 += weight_decay * self.W1
dW2 += weight_decay * self.W2

self.W1 -= lr * dW1
self.b1 -= lr * db1
self.W2 -= lr * dW2
self.b2 -= lr * db2

Weights are updated using SGD. The parameter weight_decay adds L2 regularization to help
prevent overfitting. Learning rate (step size) can be adjusted.

def batch_iter(X, y, batch_size, shuffle=True, seed=0):


rng = [Link].default_rng(seed)
idx = [Link]([Link][0])
if shuffle: [Link](idx)
for i in range(0, len(idx), batch_size):
j = idx[i:i+batch_size]
yield X[j], y[j]

This function splits the dataset into small random batches for higher efficiency (less memory),
allows escaping local minima by adding gradient noise, and allows weights to converge through
stochastic approximation.

def train(model, X, y, epochs=200, lr=1e-2, batch_size=64, weight_decay=0.0,


verbose=True):
losses = []
for e in range(1, epochs+1):
for Xb, yb in batch_iter(X, y, batch_size, shuffle=True, seed=e):
[Link](Xb)
grads = [Link](yb)
[Link](grads, lr=lr, weight_decay=weight_decay)
[Link](X)
L = [Link](y)
[Link](L)
if verbose and (e % max(1, epochs//10) == 0):
print(f"epoch {e:4d}/{epochs} loss={L:.4f}")
return losses
Training loops over epochs through forward and backward passes and weight updates for each
mini-batch. After this step, the overall loss is computed and then tracked.

3. Application Example 1 — 1D regression (piecewise-linear)

Goal: learn the function f(x) = max(0, 0.5x + 0.2) + 0.3*max(0, -x+0.5) with noise.
This shows how ReLU builds piecewise linear fits.

# Synthetic data
rng = [Link].default_rng(0)
X = [Link](-2, 2, (400, 1))
def target(x):
return [Link](0, 0.5*x + 0.2) + 0.3*[Link](0, -x + 0.5)
y = target(X) + 0.05*[Link](size=(400,1))

# Model: 1->32->1
m_reg = MLP2(in_dim=1, hidden_dim=32, out_dim=1, mode='regression', seed=1)
train(m_reg, X, y, epochs=500, lr=1e-2, batch_size=64, weight_decay=1e-4)

# Test few points


xt = [Link](-2,2,9).reshape(-1,1)
pred = m_reg.forward(xt)
print([Link]([xt, target(xt), pred]))

What’s happening:
 ReLU units carve the x-axis into regions where each unit is active/inactive.
 The network sums these regions to approximate any piecewise-linear function.
 Backprop computes gradients of MSE wrt each parameter and nudges weights by SGD.

4. Application Example 2 — Binary classification (non-linear)

We’ll classify a two-circle (bullseye) dataset—linearly inseparable but easy for ReLU.
# Concentric rings dataset
def rings(n=600, inner_r=0.5, gap=0.2, noise=0.06, seed=0):
rng = [Link].default_rng(seed)
n2 = n//2
theta1 = [Link](0, 2*[Link], n2)
r1 = inner_r + noise*[Link](size=n2)
x1 = np.c_[r1*[Link](theta1), r1*[Link](theta1)]
theta2 = [Link](0, 2*[Link], n-n2)
r2 = inner_r + gap + noise*[Link](size=n-n2)
x2 = np.c_[r2*[Link](theta2), r2*[Link](theta2)]
X = [Link]([x1, x2])
y = [Link]([0]*n2 + [1]*(n-n2))
return X, y

Xb, yb = rings(n=800, inner_r=0.6, gap=0.5, noise=0.07, seed=1)

m_bin = MLP2(in_dim=2, hidden_dim=64, out_dim=2, mode='classification', seed=2)


train(m_bin, Xb, yb, epochs=200, lr=5e-3, batch_size=64, weight_decay=1e-4)

probs = m_bin.forward(Xb)
preds = [Link](axis=1)
acc = (preds == yb).mean()
print("train accuracy:", acc)

What’s happening:
 Hidden ReLU layer learns radial/arc-shaped decision boundaries.
 Softmax turns logits into class probabilities; cross-entropy penalizes wrong classes.
 Gradients backprop through softmax→linear→ReLU→linear.

5. Application Example 3 — Multiclass classification (3 blobs)


def three_blobs(n=900, seed=0):
rng = [Link].default_rng(seed)
means = [Link]([[0,0], [2.5, 0.5], [-2.0, 1.5]])
cov = [Link]([[0.4,0.0],[0.0,0.4]])
Xs, ys = [], []
for k, m in enumerate(means):
Xk = rng.multivariate_normal(m, cov, size=n//3)
yk = [Link](n//3, k)
[Link](Xk); [Link](yk)
return [Link](Xs), [Link](ys)

Xm, ym = three_blobs(900, seed=4)


m_multi = MLP2(in_dim=2, hidden_dim=64, out_dim=3, mode='classification', seed=3)
train(m_multi, Xm, ym, epochs=200, lr=5e-3, batch_size=64)
print("train acc:", (m_multi.forward(Xm).argmax(1)==ym).mean())

What’s happening:
 The model learns multiple linear pieces (via ReLUs) to separate 3 clusters.

PART – II: Application

Use the Kaggle Dogs vs. Cats dataset ([Link]


It contains images stored in folders like this:

data/
├── train/
│ ├── dog/
│ └── cat/
└── test/

We will construct a small CNN step by step using only NumPy operations to gain a deeper
understanding of its underlying mechanics.

1. Imports

import numpy as np
import os
from PIL import Image
import [Link] as plt

2. Load and Preprocess Images


def load_data(data_dir):
X, y = [], []
for label, folder in enumerate(['cat', 'dog']):
path = [Link](data_dir, folder)
for file in [Link](path):
img = [Link]([Link](path, file)).convert('RGB').resize((64, 64))
[Link]([Link](img) / 255.0)
[Link](label)
return [Link](X), [Link](y)

X, y = load_data('data/train')
split = int(0.8 * len(X))
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

3. Helper Functions

def relu(x):
return [Link](0, x)

def relu_derivative(x):
return (x > 0).astype(float)

def mse_loss(y_true, y_pred):


return [Link]((y_true - y_pred) ** 2)

def mse_grad(y_true, y_pred):


return 2 * (y_pred - y_true) / y_true.size

4. Initialize Parameters

conv_filter = [Link](3, 3, 3) * 0.1


W_fc = [Link](32 * 32 * 3, 1) * 0.01
b_fc = [Link]((1,))

5. Forward Pass
def conv2d(img, kernel):
h, w, c = [Link]
kh, kw, kc = [Link]
out = [Link]((h - kh + 1, w - kw + 1))
for i in range([Link][0]):
for j in range([Link][1]):
region = img[i:i+kh, j:j+kw, :]
out[i, j] = [Link](region * kernel)
return out

def maxpool(img, size=2):


h, w = [Link]
new_h, new_w = h // size, w // size
pooled = [Link]((new_h, new_w))
for i in range(new_h):
for j in range(new_w):
pooled[i, j] = [Link](img[i*size:(i+1)*size, j*size:(j+1)*size])
return pooled

def forward(img):
global conv_filter, W_fc, b_fc
conv_out = conv2d(img, conv_filter)
relu_out = relu(conv_out)
pooled = maxpool(relu_out)
flat = [Link]()
fc_out = relu([Link](flat, W_fc) + b_fc)
return fc_out, flat, pooled, relu_out

6. Training with MSE Loss

We’ll train only the fully connected layer for simplicity.

lr = 0.001
epochs = 10

for epoch in range(epochs):


total_loss = 0
for i in range(len(X_train)):
img = X_train[i]
label = [Link]([y_train[i]])

y_pred, flat, pooled, relu_out = forward(img)


loss = mse_loss(label, y_pred)
total_loss += loss

grad_y = mse_grad(label, y_pred) * relu_derivative(y_pred)


grad_W = [Link](flat, grad_y)
grad_b = grad_y

W_fc -= lr * grad_W
b_fc -= lr * grad_b

print(f"Epoch {epoch+1}/{epochs} | Loss: {total_loss/len(X_train):.4f}")

7. Evaluation

correct = 0
for i in range(len(X_test)):
y_pred, _, _, _ = forward(X_test[i])
prediction = int(y_pred > 0.5)
if prediction == y_test[i]:
correct += 1

accuracy = correct / len(X_test)


print(f"Test Accuracy: {accuracy*100:.2f}%")

 Write an evaluation code that automatically trains with a set of images and then tests your
model with large blocks of unseen images
 Measure the time it takes to process the images
 Submit all of the relevant results in the report

PART – III: GPU Acceleration


Now that you understand how a CNN works internally, you’ll build and train the same model
using PyTorch. Please keep in mind that you must write your own code for the same system.

1. Imports

Import the following PyTorch modules:


- torch
- [Link]
- [Link]
- [Link]
- [Link] and transforms

2. Using CUDA-driven GPU Library

Check if a GPU is available and set the device accordingly:

```python
device = [Link]("cuda" if [Link].is_available() else "cpu")
print("Using device:", device)
```

Hint: Always move both your model and tensors to this device to utilize GPU.

3. Define Transformations and Load Dataset

Create a transform pipeline that:


- Resizes images to (64, 64)
- Converts them to tensors

Then use `[Link]()` to load your data and `DataLoader` to batch and shuffle the
samples.

Try: Print a batch of image shapes to confirm it’s loading correctly.

4. Define the CNN Model

Define a class `CatDogCNN([Link])` that includes:


- Two convolution layers with ReLU activation and max pooling
- One or two fully connected layers
- ReLU activation for all layers

Tip: Each convolution should double the channel depth (e.g., 3→16→32).

Use `[Link]()` for activation and `F.max_pool2d()` for pooling.

5. Move Model to GPU

After defining your model, move it to the device:


```python
model = CatDogCNN().to(device)
```

Hint: Failing to move tensors and model to the same device will cause runtime errors.

6. Choose Loss Function and Optimizer

Since you’re using ReLU for the output, you can use `[Link]()` as the loss function.
For optimization, use Adam (`[Link]`) with a learning rate of 0.001.

7. Implement Training Loop

Write a training loop that:


1. Iterates through all epochs
2. For each batch:
- Move `images` and `labels` to the same device as the model
- Clear gradients (`optimizer.zero_grad()`)
- Compute outputs
- Calculate loss
- Backpropagate and update weights

Hint: `[Link](device)` and `[Link](device)`

8. Evaluate the Model on GPU

After training, evaluate your model:


 Disable gradient computation using `torch.no_grad()`
 Move images and labels to the device
 Compute outputs and accuracy
 Write an evaluation code that automatically trains with a set of images and then tests your
model with large blocks of unseen images
 Measure the time it takes to process the images
 Compare the performance of the numpy version against GPU-accelerated version of your
model
 Submit all of the relevant results in the report

You might also like