0% found this document useful (0 votes)
8 views5 pages

Computer Vision and Deep Learning Overview

deep learningdeep learning

Uploaded by

kmedo8080966
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

Computer Vision and Deep Learning Overview

deep learningdeep learning

Uploaded by

kmedo8080966
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

lOMoARcPSD|15872722

I Introduction

Computer Vision: We’re trying to make sure that machines are learning to see in a similar way that humans
are doing. That is why we need Machine Learning methods, to get there. CV is the center for robotics so
that you understand the environment and what it does for you. There’s a lot of images/videos processing
done by CV as well.

I.1 The history of Computer Vision

Hubel and Wiesel Experiment


Hubel and Wiesel (neurobiologists) experimented on cats’ brains by putting electrodes in it and recording
them while the cat was being shown stimuli through a screen (mostly edges). They were able to find out
that visual cortex cells are sensitive to the orientation of edges, yet they were insensitive to the position of
the edges. Something that we will see later in convolutional networks.

I.2 The summer vision project 1966

They tried to construct a significant part of a visual system, and it was the time when pattern recognition
was coined.
CV is a very core element to other areas as well such as robotics, NLP, optics and image processing, algo
optimization, neuroscience, AI and ML.

I.3 Image classification

Previously DL was not used for Image classification yet it became popular later. Earlier they did prepro-
cessing (i.e. normalizing the colors of images), then came the feature descriptor which funcioned kind of
the same as the Hubel Wiesel Experiment in the sense that certain properties were not important such as
position of edges.
Different types of feature descriptor are HAAR, HOG, SIFT, SURF. In order to get to these feature descriptor
they had to hand engineer it, since most are gradient based. After that you have aggregators such as svm
rf, ann etc which would aggreagate the features and give the label.
Instead of feature extraction+accumulation we have a magic box that does that for us. That magic box is
deep learning. We do not have the hand engineer the feature descriptors. We are letting a data set decide
what the best possible descriptor might be that will give us the best results.
Image Classification Issues:

• Occlussion.
• Background cluttering: Background and foreground (object) similar colors
• Representation: Ex: cat drawing vs cat photo

1
Downloaded by Eng Esraa ([Link]@[Link])
lOMoARcPSD|15872722

I.4 History of Deep Learning

I.4 History of Deep Learning

Started in 1940 with the electronic brain. Each cell has a certain pattern in them. They accumulated
weights/impulses and eventually made a decision.
1960 we saw the perceptron. Instead of fixed weights, we could learn the weights. We showed the system a
couple of example and we hope to essentially learn certain parameters of these perceptrons. We learn the
feature extraction (weights) and the threshold of learning. This was all hardwired.
Then we had Adaline (the golden age of deep learning). There was a lot of hype and progess being made.
Then in 1969, people realized the problems with perceptrons, specifically the xor problem. The problem
was that a linear model (a single perceptron) cannot separate the two classes. This era was called the AI
winter.
In 1986, the multi-layer perceptron came to light. We have several layers that can be trained (optimized for
the weights of the multi-layer perceptron). This is called backpropagation. Gradient based method.
In 1995 there was the SVM. Since it was successful, it put a halt to deep learning.
In 2006, Hinton and Ruslan developed Deep Belief Networks. The idea of pretraining came around. So you
train an nn and then you train it again for a specific task. The idea of pretraining is still one of the most
relevant today (for example transfer learning with ImageNet weights). Despite of this, neural networks were
still not a mainstream method.
In 2012 : the AlexNet architecture (see Section X.2) was the first neural network based architecture that
won the ImageNet competition based on the lowest top 5-error.
Definition of top 5 error: Give me an image, ask the method what class it is and see if the top five predictions
include the correct class.

I.5 What made this [Deep Learning] possible?

• Big Data: When we have big data, models learn where to learn from and we have so much more data
today then we did back then. The datasets are also online.
• Better Hardware: Not only has the data changed, the hardware has changes as well (i.e. GPU).
Hardware was developed for the rendering of images in games, and it is now used as well for deep
learning, to train models faster.
• Models are more complex

I.6 Different Tasks in DL

• Object Detection
• Self-Driving Cars
• Gaming (i.e. AlphaGO, AlphaStar)
• Machine Translation
• Automated Text Generation (ChatBots)
• Healthcare, cancer Detection

2
Downloaded by Eng Esraa ([Link]@[Link])
lOMoARcPSD|15872722

II Machine Learning Basics

Unsupervised Learning
Supervised Learning An underlying assumption is that train and test data come from the same distribu-
tion.
Nearest neighbor Model: Supervised learning method, labels the sample based on the majority label of
its neighboring samples. The hyper-parameters to be tweaked in KNN are: k, L1 or L2 distances.
Cross Validation: Split the data in K folds
Decision Boundaries are boundaries where the data is separated into classes.
The pros and cons of using linear decision boundaries:

+ It’s very easy to implement and derive


+ It’s easy to find the hyperparameters
- The distribution must be clearly separated
- Harder to use for multi classes (?)

Linear Regression is a supervised learning method that finds a linear model that explains a target y given
inputs x with weights ¹:

d
ŷi = xij ¹j
j=1

and the prediction looks like:

d
ŷi = ¹0 + xij ¹j =⇒ ŷ = X¹
j=1

xij are the features; ¹ are the weights (model parameters). ¹0 is a bias.
Mean squared error:
n n
1 1 2
J(¹) = (ŷi − yi ) = (xi ¹ − yi )2
n i=1
n i=1

Matrix notation: min J(¹) = min(X¹ − y)T (X¹ − y)


¹ ¹
This loss function is convex, thus have a closed form solution.¹ = (X T X)−1 X T y = X y

II.1 Maximum Likelihood

Find the parameter values that maximize the likelihood of making the observations given by the parameters.

3
Downloaded by Eng Esraa ([Link]@[Link])
lOMoARcPSD|15872722

II.2 Logistic Regression

¹M L = arg max pmodel (Y |X, ¹)


¹
n
= arg max pmodel (yi |xi , ¹)
¹
i=1
n
= arg max log pmodel (yi |xi , ¹)
¹
i=1

MLE assumes that the training samples are independent and generated by the same distribu-
tion.
What shape does our probability distribution have? Assuming Gaussian distribution: yi = N (xi ¹, Ã 2 ) =
xi ¹ + N (0, Ã 2 )
1 − 12 (yi −xi ¹)2
p(yi |xi , ¹) = e 2σ
(2ÃÃ 2 )

then after more matrix calculations we get:


n 1
¹M L = − log(2ÃÃ 2 ) − 2 (y − X¹)T (y − X¹)
2 2Ã
¹M L = (X T X)−1 X T y

So the MLE is the same as the least squares estimate we found previously.

II.2 Logistic Regression

Sigmoid function:

1
Ã(x) =
1 + e−x

Probability of a binary output:


n
p(y|X, ¹) = yˆi yi (1 − yˆi )(1−yi )
i=1

ŷi = Ã(xi ¹)

Maximum Likelihood Estimate:

θM L = arg max log p(y | X, θ)


θ
N
= arg max log yˆi yi (1 − yˆi )1−yi
θ
i=1
N
= arg max yi log(yˆi ) + (1 − yi ) log(1 − yˆi )
θ
i=1

This is called binary cross-entropy loss or BCE.


In the more general case (number of classes > 2), the cross entropy loss can be written as :

4
Downloaded by Eng Esraa ([Link]@[Link])
lOMoARcPSD|15872722

II.2 Logistic Regression

N C
L(ŷi , yi ) = yi,j log yˆi,j
i=1 j=1

n
Cost function (mean of losses for all samples): C(¹) = − n1 i=1 L(ŷi , yi ), optimize via gradient descent (no
closed form)

5
Downloaded by Eng Esraa ([Link]@[Link])

Common questions

Powered by AI

Supervised learning enhances computer vision applications by providing labeled data to train models to predict or classify new inputs accurately, essential in tasks like image classification and object detection. Unsupervised learning, on the other hand, identifies patterns and structures in data without labels, proving useful in feature learning and anomaly detection. Together, they enable more comprehensive and adaptive computer vision solutions by combining labeled insights with previously unseen data structures .

Decision boundaries in supervised learning models are critical for classification tasks as they define the regions where a model assigns different output classes. In linear regression, a type of supervised learning, decision boundaries help to establish how different sets of input features separate the output space, especially when used for regression tasks. However, for classification, linear decision boundaries can be limiting when feature distributions are not linearly separable, necessitating more complex non-linear models to achieve better separation and accuracy .

Edge orientation is crucial for feature descriptors because it captures significant structural information within images that can be invariant to transformations such as translation, making it a robust feature for classification. Feature descriptors like SIFT and HOG exploit edge orientation to represent and compare images effectively. This concept, initially identified by Hubel and Wiesel, informs the foundational architecture of feature extraction that precedes deep learning and influences the feature representation in convolutional neural networks .

Big data is instrumental in the success of deep learning models as it provides the extensive datasets required for training highly accurate models by exposing them to diverse examples. This abundance of data enables models to learn intricate patterns and generalize better across diverse applications, significantly improving performance compared to when data was scarce, which limited learning capabilities and model generalization .

The development of the multi-layer perceptron was crucial as it introduced the concept of hidden layers, which enabled the modeling of more complex functions than was possible with single-layer perceptrons. This design overcame the XOR problem, a limitation of linear models that could not classify data that was not linearly separable. It facilitated the use of nonlinear activation functions and backpropagation to effectively train networks of increased complexity .

Adaline introduced the concept of adaptive linear neurons with adjustable weights, setting the stage for learning algorithms that could update parameters based on input data, which is foundational in modern neural networks. This influence extended to the development of non-linear multilayered perceptrons and the concept of backpropagation, critical for training complex networks by minimizing error through weight adjustments .

The development of GPUs has revolutionized the progression of machine and deep learning by providing the computational power necessary to handle large-scale computations efficiently. GPUs enable parallel processing, which speeds up training of complex models, such as large neural networks, and has facilitated the feasibility of using large datasets for training, contributing to the efficiency and scalability of modern deep learning applications .

Transfer learning is significant in modern deep learning architectures because it allows pre-trained models on large datasets, like ImageNet, to be adapted for specific tasks with less labeled data, reducing computational costs and training time. By leveraging learned features from one task, such models improve performance on related tasks without rebuilding models from scratch, which is especially valuable in domains with limited data .

Logistic regression utilizes maximum likelihood estimation (MLE) to find parameter values that maximize the likelihood of observing the given dataset. By adjusting the model's parameters to maximize this likelihood, logistic regression fits the model to best represent the probability distribution of the observed data, enhancing its predictive accuracy. The MLE approach matches the positive correlations between inputs and outputs within a logistic regression framework to improve classification performance .

The Hubel and Wiesel experiment demonstrated that visual cortex cells in cats are sensitive to the orientation of edges but not to their position. This finding is similar to how convolutional neural networks (CNNs) operate, as CNNs use convolutional layers to detect features in images irrespective of their position. The experiment thus laid foundational insights for feature detection in CNNs, influencing their structure in detecting edges as primary features.

You might also like