Deep Learning Exam Questions December 2024
Deep Learning Exam Questions December 2024
Training deep neural networks can be challenging due to issues such as overfitting, computational cost, and the vanishing gradient problem. The vanishing gradient problem occurs when the gradients used to update weights during backpropagation become extremely small as they are propagated backwards through the layers, especially with deep networks. This results in very slow convergence or failure to converge because the weights in the earlier layers of the network receive minimal updates. The problem is exacerbated in deeper networks where multiple layers can compound the issue. Addressing it often requires using architectures like LSTMs or techniques such as batch normalization to stabilize learning .
Sparse coding is a technique in machine learning where data is represented as a combination of a small number of active basis functions or components. This promotes a compact and efficient representation of data. A practical application of sparse coding is in image processing, particularly in denoising applications. For example, sparse coding has been used to successfully remove noise from images by learning a dictionary of image patches from clean images, then using only a small number of basis functions to reconstruct noisy images, effectively filtering out noise while preserving important features .
Pretrained models obtained through greedy layer-wise training are applied effectively across different tasks through the process of transfer learning. In this approach, a model trained on one task is reused as the starting point for a model on a second task. This is particularly useful when the second task has limited data. By transferring the hidden weights learned during the pretrained phase, models can benefit from the feature representations captured across different tasks. Fine-tuning the model on the new task data allows it to adjust and optimize the higher layers for the specific characteristics of the new task, improving performance with minimal resource investment .
Multi-layer neural networks enable more complex learning because each layer learns a different level of abstraction. In a multi-layer setup, lower layers might learn simple features like edges or textures, whereas higher layers might learn more complex structures like shapes or objects. This abstraction hierarchy allows the network to capture complex representations that a single-layer network, with a simple linear combination of inputs, cannot. Each additional layer allows for the composition of these learned features, enabling the network to approximate more complex functions .
Primary challenges in NLP include language ambiguity, context understanding, and the need for vast amounts of data for model training. Language ambiguity arises from homonyms and varying syntax, complicating tasks like translation and sentiment analysis. Context understanding is crucial for capturing meaning beyond word-to-word translations. Examples of NLP applications include chatbots in customer service, automated translation services, and sentiment analysis for market research. In healthcare, NLP assists in processing medical records and in finance, it’s used for sentiment analysis of market data to predict fluctuations. These applications have transformed industries by improving efficiency and providing deep insights .
In Hopfield Networks, energy minimization is the fundamental principle that governs how the network stabilizes to a pattern. Each configuration of the network has an associated energy level, and the network dynamics are designed to evolve towards states with lower energy. When a pattern is presented to the network, it iteratively updates its state based on the rule of minimizing energy until it reaches a stable state, or attractor, that corresponds to a learned pattern. This mechanism makes Hopfield Networks effective in pattern recognition and associative memory tasks, as they can retrieve stored patterns by associating an input with the nearest attractor state .
Autoencoders are neural networks designed to learn efficient codings of input data. Their architecture typically consists of an encoder, which maps the input to a lower-dimensional space, and a decoder, which reconstructs the input from this compressed representation. The primary purpose of autoencoders is dimensionality reduction and feature learning. By training the network to reconstruct the input from the compressed code, it learns salient features that capture the essence of the data. They are often used in tasks like denoising, where the model learns to remove noise from data while retaining essential information .
Theano and Torch are both powerful tools for deep learning, but they have distinct characteristics. Theano is a Python library that allows for efficient definition, optimization, and evaluation of mathematical expressions involving multi-dimensional arrays, facilitating dynamic graph building. Torch is known for its efficiency and flexibility in tensor computation, particularly due to its Lua-based interface. The benefits of Theano include symbolic differentiation and integration with the Python ecosystem, whereas Torch offers incredible computational efficiency and widespread use for GPU acceleration. Drawbacks include Theano's steep learning curve and complicated installation, while Torch's Lua interface can be less accessible to those unfamiliar with the language .
Regularization techniques are strategies used to prevent overfitting in machine learning models by introducing a penalty term to the loss function or constraining the model parameters. Common regularization techniques include L1 and L2 regularization, dropout, and data augmentation. L1 and L2 regularization add a penalty for large coefficients, which encourages the network to learn simpler models. Dropout randomly sets some weights to zero during training, preventing the network from becoming too reliant on any particular node, thereby promoting robustness. By minimizing overfitting, regularization helps models generalize better to unseen data, effectively minimizing risk .
LSTM cells mitigate the vanishing gradient problem by using gating mechanisms to selectively allow information to pass through. The three gates—input, forget, and output gates—determine what information is written to, removed from, and read from the cell's memory, respectively. This architecture allows LSTMs to maintain a constant error flow, enabling gradients to continue propagating through the network without vanishing. The cell state within an LSTM cell is shielded from the effects of small gradients, allowing it to preserve longer-term dependencies than traditional RNNs, which suffer heavily from vanishing gradients with large temporal gaps .