0% found this document useful (0 votes)
13 views14 pages

AI Applications in Vision & Language

Uploaded by

reema.dn644
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views14 pages

AI Applications in Vision & Language

Uploaded by

reema.dn644
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit VI

Case Study and Applications

Syllabus:

1. Computer Vision

Computer Vision (CV) is a field of Artificial Intelligence that enables machines to interpret and
understand visual information from the world. It focuses on analyzing images, detecting patterns,
and predicting outcomes using machine learning and deep learning techniques. The key
applications covered in this unit include Image Classification, ImageNet, Detection methods, and
Audio WaveNet.

1.1 Image Classification

 Image Classification is the process of assigning a label or category to an input image based
on the objects or patterns present in it.
 Deep learning models like Convolutional Neural Networks (CNNs) are widely used due to
their ability to learn hierarchical features directly from image data.
 The process includes collecting images, preprocessing, data augmentation, feature
extraction, training the CNN model, and evaluating the predictions.
 Popular architectures: LeNet, AlexNet, VGG, ResNet, Inception.
 Applications: medical image diagnosis, traffic sign recognition, plant disease detection,
satellite image analysis.

1.2 ImageNet

 ImageNet is a large-scale visual database designed for research on visual object


recognition.
 Contains over 14 million labeled images across 20,000 categories.
Page | 1
 The ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) played a major role
in the revolution of deep learning.
 Famous breakthroughs:
o AlexNet (2012) – drastically reduced error rates using deep CNNs.
o ResNet (2015) – introduced residual connections and achieved superhuman
accuracy.
 ImageNet models are commonly used for transfer learning, enabling high accuracy even
with small datasets.

1.3 Detection Techniques

 Detection focuses not only on identifying the object class but also finding its location in
the image using bounding boxes.
 Object detection involves classification + localization.
 Popular detection frameworks:
o R-CNN, Fast R-CNN, Faster R-CNN
o YOLO (You Only Look Once) – real-time detection.
o SSD (Single Shot Detector) – fast and efficient.
 Applications: autonomous vehicles, video surveillance, defect detection, person tracking,
and robotics.

1.4 Audio WaveNet

 WaveNet is a deep generative model for raw audio signals created by DeepMind.
 It uses dilated causal convolutions to model audio at the waveform level.
 Capable of generating realistic human speech and natural audio patterns.
 Applications:
o Speech synthesis (text-to-speech)
o Audio enhancement
o Music generation
o Voice assistants

Page | 2
 WaveNet serves as the foundation of modern speech technologies used in Google Assistant
and many TTS systems.

2. Natural Language Processing (NLP)

NLP enables machines to understand, generate, and interact using human language. It integrates
linguistics, machine learning, and deep learning techniques.

2.1 Sentiment Analysis

 The process of determining the emotional tone behind a text: positive, negative, or
neutral.
 Used widely in product reviews, social media monitoring, customer feedback systems.
 Techniques:
o Rule-based methods using lexicons.
o Machine learning models like Naive Bayes, SVM.
o Deep learning using LSTM, GRU, and BERT-based models.
 Steps include text cleaning, vectorization (TF-IDF/Word Embeddings), model training,
and sentiment prediction.

2.2 Text Preprocessing

 Preprocessing is essential to convert raw text into a machine-interpretable format.


 Key steps:
o Tokenization – splitting text into words or sentences.
o Stop-word removal – eliminating commonly used words (e.g., “the”, “is”).
o Stemming/Lemmatization – reducing words to root form.
o Lowercasing – standardizing text.
o Removing noise – punctuation, numbers, URLs, emojis (if required).
 These steps improve model accuracy and reduce computational complexity.

2.3 ChatBot

Page | 3
 A chatbot is an AI system designed to simulate conversation and provide automated
responses.
 Types:
o Rule-based chatbots – simple pattern matching, limited intelligence.
o Retrieval-based chatbots – respond based on a predefined set of answers.
o Generative chatbots – produce natural language answers using deep learning
(Seq2Seq, Transformers, GPT).
 Modern chatbots use:
o NLP for understanding user queries.
o Intent classification.
o Named Entity Recognition (NER).
o Dialogue management and response generation.
 Applications: customer service, healthcare, banking, education, and virtual assistance.

Unit VI focuses on practical applications of AI in two major domains: Computer Vision and
Natural Language Processing. Students learn real-world techniques such as image classification,
object detection, large-scale datasets like ImageNet, and audio generation through WaveNet. In
the NLP domain, the unit covers sentiment analysis, text preprocessing fundamentals, and chatbot
development—providing a comprehensive understanding of modern AI applications.

Detail Notes:

Computer Vision

Computer Vision (CV) is a field of Artificial Intelligence that enables machines to interpret and
understand visual information from the world. It focuses on analyzing images, detecting patterns,
and predicting outcomes using machine learning and deep learning techniques. The key
applications covered in this unit include Image Classification, ImageNet, Detection methods, and
Audio WaveNet.

1.1 Image Classification

Page | 4
 Image Classification is the process of assigning a label or category to an input image based
on the objects or patterns present in it.
 Deep learning models like Convolutional Neural Networks (CNNs) are widely used due to
their ability to learn hierarchical features directly from image data.
 The process includes collecting images, preprocessing, data augmentation, feature
extraction, training the CNN model, and evaluating the predictions.
 Popular architectures: LeNet, AlexNet, VGG, ResNet, Inception.
 Applications: medical image diagnosis, traffic sign recognition, plant disease detection,
satellite image analysis.
 Image Classification is one of the most fundamental and widely implemented tasks in the
field of Computer Vision. It refers to the process of analyzing an input image and assigning
a predefined label or category to it based on the patterns, features, or objects present within
the image. The main goal is to map an input image (x) to a class label (y). Traditional image
classification approaches relied heavily on feature engineering, where experts designed
handcrafted features such as SIFT, HOG, and SURF. However, with the development of
deep learning, Convolutional Neural Networks (CNNs) became the dominant technique
because they automatically learn hierarchical features from raw input images.

 A typical image classification pipeline begins with data collection. A large dataset with
labeled images must be assembled, ensuring that there are enough samples to capture the
variability of each class. Preprocessing is the next step and involves resizing images,
normalizing pixel values, augmenting data using transformations such as rotation, flipping,
cropping, and introducing brightness adjustments. These augmentations help improve
model generalization by exposing it to different conditions. After preprocessing, the dataset
is split into training, validation, and testing sets to ensure unbiased evaluation.

 The core component of image classification is feature extraction. CNN architectures extract
low-level features in early layers (edges, colors, textures) and gradually more complex
patterns in deeper layers (shapes, objects). This hierarchical learning capability has
revolutionized modern image classification. Prominent CNN architectures include LeNet,

Page | 5
AlexNet, VGGNet, GoogLeNet, ResNet, DenseNet, and MobileNet. Each architecture
offered improvements in depth, speed, accuracy, and computational efficiency.

 Training involves minimizing a loss function such as categorical cross-entropy using


optimization algorithms like SGD or Adam. Backpropagation is used to adjust weights.
Regularization methods such as dropout, batch normalization, and weight decay prevent
overfitting. After training, the model is validated using metrics like accuracy, precision,
recall, F1-score, and confusion matrix.

 Image classification has several practical applications, such as recognizing handwritten


digits, classifying animal species, plant disease detection, medical imaging like tumor
classification, facial recognition, textile defect detection, satellite image interpretation,
autonomous driving perception modules, and many more. Transfer learning has become an
essential technique in modern workflows—pretrained CNNs on ImageNet are fine-tuned
for domain-specific tasks, achieving high accuracy even with smaller datasets. This makes
image classification one of the most accessible yet powerful tasks in the computer vision
domain.

1.2 ImageNet

 ImageNet is a large-scale visual database designed for research on visual object


recognition.
 Contains over 14 million labeled images across 20,000 categories.
 The ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) played a major role
in the revolution of deep learning.
 Famous breakthroughs:
o AlexNet (2012) – drastically reduced error rates using deep CNNs.
o ResNet (2015) – introduced residual connections and achieved superhuman
accuracy.
 ImageNet models are commonly used for transfer learning, enabling high accuracy even
with small datasets.

Page | 6
ImageNet is one of the largest and most influential visual databases used in the field of computer
vision. It contains over 14 million labeled images across more than 21,000 synsets (categories).
Each category contains hundreds to thousands of manually labeled images, making it extremely
valuable for training deep learning models. The most impactful aspect of ImageNet is the ILSVRC
(ImageNet Large Scale Visual Recognition Challenge), which ran annually and served as a
benchmark for evaluating object recognition algorithms.

The significance of ImageNet lies in its scale, diversity, and high-quality labeling. The images
were collected from the internet, validated using human annotators via Amazon Mechanical Turk,
and grouped into WordNet hierarchy categories. Before ImageNet, computer vision datasets were
relatively small, typically containing a few thousand images. Training deep neural networks on
such small datasets was nearly impossible, which is why pre-2012 methods focused heavily on
manual feature extraction.

The major breakthrough came in 2012 when AlexNet, a deep CNN designed by Alex Krizhevsky,
won the ILSVRC competition and reduced the top-5 error rate from 26% to 15%. This single event
marked the beginning of the deep learning revolution. Following AlexNet, architectures like
VGGNet (2014), Inception (2014), ResNet (2015), and DenseNet (2016) further improved
performance, with ResNet reaching human-level accuracy. These advancements were driven by
new architectural innovations such as deeper networks, residual connections, batch normalization,
and more efficient convolutional modules.

ImageNet has been instrumental in enabling transfer learning. Pretrained models trained on
ImageNet are capable of extracting general features such as edges, textures, shapes, and object
structures. These pretrained weights are then fine-tuned on other datasets for specialized
applications, dramatically reducing training time and improving accuracy. Many fields such as
medical imaging, agriculture, robotics, and remote sensing now rely on ImageNet-based pretrained
models because they eliminate the need for large domain-specific datasets.

Furthermore, ImageNet continues to inspire new research areas such as self-supervised learning,
zero-shot learning, few-shot learning, and large multimodal models. Its legacy has shaped modern

Page | 7
artificial intelligence, making it a foundational resource for developing state-of-the-art computer
vision systems.

1.3 Detection Techniques

 Detection focuses not only on identifying the object class but also finding its location in
the image using bounding boxes.
 Object detection involves classification + localization.
 Popular detection frameworks:
o R-CNN, Fast R-CNN, Faster R-CNN
o YOLO (You Only Look Once) – real-time detection.
o SSD (Single Shot Detector) – fast and efficient.
 Applications: autonomous vehicles, video surveillance, defect detection, person tracking,
and robotics.
 Object Detection is an advanced computer vision task that involves identifying multiple
objects within an image and localizing them using bounding boxes. Unlike image
classification, which assigns only a single label to an entire image, object detection predicts
both the class label and the position of each object. This makes it one of the most important
and computationally challenging tasks in computer vision.
 Object detection models operate by combining classification and localization. Traditional
approaches like Sliding Window combined with HOG and SVM were used earlier but were
computationally expensive. The breakthrough came with R-CNN (Regions with CNN
Features), which introduced the idea of generating region proposals and classifying them
using CNNs. Later variants such as Fast R-CNN and Faster R-CNN improved speed by
introducing shared computations and the Region Proposal Network (RPN).
 Single-shot detectors such as YOLO (You Only Look Once) and SSD (Single Shot
MultiBox Detector) further revolutionized detection by enabling real-time performance.
YOLO divides the image into grids and predicts bounding boxes and class probabilities
simultaneously, allowing extremely fast inference. SSD uses anchor boxes at multiple
scales, making it efficient for detecting objects of different sizes.
 Modern object detection models include YOLOv5, YOLOv7, YOLOv8, EfficientDet,
RetinaNet (with Focal Loss), and DETR (Transformer-based detection). These achieve

Page | 8
high accuracy with reduced computational cost. Key improvements include multi-scale
feature extraction using Feature Pyramid Networks (FPN), transformer-based
architectures, attention mechanisms, and anchor-free detection techniques like CenterNet
and FCOS.
 Object detection has a huge number of applications: self-driving cars (pedestrian/vehicle
detection), medical imaging (tumor/organ detection), drone surveillance, retail analytics,
manufacturing quality control, agriculture (fruit counting, weed detection), traffic
monitoring, security systems, and video analytics. With advancements in hardware
acceleration and deep learning, object detection has become one of the foundational tasks
enabling real-world AI systems.

1.4 Audio WaveNet

 WaveNet is a deep generative model for raw audio signals created by DeepMind.
 It uses dilated causal convolutions to model audio at the waveform level.
 Capable of generating realistic human speech and natural audio patterns.
 Applications:
o Speech synthesis (text-to-speech)
o Audio enhancement
o Music generation
o Voice assistants
 WaveNet serves as the foundation of modern speech technologies used in Google Assistant
and many TTS systems.
 WaveNet is a deep generative model for producing raw audio waveforms, developed by
DeepMind. Unlike traditional speech synthesis systems that rely on predefined acoustic
models or concatenation of recorded audio, WaveNet generates speech sample-by-sample
at the waveform level. This allows it to capture natural human tone, pitch, rhythm, and
expressive characteristics with high fidelity.
 WaveNet uses dilated causal convolutions, which allow the neural network to look far back
into the audio sequence without using recurrent layers. Dilated convolutions exponentially
expand the receptive field every layer, making it efficient for modeling long-term

Page | 9
dependencies. This architecture is particularly advantageous for audio since waveform
signals require context across thousands of samples.
 The model operates autoregressively—each audio sample is generated based on all
previous samples. This gives WaveNet the ability to produce extremely realistic and
human-like speech. Later versions include non-autoregressive models to speed up
generation. WaveNet became the foundation for Google Assistant and Google’s text-to-
speech (TTS) systems, replacing older parametric and concatenative speech synthesis.
 Applications of WaveNet include speech generation, text-to-speech conversion, music
synthesis, audio denoising, voice conversion, and sound effect generation. It has also
influenced newer generative audio models like WaveGlow, WaveRNN, and diffusion-
based audio models. With its natural-sounding output and strong mathematical modeling
of raw signals, WaveNet represents a major advancement in audio processing

2. Natural Language Processing (NLP)

NLP enables machines to understand, generate, and interact using human language. It integrates
linguistics, machine learning, and deep learning techniques.

2.1 Sentiment Analysis

 The process of determining the emotional tone behind a text: positive, negative, or
neutral.
 Used widely in product reviews, social media monitoring, customer feedback systems.
 Techniques:
o Rule-based methods using lexicons.
o Machine learning models like Naive Bayes, SVM.
o Deep learning using LSTM, GRU, and BERT-based models.
 Steps include text cleaning, vectorization (TF-IDF/Word Embeddings), model training,
and sentiment prediction.
 Sentiment Analysis (also called Opinion Mining) is a technique used to determine the
emotional tone or attitude expressed in text data. It classifies text into categories such as

Page | 10
positive, negative, or neutral. This technique has become essential for analyzing social
media posts, customer reviews, feedback forms, product ratings, and brand reputation.
 The process begins with text preprocessing to clean and prepare the data. Tokenization
splits the text into meaningful units, while stop-word removal eliminates irrelevant words.
Stemming or lemmatization reduces words to their root form. After preprocessing, text is
transformed into numerical representations using Bag-of-Words, TF-IDF, Word2Vec,
GloVe, or transformer embeddings.
 Traditional machine learning techniques used for sentiment analysis include Naive Bayes,
Support Vector Machines, and Logistic Regression. These worked well for simpler datasets
but struggled with long sentences, sarcasm, contextual meanings, and ambiguous
expressions.
 Deep learning models such as LSTM, GRU, and Bi-LSTM improved performance by
capturing long-term dependencies in text. Modern transformer-based models such as
BERT, RoBERTa, and GPT achieve state-of-the-art accuracy by understanding context in
both directions and learning language semantics deeply.
 Sentiment analysis has many applications: analyzing brand sentiment on Twitter,
monitoring customer service interactions, detecting negative reviews in e-commerce,
analyzing political opinions, understanding student feedback, and tracking public mood
during events or crises. As NLP advances, sentiment analysis continues to evolve into more
nuanced tasks like emotion detection, sarcasm detection, and aspect-based sentiment
analysis.

2.2 Text Preprocessing

 Preprocessing is essential to convert raw text into a machine-interpretable format.


 Key steps:
o Tokenization – splitting text into words or sentences.
o Stop-word removal – eliminating commonly used words (e.g., “the”, “is”).
o Stemming/Lemmatization – reducing words to root form.
o Lowercasing – standardizing text.
o Removing noise – punctuation, numbers, URLs, emojis (if required).
 These steps improve model accuracy and reduce computational complexity.

Page | 11
Text preprocessing is the foundation of all NLP tasks. Raw text data is often messy, unstructured,
and inconsistent. Preprocessing transforms text into a clean, standardized, and machine-readable
format. The main steps include:

1. Lowercasing – converting all characters to lowercase ensures uniformity.


2. Removing noise – eliminating punctuation, special symbols, extra spaces, emojis
(optional).
3. Tokenization – splitting text into words or sentences.
4. Stop-word removal – removing frequently occurring but unimportant words.
5. Stemming and Lemmatization – reducing words to their base form.
6. Handling negations – converting “not good” into meaningful patterns.
7. Handling numbers, URLs, hashtags, mentions – cleaning social media text.
8. Spelling correction – fixing misspelled words improves accuracy.
9. Vectorization – converting text into numerical features.

Vectorization methods range from simple Bag-of-Words to TF-IDF, word embeddings like
Word2Vec and GloVe, and modern contextual embeddings like BERT embeddings. Proper
preprocessing significantly improves model performance and reduces computational overhead.

Text preprocessing is essential for all NLP applications such as text classification, sentiment
analysis, chatbot training, machine translation, summarization, and named entity recognition.
Without thorough preprocessing, even the best models will struggle to understand noisy or
inconsistent text.

2.3 ChatBot

 A chatbot is an AI system designed to simulate conversation and provide automated


responses.
 Types:
o Rule-based chatbots – simple pattern matching, limited intelligence.
o Retrieval-based chatbots – respond based on a predefined set of answers.
o Generative chatbots – produce natural language answers using deep learning
(Seq2Seq, Transformers, GPT).

Page | 12
A chatbot is an AI system designed to simulate conversation and respond intelligently to user
inputs. Chatbots combine several NLP techniques such as intent detection, entity recognition,
response generation, and dialogue management. There are three major types:

1. Rule-based chatbots – rely on predefined rules or pattern matching (e.g., ELIZA).


2. Retrieval-based chatbots – select the best response from a set of existing responses using
similarity matching.
3. Generative chatbots – produce new responses dynamically using deep learning.

Modern chatbots rely on machine learning models such as Seq2Seq with attention, Transformer
models, GPT architectures, and LLMs (Large Language Models). Each component plays a specific
role:

 Intent classification determines the user's purpose.


 Named Entity Recognition (NER) extracts important keywords.
 Dialogue manager decides how the system should respond.
 Response generator forms the finalized output.

Modern chatbots use:

o NLP for understanding user queries.


o Intent classification.
o Named Entity Recognition (NER).
o Dialogue management and response generation.

Chatbots are used widely across customer service, healthcare, banking, education, agriculture, and
e-commerce. Examples include virtual assistants like Siri, Alexa, Google Assistant, and support
chatbots used by companies. With advancements in NLP, chatbots are becoming more intelligent,
context-aware, and personalized. The development process involves data collection, conversation
design, training intent models, integrating APIs, and testing real conversations.

Applications: customer service, healthcare, banking, education, and virtual assistance.

Page | 13
Page | 14

Common questions

Powered by AI

Modern chatbots utilizing transformer models like GPT and BERT are significantly more effective than earlier rule-based or retrieval-based systems. Transformative models provide rich contextual understanding and natural language generation capabilities, allowing chatbots to engage in more coherent and contextually relevant conversations. They understand and generate diverse language patterns, manage complexities like humor or sarcasm, and adapt responses to conversational history, surpassing the pattern-matching limitations of rule-based systems and the static nature of retrieval-based systems. This advancement has expanded chatbot applications in customer service, virtual assistance, and educational tools, offering more interactive user experiences .

Object detection techniques differ from traditional image classification by not only identifying the object class but also determining its location within the image using bounding boxes. While image classification assigns a single label to an entire image, object detection involves both classification and localization, predicting the class labels and positions for multiple objects in an image. This requires more complex models and more computational resources since the process involves handling multiple parts of the image simultaneously for accurate detection and precise bounding position outputs .

Deep learning models like BERT, which utilizes transformer architecture, significantly outperform traditional machine learning approaches like Naive Bayes and SVM in sentiment analysis due to their ability to capture contextual meanings and long-term dependencies. BERT's capacity to understand context in both directions (bidirectionality) allows it to grasp complex sentence structures, handle nuances like sarcasm, and distinguish sentiment in ambiguous expressions. This leads to a more accurate and nuanced understanding of text, especially vital in analyzing social media posts and customer reviews, where context varies widely. In contrast, traditional methods often struggle with these complexities, relying on simpler statistical text features .

WaveNet differentiates itself from other speech synthesis models through its generative capability to create highly realistic and natural-sounding human speech directly from raw audio signals. Unlike traditional parametric and concatenative models, WaveNet generates audio waveforms sample by sample, forming audio patterns based on all preceding samples, allowing for nuanced and dynamic sound reproduction. This method results in speech output that is unprecedented in its naturalness and can seamlessly adjust to different dialects, tones, and speaking styles. Its architecture, using dilated causal convolutions, sets a new standard in audio quality for text-to-speech systems .

Convolutional Neural Networks (CNNs) revolutionized image classification by automatically learning hierarchical features from raw input images, eliminating the need for manual feature engineering. Previously, image classification relied on handcrafted features such as SIFT, HOG, and SURF, which were labor-intensive and less adaptable to new data. CNNs, through their deep architecture, capture low-level features like edges and textures in initial layers and more complex patterns like shapes and objects in deeper layers, leading to significant improvements in classification accuracy and efficiency .

The ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) transformed the computer vision and deep learning landscape by providing a large-scale benchmark for evaluating object recognition algorithms. The challenge encouraged the development of innovative deep learning models such as AlexNet, which in 2012 reduced error rates drastically compared to previous methods, thus marking the start of the deep learning revolution. Subsequent architectures like VGGNet, Inception, and ResNet improved on these results, introducing new techniques like deeper networks and residual connections, which led to superhuman accuracy. The challenge's impact extended to enabling transfer learning, as pretrained models from ImageNet were adapted for other specialized fields, significantly influencing research and applications across domains .

Preprocessing enhances NLP model performance by transforming raw, messy text into a clean, standardized, and machine-readable format, which significantly improves model accuracy and reduces computational complexity. Key preprocessing steps include tokenization, stop-word removal, stemming/lemmatization, lowercasing, noise removal, and vectorization. These steps ensure that the models receive uniform and meaningful input, avoid distractions from irrelevant information, and effectively reduce the dimensionality of the data, enabling the models to learn relevant patterns from the input text .

The success of the ImageNet database and its associated challenges has inspired numerous advancements in computer vision, such as the development of deep architectures (e.g., VGG, ResNet) that significantly improved image classification accuracy. It also popularized the use of transfer learning, where models pretrained on large datasets are adapted for specialized tasks, facilitating breakthroughs in fields that lack extensive labeled data. Model efficiency improvements, such as batch normalization and residual connections, were directly influenced by ImageNet's large and diverse dataset, which exposed limitations of earlier methods. Furthermore, ImageNet has spurred research in new fields like self-supervised and few-shot learning, expanding the horizons of what AI can achieve in visual understanding .

Hierarchical feature learning in CNNs plays a critical role in their effectiveness across various practical applications by enabling the model to understand input data at increasing levels of abstraction. In initial layers, CNNs capture basic features such as edges and textures; in deeper layers, they recognize complex patterns like shapes and entire objects. This hierarchical learning allows CNNs to generalize well across different tasks and datasets, making them particularly well-suited for applications like medical image diagnosis, where critical features may be subtle and complex, and autonomous driving, where real-time recognition of diverse objects is essential .

Pretrained ImageNet models offer significant benefits for transfer learning in specialized fields by providing a robust foundation on which to build domain-specific models. These pretrained models have learned to extract general features such as edges, textures, and object shapes from a vast array of categories. When adapted to specialized applications, such as medical imaging or satellite image analysis, they dramatically reduce the amount of domain-specific data and training time needed while maintaining high accuracy. This efficiency arises from leveraging existing knowledge captured in the model parameters, which is transferable to new, related tasks .

You might also like