0% found this document useful (0 votes)

18 views5 pages

CNN and LSTM for Image Captioning

This document describes a proposed framework that uses convolutional neural networks (CNNs) and long short-term memory (LSTM) for automatic image captioning. The framework aims to replace the recurrent neural network (RNN) encoder in existing models with a combination of CNN and LSTM. CNN is used to extract visual features from images, which are then encoded into text descriptions using LSTM. The model is trained end-to-end to generate captions that accurately describe the content of input images.

Uploaded by

Auliya Syahputra Siregar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views5 pages

CNN and LSTM for Image Captioning

Uploaded by

Auliya Syahputra Siregar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Materials Today: Proceedings xxx (xxxx) xxx

Contents lists available at ScienceDirect

Materials Today: Proceedings

journal homepage: [Link]/locate/matpr

CNN & LSTM using python for automatic image captioning

K. Loganathan a,⇑, R. Sarath Kumar b, V. Nagaraj c, Tegil J. John d
a
Department of Information Technology, Mailam Engineering College, Mailam, Tamilnadu, India
b
Department of Electronics and Communication Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, Tamilnadu, India
c
Department of Electronics and Communication Engineering, Knowledge Institute of Technology, Salem, Tamilnadu, India
d
Department of Computer Science, Kaamadhenu Arts and Science College, Erode, Tamilnadu, India

a r t i c l e i n f o a b s t r a c t

Article history: In this examination paper, we are planning a framework that use the abilities of counterfeit neural sys-
Received 12 October 2020 tems to ‘‘catch a picture dependent on its noteworthy highlights.” Recurrent Neural Network (RNN) is as a
Accepted 21 October 2020 rule progressively utilized as encoding-disentangling structures for machine decipher. Our goal is to sup-
Available online xxxx
plant the part of the RNN encoder with a Convolution Neural Network (CNN) and Long Short Term
Memory (LSTM) blend. Picture inscribing is a very captivating AI issue. Profound learning approach is
Keywords: the best in class of this issue, with the making of profound neural system. The primary errand of picture
Caption generation
subtitling is to create the portrayal of a picture naturally which requires our comprehension of picture
Deep learning
Encoding-decoding
content. The model is prepared so that it produces subtitles which nearly depict the picture when the
Neural networks information picture is given to the model. In the section of PC vision, common language handling,
Long Short Term Memory (LSTM) man-made consciousness and picture preparing, making a characteristic language clarification from pic-
tures is a critical issue.
Ó 2020 Elsevier Ltd. All rights reserved.
Selection and peer-review under responsibility of the scientific committee of the Emerging Trends in
Materials Science, Technology and Engineering.

1. Introduction Networks to confused handling of characteristic language issues,

for example, discourse acknowledgment and PC interpretation
You have seen an image, so you can tell your cerebrum rapidly prompts moderately quick advances [14-20].One such model is
what the image is about, yet can a PC mention to you what the pic- progress in the field of ‘‘Depicting Pictures”, among different
ture is? PC vision specialists have worked a lot on this and have advances. It’s trying to give a rundown of a picture. First it incorpo-
thought of it as outlandish as of not long ago! With late headways rates understanding the visual data and utilizing regular language
in profound learning brought about different new procedures, wide preparing programming to make an interpretation of the data into
informational collections and figuring assets accessible, we can sentences. This includes de-building up a model fit for catching the
make models that can produce subtitles for a picture. Program- relationship for the related picture present in the visual and com-
ming interpretation propels the AI science advancements at an mon language. The issue is multimodal, making the need to
unmatched pace [1-8]. Brisk enhancement for the effect of AI and develop a half breed model that can misuse the difficult’s multidi-
reinforce other mechanical parts of the organization. Computer- mensionality. Strategies [21-28], for example, layout based tech-
ized reasoning and Neural Networks are applied to convoluted cor- niques and recovery based strategies were regularly used to take
respondence issues in common dialects, for example, discourse care of the issue [28-35].
acknowledgment and programmed transmission.
Machine interpretation is developing at an uncommon pace in
view of innovative advances in AI [9-13].The quick improvement 2. Related techniques
in the region of AI is forming and improving different business pro-
gramming branches. The use of Artificial Intelligence and Neural A Convolution Neural Network is a profound learning calcula-
tion which accepts pictures as info, allocates loads and inclinations
and consequently separates one picture from another. Convolution
⇑ Corresponding author. Neural Networks (CNNs) involves numerical activity named convo-
E-mail address: klnathan83@[Link] (K. Loganathan). lution alongside discovery of appropriate channel highlights

[Link]
2214-7853/Ó 2020 Elsevier Ltd. All rights reserved.
Selection and peer-review under responsibility of the scientific committee of the Emerging Trends in Materials Science, Technology and Engineering.

Please cite this article as: K. Loganathan, R. Sarath Kumar, V. Nagaraj et al., CNN & LSTM using python for automatic image captioning, Materials Today:
Proceedings, [Link]
K. Loganathan, R. Sarath Kumar, V. Nagaraj et al. Materials Today: Proceedings xxx (xxxx) xxx

learned by the CNN via preparing. For example, we don’t utilize

any channels known to us like edge indicator or Gaussian clamor
evacuation, however rather we create calculation that learns pic-
ture preparing channels without anyone else through preparing
of the convolution neural system, which may particularly vary
from general picture handling channels. To comprehend the work-
ing of CNN we accept that we are running CNN on a picture of
16*16*3 measurement. The principal layer which is the informa-
tion layer holds the crude contribution of picture [Link] width sta-
ture and profundity which for our situation is 16, 16, and 3
individually. The second layer which is the convolution layer fig-
ures out how to process the yield volume by doing speck result
of all channels with given picture fix [36-42].
Let us state that we are utilizing 10 channels for this layer thus
we will get yield volume of measurement 16*16*10. The third layer
which is the initiation layer applies component astute enactment
capacity to the contribution of actuation layer which is essentially
the yield of past convolution layer. Different initiation capacities
utilized are: max, sigmoid. tanh, broken ReLU and so forth. The
yield volume stays unaltered so our yield volume will at present
be 16*16*10. The fourth layer is pool layer which lessens the size
of volume to decrease processing time which brings about quick
Fig. 2. LSTM Architecture [16].
calculation and subsequently diminishes superfluous deferral of
time and it takes less memory. There are different sorts of pooling
layers however the most well-known ones utilized are max pooling 3. Proposed technique
and normal pooling. In the event that we utilize normal pool with
2*2 channel with step of 2 will bring about yield volume of mea- Rather, it creates an up-and-comer set of new qualities utilizing
surement 8*8*10. The fifth and last layer is completely associated xt and ht-1 as contribution for the cell. A Continuous qualities
layers in which we have standard neural system which takes con- somewhere in the range of 0 and 1 by an actuation work sigmoid
tribution from past layers and changes over it into 1 dimensional are given smooth angles for back spread. In the LSTM, the over-
exhibit of size which is equivalent to the quantity of classes [43- looked entryway assumes a vital job. At the point when the over-
51]. The architecture is shown in Fig. 1. looked door units yield zero then the rehashed slopes get invalid
RNN (Recurrent Neural Network): A Recurrent Neural Networks and the relating old cell state units are disposed of. That way, the
(RNN) is fundamentally a thickly associated neural system. The LSTM discards data that it thinks won’t be important later on.
most noticeable contrast to a typical feed forward system is pre- Likewise, when the overlooked entryway units are yield 1 the
sentation of time. Repetitive name was thought of on account of mistake streams unaided through the cell units, and the model
the way that each component of its engineering is required to play can figure out how to relate significant distances between tran-
out a similar capacity. RNN has profited especially in the field of siently far off terms. Some other basic component of the LSTM is
regular language handling since word relies consecutively upon that of including entryways for yield. The yield entryway unit
any language. As a rule, RNNs ascertain some memory dependent assists with guaranteeing that not the entirety of the Ct cell state
on their calculations anytime of a succession up until now; i.e., past units’ data is uncovered to the remainder of the system and that
memory and current information [53–61]. Long Short Term Mem- lone the important data is delivered as ht. This implies the unused
ory (LSTM), famously alluded to as LSTMs, is extraordinary varia- information doesn’t influence the remainder of the system, as cell-
tion of RNNs that are fit for understanding inaccessible state information is as yet held to help in future choices. First of all
conditions in a word sentence. The design of long haul memory we will gather the dataset which we will use to prepare, approve
(LSTM) is definitely not the same as that of traditional RNNs. The and test our calculation. In this exploration paper we have utilized
working of a LSTM is appeared in underneath Fig. 2. Flickr8k dataset. The proposed block is shown in Fig. 3.
Secondly we will clean our information which incorporates
lower packaging, eliminating accentuations and words containing
numbers. Thirdly in our dataset we have a .txt record which con-
tains a rundown of 6000 picture names and we will load that doc-
ument to utilize it for preparing. Fourthly we characterize
structure of our CNN-RNN model. Then we characterize structure
of our LSTM model. Then we train our whole model. Lastly we
check our calculation by testing it utilizing different test pictures.
First of each information picture of .jpg design is taken as con-
tribution from dataset. Image is then given to CNN to produce
highlight vectors from the spatial information in the pictures and
the vectors are taken care of through the completely associated
direct layer. It is then taken care of two LSTM which is an uncom-
mon sort of RNN that incorporates a memory cell, so as to keep up
the data for a more drawn out timeframe (Fig. 4).
We use LSTM so as to create the consecutive information or suc-
cession of words that at long last produce portrayal of a picture by
applying different initiation capacities which for our situation is
Fig. 1. CNN Architecture [15]. Softmax.

2
K. Loganathan, R. Sarath Kumar, V. Nagaraj et al. Materials Today: Proceedings xxx (xxxx) xxx

Fig. 3. Hybrid LSTM Architecture [16].

4. Results and implementation

4.1. Tools and technologies

Anaconda(Free and open – source circulation of python)

Jupyter Lab (Text Editor)
Python Deep Learning libraries, for example, TensorFlow, Keras
and different other python libraries, for example, Pillow, Tqdm and
NumPy.

4.2. Dataset

In this exploration paper we have utilized Flickr8k dataset

where we have utilized 6000 pictures to prepare, 1000 pictures
to substantial and 1000 pictures to test the calculation.

4.3. Epochs

Our model has been prepared for 15 ages. We have tried our cal-
culation on many test pictures. Underneath given are 10 test pic-
tures and their subtitle produced individually (Fig. 5. Fig. 6. Fig. 7).
While taking a shot at this calculation, we actualized a picture
inscribing model without any preparation, and how significant

Fig. 4. Flowchart of proposed method given are 10 test pictures and their subtitle Fig. 5. Test Image 1 - Caption generated by our algorithm is: man is walking along
produced individually. the beach.

3
K. Loganathan, R. Sarath Kumar, V. Nagaraj et al. Materials Today: Proceedings xxx (xxxx) xxx

played out our examination utilizing Flickr8k dataset. While, for

additional upgrade in future we can utilize increasingly more big-
ger datasets, for example, Flickr30k or MSCOCO in order to get
more exact subtitles.

CRediT authorship contribution statement

K. Loganathan: Conceptualization, Data curation, Formal anal-

ysis. R. Sarath Kumar: Investigation, Methodology, Project admin-
istration. V. Nagaraj: Resources, Software, Supervision, Validation,
Visualization, Writing - original draft. Tegil J. John: Writing -
review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing finan-

Fig. 6. Test Image 2 -Caption generated by our algorithm is: surfer is riding wave in
the ocean.
cial interests or personal relationships that could have appeared
to influence the work reported in this paper.

References

[1] Y. Wu, et al. (2016). CoRR abs/1609.08144.

[2] M. Jordan, T.M. Mitchell, Science 349 (6245) (2015) 255–260.
[3] S. Amershi, M. Cakmak, W.B. Knox, T. Kulesza (2014) AI Magazine.
[4] A. Meltzoff, P. Kuhl, J. Movellan, T. Sejnowski (2009) 325:284-8.
[5] G. Hinton et al., IEEE Signal Process Mag. 29 (6) (2012) 82–97.
[6] J. Mishra, SahaI (2010) Neurocomputing 74(1):239-255. Artificial Brain.
[7] H.R. Maier, G.C. Dandy, Environ. Modell. Softw. 15 (1) (2000) 101–124.
[8] A.N. Bhute, B.B. Meshram (2014) CoRR abs/1404.1514.
[9] A. Karpathy, L. Fei-Fei, IEEE Trans. Pattern Anal. Mach. Intell. 39 (4) (2017)
664–676.
[10] O. Vinyals, A. Toshev, S. Bengio, D. Erhan (2014) CoRR abs/1411.4555.
[11] K. O’Shea, R. Nash (2015) CoRR abs/1511.08458.
[12] Z.C. Lipton, D.C. Kale, C. Elkan, Wetzel RC (2015) CoRR abs/1511.03677.
[13] J. Schmidhuber, Neural Netw. 61 (2015) 85–117.
[14] SantanuPattanayak. Karnataka: Apress; 2017. Available: [Link]
[Link]/20171207/Pro%20Deep%20Learning%20with%20TensorFlow.
pdf.
[15] Dr.S. Artheeswari. J. Soc. Technol. Environ. Sci. Vol. 6(2), August 2017.
[16] M.S. Kalaivany. Int. J. Appl. Eng. Res. 10 Nov 2015
[17] Mr.P. Saravanan. Int. J. Innov. Res. Comput. Commun. Eng. Vol.3, Special Issue
8,Oct 2015.
[18] M. Ramalingam, R.M.S. Parvathi, Eur. J. Sci. Res. 74 (1) (2012) 154–163.
[19] M. Ramalingam, R.M.S. Parvathi, Int. Rev. Comput. Softw. 8 (9) (2013) 2136–
Fig. 7. Test Image 3 -Caption generated by our algorithm is: man is red shirt is 2141.
riding bike on the side of road. (For interpretation of the references to colour in this [20] Sharma, Grishma, Priyanka Kalena, Nishi Malde, Aromal Nair, Saurabh Parkar.
figure legend, the reader is referred to the web version of this article.) Available at SSRN 3368837 (2019).
[21] S.A. SivaKumar, R. Naveen, D. Dhabliya et al., Mater. Today Proc., [Link]
org/10.1016/[Link].2020.07.064.
[22] B. Maruthi Shankar, S.A. Sivakumar, B. Vidhya et al., Mater. Today Proc.,
learning can be valuable to create right inscriptions in common [Link]
language, for example, English. We executed our calculation on [23] Dr.S.A. Sivakumar, Dr.S. Karthikeyan, Ms.M. Benedict Tephila, Dr.R. Senthil
Flickr8k dataset and inevitably we had the option to create subti- Ganesh, Mr.R. Sarath Kumar, Dr.B. Maruthi Shankar, IJAST, vol. 29, no. 8s, pp.
2254 - 2260, May 2020.
tles with moderate exactness. We additionally presumed that big- [24] S. Satheesh Kumar, R. Sowmya, B. Maruthi Shankar et al., Mater. Today Proc.,
ger the dataset more will be the exactness and less will be the [Link]
misfortunes. Programmed inscribing of pictures is a genuinely [25] T. Sathish, Dinesh Kumar Singaravelu, J. Sci. Ind. Res. 79 (6) (2020) 547–551.
[26] T. Sathish, Dinesh Kumar Singaravelu, J. Sci. Ind. Res. 79 (5) (2020) 449–452.
new position, and extraordinary advancement has been put forth
[27] T. Sathish, S. Karthick, J. Mater. Res. Technol. 9 (3) (2020) 3481–3487.
because of the attempts of specialists around there. As we would [28] T. Sathish, N. Sabarirajan, S. Karthick, Mater. Today Proc. (2019), [Link]
see it there is still a lot of room for improving picture subtitling org/10.1016/[Link].2019.12.085.
productivity. [29] T. Sathish, S. Karthick, Mater. Today Proc. (2019), [Link]
[Link].2019.12.084.
[30] Thanikodi Sathish, Singaravelu Dinesh Kumar, Devarajan Chandramohan,
Venkatraman Vijayan, Rathinavelu Venkatesh, Therm. Sci. Vinca Inst.
5. Conclusion and future enhancement Nuclear Sci. 24 (1B) (2020) 575–581.
[31] Krishnaswamy Haribabu, Muthukrishnan Sivaprakash, Thanikodi Sathish,
Arockiaraj Godwin Antony, Venkatraman Vijayan, Therm. Sci. Vinca Inst.
Right off the bat, the quick advancement of profound neural Nuclear Sci. 24 (1B) (2020) 495–498.
systems would absolutely support the proficiency of picture por- [32] Muthukrishnan Sivaprakash, Krishnaswamy Haribabu, Thanikodi Sathish,
trayal age by utilizing more productive system structures as lan- Sundaresan Dinesh, Venkatraman Vijayan, Therm. Sci. Vinca Inst. Nuclear
Sci. 24 (1B) (2020) 499–503.
guage models or potentially visual models. Besides, since pictures
[33] S.P. Palaniappan, K. Muthukumar, R.V. Sabariraj, S. Dinesh Kumar, T. Sathish,
comprise of items conveyed in space while picture subtitles are Mater. Today Proc. 21 (1) (2020) 1013–1021.
arrangements of words, it is important for picture inscribing to [34] T. Sathish, S. Dinesh Kumar, S. Karthick, Mater. Today Proc. 21 (1) (2020) 971–
analyze the presence and request of visual ideas in picture inscrip- 975.
[35] T. Sathish, J. Mater. Res. Technol. 8 (5) (2019) 4354–4363.
tions. It would likewise be intriguing to work into taking care of [36] T. Sathish, Trans. Can. Soc. Mech. Eng. 43 (04) (2019) 551–559.
picture subtitling issues in various exceptional cases. We have [37] T. Sathish, J. New Mater. Electrochem. Syst. 22 (1) (2019, 2019,) 5–9.

4
K. Loganathan, R. Sarath Kumar, V. Nagaraj et al. Materials Today: Proceedings xxx (xxxx) xxx

[38] T. Sathish, Mater. Today Proc. 05 (13) (2018) 26860–26865. [49] T. Sathish, Int. J. Ambient Energy, Taylor and Francis Publishers, Accepted, DOI:
[39] T. Sathish, J. Appl. Fluid Mech. 10 (24) (2017) 41–50. [Link]
[40] T. Sathish. J. New Mater. Electrochem. Syst. Vol. 20, pp. 161-167, 2017. [50] T. Sathish, J. Sci. Ind. Res. 79 (8) (2020) 750–752.
[41] T. Sathish, Int. J. Ambient Energy 41 (07) (2020) 1–6. [51] T. Sathish, Dinesh Kumar Singaravelu, J. Sci. Ind. Res. 79 (9) (2020) 843–845.
[42] T. Sathish, Lecture notes on Mechanical Engineering – Springer, DOI:
[Link] 2019.
[43] T. Sathish, S. Dinesh Kumar, K. Muthukumar, S. Karthick, Mater. Today Proc. 21
Further Reading
(1) (2020) 847–856.
[44] R. Praveen Kumar, P. Periyasamy, S. Rangarajan, T. Sathish, Mater. Today Proc. [1] Dhinakaran Veeman, N. Siva Shanmugam, T. Sathish, Vijay Petley,
21 (1) (2020) 504–510. Gokulakrishnan Sriram, Trans. Can. Soc. Mech. Eng. 44 (3) (2020) 471–480.
[45] T. Sathish, J. Jayaprakash, P.V. Senthil, R. Saravanan, FME Trans. 45 (1) (2017) [2] T. Sathish, Prog. Ind. Ecol. Int. J. (PIE) 12 (1/2) (2018) 46–58.
172–180. [3] T. Sathish, Prog. Ind. Ecol. Int. J. (PIE) 12 (1/2) (2018) 112–119.
[46] T. Sathish, G. Muthu, M.D. Vijayakumar, V. Dhinakaran, P.M. Bupathi Ram, [4] T. Sathish, Mater. Today Proc. 05 (6) (2018) 14448–14457.
Mater. Today Proc. (2020), [Link] [5] T. Sathish, Mater. Today Proc. 05 (6) (2018) 14545–14552.
[47] T. Sathish, J. New Mater. Electrochem. Syst. 21 (3) (2018) 179–185. [6] T. Sathish, Mater. Today Proc. 05 (6) (2018) 14416–14422.
[48] T. Sathish, J. Appl. Fluid Mech. 11 (2018) 39–44. [7] T. Sathish, Int. J. Mech. Prod. Eng. Res. Dev. 07 (2017) 551–560.

Common questions

Preprocessing datasets is crucial before training models, especially in image captioning projects, because it ensures that data is clean, standardized, and usable for learning. Preprocessing steps such as lowercasing text, removing punctuations, and filtering out noise help eliminate inconsistencies and irrelevant data that could impose biases or errors during learning. It allows models to focus on essential patterns without being skewed by common preprocessing errors, enhancing model accuracy and reliability .

LSTM networks utilize three main gating mechanisms: the forget gate, the input gate, and the output gate, to manage the flow of information. The forget gate removes irrelevant information from the cell state, the input gate decides which new information is important to be added to the cell state, and the output gate determines what information from the cell is sent to the next layer. These gates help LSTMs to selectively retain, update, and put forth long-term dependencies, avoiding errors that result from long sequences .

Input reduction in CNNs aids in efficient computation through the use of pooling layers, which apply operations like max pooling or average pooling to reduce the spatial dimensions of the feature maps. This reduction minimizes the number of parameters and computation within the network, preserving the most salient features while discarding redundant information. Consequently, it speeds up processing and reduces memory requirements, maintaining performance while making CNNs scalable and efficient for high-dimensional data like images .

The pooling layer in CNN architecture is responsible for reducing the spatial dimensions (width and height) of the input volume. This reduction decreases the computation required for the network by downsampling the feature map, thus lowering processing time and memory usage. Popular pooling methods like max pooling or average pooling help in selecting the dominant presence of features such as edges, reducing overfitting and making the network more robust in selecting features across different positions .

The size of the dataset is significant for enhancing the performance of image captioning algorithms because larger datasets provide more examples from which models can learn complex patterns and nuances. With more data, models gain better generalization skills, reducing overfitting and improving accuracy in generating captions. Larger datasets also present a greater variety of images and captions, allowing models to handle diverse input more effectively, which is crucial for achieving high precision in real-world applications .

Long Short Term Memory (LSTM) networks enhance traditional RNNs by effectively managing the vanishing gradient problem, which affects the ability of RNNs to learn dependencies across time steps. LSTMs introduce a memory cell and gates that control the flow of information, allowing them to capture long-range dependencies and carry relevant past information forward without it being diminished. The forget gate discards unnecessary data, while the input and output gates control input and output information, enabling robust handling of long-term dependencies .

Recurrent Neural Networks (RNNs) differ from traditional feedforward networks primarily through their capability to handle sequential data. RNNs incorporate temporal dynamics by including loops, which allow information to persist and be processed over time, capturing dependencies between previous inputs in the sequence, unlike feedforward networks that treat inputs as independent from one another. This characteristic is particularly beneficial for tasks like language processing, where the context of previous words affects the interpretation of current words .

Multimodal problems, which involve integrating and processing different types of data like visual and text, present challenges in aligning and correlating diverse data types that naturally have different structures and scales. Hybrid models that incorporate elements of both CNNs for visual data and RNNs or LSTMs for sequential data overcome these challenges by capturing complex relationships across modes using respective specialized components. By aligning high-level features extracted from diverse inputs, hybrid models can create cohesive understanding and reasoning across modalities .

Convolutional Neural Networks (CNNs) distinguish between images by performing a series of convolutions and pooling operations. The convolution layers apply learned filters to the input image to detect specific features. Initially, filters may capture simple edges or colors, but as the network deepens, they recognize more complex patterns. The pooling layers then downsample the feature maps, reducing dimensionality and allowing for translation invariances. Finally, fully connected layers interpret these feature maps into classifications. CNNs are unique because they automatically learn features during training rather than utilizing pre-defined filters, making them highly effective at distinguishing complex patterns .

Image captioning combines visual and language processing by using Convolutional Neural Networks (CNNs) to extract feature vectors from images, capturing the spatial information within the visual input. These feature vectors are then fed into Recurrent Neural Networks (RNNs), particularly Long Short Term Memory (LSTM) units, which translate these features into a sequence of words that describe the image. This multimodal approach allows for the generation of coherent and contextually relevant sentences that describe complex scenes, integrating visual perception with language synthesis .

Image Caption Generation with Deep Learning
No ratings yet
Image Caption Generation with Deep Learning
7 pages
CNN and LSTM for Image Captioning
No ratings yet
CNN and LSTM for Image Captioning
4 pages
Deep Learning for Image Captioning
No ratings yet
Deep Learning for Image Captioning
18 pages
Deep Learning for Image Captioning
No ratings yet
Deep Learning for Image Captioning
6 pages
Image Captioning - A Deep Learning Approach
No ratings yet
Image Captioning - A Deep Learning Approach
4 pages
Deep Learning for Image Captioning
No ratings yet
Deep Learning for Image Captioning
18 pages
Image Captioning with CNN and LSTM
No ratings yet
Image Captioning with CNN and LSTM
2 pages
Convolutional Techniques for Image Captioning
No ratings yet
Convolutional Techniques for Image Captioning
10 pages
TIJER2504165
No ratings yet
TIJER2504165
5 pages
Research Paper Final
No ratings yet
Research Paper Final
5 pages
Automated Image Captioning with CNN-RNN
No ratings yet
Automated Image Captioning with CNN-RNN
17 pages
Https:shodhgangotri - Inflibnet.ac - In:8443:jspui:bitstream:20.500.14146:6634:1:01 Synopsis
No ratings yet
Https:shodhgangotri - Inflibnet.ac - In:8443:jspui:bitstream:20.500.14146:6634:1:01 Synopsis
13 pages
PGCON Paper Final
No ratings yet
PGCON Paper Final
4 pages
Image Captioning AI for Visual Aid
No ratings yet
Image Captioning AI for Visual Aid
7 pages
Image Captioning Generator Using Deep Machine Learning
No ratings yet
Image Captioning Generator Using Deep Machine Learning
3 pages
RP Springer
No ratings yet
RP Springer
10 pages
Image Captioning with CNN and LSTM
No ratings yet
Image Captioning with CNN and LSTM
8 pages
Autonomous Image Captioning Project
No ratings yet
Autonomous Image Captioning Project
35 pages
Image Caption Generator Project Report
No ratings yet
Image Caption Generator Project Report
48 pages
Retrieval-Augmented Image Captioning
No ratings yet
Retrieval-Augmented Image Captioning
22 pages
Visual Image Captioning for Accessibility
No ratings yet
Visual Image Captioning for Accessibility
6 pages
IJNRD2309143
No ratings yet
IJNRD2309143
11 pages
Deep Neural Network Image Captioning
No ratings yet
Deep Neural Network Image Captioning
28 pages
Deep Learning for Image Captioning
No ratings yet
Deep Learning for Image Captioning
6 pages
Image Captioning Techniques Overview
No ratings yet
Image Captioning Techniques Overview
17 pages
Caption Combiner for Image Analysis
No ratings yet
Caption Combiner for Image Analysis
14 pages
Deep Learning Image Caption Generator
No ratings yet
Deep Learning Image Caption Generator
62 pages
Hierarchical LSTMs for Visual Captioning
No ratings yet
Hierarchical LSTMs for Visual Captioning
18 pages
Language CNN for Image Captioning
No ratings yet
Language CNN for Image Captioning
10 pages
Language CNN for Image Captioning
No ratings yet
Language CNN for Image Captioning
10 pages
Vision Language Framework for Image Analysis
No ratings yet
Vision Language Framework for Image Analysis
10 pages
Image Captioning with CNN and LSTM
No ratings yet
Image Captioning with CNN and LSTM
11 pages
Language Models for Vision Tasks Analysis
No ratings yet
Language Models for Vision Tasks Analysis
13 pages
Image Captioning with Attention Mechanism
No ratings yet
Image Captioning with Attention Mechanism
45 pages
Hierarchical Attention for Image Captioning
No ratings yet
Hierarchical Attention for Image Captioning
13 pages
Deep Learning for Image Captioning
No ratings yet
Deep Learning for Image Captioning
8 pages
Recent Advances in Image Captioning
No ratings yet
Recent Advances in Image Captioning
6 pages
Neural Image Caption Generator Overview
No ratings yet
Neural Image Caption Generator Overview
9 pages
Deep Learning Image Caption Generator
No ratings yet
Deep Learning Image Caption Generator
8 pages
A Survey of Evolution of Image Captioning PDF
No ratings yet
A Survey of Evolution of Image Captioning PDF
18 pages
Image Captioning Project Overview
No ratings yet
Image Captioning Project Overview
18 pages
Data Science Interview Questions (#Day27)
No ratings yet
Data Science Interview Questions (#Day27)
18 pages
Bidirectional LSTM for Image Captioning
No ratings yet
Bidirectional LSTM for Image Captioning
17 pages
Automatic Image Captioning Project Review
No ratings yet
Automatic Image Captioning Project Review
13 pages
Deep Learning Image Caption Generator
No ratings yet
Deep Learning Image Caption Generator
5 pages
Deep Image Captioning Overview
No ratings yet
Deep Image Captioning Overview
7 pages
Automatic Image Annotation Model Using LSTM Approach
No ratings yet
Automatic Image Annotation Model Using LSTM Approach
13 pages
Scene Description Generator for the Visually Impaired
No ratings yet
Scene Description Generator for the Visually Impaired
6 pages
Neural Image Captioning Project Report
No ratings yet
Neural Image Captioning Project Report
10 pages
Dual Modality Prompt Tuning For Vision-Language Pre-Trained Model
No ratings yet
Dual Modality Prompt Tuning For Vision-Language Pre-Trained Model
13 pages
Image Caption Generation with Deep Learning
No ratings yet
Image Caption Generation with Deep Learning
6 pages
Survey on Image Captioning Techniques
No ratings yet
Survey on Image Captioning Techniques
14 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
9 pages
Semantic Attention in Image Captioning
No ratings yet
Semantic Attention in Image Captioning
10 pages
A Framework For Editing Image Captions: Show, Edit and Tell
No ratings yet
A Framework For Editing Image Captions: Show, Edit and Tell
9 pages
Neural Image Captioning for the Visually Impaired
No ratings yet
Neural Image Captioning for the Visually Impaired
6 pages
Image Paragraph Captioning Project
No ratings yet
Image Paragraph Captioning Project
5 pages
Deep Learning Exam Questions Guide
No ratings yet
Deep Learning Exam Questions Guide
3 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
34 pages
Deep Learning for Text and Sequences
No ratings yet
Deep Learning for Text and Sequences
9 pages
Machine Learning Overview by Misganu T.
No ratings yet
Machine Learning Overview by Misganu T.
27 pages
Fraud Detection in Cryptocurrency Transactions
No ratings yet
Fraud Detection in Cryptocurrency Transactions
13 pages
Neural Network and Deep Learning Unit 1 PDF
No ratings yet
Neural Network and Deep Learning Unit 1 PDF
28 pages
Automated Attacks on Online Reviews
No ratings yet
Automated Attacks on Online Reviews
16 pages
CNN Predicts Soil Matric Potential Accurately
No ratings yet
CNN Predicts Soil Matric Potential Accurately
9 pages
Understanding NLP and Document Structure
No ratings yet
Understanding NLP and Document Structure
4 pages
10.2478 - Jsiot 2024 0013
No ratings yet
10.2478 - Jsiot 2024 0013
14 pages
LLMs in Medical Image Processing Review
No ratings yet
LLMs in Medical Image Processing Review
14 pages
Lithium-Ion Battery SOH Prediction Model
No ratings yet
Lithium-Ion Battery SOH Prediction Model
8 pages
Social Media Sentiment Analysis Techniques
No ratings yet
Social Media Sentiment Analysis Techniques
10 pages
Embracing AI in Arbitration Chinese Prospect - Navigating Challenges and Forging Pathways
No ratings yet
Embracing AI in Arbitration Chinese Prospect - Navigating Challenges and Forging Pathways
33 pages
Deep Learning Concepts and Techniques
No ratings yet
Deep Learning Concepts and Techniques
157 pages
Master’s in Data Science with Generative AI
No ratings yet
Master’s in Data Science with Generative AI
55 pages
Turkish Phishing Detection Using AI
No ratings yet
Turkish Phishing Detection Using AI
64 pages
IoT and ML for Air Quality Monitoring
No ratings yet
IoT and ML for Air Quality Monitoring
8 pages
RNN and LSTM for Sequence Data Analysis
No ratings yet
RNN and LSTM for Sequence Data Analysis
37 pages
Smart Meter Firmware for Energy Control
No ratings yet
Smart Meter Firmware for Energy Control
101 pages
Arabic To Bangla Machine Translation Using Encoder Decoder Approach
No ratings yet
Arabic To Bangla Machine Translation Using Encoder Decoder Approach
4 pages
Hybrid Deep Learning for ASL and ISL Recognition
No ratings yet
Hybrid Deep Learning for ASL and ISL Recognition
13 pages
Deep Learning Course Overview (21CS743)
No ratings yet
Deep Learning Course Overview (21CS743)
15 pages
Insights on Recurrent Neural Networks
No ratings yet
Insights on Recurrent Neural Networks
72 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
40 pages
Training Multi-Layer Feedforward DNNs
No ratings yet
Training Multi-Layer Feedforward DNNs
9 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
LEC3
No ratings yet
LEC3
36 pages
Bidirectional Recurrent Neural Networks Explained
No ratings yet
Bidirectional Recurrent Neural Networks Explained
11 pages
Frequency Modulated Transformer for Disease Prediction
No ratings yet
Frequency Modulated Transformer for Disease Prediction
25 pages