Topic-Based Image Caption Generation

Dash, Sandeep Kumar; Acharya, Shantanu; Das, Ranjita; Gelbukh, Alexander

doi:10.1007/s13369-019-04262-2

Cited by 16 publications

(3 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To address the above issues, several approaches have been proposed to enrich the semantic information of textual description sentences under the guidance of external information. Take the image captioning as an example 29 . Aditya et al 30 employ common sense reasoning to detect a scene description graph in images and translate this graph directly into description sentences through a template-based language model.…”

Section: Video Keyframementioning

confidence: 99%

A Sentence retrieval generation network guided video captioning

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

At present, the video captioning models based on an encoder-decoder mainly rely on a single video input source. The contents of video captioning are limited since few studies employed external corpus information to guide the generation of video captioning, which is not conducive to the accurate description and understanding of video contents. To address this issue, this work proposes a novel video captioning method guided by a sentence retrieval generation network (ED-SRG). First, we integrate a ResNeXt network model, an efficient convolutional network for online video understanding (ECO) model and a long short-term memory (LSTM) network model to construct an encoder-decoder, which are utilized to extract the 2D features, 3D features and object features of video data respectively. These features are decoded to generate textual sentences that conform to video contents for sentence retrieval. Then, a sentence-transformer network model is employed to retrieve different sentences in an external corpus that are semantically similar to the textual sentences, and the candidate sentences are screened out through similarity measurement. Finally, a novel GPT-2 network model is constructed based on GPT-2 network structure. The model introduces a designed random selector to randomly select predicted words with a high probability of appearance in the corpus, which is used to guide and generate textual sentences that are more in line with human natural language expressions. The experiments on common datasets MSVD and MSR-VTT in comparison with some existing works demonstrate that our proposed method can generate sentences with richer semantics and the performance of our method is better than several state-of-the art approaches.

show abstract

Section: Video Keyframementioning

confidence: 99%

A Sentence retrieval generation network guided video captioning

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Utilizing this set as part of the inputs into the language generation LSTM resulted in more semantically descriptive sentences. Similar research adopting semantic embeddings obtained through topic modelling with latent Dirichlet allocation (LDA) was conducted by Dash et al (2019). Using LDA, they extracted topics for each of the captions in the text corpus and used these topics as inputs alongside the image features to guide the LSTM during sentence generation.…”

Section: Image Captioning With Semantic Embeddingsmentioning

confidence: 99%

Bidirectional LSTM approach to image captioning with scene features

Agughalam

Pathak

Stynes

2021

Thirteenth International Conference on Digital Image Processing (ICDIP 2021)

View full text Add to dashboard Cite

I hereby certify that the information contained in this (my submission) is information pertaining to research I conducted for this project. All information other than my own contribution will be fully referenced and listed in the relevant bibliography section at the rear of the project. ALL internet material must be referenced in the bibliography section. Students are required to use the Referencing Standard specified in the report template. To use other author's written or electronic work is illegal (plagiarism) and may result in disciplinary action.

show abstract

“…In recent years, the generation method of image description [1], based on deep learning, has made great progress in the field of natural images, but in the field of medical images there is still a lack of effective methods that can automatically analyze diseases in medical images and generate diagnostic text. The reason is that the automatically generated disease diagnosis text should not only conform to the grammatical rules of natural language, that is, the formal cohesion should be good, but should also ensure semantic coherence.…”

Section: Introductionmentioning

confidence: 99%

Lenke Classification Report Generation Method for Scoliosis Based on Spatial and Context Dual Attention

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

The scoliosis report is a diagnosis made by the clinician looking at X-ray images of the spine. However, with numerous images, writing the report can be time-consuming and error-prone. Therefore, this paper proposes an automatic generation model of the end-to-end scoliosis Lenke classification report. The model automatically generates a short diagnostic text to explain the results of the classifiers’ Lenke classification diagnosis of scoliosis. Instead of reproducing the original diagnostic report, the original diagnostic report is described as a short sentence with diagnostic significance. In the model, the CBAM attention module is added to the residual’s path of ResNet-50 to extract key regional features of the image, and the improved Long Term and Short Term Memory Network (M-LSTM) fusion attention mechanism with additional gated operations is used as the decoder to generate more relevant description statements. The model was verified on the scoliosis dataset from Guizhou Orthopaedic Hospital, and the generated diagnostic text obtained good scores on BLEU and CIDEr evaluation indexes, and also satisfactory scores on the evaluation criteria of five professional clinicians. Therefore, the diagnostic text generated by this method had good performance in accuracy and semantic expression.

show abstract

Topic-Based Image Caption Generation

Cited by 16 publications

References 33 publications

A Sentence retrieval generation network guided video captioning

A Sentence retrieval generation network guided video captioning

Bidirectional LSTM approach to image captioning with scene features

Lenke Classification Report Generation Method for Scoliosis Based on Spatial and Context Dual Attention

Contact Info

Product

Resources

About