Video Summarization Using a Dense Captioning (DenseCap) Model

Das, Sourav; Kolya, Anup Kumar; Kundu, Aniruddha

doi:10.1002/9781119571452.ch5

Cited by 1 publication

(1 citation statement)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, they employed a proposal generator primarily reliant on video features, which deviated from the core concept of the dense video captioning task. Das et al [42] proposed a model that starts by producing region captions as its primary output. These region captions are then subjected to our clustering technique, resulting in the creation of sentence clusters.…”

Section: B Multi-modal Dense Video Captioningmentioning

confidence: 99%

TAPER-WE: Transformer-Based Model Attention with Relative Position Encoding and Word Embedding for Video Captioning and Summarization in Dense Environment

Himanshu Tyagi

2023

IJRITCC

View full text Add to dashboard Cite

In the era of burgeoning digital content, the need for automated video captioning and summarization in dense environments has become increasingly critical. This paper introduces TAPER-WE, a novel methodology for enhancing the performance of these tasks through the integration of state-of-the-art techniques. TAPER-WE leverages the power of Transformer-based models, incorporating advanced features such as Relative Position Encoding and Word Embedding. Our approach demonstrates substantial advancements in the domain of video captioning. By harnessing the contextual understanding abilities of Transformers, TAPER-WE excels in generating descriptive and contextually coherent captions for video frames. Furthermore, it provides a highly effective summarization mechanism, condensing lengthy videos into concise, informative summaries. One of the key innovations of TAPER-WE lies in its utilization of Relative Position Encoding, enabling the model to grasp temporal relationships within video sequences. This fosters accurate alignment between video frames and generated captions, resulting in superior captioning quality. Additionally, Word Embedding techniques enhance the model's grasp of semantics, enabling it to produce captions and summaries that are not only coherent but also linguistically rich. To validate the effectiveness of our proposed approach, we conducted extensive experiments on benchmark datasets, demonstrating significant improvements in captioning accuracy and summarization quality compared to existing methods. TAPER-WE not only achieves state-of-the-art performance but also showcases its adaptability and generalizability across a wide range of video content. In conclusion, TAPER-WE represents a substantial leap forward in the field of video captioning and summarization. Its amalgamation of Transformer-based architecture, Relative Position Encoding, and Word Embedding empowers it to produce captions and summaries that are not only informative but also contextually aware, addressing the growing need for efficient content understanding in the digital age.

show abstract

Section: B Multi-modal Dense Video Captioningmentioning

confidence: 99%

TAPER-WE: Transformer-Based Model Attention with Relative Position Encoding and Word Embedding for Video Captioning and Summarization in Dense Environment

Himanshu Tyagi

2023

IJRITCC

View full text Add to dashboard Cite

show abstract

Video Summarization Using a Dense Captioning (DenseCap) Model

Cited by 1 publication

References 26 publications

TAPER-WE: Transformer-Based Model Attention with Relative Position Encoding and Word Embedding for Video Captioning and Summarization in Dense Environment

TAPER-WE: Transformer-Based Model Attention with Relative Position Encoding and Word Embedding for Video Captioning and Summarization in Dense Environment

Contact Info

Product

Resources

About