2023
DOI: 10.3390/s23167050
|View full text |Cite
|
Sign up to set email alerts
|

Video Scene Detection Using Transformer Encoding Linker Network (TELNet)

Abstract: This paper introduces a transformer encoding linker network (TELNet) for automatically identifying scene boundaries in videos without prior knowledge of their structure. Videos consist of sequences of semantically related shots or chapters, and recognizing scene boundaries is crucial for various video processing tasks, including video summarization. TELNet utilizes a rolling window to scan through video shots, encoding their features extracted from a fine-tuned 3D CNN model (transformer encoder). By establishi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 42 publications
0
1
0
Order By: Relevance
“…Firstly, we add multiple noise sources (Gaussian Noise, Salt and Pepper Noise, Dropout Noise) at the input end of the model to comprehensively consider the feature extraction ability under different noisy conditions. Secondly, we incorporate a self-attention mechanism [45,46] into the hidden layer of the encoder to utilize the correlation between high-dimensional features and use the importance degree to focus the neural network's attention more on important information among various features. Finally, we deepen the autoencoder's layers and use a deep neural network to learn the deep feature representation of high-dimensional features to improve the robustness of the dimensionality reduction model.…”
Section: Feature Dimension Reduction Module Based On Mdsaementioning
confidence: 99%
“…Firstly, we add multiple noise sources (Gaussian Noise, Salt and Pepper Noise, Dropout Noise) at the input end of the model to comprehensively consider the feature extraction ability under different noisy conditions. Secondly, we incorporate a self-attention mechanism [45,46] into the hidden layer of the encoder to utilize the correlation between high-dimensional features and use the importance degree to focus the neural network's attention more on important information among various features. Finally, we deepen the autoencoder's layers and use a deep neural network to learn the deep feature representation of high-dimensional features to improve the robustness of the dimensionality reduction model.…”
Section: Feature Dimension Reduction Module Based On Mdsaementioning
confidence: 99%
“…Firstly, we add multiple noise sources (Gaussian Noise, Salt and Pepper Noise, Dropout Noise) at the input end of the model to comprehensively consider the feature extraction ability under different noisy conditions. Secondly, we incorporate a self-attention mechanism [45,46] into the hidden layer of the encoder to utilize the correlation between high-dimensional features and use the importance degree to focus the neural network's attention more on important information among various features. Finally, we deepen the autoencoder's layers and use a deep neural network to learn the deep feature representation of high-dimensional features to improve the robustness of the dimensionality reduction model.…”
Section: Feature Dimension Reduction Module Based On Mdsaementioning
confidence: 99%
“…Highlight detection is the process of identifying interesting or important segments within a video [22][23][24][25]. Traditional datasets [26] do not provide personalized highlights as they lack queries related to specific video segments.…”
Section: Introductionmentioning
confidence: 99%