2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI) 2021
DOI: 10.1109/sami50585.2021.9378632
|View full text |Cite
|
Sign up to set email alerts
|

A Framework for Lecture Video Segmentation from Extracted Speech Content

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…Through the utilization of neural networks and boosted margin maximization, this method exhibits promise in accurately identifying and organizing crucial segments within lecture videos, thereby enhancing content retrieval and accessibility. [22] focuses on the development of a segmentation method for lecture videos based on speech patterns, including pitch, volume, pause rates, and the initial time of each audio chunk, as well as content cues. Leveraging speech recognition techniques, the proposed approach aims to accurately identify and segment lecture videos into meaningful segments.…”
Section: Video Segmentation Based On Audio/image/text Algorithmsmentioning
confidence: 99%
“…Through the utilization of neural networks and boosted margin maximization, this method exhibits promise in accurately identifying and organizing crucial segments within lecture videos, thereby enhancing content retrieval and accessibility. [22] focuses on the development of a segmentation method for lecture videos based on speech patterns, including pitch, volume, pause rates, and the initial time of each audio chunk, as well as content cues. Leveraging speech recognition techniques, the proposed approach aims to accurately identify and segment lecture videos into meaningful segments.…”
Section: Video Segmentation Based On Audio/image/text Algorithmsmentioning
confidence: 99%
“…The video is usually divided into small segments (e.g by using voice activity detection [38]) or temporal windows. Then, existing transcripts or automatic speech recognition (ASR) are used to create an embedding for each segment based on methods like bags of words [39], word2vec [38]- [40], and TF-IDF [41]. Some methods also consider additional acoustic features [38], [40].…”
Section: ) Audio-based Analysismentioning
confidence: 99%
“…Then, existing transcripts or automatic speech recognition (ASR) are used to create an embedding for each segment based on methods like bags of words [39], word2vec [38]- [40], and TF-IDF [41]. Some methods also consider additional acoustic features [38], [40]. The embeddings of contiguous audio segments are then compared and if the distance is too large then both segments are assumed to belong to different topics and therefore the video is split.…”
Section: ) Audio-based Analysismentioning
confidence: 99%
See 1 more Smart Citation