Discovery and subsequent effective retrieval of useful user generated content depends on proper meta-data annotation implemented on an object such as a title and Keywords. In this study, a simpler unsupervised non graph-based algorithm for extracting keywords is proposed. A novel key phrases chunking approach was adopted; this utilizes words sequences as they appear in the original document. The simple but effective Term frequency-inverse document frequency (tf-idf) weighting scheme was exploited to rank the novelty created keyphrases. Comparing to a similar algorithm that uses three metrics weighting scheme, the tfidf yielded a precision of 89%.Thus, the application of tf-idf algorithm on YouTube's metadata based keywords shows to be useful approach in its objectivity. 105 our proposed automatic keyword extraction approach and the evidence based discourse of the UGC creator"s subjectivity in tagging. Our keywords extraction method is an approach towards solving that bias.
Given the importance of the textual information in content retrieval, it is desirable that the textual representation of educational videos contents in social media platforms like YouTube capture the semantics of what is really in content they represent. Such coherent textual representations are important in objective video content retrieval, repurposing, reuse and sense-making of the content. In this study,the Automatic Speech Recognition (ASR) in the video tracks was leveraged to supplement the insufficient video content representations done through video title alone. The Latent Dirichlet allocation (LDA) implementation of Gibb's sampling topic modeling approach was used to evaluate the suitability of various textual representations for YouTube educational videos and extract the candidate topic that extends well the original YouTube keywords. The results show that in topics space, YouTube ASR script performs well as a representative textual source in dominant topic than the combined textual representations. The automatic keywords extension obtained using our method add value to applications that use tags for content discovery or retrieval
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.