Abstract:Accurate voice humming transcription and efficient indexing and retrieval schemes are essential to a large-scale humming-based audio retrieval system. Although much research has been done to develop such schemes, their performance in terms of precision, recall, and F-measure, among all similarity metrics, are still unsatisfactory. In this paper, we propose a new voice query transcription scheme. It considers the following features: note onset detection using dynamic threshold methods, fundamental frequency (F0… Show more
“…Another possible future direction is to combine data hiding [watermarking (Liu et al 2015 ; Liu et al 2016 ), image and video steganography (Mstafa and Elleithy 2015 ; Muhammad et al 2015 ; Lin et al 2015 )] with the video summarization frameworks by embedding the patient and gynecologists data in DH videos/keyframes, resulting in secure and privacy-preserving VS framework as presented in ( Muhammad et al 2015 ) for secure visual contents retrieval from personalized repositories and other mobile healthcare applications (Lv et al 2016 ). Furthermore, we are also planning to explore deep learning and incorporate GPUs based processing (Mei and Tian 2016 ; Mei 2014 ) for efficient keyframes extraction, their indexing and retrieval (Rho et al 2008 ; Rho et al 2011 ; Rho and Hwang 2006 ).…”
In clinical practice, diagnostic hysteroscopy (DH) videos are recorded in full which are stored in long-term video libraries for later inspection of previous diagnosis, research and training, and as an evidence for patients’ complaints. However, a limited number of frames are required for actual diagnosis, which can be extracted using video summarization (VS). Unfortunately, the general-purpose VS methods are not much effective for DH videos due to their significant level of similarity in terms of color and texture, unedited contents, and lack of shot boundaries. Therefore, in this paper, we investigate visual saliency models for effective abstraction of DH videos by extracting the diagnostically important frames. The objective of this study is to analyze the performance of various visual saliency models with consideration of domain knowledge and nominate the best saliency model for DH video summarization in healthcare systems. Our experimental results indicate that a hybrid saliency model, comprising of motion, contrast, texture, and curvature saliency, is the more suitable saliency model for summarization of DH videos in terms of extracted keyframes and accuracy.
“…Another possible future direction is to combine data hiding [watermarking (Liu et al 2015 ; Liu et al 2016 ), image and video steganography (Mstafa and Elleithy 2015 ; Muhammad et al 2015 ; Lin et al 2015 )] with the video summarization frameworks by embedding the patient and gynecologists data in DH videos/keyframes, resulting in secure and privacy-preserving VS framework as presented in ( Muhammad et al 2015 ) for secure visual contents retrieval from personalized repositories and other mobile healthcare applications (Lv et al 2016 ). Furthermore, we are also planning to explore deep learning and incorporate GPUs based processing (Mei and Tian 2016 ; Mei 2014 ) for efficient keyframes extraction, their indexing and retrieval (Rho et al 2008 ; Rho et al 2011 ; Rho and Hwang 2006 ).…”
In clinical practice, diagnostic hysteroscopy (DH) videos are recorded in full which are stored in long-term video libraries for later inspection of previous diagnosis, research and training, and as an evidence for patients’ complaints. However, a limited number of frames are required for actual diagnosis, which can be extracted using video summarization (VS). Unfortunately, the general-purpose VS methods are not much effective for DH videos due to their significant level of similarity in terms of color and texture, unedited contents, and lack of shot boundaries. Therefore, in this paper, we investigate visual saliency models for effective abstraction of DH videos by extracting the diagnostically important frames. The objective of this study is to analyze the performance of various visual saliency models with consideration of domain knowledge and nominate the best saliency model for DH video summarization in healthcare systems. Our experimental results indicate that a hybrid saliency model, comprising of motion, contrast, texture, and curvature saliency, is the more suitable saliency model for summarization of DH videos in terms of extracted keyframes and accuracy.
“…Finally, a different kind of IR, related to query by humming, is reported by Rho et al [154]. The target of the paper is music retrieval, where human voice is used to produce a short clip of singing, whistling or humming to give a rough approximation of the music requested.…”
https://adwords.googleblog.com/2015/05/building-for-next-moment. html * The use of the word "mainly" here indicates that the boundary between the two areas is becoming fuzzy, with some IR systems working in a more deterministic way and DB systems working in a more probabilistic or "uncertainty conscious" fashion. * The semantic meaning of different words is inferred using a deep learning technique named word2vec, https://code.google.com/p/word2vec/.† https://code.google.com/p/androguard/ * trec.nist.gov † research.nii.ac.jp/ntcir/index-en.html ‡ https://sites.google.com/site/treccontext/
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.