A review of text and image retrieval approaches for broadcast news video

Yan, Rong; Hauptmann, Alexander G.

doi:10.1007/s10791-007-9031-y

Cited by 64 publications

(38 citation statements)

References 95 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It represents an area by roughness, directionality, repeatability and variability features over a certain spatial extent while color is a point property in an image [7]. Texture features are extracted by finding energy distribution in frequency domain by different techniques [39], [40], [41]. Gabor wavelet features are obtained using one such technique to retrieve and classify images and videos [42].…”

Section: Key Frame Featuresmentioning

confidence: 99%

CBVR and Classification of Video Database–Latest Trends, Methods, Effective Techniques, Problems and Challenges

Ansari¹,

Vasishtha²

2015

IJCA

View full text Add to dashboard Cite

Content Based Video Retrieval (CBVR) has been increasingly used to describe the process of retrieving desired videos from a large collection on the basis of features that are extracted from the videos. The extracted features are used to index, classify and retrieve desired and relevant videos while filtering out undesired ones. Videos can be represented by their audio, texts, faces and objects in their frames. An individual video possesses unique motion features, color histograms, motion histograms, text features, audio features, features extracted from faces and objects existing in its frames. Videos containing useful information and occupying significant space in the databases are under-utilized unless CBVR systems capable of retrieving desired videos by sharply selecting relevant while filtering out undesired videos exist. Results have shown performance improvement (higher precision and recall values) when features suitable to particular types of videos are utilized wisely. Various combinations of these features can also be used to achieve desired performance. In this paper a complex and wide area of CBVR and CBVR systems has been presented in a comprehensive and simple way. Processes at different stages in CBVR systems are described in a systematic way. Types of features, their combinations and their utilization methods, techniques and algorithms are also shown. Various querying methods, some of the features like GLCM, Gabor Magnitude, algorithm to obtain similarity like Kullback-Leibler distance method and Relevance Feedback Method are discussed. Functioning of Support Vector Machines (SVM) is discussed which are vital for automatic classification of videos.

show abstract

Section: Key Frame Featuresmentioning

confidence: 99%

CBVR and Classification of Video Database–Latest Trends, Methods, Effective Techniques, Problems and Challenges

Ansari¹,

Vasishtha²

2015

IJCA

View full text Add to dashboard Cite

show abstract

“…As an approach, cross-modal associative learning has been applied to multimodal data retrieval although cross-modal learning is from cognitive science and neuroscience [6]. Snoek et al proposed concept-based video retrieval method [7] and Yan et al studied a multimodal retrieval approach including text and image for broadcast new video [8]. D. Li et al [9] suggested cross-modal association based factor analysis method as alternatives to Latent Semantic Indexing (LSI) and Canonical Correlation Analysis (CCA).…”

Section: Related Workmentioning

confidence: 99%

Layered Hypernetwork Models for Cross-Modal Associative Text and Image Keyword Generation in Multimodal Information Retrieval

Byounghee

Lee

et al. 2010

PRICAI 2010: Trends in Artificial Intelligence

View full text Add to dashboard Cite

Abstract.Conventional methods for multimodal data retrieval use text-tag based or cross-modal approaches such as tag-image co-occurrence and canonical correlation analysis. Since there are differences of granularity in text and image features, however, approaches based on lower-order relationship between modalities may have limitations. Here, we propose a novel text and image keyword generation method by cross-modal associative learning and inference with multimodal queries. We use a modified hypernetwork model, i.e. layered hypernetworks (LHNs) which consists of the first (lower) layer and the second (upper) layer which has more than two modality-dependent hypernetworks and one modality-integrating hypernetwork, respectively. LHNs learn higher-order associative relationships between text and image modalities by training on an example set. After training, LHNs are used to extend multimodal queries by generating text and image keywords via cross-modal inference, i.e. text-toimage and image-to-text. The LHNs are evaluated on Korean magazine articles with images on women fashions and life-style. Experimental results show that the proposed method generates vision-language cross-modal keywords with high accuracy. The results also show that multimodal queries improve the accuracy of keyword generation compared with uni-modal ones.

show abstract

“…And cross-modal association learning has been applied to video data. Yan et al studied a text-image multimodal retrieval task on data of a broadcast new video [9] and Snoek et al suggested a concept-based video retrieval method [8]. Additionally, D. Li et al proposed a factor analysis method based on cross-modal association [10].…”

Section: Related Workmentioning

confidence: 99%

Visual Query Expansion via Incremental Hypernetwork Models of Image and Text

Heo

Kang

Zhang

2010

PRICAI 2010: Trends in Artificial Intelligence

View full text Add to dashboard Cite

Abstract. Humans can associate vision and language modalities and thus generate mental imagery, i.e. visual images, from linguistic input in an environment of unlimited inflowing information. Inspired by human memory, we separate a text-to-image retrieval task into two steps: 1) text-to-image conversion (generating visual queries for the 2 step) and 2) image-to-image retrieval task. This separation is advantageous for inner representation visualization, learning incremental dataset, using the results of content-based image retrieval. Here, we propose a visual query expansion method that simulates the capability of human associative memory. We use a hyperenetwork model (HN) that combines visual words and linguistic words. HNs learn the higher-order cross-modal associative relationships incrementally on a set of image-text pairs in sequence. An incremental HN generates images by assembling visual words based on linguistic cues. And we retrieve similar images with the generated visual query. The method is evaluated on 26 video clips of 'Thomas and Friends'. Experiments show the performance of successive image retrieval rate up to 98.1% with a single text cue. It shows the additional potential to generate the visual query with several text cues simultaneously.

show abstract

A review of text and image retrieval approaches for broadcast news video

Cited by 64 publications

References 95 publications

CBVR and Classification of Video Database–Latest Trends, Methods, Effective Techniques, Problems and Challenges

CBVR and Classification of Video Database–Latest Trends, Methods, Effective Techniques, Problems and Challenges

Layered Hypernetwork Models for Cross-Modal Associative Text and Image Keyword Generation in Multimodal Information Retrieval

Visual Query Expansion via Incremental Hypernetwork Models of Image and Text

Contact Info

Product

Resources

About