Comparison of Visual Features and Fusion Techniques in Automatic Detection of Concepts from News Video

Rautiainen, Mika; Seppdnen, T.

doi:10.1109/icme.2005.1521577

Cited by 8 publications

(7 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Westerveld and van Gemert adopted a similar approach, although they applied dimensionality reduction techniques to the heterogeneous vectors for compressing the multimodal information (Westerveld 2000;van Gemert 2003). Similar approaches are reported in (Rautiainen et al 2004;Rautiainen and Seppdnen 2005;Snoek et al 2005). Compared to LF, EF can be more efficient as a single retrieval stage is performed, however, the dimensionality in which EF methods may work can be huge.…”

Section: Related Workmentioning

confidence: 50%

“…Usually a single method is used per modality (Peinado et al 2005;Izquierdo-Beviá et al 2005;Besancon and Millet 2006;Chang and Chen 2006;Rautiainen et al 2004), although the use of multiple and heterogeneous techniques has been also studied (Escalante et al 2008b). The EF formulation, on the other hand, consists of merging the vectors corresponding to textual and visual information beforehand and then using a straight retrieval technique (Rautiainen et al 2004;Rautiainen and Seppdnen 2005;Snoek et al 2005;Westerveld 2000;van Gemert 2003;). In its basic form, EF consists of concatenating the vectors of textual and visual features.…”

Section: Related Workmentioning

confidence: 96%

“…Feature vectors have been usually combined through late fusion (LF) or early fusion (EF) techniques. The LF approach to multimedia image retrieval consists of running several unimodal retrieval methods (either textual or visual) and combining their outputs to obtain a single list of ranked images (documents) per query (Westerveld 2004;Escalante et al 2008b;Peinado et al 2005;Izquierdo-Beviá et al 2005;Besancon and Millet 2006;Inf Retrieval (2012) 15:1-32 5 Chang and Chen 2006;Rautiainen and Seppdnen 2005;Rautiainen et al 2004;Snoek et al 2005). Because of its simplicity and its effectiveness, LF is one of the most used techniques for information fusion in general (Shu and Taska 2005), although a disadvantage of this method is that it may be inefficient as several retrieval methods must be run for each query.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Multimodal indexing based on semantic cohesion for image retrieval

Montes

Sucar

2011

Inf Retrieval

View full text Add to dashboard Cite

This paper introduces two novel strategies for representing multimodal images with application to multimedia image retrieval. We consider images that are composed of both text and labels: while text describes the image content at a very high semantic level (e.g., making reference to places, dates or events), labels provide a mid-level description of the image (i.e., in terms of the objects that can be seen in the image). Accordingly, the main assumption of this work is that by combining information from text and labels we can develop very effective retrieval methods. We study standard information fusion techniques for combining both sources of information. However, whereas the performance of such techniques is highly competitive, they cannot capture effectively the content of images. Therefore, we propose two novel representations for multimodal images that attempt to exploit the semantic cohesion among terms from different modalities. Such representations are based on distributional term representations widely used in computational linguistics. Under the considered representations the content of an image is modeled by a distribution of co-occurrences over terms or of occurrences over other images, in such a way that the representation can be considered an expansion of the multimodal terms in the image. We report experimental results using the SAIAPR TC12 benchmark on two sets of topics used in ImageCLEF competitions with manually and automatically generated labels. Experimental results show that the proposed representations outperform significantly both, standard multimodal techniques and unimodal methods. Results on manually assigned labels provide an upper bound in the retrieval performance that can be obtained, whereas results with automatically generated labels are encouraging. The novel representations are able to capture more effectively the content of multimodal images. We emphasize that although we have applied our representations to multimedia image retrieval the same formulation can be adopted for modeling other multimodal documents (e.g., videos).

show abstract

Section: Related Workmentioning

confidence: 50%

Section: Related Workmentioning

confidence: 96%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Multimodal indexing based on semantic cohesion for image retrieval

Montes

Sucar

2011

Inf Retrieval

View full text Add to dashboard Cite

show abstract

“…We notice that the coder neural-network obtains superior scores to those obtained by the other systems (1,2,3,4) for all semantic concepts. This supports further the importance of feature fusion.…”

mentioning

confidence: 77%

“…Many application domains making use of video data are available: Security, digital library, interactive TV, etc... Many of those rely on video content analysis and in particular video shot classification [1,2].…”

Section: Introductionmentioning

confidence: 99%

Low-level feature fusion models for soccer scene classification

Benmokhtar

Huet

Berrani

2008

2008 IEEE International Conference on Multimedia and Expo

View full text Add to dashboard Cite

This paper presents an automatic semantic concept extraction method which employs low level visual feature fusion. Both static and dynamic feature fusion approaches are studied and evaluated. The main contributions of this paper are: A novel dynamic feature fusion approach inspired from coding is proposed to create compact yet rich signatures; Statistical study of descriptors with and without fusion. To validate and evaluate our approach, we have conducted a set experiments on the classification of soccer video shots. These experiments show, in particular, that the feature fusion step of our system increases the classification rate of 17% comparing to a system without feature fusion.

show abstract

Fusion, Rank-Level

Pathak¹

2009

Encyclopedia of Biometrics

View full text Add to dashboard Cite

Comparison of Visual Features and Fusion Techniques in Automatic Detection of Concepts from News Video

Cited by 8 publications

References 8 publications

Multimodal indexing based on semantic cohesion for image retrieval

Multimodal indexing based on semantic cohesion for image retrieval

Low-level feature fusion models for soccer scene classification

Fusion, Rank-Level

Contact Info

Product

Resources

About