2012
DOI: 10.1002/int.21567
|View full text |Cite
|
Sign up to set email alerts
|

Image Classification Based on the Combination of Text Features and Visual Features

Abstract: With more and more text‐image co‐occurrence data becoming available on the Web, we are interested in how text especially Chinese context around images can aid image classification. The goal is to construct a classification system for images, and we used the context of the images to improve the classification system. First, we extracted three kinds of features, including global visual features, local visual features, and text features using both the image content and context. Then, we tried various feature comb… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 11 publications
0
9
0
Order By: Relevance
“…Few‐shot learning : Few‐shot learning, based on meta‐learning, typically uses episodic training strategies 31,32 . In each episode, the model based on meta‐learning is trained on a meta‐task, which can be viewed as a classification task 33,34 . During training, the tasks were randomly selected from the training data set in the episodes.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Few‐shot learning : Few‐shot learning, based on meta‐learning, typically uses episodic training strategies 31,32 . In each episode, the model based on meta‐learning is trained on a meta‐task, which can be viewed as a classification task 33,34 . During training, the tasks were randomly selected from the training data set in the episodes.…”
Section: Related Workmentioning
confidence: 99%
“…31,32 In each episode, the model based on meta-learning is trained on a metatask, which can be viewed as a classification task. 33,34 During training, the tasks were randomly selected from the training data set in the episodes. During the model evaluation, the tasks were selected from a separate test data set consisting of novel classes not included in the training data set.…”
Section: Introductionmentioning
confidence: 99%
“…A lot of literature exists that combines audio and video data [2][3] [4]. Thus a system is proposed combining text(tweets) and video features [5] to predict user's emotion as this combination gave a better result than any other [1]. Since Twitter is the most popular micro-blogging site which is regularly and frequently used to express sentiments it was our ideal choice as a source for text, video data is taken in real time as user answers the beck 9 questionnaire.…”
Section: Introductionmentioning
confidence: 99%
“…Here, a noise filtering algorithm is necessarily developed to remove the irrelevant web texts for the BoW-based model. Since web resources have great reliability diversity, it may not be an optimal practice to allocate fixed weights to the visual feature-based and text feature-based classifiers as in [9][10][11]105]. In this chapter, an adaptive fusion algorithm is developed for the integration of the visual feature-based and web textual feature-based classification results.…”
Section: Motivationsmentioning
confidence: 99%
“…Different from homogeneous web data-aided approaches, heterogeneous web dataaided frameworks [9][10][11] have been developed to explore different modality data and facilitate image classification, such as image tags or descriptions in the form of short text. Compared to homogeneous frameworks, heterogeneous frameworks not only use the extra images that have the same feature representation for training, but also investigate different feature representations for the web text information.…”
mentioning
confidence: 99%