Image Classification Based on the Combination of Text Features and Visual Features

Tian, Lexiao; Zheng, Dequan; Zhu, Conghui

doi:10.1002/int.21567

Cited by 17 publications

(9 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Few‐shot learning : Few‐shot learning, based on meta‐learning, typically uses episodic training strategies 31,32 . In each episode, the model based on meta‐learning is trained on a meta‐task, which can be viewed as a classification task 33,34 . During training, the tasks were randomly selected from the training data set in the episodes.…”

Section: Related Workmentioning

confidence: 99%

“…31,32 In each episode, the model based on meta-learning is trained on a metatask, which can be viewed as a classification task. 33,34 During training, the tasks were randomly selected from the training data set in the episodes. During the model evaluation, the tasks were selected from a separate test data set consisting of novel classes not included in the training data set.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning to learn by yourself: Unsupervised meta‐learning with self‐knowledge distillation for COVID‐19 diagnosis from pneumonia cases

et al. 2021

View full text Add to dashboard Cite

The goal of diagnosing the coronavirus disease 2019 (COVID‐19) from suspected pneumonia cases, that is, recognizing COVID‐19 from chest X‐ray or computed tomography (CT) images, is to improve diagnostic accuracy, leading to faster intervention. The most important and challenging problem here is to design an effective and robust diagnosis model. To this end, there are three challenges to overcome: (1) The lack of training samples limits the success of existing deep‐learning‐based methods. (2) Many public COVID‐19 data sets contain only a few images without fine‐grained labels. (3) Due to the explosive growth of suspected cases, it is urgent and important to diagnose not only COVID‐19 cases but also the cases of other types of pneumonia that are similar to the symptoms of COVID‐19. To address these issues, we propose a novel framework called Unsupervised Meta‐Learning with Self‐Knowledge Distillation to address the problem of differentiating COVID‐19 from pneumonia cases. During training, our model cannot use any true labels and aims to gain the ability of learning to learn by itself. In particular, we first present a deep diagnosis model based on a relation network to capture and memorize the relation among different images. Second, to enhance the performance of our model, we design a self‐knowledge distillation mechanism that distills knowledge within our model itself. Our network is divided into several parts, and the knowledge in the deeper parts is squeezed into the shallow ones. The final results are derived from our model by learning to compare the features of images. Experimental results demonstrate that our approach achieves significantly higher performance than other state‐of‐the‐art methods. Moreover, we construct a new COVID‐19 pneumonia data set based on text mining, consisting of 2696 COVID‐19 images (347 X‐ray + 2349 CT), 10,155 images (9661 X‐ray + 494 CT) about other types of pneumonia, and the fine‐grained labels of all. Our data set considers not only a bacterial infection or viral infection which causes pneumonia but also a viral infection derived from the influenza virus or coronavirus.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Learning to learn by yourself: Unsupervised meta‐learning with self‐knowledge distillation for COVID‐19 diagnosis from pneumonia cases

et al. 2021

View full text Add to dashboard Cite

show abstract

“…A lot of literature exists that combines audio and video data [2][3] [4]. Thus a system is proposed combining text(tweets) and video features [5] to predict user's emotion as this combination gave a better result than any other [1]. Since Twitter is the most popular micro-blogging site which is regularly and frequently used to express sentiments it was our ideal choice as a source for text, video data is taken in real time as user answers the beck 9 questionnaire.…”

Section: Introductionmentioning

confidence: 99%

Survey of Approaches for Building Emotion Detection Applications using a Multi-modal Approach

More¹,

Subramanyam²,

Reddy³

et al. 2018

IJCA

View full text Add to dashboard Cite

Emotion detection of users is a challenging and exciting field where user's data is analyzed to recognize emotions such as happy, sad, angry etc. This data could be in one or multiple formats such as audio, video, text, still images etc. Relevant features are extracted and fused together to give a label. Fusing data from two or more sources(modalities) is another challenge, feature level or decision level fusion is employed. This paper inspects and studies the various approaches to multi-modal extraction of emotions.

show abstract

“…Here, a noise filtering algorithm is necessarily developed to remove the irrelevant web texts for the BoW-based model. Since web resources have great reliability diversity, it may not be an optimal practice to allocate fixed weights to the visual feature-based and text feature-based classifiers as in [9][10][11]105]. In this chapter, an adaptive fusion algorithm is developed for the integration of the visual feature-based and web textual feature-based classification results.…”

Section: Motivationsmentioning

confidence: 99%

“…Different from homogeneous web data-aided approaches, heterogeneous web dataaided frameworks [9][10][11] have been developed to explore different modality data and facilitate image classification, such as image tags or descriptions in the form of short text. Compared to homogeneous frameworks, heterogeneous frameworks not only use the extra images that have the same feature representation for training, but also investigate different feature representations for the web text information.…”

mentioning

confidence: 99%

Web text-aided image classification

Wang¹

View full text Add to dashboard Cite

vi supervised setting when only little labeled data is available. Specially, we investigate web text-aided one-shot learning that is able to identify unlabeled data from novel classes based on single observation using an adaptive attention mechanism. This thesis is organized as follows. Chapter 1 introduces the motivation behind the web resources-aided image classification. Chapter 2 reviews the related works in this field, including image representation learning, text representation learning and multimodal fusion learning. Chapter 3 investigates the decision-level data fusion for web-aided image classification. An adaptive combiner for two separate bimodal classifiers is developed in decision level. This adaptive fusion algorithm is inspired by the multisensory integration mechanism of human. And the adaptability is achieved by reliabilitydependent weighting of different sensory modalities. In Chapter 4, a novel text modeling namely the semantic matching neural network (SMNN) is proposed, which is quantified by cosine similarity measures between embedded text input and task-specific semantic filters. It is capable of learning semantic features from the associated text of web images. The SMNN text features have improved reliability and applicability, compared to the text features obtained from other methods. Then, the SMNN text features and convolutional neural network visual features are jointly learned in a shared representation, which aims to capture the correlations between the two modalities in the feature level. Improving upon task-specific filters for SMNN, Chapter 5 presents a novel semantic CNN (s-CNN) model for high-level text representation learning to encode semantic correlation based on task-generic semantic filters. However, the s-CNN model inevitably brings about surplus semantic filters to achieve better applicability and generalization in universal tasks. Moreover, the surplus filters may lead to semantic overlaps and feature redundancy issue. To address this issue, the s-CNN Clustered (s-CNNC) models that uses filter clusters instead of individual filters is presented. Interacting with the image CNN models, the s-CNNC models can further boost image classification under a multimodal framework, which can be trained end-toend. Chapter 6 develops an adaptive encoder-decoder attention network that uses web text to aid one-shot image classification. Without any ground truth semantic clues, e.g., class tag information, our model is able to extract useful information Contents vii from web sourced data instead. To address the noise nature of web text, the adaptive mechanism is introduced to determine when to attend to text-inferred visual features and when to rely on original visual features. The summarization and future prospect of my PhD work is finally discussed in Chapter 7.

show abstract

Image Classification Based on the Combination of Text Features and Visual Features

Cited by 17 publications

References 11 publications

Learning to learn by yourself: Unsupervised meta‐learning with self‐knowledge distillation for COVID‐19 diagnosis from pneumonia cases

Learning to learn by yourself: Unsupervised meta‐learning with self‐knowledge distillation for COVID‐19 diagnosis from pneumonia cases

Survey of Approaches for Building Emotion Detection Applications using a Multi-modal Approach

Web text-aided image classification

Contact Info

Product

Resources

About