CNN-RNN: A Unified Framework for Multi-label Image Classification

Wang, Jiang; Yang, Yi; Mao, Junhua; Huang, Zhiheng; Huang, Chang; Xu, Wei

doi:10.48550/arxiv.1604.04573

Cited by 24 publications

(38 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To achieve this, a series of works introduced graphic models, such as Conditional Random Field [8], Dependency Network [10], or co-occurrence matrix [29] to capture pairwise label correlations. Recently, Wang et al [24] formulated a CNN-RNN framework that utilized the semantic redundancy and the co-occurrence dependency implicitly to facilitate effective multi-label classification. Some works [33,2] further took advantage of proposal generation/visual attention mechanism to search local discriminative regions and LSTM [13] to explicitly model label dependencies.…”

Section: Related Workmentioning

confidence: 99%

“…Since the ground truth annotations of test set are unavailable, our method and all existing competitors are trained on the training set and evaluated on the validation set. For the OP, OR, OF1 and CP, CR, CF1 metrics with top-3 constraint, we follow existing methods [24] to exclude the labels with probabilities lower than a threshold (0.5 in our experiments).…”

Section: Comparison On Microsoft Cocomentioning

confidence: 99%

“…However, object localization techniques [23,35] have to search numerous category-agnostic and redundant proposals and can hardly be integrated into deep neural networks for end-to-end training, while visual attention networks can merely locate object regions roughly due to the lack of supervision or guidance. Some other works introduce RNN/LSTM [13,24,2] to further model contextual dependencies among semantic regions and capture label dependencies. However, the RNN/LSTM sequentially arXiv:1908.07325v1 [cs.CV] 20 Aug 2019 models regions/labels dependencies, which cannot fully exploit this property since direct association exists between each region or label pair.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Chen

Hui

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

244

188

View full text Add to dashboard Cite

Recognizing multiple labels of images is a practical and challenging task, and significant progress has been made by searching semantic-aware regions and modeling label dependency. However, current methods cannot locate the semantic regions accurately due to the lack of part-level supervision or semantic guidance. Moreover, they cannot fully explore the mutual interactions among the semantic regions and do not explicitly model the label co-occurrence.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Comparison On Microsoft Cocomentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Chen

Hui

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

244

188

View full text Add to dashboard Cite

show abstract

“…However, the model must allow appropriate label interactions for beneficial results. Multi-task learning is frequently applied to tag images with multiple labels (Wang et al, 2016;Wei et al, 2015). Multi-task sequence learning (Sutton et al, 2007;Collobert et al, 2011) is the task to jointly tag sequence values with multiple label categories.…”

Section: Literature Reviewmentioning

confidence: 99%

Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling

Neogi¹,

Dauwels²

2019

Preprint

View full text Add to dashboard Cite

Conditional Random Fields (CRF) are frequently applied for labeling and segmenting sequence data. introduced hidden state variables in a labeled CRF structure in order to model the latent dynamics within class labels, thus improving the labeling performance. Such a model is known as Latent-Dynamic CRF (LDCRF). We present Factored LDCRF (FLDCRF), a structure that allows multiple latent dynamics of the class labels to interact with each other. Including such latent-dynamic interactions leads to improved labeling performance on single-label and multi-label sequence modeling tasks. We apply our FLDCRF models on two single-label (one nested cross-validation) and one multi-label sequence tagging (nested cross-validation) experiments across two different datasets -UCI gesture phase data and UCI opportunity data. FLDCRF outperforms all state-of-the-art sequence models, i.e., CRF, LDCRF, LSTM, LSTM-CRF, Factorial CRF, Coupled CRF and a multi-label LSTM model in all our experiments. In addition, LSTM based models display inconsistent performance across validation and test data, and pose difficulty to select models on validation data during our experiments. FLDCRF offers easier model selection, consistency across validation and test performance and lucid model intuition. FLDCRF is also much faster to train compared to LSTM, even without a GPU. FLDCRF outshines the best LSTM model by ∼4% on a single-label task on UCI gesture phase data and outperforms LSTM performance by ∼2% on average across nested cross-validation test sets on the multi-label sequence tagging experiment on UCI opportunity data. The idea of FLDCRF can be extended to joint (multi-agent interactions) and heterogeneous (discrete and continuous) state space models.

show abstract

“…Kiros et al used deep representation of images for automatic image annotation [52], [72]. While convolutional neural networks like [53] and [91] were designed to identify a single object label for images, deep and recurrent neural networks have been employed for multi-label ranking and classification problem [33], [111]. Training a deep neural network requires availability of large training dataset, and is computationally extremely expensive.…”

Section: B Deep Convolutional Neural Networkmentioning

confidence: 99%

Designing a symmetric classifier for image annotation using multi-layer sparse coding

Tariq

Foroosh

2018

Image and Vision Computing

View full text Add to dashboard Cite

CNN-RNN: A Unified Framework for Multi-label Image Classification

Cited by 24 publications

References 16 publications

Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Factored Latent-Dynamic Conditional Random Fields for Single and Multi-label Sequence Modeling

Designing a symmetric classifier for image annotation using multi-layer sparse coding

Contact Info

Product

Resources

About