Learning More Universal Representations for Transfer-Learning

Tamaazousti, Youssef; Borgne, Hervé Le; Hudelot, Céline; Seddik, Mohamed-El-Amine; Tamaazousti, Mohamed

doi:10.1109/tpami.2019.2913857

Cited by 45 publications

(42 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The best outcomes among var-ious combinations of our single-layer depth augmented AlexNet and VGG-16 evaluated by our layer-wise fine-tuning scheme are shown. For other approaches, the performance gap between our implementation and that reported by [5], [25], [26], [27], [15], [28] is due to different target sets, train-test splits, network architectures, and iterations. Note that we have used similar hyper-parameters, iterations, and train-test splits for all approaches in Tables 3 and 4 to maintain a fair comparison.…”

Section: Comparison With Contemporary Transfer Learning Workmentioning

confidence: 78%

Enhanced Transfer Learning with ImageNet Trained Classification Layer

Shermin

Teng

Murshed

et al. 2019

Image and Video Technology

View full text Add to dashboard Cite

Parameter fine tuning is a transfer learning approach whereby learned parameters from pre-trained source network are transferred to the target network followed by fine-tuning. Prior research has shown that this approach is capable of improving task performance. However, the impact of the ImageNet pre-trained classification layer in parameter finetuning is mostly unexplored in the literature. In this paper, we propose a fine-tuning approach with the pre-trained classification layer. We employ layer-wise fine-tuning to determine which layers should be frozen for optimal performance. Our empirical analysis demonstrates that the proposed fine-tuning performs better than traditional fine-tuning. This finding indicates that the pre-trained classification layer holds less categoryspecific or more global information than believed earlier. Thus, we hypothesize that the presence of this layer is crucial for growing network depth to adapt better to a new task. Our study manifests that careful normalization and scaling are essential for creating harmony between the pre-trained and new layers for target domain adaptation. We evaluate the proposed depth augmented networks for fine-tuning on several challenging benchmark datasets and show that they can achieve higher classification accuracy than contemporary transfer learning approaches.

show abstract

Section: Comparison With Contemporary Transfer Learning Workmentioning

confidence: 78%

Enhanced Transfer Learning with ImageNet Trained Classification Layer

Shermin

Teng

Murshed

et al. 2019

Image and Video Technology

View full text Add to dashboard Cite

show abstract

“…iCaRL halves this score while our best configurations with DF E DF E DF E based on 100 and 1000 classes lose only 22 and 12 points respectively. The gap could probably be further reduced if the feature extractors were more universal [10,11]. This could, for instance, be achieved if DeeSIL's initial training would be done with an even larger number of classes.…”

Section: Evaluation and Discussionmentioning

confidence: 99%

“…One thousand classes are selected to form a diversified subset of ImageNet and thus increase universality (i.e. optimize their transferability toward new tasks) [10,11]. -F L1000 -train with a more challenging dataset which is obtained from weakly annotated Flickr group data and is visually more distant from the test set.…”

Section: Methodsmentioning

confidence: 99%

DeeSIL: Deep-Shallow Incremental Learning

Belouadah

Popescu

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Incremental Learning (IL) is an interesting AI problem when the algorithm is assumed to work on a budget. This is especially true when IL is modeled using a deep learning approach, where two complex challenges arise due to limited memory, which induces catastrophic forgetting and delays related to the retraining needed in order to incorporate new classes. Here we introduce DeeSIL, an adaptation of a known transfer learning scheme that combines a fixed deep representation used as feature extractor and learning independent shallow classifiers to increase recognition capacity. This scheme tackles the two aforementioned challenges since it works well with a limited memory budget and each new concept can be added within a minute. Moreover, since no deep retraining is needed when the model is incremented, DeeSIL can integrate larger amounts of initial data that provide more transferable features. Performance is evaluated on ImageNet LSVRC 2012 against three state of the art algorithms. Results show that, at scale, DeeSIL performance is 23 and 33 points higher than the best baseline when using the same and more initial data respectively.

show abstract

“…Existing unsupervised methods also do not use feature projection. Some other works have also been done for semi-supervised representation learning (Kevin Clark, 2018) and transfer learning (Tamaazousti et al, 2018). Jason Phang ( 2019) also proposed to use some data-rich intermediate supervised tasks for pre-training to help produce better representation for the end task.…”

Section: Related Workmentioning

confidence: 99%

Feature Projection for Improved Text Classification

Qin

Liu

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

In classification, there are usually some good features that are indicative of class labels. For example, in sentiment classification, words like good and nice are indicative of the positive sentiment and words like bad and terrible are indicative of the negative sentiment. However, there are also many common features (e.g., words) that are not indicative of any specific class (e.g., voice and screen, which are common to both sentiment classes and are not discriminative for classification). Although deep learning has made significant progresses in generating discriminative features through its powerful representation learning, we believe there is still room for improvement. In this paper, we propose a novel angle to further improve this representation learning, i.e., feature projection. This method projects existing features into the orthogonal space of the common features. The resulting projection is thus perpendicular to the common features and more discriminative for classification. We apply this new method to improve CNN, RNN, Transformer, and Bert based text classification and obtain markedly better results.

show abstract

Learning More Universal Representations for Transfer-Learning

Cited by 45 publications

References 45 publications

Enhanced Transfer Learning with ImageNet Trained Classification Layer

Enhanced Transfer Learning with ImageNet Trained Classification Layer

DeeSIL: Deep-Shallow Incremental Learning

Feature Projection for Improved Text Classification

Contact Info

Product

Resources

About