Text preprocessing is a common task in machine learning applications that involves handlabeling sets. Although automatic and semi-automatic annotation of text data is a growing field, researchers need to develop models that use resources as efficiently as possible for a learning task. The goal of this work was to learn faster with fewer resources. In this paper, the combination of active and transfer learning was examined with the purpose of developing an effective text categorization method. These two forms of learning have proven their efficiency and capacity to train correct models with substantially less training data. We considered three types of criteria for selecting training points: random selection, uncertainty sampling criterion and active transfer selection. Experimental evaluation was performed on five data sets from different domains. The findings of the experiments suggest that by combining active and transfer learning, the algorithm performs better with fewer labels than random selection of training points. INDEX TERMS active learning, active transfer learning, text classification, transfer learning I. INTRODUCTION
Images and text represent types of content that are used together for conveying a message. The process of mapping images to text can provide very useful information and can be included in many applications from the medical domain, applications for blind people, social networking, etc. In this paper, we investigate an approach for mapping images to text using a Kernel Ridge Regression model. We considered two types of features: simple RGB pixel-value features and image features extracted with deep-learning approaches. We investigated several neural network architectures for image feature extraction: VGG16, Inception V3, ResNet50, Xception. The experimental evaluation was performed on three data sets from different domains. The texts associated with images represent objective descriptions for two of the three data sets and subjective descriptions for the other data set. The experimental results show that the more complex deep-learning approaches that were used for feature extraction perform better than simple RGB pixel-value approaches. Moreover, the ResNet50 network architecture performs best in comparison to the other three deep network architectures considered for extracting image features. The model error obtained using the ResNet50 network is less by approx. 0.30 than other neural network architectures. We extracted natural language descriptors of images and we made a comparison between original and generated descriptive words. Furthermore, we investigated if there is a difference in performance between the type of text associated with the images: subjective or objective. The proposed model generated more similar descriptions to the original ones for the data set containing objective descriptions whose vocabulary is simpler, bigger and clearer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.