For almost every online service, it is fundamental to understand patterns, differences and trends revealed by age demographic analysis—for example, take the discovery of malicious activity, including identity theft, violation of community guidelines and fake profiles. In the particular case of platforms such as Facebook, Twitter and Yahoo! Answers, user demographics have impacts on their revenues and user experience; demographics assist in ensuring that the needs of each cohort are fulfilled via personalizing and contextualizing content. Despite the fact that technology has been made more accessible, thereby becoming evermore prevalent in both personal and professional lives alike, older people continue to trail Gen Z and Millennials in its adoption. This trailing brings about an under-representation that has a harmful influence on the demographic analysis and on supervised machine learning models. To that end, this paper pioneers attempts at examining this and other major challenges facing three distinct modalities when dealing with community question answering (cQA) platforms (i.e., texts, images and metadata). As for textual inputs, we propose an age-batched greedy curriculum learning (AGCL) approach to lessen the effects of their inherent class imbalances. When built on top of FastText shallow neural networks, AGCL achieved an increase of ca. 4% in macro-F1-score with respect to baseline systems (i.e., off-the-shelf deep neural networks). With regard to metadata, our experiments show that random forest classifiers significantly improve their performance when individuals close to generational borders are excluded (up to 20% more accuracy); and by experimenting with neural network-based visual classifiers, we discovered that images are the most challenging modality for age prediction. In fact, it is hard for a visual inspection to connect profile pictures with age cohorts, and there are considerable differences in their group distributions with respect to meta-data and textual inputs. All in all, we envisage that our findings will be highly relevant as guidelines for constructing assorted multimodal supervised models for automatic age recognition across cQA platforms.
Automatic recognition of visual objects using a deep learning approach has been successfully applied to multiple areas. However, deep learning techniques require a large amount of labeled data, which is usually expensive to obtain. An alternative is to use semi-supervised models, such as co-training, where multiple complementary views are combined using a small amount of labeled data. A simple way to associate views to visual objects is through the application of a degree of rotation or a type of filter. In this work, we propose a co-training model for visual object recognition using deep neural networks by adding layers of self-supervised neural networks as intermediate inputs to the views, where the views are diversified through the cross-entropy regularization of their outputs. Since the model merges the concepts of co-training and self-supervised learning by considering the differentiation of outputs, we called it Differential Self-Supervised Co-Training (DSSCo-Training). This paper presents some experiments using the DSSCo-Training model to well-known image datasets such as MNIST, CIFAR-100, and SVHN. The results indicate that the proposed model is competitive with the state-of-art models and shows an average relative improvement of 5% in accuracy for several datasets, despite its greater simplicity with respect to more recent approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.