In online video systems, viewer demographic information (gender, age, etc.) is of huge commercial value for delivering targeted advertising and video recommendations, but generally not available directly. This paper targets inferring viewers' gender based on implicit watching history in the largescale online video systems. To tackle the sparsity problem without filtering out any cold users or videos, we not only introduce video tags as features, but also use an efficient Chinese word segmentation method to extract hot key-words from video titles as features. Moreover, users' viewing behavior distribute lognormally, hence we apply a logarithmic transformation on the inference matrixes and further find key features via principal components analysis (PCA). We then solve the gender inference as a classification problem and define some modified evaluation metrics adapt to the imbalance classification problem. We compare a set of classifiers including Class prior, EM, SVM, Logistic regression, Partially supervised soft-label and beliefbased mixture and find that Logistic regression is the best. The inference results show that our algorithms can obtain high 1 F values for all classes. The highest value of PPTV dataset can reach nearly 0.75. And inference based on key-words results in a 14.63% increase of 1 F contrast to the ratings of MovieLens.
With the surging demand on high-quality mobile video services and the unabated development of new network technology, including fog computing, there is a need for a generalized quality of user experience (QoE) model that could provide insight for various network optimization designs. A good QoE, especially when measured as engagement, is an important optimization goal for investors and advertisers. Therefore, many works have focused on understanding how the factors, especially quality of service (QoS) factors, impact user engagement. However, the divergence of user interest is usually ignored or deliberatively decoupled from QoS and/or other objective factors. With an increasing trend towards personalization applications, it is necessary as well as feasible to consider user interest to satisfy aesthetic and personal needs of users when optimizing user engagement. We first propose an Extraction-Inference (E-I) algorithm to estimate the user interest from easily obtained user behaviors. Based on our empirical analysis on a large-scale dataset, we then build a QoS and user Interest based Engagement (QI-E) regression model. Through experiments on our dataset, we demonstrate that the proposed model reaches an improvement in accuracy by 9.99% over the baseline model which only considers QoS factors. The proposed model has potential for designing QoE-oriented scheduling strategies in various network scenarios, especially in the fog computing context.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.