Yanru Qu scite author profile

Abstract-Predicting user responses, such as clicks and conversions, is of great importance and has found its usage in many Web applications including recommender systems, web search and online advertising. The data in those applications is mostly categorical and contains multiple fields; a typical representation is to transform it into a high-dimensional sparse binary feature representation via one-hot encoding. Facing with the extreme sparsity, traditional models may limit their capacity of mining shallow patterns from the data, i.e. low-order feature combinations. Deep models like deep neural networks, on the other hand, cannot be directly applied for the high-dimensional input because of the huge feature space. In this paper, we propose a Product-based Neural Networks (PNN) with an embedding layer to learn a distributed representation of the categorical data, a product layer to capture interactive patterns between interfield categories, and further fully connected layers to explore high-order feature interactions. Our experimental results on two large-scale real-world ad click datasets demonstrate that PNNs consistently outperform the state-of-the-art models on various metrics.

show abstract

Product-Based Neural Networks for User Response Prediction over Multi-Field Categorical Data

Fang

Zhang

et al. 2018

ACM Trans. Inf. Syst.

174

193

View full text Add to dashboard Cite

User response prediction is a crucial component for personalized information retrieval and filtering scenarios, such as recommender system and web search. The data in user response prediction is mostly in a multi-field categorical format and transformed into sparse representations via one-hot encoding. Due to the sparsity problems in representation and optimization, most research focuses on feature engineering and shallow modeling. Recently, deep neural networks have attracted research attention on such a problem for their high capacity and end-to-end training scheme. In this paper, we study user response prediction in the scenario of click prediction. We first analyze a coupled gradient issue in latent vector-based models and propose kernel product to learn field-aware feature interactions. Then we discuss an insensitive gradient issue in DNN-based models and propose Product-based Neural Network (PNN) which adopts a feature extractor to explore feature interactions. Generalizing the kernel product to a net-in-net architecture, we further propose Product-network In Network (PIN) which can generalize previous models. Extensive experiments on 4 industrial datasets and 1 contest dataset demonstrate that our models consistently outperform 8 baselines on both AUC and log loss. Besides, PIN makes great CTR improvement (relatively 34.67%) in online A/B test.Many machine learning models are leveraged or proposed to work on such a problem, including linear models, latent vector-based models, tree models, and DNN-based models. Linear models, such as Logistic Regression (LR) [25] and Bayesian Probit Regression [14], are easy to implement and with high efficiency. A typical latent vector-based model is Factorization Machine (FM) [36]. FM uses weights and latent vectors to represent categories. According to their parametric representations, LR has a linear feature extractor, and FM has a bi-linear 2 feature extractor. The prediction of LR and FM are simply based on the sum over weights, thus their classifiers are linear. FM works well on sparse data, and inspires a lot of extensions, including Field-aware FM (FFM) [21]. FFM introduces field-aware latent vectors, which gain FFM higher capacity and better performance. However, FFM is restricted by space complexity. Inspired by FFM, we find a coupled gradient issue of latent vector-based models and refine feature interactions 3 as field-aware feature interactions. To solve this issue as well as saving memory, we propose kernel product methods and derive Kernel FM (KFM) and Network in FM (NIFM).Trees and DNNs are potent function approximators. Tree models, such as Gradient Boosting Decision Tree (GBDT) [6], are popular in various data science contests as well as industrial applications. GBDT explores very high order feature combinations in a non-parametric way, yet its exploration ability is restricted when feature space becomes extremely high-dimensional and sparse. DNN has also been preliminarily studied in information system literature [8,33,40,51]. In [51], FM supported Neura...

show abstract

Dynamically Fused Graph Network for Multi-hop Reasoning

Qiu¹,

Xiao²,

Qu³

et al. 2019

134

149

View full text Add to dashboard Cite

Text-based question answering (TBQA) has been studied extensively in recent years. Most existing approaches focus on finding the answer to a question within a single paragraph. However, many difficult questions require multiple supporting evidence from scattered text across two or more documents. In this paper, we propose the Dynamically Fused Graph Network (DFGN), a novel method to answer those questions requiring multiple scattered evidence and reasoning over them. Inspired by human's step-by-step reasoning behavior, DFGN includes a dynamic fusion layer that starts from the entities mentioned in the given query, explores along the entity graph dynamically built from the text, and gradually finds relevant supporting entities from the given documents. We evaluate DFGN on HotpotQA, a public TBQA dataset requiring multi-hop reasoning. DFGN achieves competitive results on the public board. Furthermore, our analysis shows DFGN could produce interpretable reasoning chains.

show abstract

Wasserstein Distance Guided Representation Learning for Domain Adaptation

Shen

Zhang

et al. 2018

AAAI

367

136

View full text Add to dashboard Cite

Domain adaptation aims at generalizing a high-performance learner on a target domain via utilizing the knowledge distilled from a source domain which has a different but related data distribution. One solution to domain adaptation is to learn domain invariant feature representations while the learned representations should also be discriminative in prediction. To learn such representations, domain adaptation frameworks usually include a domain invariant representation learning approach to measure and reduce the domain discrepancy, as well as a discriminator for classification. Inspired by Wasserstein GAN, in this paper we propose a novel approach to learn domain invariant feature representations, namely Wasserstein Distance Guided Representation Learning (WDGRL). WDGRL utilizes a neural network, denoted by the domain critic, to estimate empirical Wasserstein distance between the source and target samples and optimizes the feature extractor network to minimize the estimated Wasserstein distance in an adversarial manner. The theoretical advantages of Wasserstein distance for domain adaptation lie in its gradient property and promising generalization bound. Empirical studies on common sentiment and image classification adaptation datasets demonstrate that our proposed WDGRL outperforms the state-of-the-art domain invariant representation learning approaches.

show abstract

Wasserstein Distance Guided Representation Learning for Domain Adaptation

Shen

Zhang

et al. 2017

Preprint

120

View full text Add to dashboard Cite

Domain adaptation aims at generalizing a high-performance learner on a target domain via utilizing the knowledge distilled from a source domain which has a different but related data distribution. One solution to domain adaptation is to learn domain invariant feature representations while the learned representations should also be discriminative in prediction. To learn such representations, domain adaptation frameworks usually include a domain invariant representation learning approach to measure and reduce the domain discrepancy, as well as a discriminator for classification. Inspired by Wasserstein GAN, in this paper we propose a novel approach to learn domain invariant feature representations, namely Wasserstein Distance Guided Representation Learning (WD-GRL). WDGRL utilizes a neural network, denoted by the domain critic, to estimate empirical Wasserstein distance between the source and target samples and optimizes the feature extractor network to minimize the estimated Wasserstein distance in an adversarial manner. The theoretical advantages of Wasserstein distance for domain adaptation lie in its gradient property and promising generalization bound. Empirical studies on common sentiment and image classification adaptation datasets demonstrate that our proposed WDGRL outperforms the state-of-the-art domain invariant representation learning approaches.

show abstract

GIKT: A Graph-Based Interaction Model for Knowledge Tracing

Yang

Shen

et al. 2021

View full text Add to dashboard Cite

An end-to-end neighborhood-based interaction model for knowledge-enhanced recommendation

Bai

Zhang

et al. 2019

View full text Add to dashboard Cite

This paper studies graph-based recommendation, where an interaction graph is constructed built from historical records and is leveraged to alleviate data sparsity and cold start problems. We reveal an early summarization problem in existing graph-based models, and propose Neighborhood Interaction (NI) model to capture each neighbor pair (between user-side and item-side) distinctively. NI model is more expressive and can capture more complicated structural patterns behind user-item interactions. To further enrich node connectivity and utilize high-order structural information, we incorporate extra knowledge graphs (KGs) and adopt graph neural networks (GNNs) in NI, called Knowledge-enhanced Neighborhood Interaction (KNI). Compared with the state-of-the-art recommendation methods, e.g., feature-based, meta path-based, and KG-based models, our KNI achieves superior performance in click-through rate prediction (1.1%-8.4% absolute AUC improvements) and outperforms by a wide margin in top-N recommendation on 4 real world datasets.

show abstract

Theoretical investigation on lithium polysulfide adsorption and conversion for high-performance Li–S batteries

Chen

et al. 2021

Nanoscale

View full text Add to dashboard Cite

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yanru Qu

Product-Based Neural Networks for User Response Prediction

Product-Based Neural Networks for User Response Prediction over Multi-Field Categorical Data

Dynamically Fused Graph Network for Multi-hop Reasoning

Wasserstein Distance Guided Representation Learning for Domain Adaptation

Wasserstein Distance Guided Representation Learning for Domain Adaptation

GIKT: A Graph-Based Interaction Model for Knowledge Tracing

An end-to-end neighborhood-based interaction model for knowledge-enhanced recommendation

Theoretical investigation on lithium polysulfide adsorption and conversion for high-performance Li–S batteries

Contact Info

Product

Resources

About