Behavior sequence transformer for e-commerce recommendation in Alibaba

Chen, Qiwei; Zhao, Huan; Li, Wei; Huang, Pipei; Ou, Wenwu

doi:10.1145/3326937.3341261

Cited by 284 publications

(173 citation statements)

References 14 publications

Supporting

Mentioning

167

Contrasting

Unclassified

Order By: Relevance

“…Our second approach to implement temporal dynamics is a DL-based approach. We make use of the Transformer architecture [4] in this model for two prominent tasks: to capture the sequential reading behavior of a news reader and to perform the next click prediction. We build separate reader and news components in our proposed framework.…”

Section: Preliminary Resultsmentioning

confidence: 99%

News Recommender System Considering Temporal Dynamics and News Taxonomy

Raza

2019

2019 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

In a news recommender system, a reader's preferences change over time. Some preferences drift quite abruptly (short-term preferences), while others change over a longer period of time (long-term preferences). Although the existing news recommender systems consider the reader's full history, they often ignore the dynamics in the reader's behavior. Thus, they cannot meet the demand of the news readers for their timevarying preferences. In addition, the state-of-the-art news recommendation models are often focused on providing accurate predictions, which can work well in traditional recommendation scenarios. However, in a news recommender system, diversity is essential, not only to keep news readers engaged, but also to play a key role in a democratic society. In this PhD dissertation, our goal is to build a news recommender system to address these two challenges. Our system should be able to: (i) accommodate the dynamics in reader behavior; and (ii) consider both accuracy and diversity in the design of the recommendation model. Our news recommender system can also work for unprofiled, anonymous and short-term readers, by leveraging the rich side information of the news items and by including the implicit feedback in our model. We evaluate our model with multiple evaluation measures (both accuracy and diversity-oriented metrics) to demonstrate the effectiveness of our methods.

show abstract

Section: Preliminary Resultsmentioning

confidence: 99%

News Recommender System Considering Temporal Dynamics and News Taxonomy

Raza

2019

2019 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

show abstract

“…Zhu et al [17] proposed an improved long short term memory (LSTM) method to learn the correlation between users' adjacent behaviors, which can predict users' short-term and long-term interests. Chen et al [18] proposed a transformer model based on attention mechanism to extract the features of user behavior sequence, which can be used to predict user preferences for the product. Zhong et al [19] proposed multiple aspect attentive graph neural networks to extract user social network features, which can be used to generate user geographic information tag.…”

Section: Related Workmentioning

confidence: 99%

Multiple Time Series Perceptive Network for User Tag Suggestion in Online Innovation Community

et al. 2021

View full text Add to dashboard Cite

User tag suggestion technique, aiming at learning users' preferences over knowledge products from their historical behaviors, plays an important role in generating personalized recommendation in online innovation community. However, most current user tagging solutions only utilize a single kind of behavior to predict a single tag for users, resulting in weak generalization of user profile. In this paper, we propose a multiple time series perceptive network (MTSPN) for user tagging tasks in online innovation community. In particular, MTSPN takes multiple kinds of user behaviors into consideration for collaborative perception purpose, in which multi-scale sequential features are extracted from different sequential behaviors, and a multi-label classification module is built-in the proposed MTSPN model to predict multiple tags for users. Our encouraging experimental results on a real-world dataset collected from "Thingiverse" community validate the superiority of the our MTSPN model over several existing user tagging methods.

show abstract

“…Self-attention blocks. Self-attention [30], which is an attention mechanism relating different positions of a single sequence in order to compute a new representation of the sequence, has achieved stateof-the-art performance for sequence modeling in many tasks [3,30]. An attention function mappings a query and a set of key-value pairs to an output, which is a weighted sum of the values, where the weight assigned to each value is computed based on the query and corresponding key.…”

Section: Deep Interestmentioning

confidence: 99%

Deep Multifaceted Transformers for Multi-objective Ranking in Large-Scale E-commerce Recommender Systems

Ding

Wang

et al. 2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

Recommender Systems have been playing essential roles in ecommerce portals. Existing recommendation algorithms usually learn the ranking scores of items by optimizing a single task (e.g., Click-through rate prediction) based on users' historical click sequences, but they generally pay few attention to simultaneously modeling users' multiple types of behaviors or jointly optimize multiple objectives (e.g., both Click-through rate and Conversion rate), which are both vital for e-commerce sites. In this paper, we argue that it is crucial to formulate users' different interests based on multiple types of behaviors and perform multi-task learning for significant improvement in multiple objectives simultaneously. We propose Deep Multifaceted Transformers (DMT), a novel framework that can model users' multiple types of behavior sequences simultaneously with multiple Transformers. It utilizes Multi-gate Mixture-of-Experts to optimize multiple objectives. Besides, it exploits unbiased learning to reduce the selection bias in the training data. Experiments on JD real production dataset demonstrate the effectiveness of DMT, which significantly outperforms state-ofart methods. DMT has been successfully deployed to serve the main traffic in the commercial Recommender System in JD.com. To facilitate future research, we release the codes and datasets at https://github.com/guyulongcs/CIKM2020_DMT.

show abstract

Behavior sequence transformer for e-commerce recommendation in Alibaba

Cited by 284 publications

References 14 publications

News Recommender System Considering Temporal Dynamics and News Taxonomy

News Recommender System Considering Temporal Dynamics and News Taxonomy

Multiple Time Series Perceptive Network for User Tag Suggestion in Online Innovation Community

Deep Multifaceted Transformers for Multi-objective Ranking in Large-Scale E-commerce Recommender Systems

Contact Info

Product

Resources

About