Yanyan Shen scite author profile

k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centralized machine efficiently. In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. In brief, the mappers cluster objects into groups; the reducers perform the kNN join on each group of objects separately. We design an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs. To reduce the shuffling cost, we propose two approximate algorithms to minimize the number of replicas. Extensive experiments on our in-house cluster demonstrate that our proposed methods are efficient, robust and scalable.

show abstract

Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions

Cheng

Shen

Huang

2020

AAAI

109

View full text Add to dashboard Cite

Various factorization-based methods have been proposed to leverage second-order, or higher-order cross features for boosting the performance of predictive models. They generally enumerate all the cross features under a predefined maximum order, and then identify useful feature interactions through model training, which suffer from two drawbacks. First, they have to make a trade-off between the expressiveness of higher-order cross features and the computational cost, resulting in suboptimal predictions. Second, enumerating all the cross features, including irrelevant ones, may introduce noisy feature combinations that degrade model performance. In this work, we propose the Adaptive Factorization Network (AFN), a new model that learns arbitrary-order cross features adaptively from data. The core of AFN is a logarithmic transformation layer that converts the power of each feature in a feature combination into the coefficient to be learned. The experimental results on four real datasets demonstrate the superior predictive performance of AFN against the state-of-the-arts.

show abstract

Discovering queries based on example tuples

Shen

Chakrabarti

Chaudhuri

et al. 2014

View full text Add to dashboard Cite

An enterprise information worker is often aware of a few example tuples (but not the entire result) that should be present in the output of the query. We study the problem of discovering the minimal project join query that contains the given example tuples in its output. Efficient discovery of such queries is challenging. We propose novel algorithms to solve this problem. Our experiments on reallife datasets show that the proposed solution is significantly more efficient compared with naïve adaptations of known techniques.

show abstract

Predicting Multi-step Citywide Passenger Demands Using Attention-based Neural Networks

et al. 2018

View full text Add to dashboard Cite

DELF: A Dual-Embedding based Deep Latent Factor Model for Recommendation

Cheng

Shen

Zhu

et al. 2018

View full text Add to dashboard Cite

Among various recommendation methods, latent factor models are usually considered to be state-ofthe-art techniques, which aim to learn user and item embeddings for predicting user-item preferences. When applying latent factor models to recommendation with implicit feedback, the quality of embeddings always suffers from inadequate positive feedback and noisy negative feedback. Inspired by the idea of NSVD that represents users based on their interacted items, this paper proposes a dualembedding based deep latent factor model named DELF for recommendation with implicit feedback. In addition to learning a single embedding for a user (resp. item), we represent each user (resp. item) with an additional embedding from the perspective of the interacted items (resp. users). We employ an attentive neural method to discriminate the importance of interacted users/items for dualembedding learning. We further introduce a neural network architecture to incorporate dual embeddings for recommendation. A novel attempt of DELF is to model each user-item interaction with four deep representations that are subtly fused for preference prediction. We conducted extensive experiments on real-world datasets. The results verify the effectiveness of user/item dual embeddings and the superior performance of DELF on item recommendation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yanyan Shen

Efficient processing of k nearest neighbor joins using MapReduce

Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions

Discovering queries based on example tuples

Predicting Multi-step Citywide Passenger Demands Using Attention-based Neural Networks

DELF: A Dual-Embedding based Deep Latent Factor Model for Recommendation

Contact Info

Product

Resources

About