Abhay Harpale scite author profile

Abhay Harpale

5Publications

156Citation Statements Received

75Citation Statements Given

How they've been cited

353

155

How they cite others

Affiliations

GE Global Research (United States), Carnegie Mellon University, Indian Institute of Technology Bombay

Publications

Order By: Most citations

Personalized active learning for collaborative filtering

Harpale

Yang

2008

100

View full text Add to dashboard Cite

Collaborative Filtering (CF) requires user-rated training examples for statistical inference about the preferences of new users. Active learning strategies identify the most informative set of training examples through minimum interactions with the users. Current active learning approaches in CF make an implicit and unrealistic assumption that a user can provide rating for any queried item. This paper introduces a new approach to the problem which does not make such an assumption. We personalize active learning for the user, and query for only those items which the user can provide rating for. We propose an extended form of Bayesian active learning and use the Aspect Model for CF to illustrate and examine the idea. A comparative evaluation of the new method and a well-established baseline method on benchmark datasets shows statistically significant improvements with our method over the performance of the baseline method that is representative for existing approaches which do not take personalization into account.

show abstract

Document Classification Through Interactive Supervision of Document and Term Labels

Godbole

Harpale

Sarawagi

et al. 2004

View full text Add to dashboard Cite

Effective incorporation of human expertise, while exerting a low cognitive load, is a critical aspect of real-life text classification applications that is not adequately addressed by batch-supervised highaccuracy learners. Standard text classifiers are supervised in only one way: assigning labels to whole documents. They are thus deprived of the enormous wisdom that humans carry about the significance of words and phrases in context. We present HIClass, an interactive and exploratory labeling package that actively collects user opinion on feature representations and choices, as well as whole-document labels, while minimizing redundancy in the input sought. Preliminary experience suggests that, starting with essentially an unlabeled corpus, very little cognitive labor suffices to set up a labeled collection on which standard classifiers perform well.

show abstract

Utility-based information distillation over temporally sequenced documents

Yang

Lad

Lao

et al. 2007

View full text Add to dashboard Cite

This paper examines a new approach to information distillation over temporally ordered documents, and proposes a novel evaluation scheme for such a framework. It combines the strengths of and extends beyond conventional adaptive filtering, novelty detection and non-redundant passage ranking with respect to long-lasting information needs ('tasks' with multiple queries). Our approach supports fine-grained user feedback via highlighting of arbitrary spans of text, and leverages such information for utility optimization in adaptive settings. For our experiments, we defined hypothetical tasks based on news events in the TDT4 corpus, with multiple queries per task. Answer keys (nuggets) were generated for each query and a semiautomatic procedure was used for acquiring rules that allow automatically matching nuggets against system responses. We also propose an extension of the NDCG metric for assessing the utility of ranked passages as a combination of relevance and novelty. Our results show encouraging utility enhancements using the new approach, compared to the baseline systems without incremental learning or the novelty detection components.

show abstract

GigaTensor

et al. 2012

View full text Add to dashboard Cite

Protein Identification from Tandem Mass Spectra with Probabilistic Language Modeling

Yang

Harpale

Ganapathy

2009

View full text Add to dashboard Cite

Abstract. This paper presents an interdisciplinary investigation of statistical information retrieval (IR) techniques for protein identification from tandem mass spectra, a challenging problem in proteomic data analysis. We formulate the task as an IR problem, by constructing a "query vector" whose elements are system-predicted peptides with confidence scores based on spectrum analysis of the input sample, and by defining the vector space of "documents" with protein profiles, each of which is constructed based on the theoretical spectrum of a protein. This formulation establishes a new connection from the protein identification problem to a probabilistic language modeling approach as well as the vector space models in IR, and enables us to compare fundamental differences in the IR models and common approaches in protein identification. Our experiments on benchmark spectrometry query sets and large protein databases demonstrate that the IR models significantly outperform wellestablished methods in protein identification, by enhancing precision in highrecall regions in particular, where the conventional approaches are weak.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Abhay Harpale

Personalized active learning for collaborative filtering

Document Classification Through Interactive Supervision of Document and Term Labels

Utility-based information distillation over temporally sequenced documents

GigaTensor

Protein Identification from Tandem Mass Spectra with Probabilistic Language Modeling

Contact Info

Product

Resources

About