Geli Fei scite author profile

In classic supervised learning, a learning algorithm takes a fixed training data of several classes to build a classifier. In this paper, we propose to study a new problem, i.e., building a learning system that learns cumulatively. As time goes by, the system sees and learns more and more classes of data and becomes more and more knowledgeable. We believe that this is similar to human learning. We humans learn continuously, retaining the learned knowledge, identifying and learning new things, and updating the existing knowledge with new experiences. Over time, we cumulate more and more knowledge. A learning system should be able to do the same. As algorithmic learning matures, it is time to tackle this cumulative machine learning (or simply cumulative learning) problem, which is a kind of lifelong machine learning problem. It presents two major challenges. First, the system must be able to detect data from unseen classes in the test set. Classic supervised learning, however, assumes all classes in testing are known or seen at the training time. Second, the system needs to be able to selectively update its models whenever a new class of data arrives without retraining the whole system using the entire past and present training data. This paper proposes a novel approach and system to tackle these challenges. Experimental results on two datasets with learning from 2 classes to up to 100 classes show that the proposed approach is highly promising in terms of both classification accuracy and computational efficiency.

show abstract

Bimodal Distribution and Co-Bursting in Review Spam Detection

Fei

Wang

et al. 2017

111

View full text Add to dashboard Cite

Breaking the Closed World Assumption in Text Classification

Fei

Liu

2016

109

View full text Add to dashboard Cite

Existing research on multiclass text classification mostly makes the closed world assumption, which focuses on designing accurate classifiers under the assumption that all test classes are known at training time. A more realistic scenario is to expect unseen classes during testing (open world). In this case, the goal is to design a learning system that classifies documents of the known classes into their respective classes and also to reject documents from unknown classes. This problem is called open (world) classification. This paper approaches the problem by reducing the open space risk while balancing the empirical risk. It proposes to use a new learning strategy, called center-based similarity (CBS) space learning (or CBS learning), to provide a novel solution to the problem. Extensive experiments across two datasets show that CBS learning gives promising results on multiclass open text classification compared to state-ofthe-art baselines.

show abstract

Social Media Text Classification under Negative Covariate Shift

Fei¹,

Liu²

2015

View full text Add to dashboard Cite

In a typical social media content analysis task, the user is interested in analyzing posts of a particular topic. Identifying such posts is often formulated as a classification problem. However, this problem is challenging. One key issue is covariate shift. That is, the training data is not fully representative of the test data. We observed that the covariate shift mainly occurs in the negative data because topics discussed in social media are highly diverse and numerous, but the user-labeled negative training data may cover only a small number of topics. This paper proposes a novel technique to solve the problem. The key novelty of the technique is the transformation of document representation from the traditional ngram feature space to a center-based similarity (CBS) space. In the CBS space, the covariate shift problem is significantly mitigated, which enables us to build much better classifiers. Experiment results show that the proposed approach markedly improves classification.

show abstract

Targeted Topic Modeling for Focused Analysis

Wang

Chen

Fei

et al. 2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Geli Fei

Learning Cumulatively to Become More Knowledgeable

Bimodal Distribution and Co-Bursting in Review Spam Detection

Breaking the Closed World Assumption in Text Classification

Social Media Text Classification under Negative Covariate Shift

Targeted Topic Modeling for Focused Analysis

Contact Info

Product

Resources

About