Rohini K. Srihari scite author profile

A number of feature selection metrics have been explored in text categorization, among which information gain (IG), chi-square (CHI), correlation coefficient (CC) and odds ratios (OR) are considered most effective. CC and OR are one-sided metrics while IG and CHI are two-sided. Feature selection using one-sided metrics selects the features most indicative of membership only, while feature selection using two-sided metrics implicitly combines the features most indicative of membership (e.g. positive features) and nonmembership (e.g. negative features) by ignoring the signs of features. The former never consider the negative features, which are quite valuable, while the latter cannot ensure the optimal combination of the two kinds of features especially on imbalanced data. In this work, we investigate the usefulness of explicit control of that combination within a proposed feature selection framework. Using multinomial naïve Bayes and regularized logistic regression as classifiers, our experiments show both great potential and actual merits of explicitly combining positive and negative features in a nearly optimal fashion according to the imbalanced data.

show abstract

Incorporating prior knowledge with weighted margin support vector machines

Wu¹,

Srihari²

2004

104

View full text Add to dashboard Cite

Like many purely data-driven machine learning methods, Support Vector Machine (SVM) classifiers are learned exclusively from the evidence presented in the training dataset; thus a larger training dataset is required for better performance. In some applications, there might be human knowledge available that, in principle, could compensate for the lack of data. In this paper, we propose a simple generalization of SVM: Weighted Margin SVM (WMSVMs) that permits the incorporation of prior knowledge. We show that Sequential Minimal Optimization can be used in training WMSVM. We discuss the issues of incorporating prior knowledge using this rather general formulation. The experimental results show that the proposed methods of incorporating prior knowledge is effective.

show abstract

A hybrid approach for named entity and sub-type tagging

Srihari

2000

View full text Add to dashboard Cite

show abstract

Information Extraction Supported Question Answering

Srihari¹,

Li²

1999

View full text Add to dashboard Cite

show abstract

Automatic indexing and content-based retrieval of captioned images

Srihari

1995

Computer

146

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rohini K. Srihari

Feature selection for text categorization on imbalanced data

Incorporating prior knowledge with weighted margin support vector machines

A hybrid approach for named entity and sub-type tagging

Information Extraction Supported Question Answering

Automatic indexing and content-based retrieval of captioned images

Contact Info

Product

Resources

About