Rahul Kumar scite author profile

In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering" of features has been found to achieve improvements over feature selection in terms of classification accuracy, especially at lower number of features [2,28]. However the existing clustering, techniques are agglomerative in nature and result in (i) sub-optimal word clusters and (ii) high computational cost. In order to explicitly capture the optimality of word clusters in an information theoretic framework, we first derive a global criterion for feature clustering. We then present a fast, divisive algorithm that monotonically decreases this objective function value, thus converging to a local minimum. We show that our algorithm minimizes the "within-cluster Jensen-Shannon divergence" while simultaneously maximizing the "between-cluster Jensen-Shannon divergence". In comparison to the previously proposed agglomerative strategies our divisive algorithm achieves higher classification accuracy especially at lower number of features. We further show that feature clustering is an effective technique for building smaller class models in hierarchical classification. We present detailed experimental results using Naive Bayes and Support Vector Machines on the 20 Newsgroups data set and a 3-level hierarchy of HTML documents collected from Dmoz Open Directory.

show abstract

A context-aware robust intrusion detection system: a reinforcement learning-based approach

Sethi

Rupesh

Kumar

et al. 2019

Int. J. Inf. Secur.

View full text Add to dashboard Cite

MUX: algorithm selection for software model checkers

Tulsian

Kanade

Kumar

et al. 2014

View full text Add to dashboard Cite

With the growing complexity of modern day software, software model checking has become a critical technology for ensuring correctness of software. As is true with any promising technology, there are a number of tools for software model checking. However, their respective performance trade-offs are difficult to characterize accurately -making it difficult for practitioners to select a suitable tool for the task at hand. This paper proposes a technique called MUX that addresses the problem of selecting the most suitable software model checker for a given input instance. MUX performs machine learning on a repository of software verification instances. The algorithm selector, synthesized through machine learning, uses structural features from an input instance, comprising a program-property pair, at runtime and determines which tool to use.We have implemented MUX for Windows device drivers and evaluated it on a number of drivers and model checkers. Our results are promising in that the algorithm selector not only avoids a significant number of timeouts but also improves the total runtime by a large margin, compared to any individual model checker. It also outperforms a portfolio-based algorithm selector being used in Microsoft at present. Besides, MUX identifies structural features of programs that are key factors in determining performance of model checkers.

show abstract

Combinatorial feature selection problems

Charikar

Guruswami²,

Kumar³

et al.

View full text Add to dashboard Cite

WhoDo: automating reviewer suggestions at scale

Asthana

Kumar

Bhagwan

et al. 2019

View full text Add to dashboard Cite

The Static Driver Verifier Research Platform

Ball

Bounimova

Levin

et al. 2010

View full text Add to dashboard Cite

The Sdv Research Platform (Sdvrp) is a new academic release of Static Driver Verifier (Sdv) and the Slam software model checker that contains: (1) a parameterized version of Sdv that allows one to write custom API rules for APIs independent of device drivers; (2) thousands of Boolean programs generated by Sdv in the course of verifying Windows device drivers, including the functional and performance results (of the Bebop model checker) and test scripts to allow comparison against other Boolean program model checkers; (3) a new version of the Slam analysis engine, called Slam2, that is much more robust and performant.

show abstract

Parallelizing top-down interprocedural analyses

Albarghouthi

Kumar

Nori

et al. 2012

View full text Add to dashboard Cite

Deep Reinforcement Learning based Intrusion Detection System for Cloud Infrastructure

Sethi

Kumar

Prajapati

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rahul Kumar

Enhanced word clustering for hierarchical text classification

A context-aware robust intrusion detection system: a reinforcement learning-based approach

MUX: algorithm selection for software model checkers

Combinatorial feature selection problems

WhoDo: automating reviewer suggestions at scale

The Static Driver Verifier Research Platform

Parallelizing top-down interprocedural analyses

Deep Reinforcement Learning based Intrusion Detection System for Cloud Infrastructure

Contact Info

Product

Resources

About