High-performing feature selection for text classification

Rogati, Monica; Yang, Yiming

doi:10.1145/584792.584911

Cited by 208 publications

(43 citation statements)

References 3 publications

(2 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several studies [31,32] found that feature selection methods based on χ 2 statistics consistently outperformed those based on other criteria (including information gain) for the most popular classifiers used in TC. The terms with a document frequency less than 5 were also removed, as χ 2 is known to be less reliable for rare words [31].…”

Section: Text Classification: Experimental Settingsmentioning

confidence: 99%

See 1 more Smart Citation

Coreference Resolution: To What Extent Does It Help NLP Applications?

Mitkov

Evans

Orăsan

et al. 2012

Text, Speech and Dialogue

View full text Add to dashboard Cite

This paper describes a study of the impact of coreference resolution on NLP applications. Further to our previous study [1], in which we investigated whether anaphora resolution could be beneficial to NLP applications, we now seek to establish whether a different, but related task -that of coreference resolution, could improve the performance of three NLP applications: text summarisation, recognising textual entailment and text classification. The study discusses experiments in which the aforementioned applications were implemented in two versions, one in which the BART coreference resolution system was integrated and one in which it was not, and then tested in processing input text. The paper discusses the results obtained.

show abstract

Section: Text Classification: Experimental Settingsmentioning

confidence: 99%

“…The terms with a document frequency less than 5 were also removed, as χ 2 is known to be less reliable for rare words [31]. Both methods were applied and 10% of the terms were selected for the vector space representation.…”

Section: Text Classification: Experimental Settingsmentioning

confidence: 99%

Coreference Resolution: To What Extent Does It Help NLP Applications?

Mitkov

Evans

Orăsan

et al. 2012

Text, Speech and Dialogue

View full text Add to dashboard Cite

show abstract

“…-Support-vector machines (SVM) [Cortes and Vapnik 1995] are a class of powerful methods for classification tasks, involving the construction of hyperplanes that have the largest distance to the nearest training points. Several papers reference support-vector machines as the state-of-the-art method for text classification [Gabrilovich and Markovitch 2004;Rogati and Yang 2002;Tong and Koller 2000]. We use a nonlinear poly-2 kernel [Joachims 1998] to train our classifiers, as preliminary experiments with a linear kernel did not yield statistically significant differences with a poly-2 kernel, which has also been a finding in some recent empirical evaluation of SVM kernels [Gao and Sun 2010].…”

Section: Statistical Machine Learning Techniquesmentioning

confidence: 92%

Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying

Dinakar

Jones

Havasi

et al. 2012

ACM Trans. Interact. Intell. Syst.

315

228

View full text Add to dashboard Cite

Cyberbullying (harassment on social networks) is widely recognized as a serious social problem, especially for adolescents. It is as much a threat to the viability of online social networks for youth today as spam once was to email in the early days of the Internet. Current work to tackle this problem has involved social and psychological studies on its prevalence as well as its negative effects on adolescents. While true solutions rest on teaching youth to have healthy personal relationships, few have considered innovative design of social network software as a tool for mitigating this problem. Mitigating cyberbullying involves two key components: robust techniques for effective detection and reflective user interfaces that encourage users to reflect upon their behavior and their choices.Spam filters have been successful by applying statistical approaches like Bayesian networks and hidden Markov models. They can, like Google's GMail, aggregate human spam judgments because spam is sent nearly identically to many people. Bullying is more personalized, varied, and contextual. In this work, we present an approach for bullying detection based on state-of-the-art natural language processing and a common sense knowledge base, which permits recognition over a broad spectrum of topics in everyday life. We analyze a more narrow range of particular subject matter associated with bullying (e.g. appearance, intelligence, racial and ethnic slurs, social acceptance, and rejection), and construct BullySpace, a common sense knowledge base that encodes particular knowledge about bullying situations. We then perform joint reasoning with common sense knowledge about a wide range of everyday life topics. We analyze messages using our novel AnalogySpace common sense reasoning technique. We also take into account social network analysis and other factors. We evaluate the model on real-world instances that have been reported by users on Formspring, a social networking website that is popular with teenagers.On the intervention side, we explore a set of reflective user-interaction paradigms with the goal of promoting empathy among social network participants. We propose an "air traffic control"-like dashboard, which alerts moderators to large-scale outbreaks that appear to be escalating or spreading and helps them prioritize the current deluge of user complaints. For potential victims, we provide educational material that informs them about how to cope with the situation, and connects them with emotional support from others. A user evaluation shows that in-context, targeted, and dynamic help during cyberbullying situations fosters end-user reflection that promotes better coping strategies.

show abstract

“…Feature selection can defined as the process in which the irrelevant features are deducted and detecting only the relevant ones, an optimal selection of features can bring improvement in overall knowledge of domain, reduced size, generalization capacity etc.., [9] J.Arturo Olvera Lopez stated that sufficient identification of features is necessary in real world scenario, hence the identification of features is important. [20] Yiming Yang stated that, feature selection is the best solution for text classification problems it increases both the classification effectiveness and also computational efficiency. Instance selection is a process, in which the dataset size is reduced , which eventually decreases the runtime, especially in the case of instance based classifiers, the commonly used instance selection mechanisms are wrapper and filter, here…”

Section: Iterature Surveymentioning

confidence: 99%

An Efficient improved technique to retrieve the bug repository

Prasad¹,

Ganesh²,

Kumar³

2017

Nctet-2k17

View full text Add to dashboard Cite

Bug triage is the most important step in handling the bugs which occur during a software process. In manual bug triaging process the received bug is assigned to a tester or a developer by a triager, hence the bugs are received in huge numbers it is difficult to carry out the manual bug triaging process, and it consumes much resources both in the form of man hours and economy, hence there is a necessity to reduce the exploitation of resources. Hence a mechanism is proposed which facilitates much better and efficient triaging process by reducing the size of the bug data sets, the mechanism here involves techniques like clustering techniques and selection techniques, The approach proved much efficient than the manual bug triaging process when compared with bug data sets which were retrieved from the open source bug repository called bugzilla.

show abstract

High-performing feature selection for text classification

Cited by 208 publications

References 3 publications

Coreference Resolution: To What Extent Does It Help NLP Applications?

Coreference Resolution: To What Extent Does It Help NLP Applications?

Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying

An Efficient improved technique to retrieve the bug repository

Contact Info

Product

Resources

About