Optimal stop word selection for text mining in critical infrastructure domain

Amarasinghe, Kasun; Manic, Milos; Hruska, Ryan

doi:10.1109/rweek.2015.7287440

Cited by 15 publications

(9 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…-Elimination of stop words: The stop words are a set of words that provide little or no semantic meaning in the texts, they are generally the words that appear most frequently in a language and contain prepositions, pronouns, auxiliary verbs, etc. Eliminating stop words is a basic step in pre-processing to perform text mining, which, as the name suggests, consists of removing the stop words from the set of characteristics of the texts [1]. The catalog used contains 613 stop words in Spanish.…”

Section: Methodsmentioning

confidence: 99%

Application of Natural Language Processing Techniques for Classification of Web Published News in Spanish

Hernandez-Cruz¹,

Chi-Poot²,

Martínez-Luna³

2019

RCS

View full text Add to dashboard Cite

Web published news written in the Spanish language, were analyzed by using categories that are related to its content, such as: 'Culture', 'Sports' and 'Finances', or they are classified very generally as is the case of 'National' or 'International'. The large content of documents generated the need to provide the user with an analysis of such documents, particularly in circumstances where in search engines are involved. First of all, a pre-process was applied to allow the mining of texts, which includes the lemmatization, homologation of synonyms and representation of documents with a Boolean method. This pre-process also includes a dimensional reduction of the obtained matrix. Secondly, different classification methods were applied to compare their performance in order to find the one that best assigns the category to the news.

show abstract

Section: Methodsmentioning

confidence: 99%

Application of Natural Language Processing Techniques for Classification of Web Published News in Spanish

Hernandez-Cruz¹,

Chi-Poot²,

Martínez-Luna³

2019

RCS

View full text Add to dashboard Cite

show abstract

“…statistical, word distribution in documents using variance measure and using the entropy measure. An evolutionary technique was proposed by [9] to extract the optimal set of stop words from the critical infrastructure domain.…”

Section: Related Studiesmentioning

confidence: 99%

Construction of a generic stopwords list for Hindi language without corpus statistics

Siddiqi¹,

Sharan²

2018

IJACR

View full text Add to dashboard Cite

“…Many articles on the bag-of-words method [1,4,7,18,23] show that an integral part of the algorithm is the processing of stop-words. In Amarasinghe, Manic and Hruska [23] this stage was given special attention. They emphasized that the removal of the words leads to the loss of some useful information.…”

Section: B Types Of Text Miningmentioning

confidence: 99%

“…Therefore, it was proposed that an alternate method, in which the stop-words are considered separately from the key words and the dimension is reduced using a genetic algorithm, be used instead. In [23] experiments were carried out that showed that the accuracy of the algorithm increased by two percent. However, their experiments have been conducted on a fairly small amount of data and there are questions as whether or not the proposed method is effective and, most importantly, can it quickly reduce the dimension of stop-words with a large amount of data?…”

Section: B Types Of Text Miningmentioning

confidence: 99%

See 1 more Smart Citation

Application of formal grammar in text mining and construction of an ontology

Kanev

Cunningham

Terekhov

2017

2017 Internet Technologies and Applications (ITA)

View full text Add to dashboard Cite

This work describes an investigation of formal grammar with application to text mining. It is an important area since text is the most widespread type of data and it contains a lot of potentially useful information. Unstructured nature of text requires other methods for its processing, in contrast to other types of data mining. In this work, the authors propose an original approach to text mining by making a parse tree for each sentence using regular grammar and creating an ontology and provide a demonstration of this system being implemented in a constrained scenario. This ontology can be used for different tasks, ranging from expert systems to automatic machine translation. The ontology is a network consisting of concepts linked by relations. The authors developed a new system to implement proposed approach working in different languages.

show abstract

Optimal stop word selection for text mining in critical infrastructure domain

Cited by 15 publications

References 13 publications

Application of Natural Language Processing Techniques for Classification of Web Published News in Spanish

Application of Natural Language Processing Techniques for Classification of Web Published News in Spanish

Construction of a generic stopwords list for Hindi language without corpus statistics

Application of formal grammar in text mining and construction of an ontology

Contact Info

Product

Resources

About