Abstract:This study reports the results of a series of experiments i~l the techniques of automatic document classifications. Two different classification schedules are co~J~pared along with two methods of automatically classifying documents into categories. It is concluded that., while thet'e is no significant difference in the predictive efficiency between the Bayesian and the Factor Score methods, automatic document classification is enhanced by the use of a factor-analytically-derived classification schedule. Appro×… Show more
“…Comparison between categorization methods would be aided by the use of common testsets, something which has rarely been done. (An exception is [BB64].) Development of standard collections would be an important first step to better understanding of text categorization.…”
While certain standard procedures are widely used for evaluating text retrieval systems and algorithms, the sarne is not true for text categorization. Omission of important data from reports is common and methods of measuring effectiveness vary widely. This ]'ms m~de judging the relative merits of techniques for text categorization dif~.cult and has disguised important research issues. In this paper I discuss a variety of ways of evaluating the effectiveness of text categorization systems, drawing both on reported categorization experiments and on methods used in evaluating query-driven retrieval. I also consider the extent to which the same evaluation methods may be used with systems for text extraction, a more complex task. In evaluatlng either kind of system, the purpose for which the output is to be used is crucial in choosing appropriate evaluation methods.
“…Comparison between categorization methods would be aided by the use of common testsets, something which has rarely been done. (An exception is [BB64].) Development of standard collections would be an important first step to better understanding of text categorization.…”
While certain standard procedures are widely used for evaluating text retrieval systems and algorithms, the sarne is not true for text categorization. Omission of important data from reports is common and methods of measuring effectiveness vary widely. This ]'ms m~de judging the relative merits of techniques for text categorization dif~.cult and has disguised important research issues. In this paper I discuss a variety of ways of evaluating the effectiveness of text categorization systems, drawing both on reported categorization experiments and on methods used in evaluating query-driven retrieval. I also consider the extent to which the same evaluation methods may be used with systems for text extraction, a more complex task. In evaluatlng either kind of system, the purpose for which the output is to be used is crucial in choosing appropriate evaluation methods.
“…For example, research in information retrieval as early as 1963 used Factor Analysis (FA) on text documents to extract topics and automatically classify documents [5,6]. Whilst this work received a lot of attention as an unsupervised approach to document classification, though rarely has it been cited as an example of topic identification.…”
“…It can be deduced from the mathematical notation and diagrammatic representation of Automatic Text Classification (ATC) that the definition by Borko and Bernick (1963) [6] is extending the first definition, definition by Merkl (1998) [7] is extending the definition by Borko and Bernick (1963) [6] and definition by Manning and Schutze(1999) [8] is the union of definition by Merkl (1998) [7] and definition by Borko and Bernick (1963) [6].…”
Section: Discussionmentioning
confidence: 99%
“…Automatic Text Classification (ATC) can be defined as automatic identification of such a set of categories "definition by Borko and Bernick(1963)" [6].…”
As the time goes on and on, digitization of text has been increasing enormously and the need to organize, categorize and classify text has become indispensable. Disorganization and very little categorization and classification of text may result in slower response time of text or information retrieval. Therefore it is very important and essential to organize, categorize and classify texts and digitized documents according to definitions proposed by text mining experts and computer scientists. Work has been done on Text Mining, Text Categorization and Automatic Text Classification by computer and information scientists, but obviously a lot of space for novel research in this domain is available. In this paper we have proposed the mathematical notation and graphical models for Text Mining, Text Categorization and Automatic Text Classification to get in depth understanding of these techniques and concepts. Introduction and proposal of mathematical and graphical models for Text Mining, Text Categorization and Automatic Text Classification will shorten the response time of text and information retrieval. Also the performance of web search engines can be improved so much by employing these mathematical and graphical models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.