A document classification method by using field association words

Fuketa, Masao; Lee, Sangkon; Tsuji, T.; Okada, Makoto; Aoe, Jun-ichi

doi:10.1016/s0020-0255(00)00042-6

Cited by 41 publications

(17 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There has been much research and development related to automatic document classification and summarization [1][2][3][4][5] including the vector space model [6,7] and the probabilistic model [8][9][10][11]. These approaches need to read the whole document to calculate document similarity.…”

Section: Introductionmentioning

confidence: 99%

An automatic filtering method for field association words by deleting unnecessary words

Ghada

Atlam

Fuketa

et al. 2006

International Journal of Computer Mathematics

View full text Add to dashboard Cite

Document classification and summarization are very important for document text retrieval. Generally, humans can recognize fields such as Sports or Politics based on specific words called Field Association (FA) words in those document fields. The traditional method causes misleading redundant words (unnecessary words) to be registered because the quality of the resulting FA words depends on learning data pre-classified by hand. Therefore recall and precision of document classification are degraded if the classified fields classified by hand are ambiguous. We propose two criteria: deleting unnecessary words with low frequencies, and deleting unnecessary words using category information. Moreover, using the proposed criteria unnecessary words can be deleted from the FA words dictionary created by the traditional method. Experimental results showed that 25% of 38 372 FA word candidates were identified as unnecessary and deleted automatically when the presented method was used. Furthermore, precision and F-measure were improved by 26% and 15%, respectively, compared with the traditional method.

show abstract

Section: Introductionmentioning

confidence: 99%

An automatic filtering method for field association words by deleting unnecessary words

Ghada

Atlam

Fuketa

et al. 2006

International Journal of Computer Mathematics

View full text Add to dashboard Cite

show abstract

“…For these problems, extraction of schedule and topics information in communication was presented by Mani [10] and Fuketa et al [16]. Moreover, personalized E-mail ranking based on communication history was proposed by Hasegawa [11].…”

Section: Introductionmentioning

confidence: 99%

An efficient e-mail filtering using time priority measurement

Kadoya

Fuketa

Atlam

et al. 2004

Information Sciences

View full text Add to dashboard Cite

“…Since it is difficult to read through all of these articles, document classification [4], information extraction [14] and automatic text summarization are necessary for the effective use of the articles. Techniques for information extraction and summarization are currently under investigation.…”

Section: Introductionmentioning

confidence: 99%