2003
DOI: 10.1080/713827177
|View full text |Cite
|
Sign up to set email alerts
|

Summarization and categorization of text data in high-level data cleaning for information retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2003
2003
2018
2018

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(7 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…The goal of this paper is to eliminate most of the ''dirty data'' and the irrelevant information from the documents retrieved. Saravanan et al (2003) design a high-level data cleaning framework for building the relevance between text categorization and summarization. The main contribution of this framework is that it effectively applies Katz's K-mixture model of term distribution to the summarization tasks.…”
Section: Desirable Contributionsmentioning
confidence: 99%
“…The goal of this paper is to eliminate most of the ''dirty data'' and the irrelevant information from the documents retrieved. Saravanan et al (2003) design a high-level data cleaning framework for building the relevance between text categorization and summarization. The main contribution of this framework is that it effectively applies Katz's K-mixture model of term distribution to the summarization tasks.…”
Section: Desirable Contributionsmentioning
confidence: 99%
“…Saravanan et al [18] further outlines that clean data implies relevant data. Hence as not all data provided as part of the document collection was particularly useful in this research, it was therefore vital that the document collection be cleaned before the index process could take place.…”
Section: Data Cleaningmentioning
confidence: 99%
“…Data cleaning refers to removing noise or outliers, collecting relevant information for modelling noise, deciding on strategies on missing data, and accounting for time sequence information and known changes [18]. Saravanan et al [18] further outlines that clean data implies relevant data.…”
Section: Data Cleaningmentioning
confidence: 99%
“…Turmo et al [13] introduce and com pare different approaches to adaptive infor mation extraction from textual docum ents and different m achine language techniques. Saravanan et al [14] discuss how to autom atically clean data by disc overing classes of sim ilar items that can be grouped into prescribed dom ains. Srinivasan [15] develops an algorithm to generate in teresting hypotheses from a set of text collections using Me dline database.…”
Section: Literature Review: Related Workmentioning
confidence: 99%