Summarization and categorization of text data in high-level data cleaning for information retrieval

Saravanan, M.; Raj, P. C. Reghu; Raman, Shanmuganathan

doi:10.1080/713827177

Cited by 16 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The goal of this paper is to eliminate most of the ''dirty data'' and the irrelevant information from the documents retrieved. Saravanan et al (2003) design a high-level data cleaning framework for building the relevance between text categorization and summarization. The main contribution of this framework is that it effectively applies Katz's K-mixture model of term distribution to the summarization tasks.…”

Section: Desirable Contributionsmentioning

confidence: 99%

Data preparation for data mining

Zhang

Yang

2003

Applied Artificial Intelligence

392

176

View full text Add to dashboard Cite

Data preparation is a fundamental stage of data analysis. While a lot of low-quality information is available in various data sources and on the Web, many organizations or companies are interested in how to transform the data into cleaned forms which can be used for high-profit purposes. This goal generates an urgent need for data analysis aimed at cleaning the raw data. In this paper, we first show the importance of data preparation in data analysis, then introduce some research achievements in the area of data preparation. Finally, we suggest some future directions of research and development.

show abstract

Section: Desirable Contributionsmentioning

confidence: 99%

Data preparation for data mining

Zhang

Yang

2003

Applied Artificial Intelligence

392

176

View full text Add to dashboard Cite

show abstract

“…Saravanan et al [18] further outlines that clean data implies relevant data. Hence as not all data provided as part of the document collection was particularly useful in this research, it was therefore vital that the document collection be cleaned before the index process could take place.…”

Section: Data Cleaningmentioning

confidence: 99%

“…Data cleaning refers to removing noise or outliers, collecting relevant information for modelling noise, deciding on strategies on missing data, and accounting for time sequence information and known changes [18]. Saravanan et al [18] further outlines that clean data implies relevant data.…”

Section: Data Cleaningmentioning

confidence: 99%

Assessing the Significance of Incorporating User Profiles in Social Book Search

Apadile¹,

Thuma²,

Mosweunyane³

2018

IJCA

View full text Add to dashboard Cite

In this article, it is hypothesized that personalizing the book search application by incorporating user profiles such as background of personal tastes, interests and previously seen books. can issue or produce a more effective query result set as well as an effective book recommendation. To meet this end, experiments were carried out to explore which topic representation gives the best result. Four different query representations, which are title, request, group and a combination of title-request-group were used. It was observed that the title-request-group query representation was best. In addition, an investigation was conducted to determine whether a learning to rank framework that incorporates topical relevance by exploiting user profiles for document re-ranking according to individual preference will issue a more effective result set. Moreover, an investigation was conducted to determine whether the use of keywords from profiles for query expansion and reformulation improves the search results. The results of these investigations suggest that a more effective query result set as well as an effective book recommendation can be attained by incorporating user profiles such as background of personal tastes, interests and previously seen books into the social book search application.

show abstract

“…Turmo et al [13] introduce and com pare different approaches to adaptive infor mation extraction from textual docum ents and different m achine language techniques. Saravanan et al [14] discuss how to autom atically clean data by disc overing classes of sim ilar items that can be grouped into prescribed dom ains. Srinivasan [15] develops an algorithm to generate in teresting hypotheses from a set of text collections using Me dline database.…”

Section: Literature Review: Related Workmentioning

confidence: 99%

Sensitivity of Semantic Signatures in Text Mining

Peddada¹

View full text Add to dashboard Cite

Summarization and categorization of text data in high-level data cleaning for information retrieval

Cited by 16 publications

References 0 publications

Data preparation for data mining

Data preparation for data mining

Assessing the Significance of Incorporating User Profiles in Social Book Search

Sensitivity of Semantic Signatures in Text Mining

Contact Info

Product

Resources

About