A. Muthusamy scite author profile

2019

IJIES

Attribute reduction for big data is an important preprocessing step in the area of data mining. A multi-step dimension reduction approach was proposed for attribute reduction in big data. It addressed the non-linear relationships within the attributes. The data dimension was reduced through a parametric mapping. The mapping parameters were estimated using low-rank Singular Value Decomposition (SVD). However, the user-defined criterion in multi-step dimension reduction approach has greatly influenced the efficiency of attribute reduction. This approach was proposed for a single machine that means the entire big data must fit in the main memory and the parallelism was limited. So, in this paper, parallel rough set theory based attribute reduction approach is proposed for attribute reduction in big data. Based on two descriptions of lower approximation and upper approximation, a rough set is constructed. Then a reduct is detected using inner importance measure and outer importance measure. The rough set theory is used in MapReduce framework to achieve the parallelism for attribute reduction in big data. Hence, the computation time is reduced by using parallel rough set theory based attribute reduction approach. Finally, the experiments are carried out in Amazon customer review, REUTERS-21578 and International Cancer Genome Consortium (ICGC) on AWS datasets to prove the effectiveness of parallel rough set theory based attribute reduction in terms of accuracy, precision, recall and computation time.

Extracting Textual Information from Google Using Wrapper Class

Muthusamy¹

2017

NET

Abstract:In general, the web text documents are often structured, un-structured, or semi-structured format that is promptly growing everyday with massive amounts of data. The users provided with many tools for searching relevant information. Some of the searches include, Keyword searching, topic and subject browsing can help users to find relevant information quickly. In addition, Index search mechanisms allow the user to retrieve a set of relevant documents. Occasionally these search mechanisms are not sufficient. With the rapid development of Internet, amount of data available on the web regularly increased, which makes it difficult for humans to distinguish relevant information. A wrapper class is proposed to extract the relevant text information and focus on finding useful facts of knowledge from unstructured web documents using Google. Techniques from information retrieval (IR), information extraction (IE), and pattern recognition are explored.

Framework for pattern generation from discriminating datasets

Muthusamy¹,

2015

IJCI

A Survey of Automatic Extraction of Personal Name Alias from the Web

Muthusamy¹,

2014

IJSIP

Automatic Discovery of Lexical Patterns using Pattern Extraction Algorithm to Identify Personal Name Aliases with Entities

Muthusamy¹,

2015

IJSEIA