The explosive growth of the Web makes it a very useful information resource to all types of users. Today, everyone accesses the Internet for various purposes and retrieving the required information within the stipulated time is the major demand from users. Also, the Internet provides millions of Web pages for each and every search term. Getting interesting and required results from the Web becomes very difficult and turning the classification of Web pages into relevant categories is the current research topic. Web page classification is the current research problem that focuses on classifying the documents into different categories, which are used by search engines for producing the result. In this chapter we focus on different machine learning techniques and how Web pages can be classified using these machine learning techniques. The automatic classification of Web pages using machine learning techniques is the most efficient way used by search engines to provide accurate results to the users. Machine learning classifiers may also be trained to preserve the personal details from unauthenticated users and for privacy preserving data mining.
Increasing with the number of users, the need for automatic classification techniques with good classification accuracy increases as search engines depend on previously classified web pages stored as classified directories to retrieve the relevant results. Machine learning techniques for automatic classification gains more interest as the classifier improves its performance with experience. In this paper we show that lazy learners are capable of solving the web page classification problem. Our experimental results show that lazy learners classify the web page with acceptable accuracy using optimum number of attributes and LBR classifies more accurately than LWL classifiers.
The explosive growth of the Web makes it a very useful information resource to all types of users. Today, everyone accesses the Internet for various purposes and retrieving the required information within the stipulated time is the major demand from users. Also, the Internet provides millions of Web pages for each and every search term. Getting interesting and required results from the Web becomes very difficult and turning the classification of Web pages into relevant categories is the current research topic. Web page classification is the current research problem that focuses on classifying the documents into different categories, which are used by search engines for producing the result. In this chapter we focus on different machine learning techniques and how Web pages can be classified using these machine learning techniques. The automatic classification of Web pages using machine learning techniques is the most efficient way used by search engines to provide accurate results to the users. Machine learning classifiers may also be trained to preserve the personal details from unauthenticated users and for privacy preserving data mining.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.