This paper proposes a system capable of identifying and categorising web pages, on the basis of information filtering. The system is a three layer Probabilistic Neural Network (PNN) with biases and radial basis neurons in the middle layer and competitive neurons in the output layer. The domain of study involves the e-commerce area. Thus, the PNN scopes to identify e-commerce web pages and classify them to the respective type according to a framework, which describes the fundamental phases of commercial transactions in the web.The system was tested with many types of web pages demonstrating the robustness of the method, since no restrictions were imposed except for the language of the content, which is English. The probabilistic classifier was used for estimating the population of specific e-commerce web pages. Potential applications involve surveying web activity in commercial servers, as well as web page classification in largely expanding information areas like e-government or news and media.technique, according to which, some distance measures are found and the most frequent code among the respective corpus texts is assigned to the text [2].A second large group of techniques are neural networks. This class of models originated from engineering and two of its main areas of application are classification and decision problems. The numerical input obtained from each web page is a vector containing the frequency of appearance of terms. Due to the possible appearance of thousands of terms, the dimension of the vectors can be reduced either by Singular Value Decomposition or by their projection to spaces with fewer dimensions [3], [4]. Text classification and document classification has also been tested with neural networks architectures, which are called Self-Organised Maps (SOMs) [4], [5], [6]. Other solutions, like the use of evolution-based genetic algorithms, and the utilization of fuzzy function approximation have also been presented as possible solutions for the classification problem [7], [8], [9], [10].Neural networks are chosen mainly for computational reasons, since once trained, they operate very fast and the creation of thesauri and indices is avoided. Nevertheless, basic concepts from information filtering and retrieval are still used in the computations. Thus, many experimental investigations on the use of neural networks for implementing relevance feedback in an interactive information retrieval system have been proposed. In these investigations, the anticipated outcome was to compare relevance feedback mechanisms with neural networks based techniques on the basis of relevant and non-relevant document segmentation [11], [12].
The proposed classification methodThis paper describes a Probabilistic Neural Network that classifies web pages under the concepts of Business Media Framework -BMF [13]. The classification is performed by estimating the likelihood of an input feature vector according to Bayes posterior probabilities.Moreover, the theoretical background used, consists of information filtering tec...