Information filtering and information retrieving applications are based on web page classification methods. Usually, web pages serve different functionalities or develop different topics or subjects. The diversity of web page content increases the need for automatic web page classification, making it a challenging task at the same time. Considering that the main component of the content of a web page is most often represented by the text and the classification of the text is a problem intensively studied in the last years, with researchers reporting state-of-the-art results for various methods, the idea of applying these methods on the text extracted from web pages could lead to important results. In this work, we revisit our experimental study on convolutional neural networks for multi-label multi-language web page classification with a new approach that consists of dividing the classification problem into functional classification and subject classification of web pages. From the experimental evaluation, one may conclude that the separation of the functional and subject classification of web pages leads to an improvement of the overall results.
This work presents a comparison between several task distribution methods for load balancing with the help of an original implementation of a solution based on a multi-agent system. Among the original contributions, one can mention the design and implementation of the agent-based solution and the proposal of various scenarios, strategies and metrics that are further analyzed in the experimental case studies. The best strategy depends on the context. When the objective is to use the processors at their highest processing potential, the agents preferences strategy produces the best usage of the processing resources with an aggregated load per turn for all PAs up to four times higher than the rest of the strategies. When one needs to have a balance between the loads of the processing elements, the maximum availability strategy is better than the rest of the examined strategies, producing the lowest imbalance rate between PAs out of all the strategies in most scenarios. The random distribution strategy produces the lowest average load especially for tasks with higher required processing time, and thus, it should generally be avoided.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.