Abstract. Due to various Web authoring tools, the new web standards, and improved web accessibility, a wide variety of Web contents are being produced very quickly. In such an environment, in order to provide appropriate Web services to users' needs it is important to quickly and accurately extract relevant information from Web documents and remove irrelevant contents such as advertisements. In this paper, we propose a method that extracts main content accurately from HTML Web documents. In the method, a decision tree is built and used to classify each block of text whether it is a part of the main content. For classification we use contextual features around text blocks including word density, link density, HTML tag distribution, and distances between text blocks. We experimented with our method using a published data set and a data set that we collected. The experiment results show that our method performs 19% better in F-measure compared to the existing best performing method.
To speed up fetching web pages, this paper gives an intelligent technology of web pre-fetching. We use a simplified WWW data model to represent the datu in the cache of web browser to mine the association rules. We store these rules in a knowledge base so as to predict the user S actions. Intelligent agents are responsible f o r mining the users' interest and pre-fetching web pages, bused on the interest association repositoty. In this way, user browsing time has been reduced transpurently.
To speed up fetching web pages, this paper gives a n intelligent technology of web pre-fetching. We use a simplified WWW data model to represent the data in the cache of web browser to mine the association rules. We store these rules in a knowledge base so as to predict the user's actions. Intelligent agents are responsible f o r mining the users' interest and pre-fetching web pages, based on the interest association repository. In this way, user browsing ,time has been reduced transparently.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.