Every time a firm or institution performs an activity on the Web, this is registered, leaving a "digital footprint”. Part this digital footprint is reflected on their websites as these officially represent them on the Web. We plan to automatically monitor the changes that periodically occur in a website to relate them with the business activity. The aim of this paper is to propose a theoretical classification of corporate webpages to associate changes that occur on them with the regular activity of the firms, and to evaluate the possibility of an automatic categorization using classification models. To generate the classification of corporate webpages, a significant number of today corporate webpages were analyzed and observed, distinguishing four theoretical types of corporate webpages. To evaluate the automatic categorization of corporate webpages, a dataset of 1005 today corporate pages was generated by manually labeling them and evaluating their automatic categorization using classification models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.