Every text has at least one topic and at least one genre. Evidence for a text's topic and genre comes, in part, from its lexical and syntactic features—features used in both Automatic Topic Classification and Automatic Genre Classification (AGC). Because an ideal AGC system should be stable in the face of changes in topic distribution, we assess five previously published AGC methods with respect to both performance on the same topic–genre distribution on which they were trained and stability of that performance across changes in topic–genre distribution. Our experiments lead us to conclude that (1) stability in the face of changing topical distributions should be added to the evaluation critera for new approaches to AGC, and (2) Part-of-Speech features should be considered individually when developing a high-performing, stable AGC system for a particular, possibly changing corpus.
6There is considerable interest in developing landmark saliency models as a basis for describing urban is perceived by the user? This paper presents a web based experiment in which users were asked to tag and label 10 the most salient features from urban images for the purposes of navigation and exploration. In order to rank 11 landmark popularity in each scene it was necessary to determine which tags related to the same object (e.g. tags 12 relating to a particular café). Existing clustering techniques did not perform well for this task, and it was 13 therefore necessary to develop a new spatial-semantic clustering method which considered the proximity of 14 nearby tags and the similarity of their label content. The annotation similarity was initially calculated using 15 trigrams in conjunction with a synonym list, generating a set of networks formed from the links between related 16 tags. These networks were used to build related word lists encapsulating conceptual connections (e.g. church 17 tower related to clock) so that during a secondary pass of the data related network segments could be merged. 18This approach gives interesting insight into the partonomic relationships between the constituent parts of 19 landmarks and the range and frequency of terms used to describe them. The knowledge gained from this will be 20 used to help calibrate a landmark saliency model, and to gain a deeper understanding of the terms typically 21 associated with different types of landmarks. 22 23
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.