In this work, we provide an empirical analysis of differences in word use between genders in telephone conversations, which complements the considerable body of work in sociolinguistics concerned with gender linguistic differences. Experiments are performed on a large speech corpus of roughly 12000 conversations. We employ machine learning techniques to automatically categorize the gender of each speaker given only the transcript of his/her speech, achieving 92% accuracy. An analysis of the most characteristic words for each gender is also presented. Experiments reveal that the gender of one conversation side influences lexical use of the other side. A surprising result is that we were able to classify male-only vs. female-only conversations with almost perfect accuracy.
Abstract. Three methods for combining multiple clustering systems are presented and evaluated, focusing on the problem of finding the correspondence between clusters of different systems. In this work, the clusters of individual systems are represented in a common space and their correspondence estimated by either "clustering clusters" or with Singular Value Decomposition. The approaches are evaluated for the task of topic discovery on three major corpora and eight different clustering algorithms and it is shown experimentally that combination schemes almost always offer gains compared to single systems, but gains from using a combination scheme depend on the underlying clustering systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.