Abstract.What is the likelihood that a Web page is considered relevant to a query, given the relevance assessment of the corresponding snippet? Using a new federated IR test collection that contains search results from over a hundred search engines on the internet, we are able to investigate such research questions from a global perspective. Our test collection covers the main Web search engines like Google, Yahoo!, and Bing, as well as a number of smaller search engines dedicated to multimedia, shopping, etc., and as such reflects a realistic Web environment. Using a large set of relevance assessments, we are able to investigate the connection between snippet quality and page relevance. The dataset is strongly inhomogeneous, and although the assessors' consistency is shown to be satisfying, care is required when comparing resources. To this end, a number of probabilistic quantities, based on snippet and page relevance, are introduced and evaluated.
Welcome to the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology. The workshop aims to bring together researchers interested in applying computational techniques to problems in morphology, phonology, and phonetics. Our program this year highlights the ongoing and important interaction between work in computational linguistics and work in theoretical linguistics. We received 23 submissions and accepted 11.The volume of submissions made it necessary to recruit several additional reviewers. We'd like to thank all of these people for agreeing to review papers on what seemed like impossibly short notice.This year also marks the first SIGMORPHON shared task, on morphological reinflection. The shared task received 9 submissions, all of which were accepted, and greatly advanced the state of the art in this area.We thank all the authors, reviewers and organizers for their efforts on behalf of the community. AbstractThis paper conceptualizes speech prosody data mining and its potential application in data-driven phonology/phonetics research. We first conceptualize Speech Prosody Mining (SPM) in a time-series data mining framework. Specifically, we propose using efficient symbolic representations for speech prosody time-series similarity computation. We experiment with both symbolic and numeric representations and distance measures in a series of time-series classification and clustering experiments on a dataset of Mandarin tones. Evaluation results show that symbolic representation performs comparably with other representations at a reduced cost, which enables us to efficiently mine large speech prosody corpora while opening up to possibilities of using a wide range of algorithms that require discrete valued data. We discuss the potential of SPM using time-series mining techniques in future works. IntroductionCurrent investigations on the phonology of intonation and tones (or pitch accent) typically employ data-driven approaches by building research on top of manual annotations of a large amount of speech prosody data (for example, (Morén and Zsiga, 2006; Zsiga and Zec, 2013), and many others). Meanwhile, researchers are also limited by the amount of resources invested in such expensive endeavor of manual annotations. Given this paradox, we believe that this type of data driven approach in phonology-phonetics interface can benefit from tools that can efficiently index, query, classify, cluster, summarize, and discover meaningful prosodic patterns from a large speech prosody corpus.The data mining of f 0 1 (pitch) contour patterns from audio data has recently gained success in the domain of Music Information Retrieval (aka MIR, see (Gulati and Serra, 2014; Gulati et al., 2015; Ganguli, 2015) for examples). In contrast, the data mining of speech prosody f 0 data (here on referred to as Speech Prosody Mining (SPM) 2 ) is a less explored research topic (Raskinis and Kazlauskiene, 2013). Fundamentally, SPM in a large prosody corpus aims at discovering meaningful patterns in the f ...
On Twitter, many users tweet in more than one language. In this study, we examine the use of two Dutch minority languages. Users can engage with different audiences and by analyzing different types of tweets, we find that characteristics of the audience influence whether a minority language is used. Furthermore, while most tweets are written in Dutch, in conversations users often switch to the minority language.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.