Information retrieval evaluation based on the pooling method is inherently biased against systems that did not contribute to the pool of judged documents. This may distort the results obtained about the relative quality of the systems evaluated and thus lead to incorrect conclusions about the performance of a particular ranking technique.We examine the magnitude of this effect and explore how it can be countered by automatically building an unbiased set of judgements from the original, biased judgements obtained through pooling. We compare the performance of this method with other approaches to the problem of incomplete judgements, such as bpref, and show that the proposed method leads to higher evaluation accuracy, especially if the set of manual judgements is rich in documents, but highly biased against some systems.
To enable more websites to provide content in the form of sign language, we investigate software to partially automate the synthesis of animations of American Sign Language (ASL), based on a human-authored message specification. We automatically select: where prosodic pauses should be inserted (based on the syntax or other features), the timeduration of these pauses, and the variations of the speed at which individual words are performed (e.g. slower at the end of phrases). Based on an analysis of a corpus of multisentence ASL recordings with motion-capture data, we trained machine-learning models, which were evaluated in a cross-validation study. The best model out-performed a prior state-of-the-art ASL timing model. In a study with native ASL signers evaluating animations generated from either our new model or from a simple baseline (uniform speed and no pauses), participants indicated a preference for speed and pausing in ASL animations from our model.
Recent research has investigated automatic methods for identifying how important each word in a text is for the overall message, in the context of people who are Deaf and Hard of Hearing (DHH) viewing video with captions. We examine whether DHH users report benefits from visual highlighting of important words in video captions. In formative interview and prototype studies, users indicated a preference for underlining of 5%-15% of words in a caption text to indicate that they are important, and they expressed an interest for such text markup in the context of educational lecture videos. In a subsequent user study, 30 DHH participants viewed lecture videos in two forms: with and without such visual markup. Users indicated that the videos with captions containing highlighted words were easier to read and follow, with lower perceived task-load ratings, compared to the videos without highlighting. This study motivates future research on caption highlighting in online educational videos, and it provides a foundation for how to evaluate the efficacy of such systems with users.
For enterprise search, there exists a relationship between work task and document type that can be used to refine search results [3]. In this poster, we adapt the popular Okapi BM25 scoring function to weight term frequency based on the relevance of a document type to a work task. Also, we use click frequency for each task-type pair to estimate a realistic weight. Using the W3C collection from the TREC Enterprise track for evaluations, our approach leads to significant improvements on search precision.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.