We consider the problem of questionfocused sentence retrieval from complex news articles describing multi-event stories published over time. Annotators generated a list of questions central to understanding each story in our corpus. Because of the dynamic nature of the stories, many questions are time-sensitive (e.g. "How many victims have been found?") Judges found sentences providing an answer to each question. To address the sentence retrieval problem, we apply a stochastic, graph-based method for comparing the relative importance of the textual units, which was previously used successfully for generic summarization. Currently, we present a topic-sensitive version of our method and hypothesize that it can outperform a competitive baseline, which compares the similarity of each sentence to the input question via IDF-weighted word overlap. In our experiments, the method achieves a TRDR score that is significantly higher than that of the baseline.
As Twitter becomes a more common means for officials to communicate with their constituents, it becomes more important that we understand how officials use these communication tools. Using data from 380 members of Congress' Twitter activity during the winter of 2012, we find that officials frequently use Twitter to advertise their political positions and to provide information but rarely to request political action from their constituents or to recognize the good work of others. We highlight a number of differences in communication frequency between men and women, Senators and Representatives, Republicans and Democrats. We provide groundwork for future research examining the behavior of public officials online and testing the predictive power of officials' social media behavior.
Extractive summaries produced from multiple source documents suffer from an array of problems with respect to text cohesion. In this preliminary study, we seek to understand what problems occur in such summaries and how often. We present an analysis of a small corpus of manually revised summaries and discuss the feasibility of making such repairs automatically. Additionally, we present a taxonomy of the problems that occur in the corpus, as well as the operators which, when applied to the summaries, can address these concerns. This study represents a first step toward identifying and automating revision operators that could work with current summarization systems in order to repair cohesion problems in multidocument summaries.
There is much concern about algorithms that underlie information services and the view of the world they present. We develop a novel method for examining the content and strength of gender stereotypes in image search, inspired by the trait adjective checklist method. We compare the gender distribution in photos retrieved by Bing for the query "person" and for queries based on 68 character traits (e.g., "intelligent person") in four regional markets. Photos of men are more often retrieved for "person," as compared to women. As predicted, photos of women are more often retrieved for warm traits (e.g., "emotional") whereas agentic traits (e.g., "rational") are represented by photos of men. A backlash effect, where stereotype-incongruent individuals are penalized, is observed. However, backlash is more prevalent for "competent women" than "warm men." Results underline the need to understand how and why biases enter search algorithms and at which stages of the engineering process.
There is growing evidence that search engines produce results that are socially biased, reinforcing a view of the world that aligns with prevalent social stereotypes. One means to promote greater transparency of search algorithms -which are typically complex and proprietary -is to raise user awareness of biased result sets. However, to date, little is known concerning how users perceive bias in search results, and the degree to which their perceptions differ and/or might be predicted based on user attributes. One particular area of search that has recently gained attention, and forms the focus of this study, is image retrieval and gender bias. We conduct a controlled experiment via crowdsourcing using participants recruited from three countries to measure the extent to which workers perceive a given image results set to be subjective or objective. Demographic information about the workers, along with measures of sexism, are gathered and analysed to investigate whether (gender) biases in the image search results can be detected. Amongst other findings, the results confirm that sexist people are less likely to detect and report gender biases in image search results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.