We examine the issue of bias in datasets designed to train visual question answering (VQA) algorithms. These datasets include a collection of natural language questions about images (aka ‐ visual questions). We consider three popular datasets that are captured by people with sight, people who are blind, and generated by computers. We first demonstrate that machine learning algorithms can be trained to recognize each dataset's bias, and so determine the source of a novel visual question. We then discuss potential risks and benefits of biased VQA datasets and corresponding machine learning algorithms that can identify the source of a visual question; e.g., whether it comes from a person with sight, a person who is blind, or bot (aka ‐ computer). Our ultimate aim is to inspire the development of more inclusive VQA systems.
The purpose of the SIGIR 2019 workshop on Fairness, Accountability, Confidentiality, Transparency, and Safety (FACTS-IR) was to explore challenges in responsible information retrieval system development and deployment. To this end, the workshop aimed to crowd-source from the larger SIGIR community and draft an actionable research agenda on five key dimensions of responsible information retrieval: fairness, accountability, confidentiality, transparency, and safety. Such an agenda can guide others in the community that are interested in pursuing FACTS-IR research, as well as inform potential funders about relevant research avenues. The workshop brought together a diverse set of researchers and practitioners interested in contributing to the development of a technical research agenda for responsible information retrieval.
We present PROTOTEX, a novel white-box NLP classification architecture based on prototype networks (Li et al., 2018). PROTOTEX faithfully explains model decisions based on prototype tensors that encode latent clusters of training examples. At inference time, classification decisions are based on the distances between the input text and the prototype tensors, explained via the training examples most similar to the most influential prototypes. We also describe a novel interleaved training algorithm that effectively handles classes characterized by the absence of indicative features. On a propaganda detection task, PROTOTEX accuracy matches BART-large and exceeds BERTlarge with the added benefit of providing faithful explanations. A user study also shows that prototype-based explanations help non-experts to better recognize propaganda in online news.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.