Rico Angell scite author profile

Bias in decisions made by modern software is becoming a common and serious problem. We present Themis, an automated test suite generator to measure two types of discrimination, including causal relationships between sensitive inputs and program behavior. We explain how Themis can measure discrimination and aid its debugging, describe a set of optimizations Themis uses to reduce test suite size, and demonstrate Themis' effectiveness on open-source software. Themis is open-source and all our evaluation data are available at http://fairness.cs.umass.edu/. See a video of Themis in action: https://youtu.be/brB8wkaUesY CCS CONCEPTS• Software and its engineering → Software testing and debugging;

show abstract

Clustering-based Inference for Biomedical Entity Linking

Angell¹,

Monath²,

Mohan³

et al. 2021

View full text Add to dashboard Cite

Due to large number of entities in biomedical knowledge bases, only a small fraction of entities have corresponding labelled training data. This necessitates entity linking models which are able to link mentions of unseen entities using learned representations of entities. Previous approaches link each mention independently, ignoring the relationships within and across documents between the entity mentions. These relations can be very useful for linking mentions in biomedical text where linking decisions are often difficult due mentions having a generic or a highly specialized form. In this paper, we introduce a model in which linking decisions can be made not merely by linking to a knowledge base entity but also by grouping multiple mentions together via clustering and jointly making linking predictions. In experiments we improve the state-of-the-art entity linking accuracy on two biomedical entity linking datasets including on the largest publicly available dataset.

show abstract

A propositional logic with subjunctive conditionals

Angell

1962

J. symb. log.

View full text Add to dashboard Cite

In this paper a formalized logic of propositions, PA1, is presented. It is proven consistent and its relationships to traditional logic, to PM ([15]), to subjunctive (including contrary-to-fact) implication and to the “paradoxes” of material and strict implication are developed. Apart from any intrinsic merit it possesses, its chief significance lies in demonstrating the feasibility of a general logic containing the principle of subjunctive contrariety, i.e., the principle that ‘If p were true then q would be true’ and ‘If p were true then q would be false’ are incompatible.

show abstract

Don’t Be Greedy: Leveraging Community Structure to Find High Quality Seed Sets for Influence Maximization

Angell¹,

Schoenebeck

2017

View full text Add to dashboard Cite

We consider the problem of maximizing the spread of influence in a social network by choosing a fixed number of initial seeds -a central problem in the study of network cascades. The majority of existing work on this problem, formally referred to as the influence maximization problem, is designed for submodular cascades. Despite the empirical evidence that many cascades are non-submodular, little work has been done focusing on non-submodular influence maximization.We propose a new heuristic for solving the influence maximization problem and show via simulations on real-world and synthetic networks that our algorithm outputs more influential seed sets than the state-of-the-art greedy algorithm in many natural cases, with average improvements of 7% for submodular cascades, and 55% for non-submodular cascades. Our heuristic uses a dynamic programming approach on a hierarchical decomposition of the social network to leverage the relation between the spread of cascades and the community structure of social networks. We verify the importance of network structure by showing the quality of the hierarchical decomposition impacts the quality of seed set output by our algorithm. We also present "worst-case" theoretical results proving that in certain settings our algorithm outputs seed sets that are a factor of Θ( √ n) more influential than those of the greedy algorithm, where n is the number of nodes in the network. Finally, we generalize our algorithm to a message passing version that can be used to find seed sets that have at least as much influence as the dynamic programming algorithms.

show abstract

Entity Linking and Discovery via Arborescence-based Supervised Clustering

Agarwal¹,

Angell²,

Monath³

et al. 2021

Preprint

View full text Add to dashboard Cite

Previous work has shown promising results in performing entity linking by measuring not only the affinities between mentions and entities but also those amongst mentions. In this paper, we present novel training and inference procedures that fully utilize mention-tomention affinities by building minimum arborescences (i.e., directed spanning trees) over mentions and entities across documents in order to make linking decisions. We also show that this method gracefully extends to entity discovery, enabling the clustering of mentions that do not have an associated entity in the knowledge-base. We evaluate our approach on the Zero-Shot Entity Linking dataset and MedMentions, the largest publicly available biomedical dataset, and show significant improvements in performance for both entity linking and discovery compared to identically parameterized models. We further show significant efficiency improvements with only a small loss in accuracy over previous work, which use more computationally expensive models.

show abstract

Entity Linking via Explicit Mention-Mention Coreference Modeling

Agarwal¹,

Angell²,

Monath³

et al. 2022

View full text Add to dashboard Cite

Learning representations of entity mentions is a core component of modern entity linking systems for both candidate generation and making linking predictions. In this paper 1 , we present and empirically analyze a novel training approach for learning mention and entity representations that is based on building minimum spanning arborescences (i.e., directed spanning trees) over mentions and entities across documents to explicitly model mention coreference relationships. We demonstrate the efficacy of our approach by showing significant improvements in both candidate generation recall and linking accuracy on the Zero-Shot Entity Linking dataset and MedMentions, the largest publicly available biomedical dataset. In addition, we show that our improvements in candidate generation yield higher quality re-ranking models downstream, setting a new SOTA result in linking accuracy on MedMentions. Finally, we demonstrate that our improved mention representations are also effective for the discovery of new entities via cross-document coreference.

show abstract

Clustering-based Inference for Biomedical Entity Linking

Angell¹,

Monath²,

Mohan³

et al. 2020

Preprint

View full text Add to dashboard Cite

Due to large number of entities in biomedical knowledge bases, only a small fraction of entities have corresponding labelled training data. This necessitates a zero-shot entity linking model which is able to link mentions of unseen entities using learned representations of entities. Existing zero-shot entity linking models however link each mention independently, ignoring the inter/intra-document relationships between the entity mentions. These relations can be very useful for linking mentions in biomedical text where linking decisions are often difficult due mentions having a generic or a highly specialized form. In this paper, we introduce a model in which linking decisions can be made not merely by linking to a KB entity but also by grouping multiple mentions together via clustering and jointly making linking predictions. In experiments on the largest publicly available biomedical dataset, we improve the best independent prediction for zero-shot entity linking by 2.5 points of accuracy, and our joint inference model further improves entity linking by 1.8 points.

show abstract

Low resource recognition and linking of biomedical concepts from a large ontology

Mohan

Angell²,

Monath³

et al. 2021

View full text Add to dashboard Cite

Tools to explore scientific literature are essential for scientists, especially in biomedicine, where about a million new papers are published every year. Many such tools provide users the ability to search for specific entities (e.g. proteins, diseases) by tracking their mentions in papers. PubMed, the most well known database of biomedical papers, relies on human curators to add these annotations. This can take several weeks for new papers, and not all papers get tagged. Machine learning models have been developed to facilitate the semantic indexing of scientific papers. However their performance on the more comprehensive ontologies of biomedical concepts does not reach the levels of typical entity recognition problems studied in NLP. In large part this is due to their low resources, where the ontologies are large, there is a lack of descriptive text defining most entities, and labeled data can only cover a small portion of the ontology. In this paper, we develop a new model that overcomes these challenges by (1) generalizing to entities unseen at training time, and (2) incorporating linking predictions into the mention segmentation decisions. Our approach achieves new state-of-the-art results for the UMLS ontology in both traditional recognition/linking (+8 F1 pts) as well as semantic indexing-based evaluation (+10 F1 pts).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rico Angell

Themis: automatically testing software for discrimination

Clustering-based Inference for Biomedical Entity Linking

A propositional logic with subjunctive conditionals

Don’t Be Greedy: Leveraging Community Structure to Find High Quality Seed Sets for Influence Maximization

Entity Linking and Discovery via Arborescence-based Supervised Clustering

Entity Linking via Explicit Mention-Mention Coreference Modeling

Clustering-based Inference for Biomedical Entity Linking

Low resource recognition and linking of biomedical concepts from a large ontology

Contact Info

Product

Resources

About