Mark Goadrich scite author profile

Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm's performance. We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. Finally, we also note differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between points. Furthermore, algorithms that optimize the area under the ROC curve are not guaranteed to optimize the area under the PR curve.

show abstract

Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction

Goadrich

Oliphant

Shavlik

2004

View full text Add to dashboard Cite

Smart smartphone development

Goadrich

Rogers

2011

View full text Add to dashboard Cite

Gleaner: Creating ensembles of first-order clauses to improve recall-precision curves

2006

View full text Add to dashboard Cite

Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. The goal of our research is to find new approaches within ILP particularly suited for large, highly-skewed domains. We propose Gleaner, a randomized search method that collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an "at least L of these K clauses" thresholding method to combine sets of selected clauses. Our research focuses on Multi-Slot Information Extraction (IE), a task that typically involves many more negative examples than positive examples. We formulate this problem into a relational domain, using two large testbeds involving the extraction of important relations from the abstracts of biomedical journal articles. We compare Gleaner to ensembles of standard theories learned by Aleph, finding that Gleaner produces comparable testset results in a fraction of the training time.

show abstract

Analyzing Two-Color Babylon

Goadrich

Schlatter

2011

View full text Add to dashboard Cite

show abstract

An undergraduate laboratory activity on molecular dynamics simulations

Spitznagel

Pritchett²,

Messina

et al. 2016

Biochem Molecular Bio Educ

View full text Add to dashboard Cite

Vision and Change [AAAS, 2011] outlines a blueprint for modernizing biology education by addressing conceptual understanding of key concepts, such as the relationship between structure and function. The document also highlights skills necessary for student success in 21st century Biology, such as the use of modeling and simulation. Here we describe a laboratory activity that allows students to investigate the dynamic nature of protein structure and function through the use of a modeling technique known as molecular dynamics (MD). The activity takes place over two lab periods that are 3 hr each. The first lab period unpacks the basic approach behind MD simulations, beginning with the kinematic equations that all bioscience students learn in an introductory physics course. During this period students are taught rudimentary programming skills in Python while guided through simple modeling exercises that lead up to the simulation of the motion of a single atom. In the second lab period students extend concepts learned in the first period to develop skills in the use of expert MD software. Here students simulate and analyze changes in protein conformation resulting from temperature change, solvation, and phosphorylation. The article will describe how these activities can be carried out using free software packages, including Abalone and VMD/NAMD.

show abstract

Combining Clauses with Various Precisions and Recalls to Produce Accurate Probabilistic Estimates

Goadrich

Shavlik

View full text Add to dashboard Cite

Abstract. Statistical Relational Learning (SRL) combines the benefits of probabilistic machine learning approaches with complex, structured domains from Inductive Logic Programming (ILP). We propose a new SRL algorithm, GleanerSRL, to generate probabilities for highly-skewed relational domains. In this work, we combine clauses from Gleaner, an ILP algorithm for learning a wide variety of first-order clauses, with the propositional learning technique of support vector machines to learn wellcalibrated probabilities. We find that our results are comparable to SRL algorithms SAYU and SAYU-VISTA on a well-known relational testbed.

show abstract

Civic Engagement Across the Computing Curriculum

Goadrich

Goldweber

Jadud

et al. 2019

View full text Add to dashboard Cite

As outlined in the ACM Computer Science Curricula 2013 Guidelines section on Social Issues and Professional Practice, "Students must also be exposed to the larger societal context of computing to develop an understanding of the relevant social [and] ethical ... issues. " [1] In this panel, we demonstrate diverse approaches used to achieve this goal with respect to civic engagement. Drawing from experiences with non-major, introductory computing, mobile applications, software engineering, and interdisciplinary courses, we discuss how to move beyond surface-level discussions of ethical case studies toward an integration of civic engagement activities and personal reflection into standard computing curriculum.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mark Goadrich

The relationship between Precision-Recall and ROC curves

Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction

Smart smartphone development

Gleaner: Creating ensembles of first-order clauses to improve recall-precision curves

Analyzing Two-Color Babylon

An undergraduate laboratory activity on molecular dynamics simulations

Combining Clauses with Various Precisions and Recalls to Produce Accurate Probabilistic Estimates

Civic Engagement Across the Computing Curriculum

Contact Info

Product

Resources

About