Receiver Operator Characteristic (ROC) curves are commonly used to present results for binary decision problems in machine learning. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more informative picture of an algorithm's performance. We show that a deep connection exists between ROC space and PR space, such that a curve dominates in ROC space if and only if it dominates in PR space. A corollary is the notion of an achievable PR curve, which has properties much like the convex hull in ROC space; we show an efficient algorithm for computing this curve. Finally, we also note differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between points. Furthermore, algorithms that optimize the area under the ROC curve are not guaranteed to optimize the area under the PR curve.
Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. The goal of our research is to find new approaches within ILP particularly suited for large, highly-skewed domains. We propose Gleaner, a randomized search method that collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an "at least L of these K clauses" thresholding method to combine sets of selected clauses. Our research focuses on Multi-Slot Information Extraction (IE), a task that typically involves many more negative examples than positive examples. We formulate this problem into a relational domain, using two large testbeds involving the extraction of important relations from the abstracts of biomedical journal articles. We compare Gleaner to ensembles of standard theories learned by Aleph, finding that Gleaner produces comparable testset results in a fraction of the training time.
We examine the impartial combinatorial game Babylon. We abstract the game so that it is suitable to combinatorial analysis, and present a full characterization and strategies for the cases with odd number of tokens and only two colors. We also demonstrate partial results on games with even tokens with two colors, an initial extension to three colors, and offer directions for future work.
Vision and Change [AAAS, 2011] outlines a blueprint for modernizing biology education by addressing conceptual understanding of key concepts, such as the relationship between structure and function. The document also highlights skills necessary for student success in 21st century Biology, such as the use of modeling and simulation. Here we describe a laboratory activity that allows students to investigate the dynamic nature of protein structure and function through the use of a modeling technique known as molecular dynamics (MD). The activity takes place over two lab periods that are 3 hr each. The first lab period unpacks the basic approach behind MD simulations, beginning with the kinematic equations that all bioscience students learn in an introductory physics course. During this period students are taught rudimentary programming skills in Python while guided through simple modeling exercises that lead up to the simulation of the motion of a single atom. In the second lab period students extend concepts learned in the first period to develop skills in the use of expert MD software. Here students simulate and analyze changes in protein conformation resulting from temperature change, solvation, and phosphorylation. The article will describe how these activities can be carried out using free software packages, including Abalone and VMD/NAMD.
Abstract. Statistical Relational Learning (SRL) combines the benefits of probabilistic machine learning approaches with complex, structured domains from Inductive Logic Programming (ILP). We propose a new SRL algorithm, GleanerSRL, to generate probabilities for highly-skewed relational domains. In this work, we combine clauses from Gleaner, an ILP algorithm for learning a wide variety of first-order clauses, with the propositional learning technique of support vector machines to learn wellcalibrated probabilities. We find that our results are comparable to SRL algorithms SAYU and SAYU-VISTA on a well-known relational testbed.
As outlined in the ACM Computer Science Curricula 2013 Guidelines section on Social Issues and Professional Practice, "Students must also be exposed to the larger societal context of computing to develop an understanding of the relevant social [and] ethical ... issues. " [1] In this panel, we demonstrate diverse approaches used to achieve this goal with respect to civic engagement. Drawing from experiences with non-major, introductory computing, mobile applications, software engineering, and interdisciplinary courses, we discuss how to move beyond surface-level discussions of ethical case studies toward an integration of civic engagement activities and personal reflection into standard computing curriculum.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.