Brian Wallace scite author profile

Brian Wallace

3Publications

23Citation Statements Received

59Citation Statements Given

How they've been cited

How they cite others

Affiliations

Bucknell University

Publications

Order By: Most citations

“Influence sketching”: Finding influential samples in large-scale regressions

Wojnowicz¹,

Cruz²,

Zhao³

et al. 2016

View full text Add to dashboard Cite

Abstract-There is an especially strong need in modern largescale data analysis to prioritize samples for manual inspection. For example, the inspection could target important mislabeled samples or key vulnerabilities exploitable by an adversarial attack. In order to solve the "needle in the haystack" problem of which samples to inspect, we develop a new scalable version of Cook's distance, a classical statistical technique for identifying samples which unusually strongly impact the fit of a regression model (and its downstream predictions). In order to scale this technique up to very large and high-dimensional datasets, we introduce a new algorithm which we call "influence sketching." Influence sketching embeds random projections within the influence computation; in particular, the influence score is calculated using the randomly projected pseudo-dataset from the post-convergence Generalized Linear Model (GLM). We validate that influence sketching can reliably and successfully discover influential samples by applying the technique to a malware detection dataset of over 2 million executable files, each represented with almost 100,000 features. For example, we find that randomly deleting approximately 10% of training samples reduces predictive accuracy only slightly from 99.47% to 99.45%, whereas deleting the same number of samples with high influence sketch scores reduces predictive accuracy all the way down to 90.24%. Moreover, we find that influential samples are especially likely to be mislabeled. In the case study, we manually inspect the most influential samples, and find that influence sketching pointed us to new, previously unidentified pieces of malware.

show abstract

Lagrangian chaos and multiphase processes in vortex flows

Solomon

Wallace

Miller

et al. 2003

Communications in Nonlinear Science and Numerical Simulation

View full text Add to dashboard Cite

We discuss experimental and numerical studies of the effects of Lagrangian chaos (chaotic advection) on the stretching of a drop of an immiscible impurity in a flow. We argue that the standard capillary number used to describe this process is inadequate since it does not account for advection of a drop between regions of the flow with varying velocity gradient. Consequently, we propose a Lagrangiangeneralized capillary number C L number based on finite-time Lyapunov exponents. We present preliminary tests of this formalism for the stretching of a single drop of oil in an oscillating vortex flow, which has been shown previously to exhibit Lagrangian chaos. Probability distribution functions (PDFs) of the stretching of this drop have features that are similar to PDFs of C L . We also discuss on-going experiments that we have begun on drop stretching in a blinking vortex flow.

show abstract

SUSPEND: Determining software suspiciousness by non-stationary time series modeling of entropy signals

Wojnowicz¹,

Chisholm²,

Wallace³

et al. 2017

Expert Systems with Applications

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Brian Wallace

“Influence sketching”: Finding influential samples in large-scale regressions

Lagrangian chaos and multiphase processes in vortex flows

SUSPEND: Determining software suspiciousness by non-stationary time series modeling of entropy signals

Contact Info

Product

Resources

About