Karen Willbrand scite author profile

Here we introduce a new method of detecting pattern in microarray data series which is independent of the nature of this pattern. Our approach provides a measure of the algorithmic compressibility of each data series. A series which is significantly compressible is much more likely to result from simple underlying mechanisms than series which are incompressible. Accordingly, the gene associated with a compressible series is more likely to be biologically significant. We test our method on microarray time series of yeast cell cycle and show that it blindly selects genes exhibiting the expected cyclic behaviour as well as detecting other forms of pattern. Our results successfully predict two independent non-microarray experimental studies.

show abstract

Identifying genes from up-down properties of microarray expression series

Willbrand¹,

Radvanyi²,

Nadal³

et al. 2005

Bioinformatics

View full text Add to dashboard Cite

show abstract

A new upscaling method for fractured porous media

Chen

Clauser

Marquart

et al. 2015

Advances in Water Resources

View full text Add to dashboard Cite

1-D random landscapes and non-random data series

Fink

Willbrand²,

Brown³

2007

Europhys. Lett.

View full text Add to dashboard Cite

We study the simplest random landscape, the curve formed by joining consecutive data points f1, . . . , fN+1 with line segments, where the fi are i.i.d. random numbers and fi = fj. We label each segment increasing (+) or decreasing (−) and call this string of +'s and −'s the up-down signature σ. We calculate the probability P (σ(f )) for a random curve and use it to bound the algorithmic information content of f . We show that f can be compressed by k = log 2 1/P (σ)−N bits, where k is a universal currency for comparing the amount of pattern in different curves. By applying our results to microarray time series data, we blindly identify regulatory genes.Introduction -Identifying trends or pattern in a data series is the traditional basis of hypothesis formation in the physical sciences [1]. Typically, the pattern is incontrovertible and can be encapsulated by a concise mathematical relation between the data and the independent variable. However, many systems exhibiting collective behaviour -such as genetic networks, financial markets and social systems -exhibit weak pattern, that is, the pattern does not look significantly different from a random curve. Moreover, because the dynamics of collective systems are in general not understood (at most a statistical description is possible), it is not clear what kind of pattern to look for.Random landscapes are central to the disciplines of spin glasses, drainage networks, protein folding, neural networks and combinatorial optimisation [2,3]. Properties of these systems are related to simple questions about their landscapes: How many minima are there? What is the size of their basins of attraction? What is the pattern of rises and falls?In this Letter we show that that there are fruitful underlying connections between the dynamical properties of a 1-D landscape and the presence of pattern in a series of data. Considering a series as a sequence of increases and decreases provides a method of compressing a curve, in the sense that the size of the file needed to store instructions for generating the curve is less than it would be by storing the curve outright. We derive a formal relation between the up-down properties of a curve and the algorithmic information content (AIC) of the equivalent data series, or size of the smallest file needed to store it, which is the ultimate test of pattern. As a demonstration of its efficacy, we use our method to blindly identify regulatory genes from a classic yeast cell cycle microarray data set. Random data and permutations -We study the simplest form of random landscape, a sequence of N +1 identically and independently distributed random numbers. We connect pairs of consecutive data points with line segments to form a curve. If we assume that the probability that two points are identical is negligible, we can label these

show abstract

Upscaling permeability for three-dimensional fractured porous rocks with the multiple boundary method

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Karen Willbrand

Unbiased pattern detection in microarray data series

Identifying genes from up-down properties of microarray expression series

A new upscaling method for fractured porous media

1-D random landscapes and non-random data series

Upscaling permeability for three-dimensional fractured porous rocks with the multiple boundary method

Contact Info

Product

Resources

About