In this paper, we propose a very simple and elegant patch-based, machine learning technique for image denoising using the higher order singular value decomposition (HOSVD). The technique simply groups together similar patches from a noisy image (with similarity defined by a statistically motivated criterion) into a 3D stack, computes the HOSVD coefficients of this stack, manipulates these coefficients by hard thresholding, and inverts the HOSVD transform to produce the final filtered image. Our technique chooses all required parameters in a principled way, relating them to the noise model. We also discuss our motivation for adopting the HOSVD as an appropriate transform for image denoising. We experimentally demonstrate the excellent performance of the technique on grayscale as well as color images. On color images, our method produces state-of-the-art results, outperforming other color image denoising algorithms at moderately high noise levels. A criterion for optimal patch-size selection and noise variance estimation from the residual images (after denoising) is also presented.
Analyses of frequency profiles of markers on disease or drug-response related genes in diverse populations are important for the dissection of common diseases. We report the results of analyses of data on 405 SNPs from 75 such genes and a 5.2 Mb chromosome, 22 genomic region in 1871 individuals from diverse 55 endogamous Indian populations. These include 32 large (>10 million individuals) and 23 isolated populations, representing a large fraction of the people of India. We observe high levels of genetic divergence between groups of populations that cluster largely on the basis of ethnicity and language. Indian populations not only overlap with the diversity of HapMap populations, but also contain population groups that are genetically distinct. These data and results are useful for addressing stratification and study design issues in complex traits especially for heterogeneous populations.
In this paper, we present a novel feature allocation model to describe tumor heterogeneity (TH) using next-generation sequencing (NGS) data. Taking a Bayesian approach, we extend the Indian buffet process (IBP) to define a class of nonparametric models, the categorical IBP (cIBP). A cIBP takes categorical values to denote homozygous or heterozygous genotypes at each SNV. We define a subclone as a vector of these categorical values, each corresponding to an SNV. Instead of partitioning somatic mutations into non-overlapping clusters with similar cellular prevalences, we took a different approach using feature allocation. Importantly, we do not assume somatic mutations with similar cellular prevalence must be from the same subclone and allow overlapping mutations shared across subclones. We argue that this is closer to the underlying theory of phylogenetic clonal expansion, as somatic mutations occurred in parent subclones should be shared across the parent and child subclones. Bayesian inference yields posterior probabilities of the number, genotypes, and proportions of subclones in a tumor sample, thereby providing point estimates as well as variabilities of the estimates for each subclone. We report results on both simulated and real data. BayClone is available at
Abstract-A hybrid censoring scheme is a mixture of Type-I and Type-II censoring schemes. This article presents the statistical inferences on Weibull parameters when the data are Type-II hybrid censored. The maximum likelihood estimators, and the approximate maximum likelihood estimators are developed for estimating the unknown parameters. Asymptotic distributions of the maximum likelihood estimators are used to construct approximate confidence intervals. Bayes estimates, and the corresponding highest posterior density credible intervals of the unknown parameters, are obtained using suitable priors on the unknown parameters, and by using Markov Chain Monte Carlo techniques. The method of obtaining the optimum censoring scheme based on the maximum information measure is also developed. We perform Monte Carlo simulations to compare the performances of the different methods, and we analyse one data set for illustrative purposes.
Several recent models have proposed the use of precise timing of spikes for cortical computation. Such models rely on growing experimental evidence that neurons in the thalamus as well as many primary sensory cortical areas respond to stimuli with remarkable temporal precision. Models of computation based on spike timing, where the output of the network is a function not only of the input but also of an independently initializable internal state of the network, must, however, satisfy a critical constraint: the dynamics of the network should not be sensitive to initial conditions. We have previously developed an abstract dynamical system for networks of spiking neurons that has allowed us to identify the criterion for the stationary dynamics of a network to be sensitive to initial conditions. Guided by this criterion, we analyzed the dynamics of several recurrent cortical architectures, including one from the orientation selectivity literature. Based on the results, we conclude that under conditions of sustained, Poisson-like, weakly correlated, low to moderate levels of internal activity as found in the cortex, it is unlikely that recurrent cortical networks can robustly generate precise spike trajectories, that is, spatiotemporal patterns of spikes precise to the millisecond timescale.
We present a new method for compact representation of large image datasets. Our method is based on treating small patches from a 2-D image as matrices as opposed to the conventional vectorial representation, and encoding these patches as sparse projections onto a set of exemplar orthonormal bases, which are learned a priori from a training set. The end result is a low-error, highly compact image/patch representation that has significant theoretical merits and compares favorably with existing techniques (including JPEG) on experiments involving the compression of ORL and Yale face databases, as well as a database of miscellaneous natural images. In the context of learning multiple orthonormal bases, we show the easy tunability of our method to efficiently represent patches of different complexities. Furthermore, we show that our method is extensible in a theoretically sound manner to higher-order matrices ("tensors"). We demonstrate applications of this theory to compression of well-known color image datasets such as the GaTech and CMU-PIE face databases and show performance competitive with JPEG. Lastly, we also analyze the effect of image noise on the performance of our compression schemes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.