Sayan Mukherjee scite author profile

Three-dimensional geometric morphometric (3DGM) methods for placing landmarks on digitized bones have become increasingly sophisticated in the last 20 years, including greater degrees of automation. One aspect shared by all 3DGM methods is that the researcher must designate initial landmarks. Thus, researcher interpretations of homology and correspondence are required for and influence representations of shape. We present an algorithm allowing fully automatic placement of correspondence points on samples of 3D digital models representing bones of different individuals/species, which can then be input into standard 3DGM software and analyzed with dimension reduction techniques. We test this algorithm against several samples, primarily a dataset of 106 primate calcanei represented by 1,024 correspondence points per bone. Results of our automated analysis of these samples are compared to a published study using a traditional 3DGM approach with 27 landmarks on each bone. Data were analyzed with morphologika 2.5 and PAST. Our analyses returned strong correlations between principal component scores, similar variance partitioning among components, and similarities between the shape spaces generated by the automatic and traditional methods. While cluster analyses of both automatically generated and traditional datasets produced broadly similar patterns, there were also differences. Overall these results suggest to us that automatic quantifications can lead to shape spaces that are as meaningful as those based on observer landmarks, thereby presenting potential to save time in data collection, increase completeness of morphological quantification, eliminate observer error, and allow comparisons of shape diversity between different types of bones. We provide an R package for implementing this analysis. Anat Rec,

show abstract

Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits

Crawford

et al. 2017

View full text Add to dashboard Cite

Epistasis, commonly defined as the interaction between multiple genes, is an important genetic component underlying phenotypic variation. Many statistical methods have been developed to model and identify epistatic interactions between genetic variants. However, because of the large combinatorial search space of interactions, most epistasis mapping methods face enormous computational challenges and often suffer from low statistical power due to multiple test correction. Here, we present a novel, alternative strategy for mapping epistasis: instead of directly identifying individual pairwise or higher-order interactions, we focus on mapping variants that have non-zero marginal epistatic effects—the combined pairwise interaction effects between a given variant and all other variants. By testing marginal epistatic effects, we can identify candidate variants that are involved in epistasis without the need to identify the exact partners with which the variants interact, thus potentially alleviating much of the statistical and computational burden associated with standard epistatic mapping procedures. Our method is based on a variance component model, and relies on a recently developed variance component estimation method for efficient parameter inference and p-value computation. We refer to our method as the “MArginal ePIstasis Test”, or MAPIT. With simulations, we show how MAPIT can be used to estimate and test marginal epistatic effects, produce calibrated test statistics under the null, and facilitate the detection of pairwise epistatic interactions. We further illustrate the benefits of MAPIT in a QTL mapping study by analyzing the gene expression data of over 400 individuals from the GEUVADIS consortium.

show abstract

Persistent homology transform for modeling shapes and surfaces

Turner¹,

Mukherjee²,

Boyer³

2014

Information and Inference

139

160

View full text Add to dashboard Cite

In this paper we introduce a statistic, the persistent homology transform (PHT), to model surfaces in R 3 and shapes in R 2 . This statistic is a collection of persistence diagrams -multiscale topological summaries used extensively in topological data analysis. We use the PHT to represent shapes and execute operations such as computing distances between shapes or classifying shapes. We prove the map from the space of simplicial complexes in R 3 into the space spanned by this statistic is injective. This implies that the statistic is a sufficient statistic for probability densities on the space of piecewise linear shapes. We also show that a variant of this statistic, the Euler Characteristic Transform (ECT), admits a simple exponential family formulation which is of use in providing likelihood based inference for shapes and surfaces. We illustrate the utility of this statistic on simulated and real data. persistence homology, surfaces, shape spaces, sufficient shape statistics Insert classification here 1 arXiv:1310.1030v2 [math.ST]

show abstract

Molecular classification of multiple tumor types

et al. 2001

View full text Add to dashboard Cite

Using gene expression data to classify tumor types is a very promising tool in cancer diagnosis. Previous works show several pairs of tumor types can be successfully distinguished by their gene expression patterns (Golub et al. 1999, Ben-Dor et al. 2000, Alizadeh et al. 2000). However, the simultaneous classification across a heterogeneous set of tumor types has not been well studied yet. We obtained 190 samples from 14 tumor classes and generated a combined expression dataset containing 16063 genes for each of those samples. We performed multi-class classification by combining the outputs of binary classifiers. Three binary classifiers (k-nearest neighbors, weighted voting, and support vector machines) were applied in conjunction with three combination scenarios (one-vs-all, all-pairs, hierarchical partitioning). We achieved the best cross validation error rate of 18.75% and the best test error rate of 21.74% by using the one-vs-all support vector machine algorithm. The results demonstrate the feasibility of performing clinically useful classification from samples of multiple tumor types.

show abstract

Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples

Snyder‐Mackler

Yuan

Shaver

et al. 2016

View full text Add to dashboard Cite

Research on the genetics of natural populations was revolutionized in the 1990s by methods for genotyping noninvasively collected samples. However, these methods have remained largely unchanged for the past 20 years and lag far behind the genomics era. To close this gap, here we report an optimized laboratory protocol for genome-wide capture of endogenous DNA from noninvasively collected samples, coupled with a novel computational approach to reconstruct pedigree links from the resulting low-coverage data. We validated both methods using fecal samples from 62 wild baboons, including 48 from an independently constructed extended pedigree. We enriched fecal-derived DNA samples up to 40-fold for endogenous baboon DNA and reconstructed near-perfect pedigree relationships even with extremely low-coverage sequencing. We anticipate that these methods will be broadly applicable to the many research systems for which only noninvasive samples are available. The lab protocol and software (“WHODAD”) are freely available at www.tung-lab.org/protocols-and-software.html and www.xzlab.org/software.html, respectively.

show abstract

Permutation Tests for Classification

et al. 2005

View full text Add to dashboard Cite

© 2 0 0 3 m a s s a c h u s e t t s i n s t i t u t e o f t e c h n o l o g y, c a m b r i d g e , m a 0 2 1 3 9 u s a -w w w. a i . m i t . e d u m a s s a c h u s e t t s i n s t i t u t e o f t e c h n o l o g y -a r t i f i c i a l i n t e l l i g e n c e l a b o r a t o r y @ MIT AbstractWe introduce and explore an approach to estimating statistical significance of classification accuracy, which is particularly useful in scientific applications of machine learning where high dimensionality of the data and the small number of training examples render most standard convergence bounds too loose to yield a meaningful guarantee of the generalization ability of the classifier. Instead, we estimate statistical significance of the observed classification accuracy, or the likelihood of observing such accuracy by chance due to spurious correlations of the high-dimensional data patterns with the class labels in the given training set. We adopt permutation testing, a non-parametric technique previously developed in classical statistics for hypothesis testing in the generative setting (i.e., comparing two probability distributions). We demonstrate the method on real examples from neuroimaging studies and DNA microarray analysis and suggest a theoretical analysis of the procedure that relates the asymptotic behavior of the test to the existing convergence bounds.

show abstract

Local Homology Transfer and Stratification Learning

Bendich¹,

Wang²,

Mukherjee³

2012

View full text Add to dashboard Cite

The objective of this paper is to show that point cloud data can under certain circumstances be clustered by strata in a plausible way. For our purposes, we consider a stratified space to be a collection of manifolds of different dimensions which are glued together in a locally trivial manner inside some Euclidean space. To adapt this abstract definition to the world of noise, we first define a multi-scale notion of stratified spaces, providing a stratification at different scales which are indexed by a radius parameter. We then use methods derived from kernel and cokernel persistent homology to cluster the data points into different strata. We prove a correctness guarantee for this clustering method under certain topological conditions. We then provide a probabilistic guarantee for the clustering for the point sample setting-we provide bounds on the minimum number of sample points required to state with high probability which points belong to the same strata. Finally, we give an explicit algorithm for the clustering.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sayan Mukherjee

Nonlinear prediction of chaotic time series using support vector machines

A New Fully Automated Approach for Aligning and Comparing Shapes

Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits

Persistent homology transform for modeling shapes and surfaces

Molecular classification of multiple tumor types

Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples

Permutation Tests for Classification

Local Homology Transfer and Stratification Learning

Contact Info

Product

Resources

About