22The recent curation of large-scale databases with 3D surface scans of shapes has motivated the devel-23 opment of tools that better detect global-patterns in morphological variation. Studies which focus on 24 identifying differences between shapes have been limited to simple pairwise comparisons and rely on 25 pre-specified landmarks (that are often known). We present SINATRA: the first statistical pipeline for 26 analyzing collections of shapes without requiring any correspondences. Our novel algorithm takes in two 27 classes of shapes and highlights the physical features that best describe the variation between them. We 28 use a rigorous simulation framework to assess our approach. Lastly, as a case study, we use SINATRA to 29 analyze mandibular molars from four different suborders of primates and demonstrate its ability recover 30 known morphometric variation across phylogenies. 31 Introduction 32 Sub-image analysis is an important open problem in both medical imaging studies and geometric mor-33 phometric applications. The problem asks which physical features of shapes are most important for 34 differentiating between two classes of 3D images or shapes such as computed tomography (CT) scans of 35 bones or magnetic resonance images (MRI) of different tissues. More generally, the sub-image analysis 36problem can be framed as a regression-based task: given a collection of shapes, find the properties that 37 explain the greatest variation in some response variable (continuous or binary). One example is identify-38 ing the structures of glioblastoma tumors that best indicate signs of potential relapse and other clinical 39 outcomes [1]. From a statistical perspective, the sub-image selection problem is directly related to the 40 variable selection problem -given high-dimensional covariates and a univariate outcome, we want to 41 infer which variables are most relevant in explaining or predicting variation in the observed response. 42 2Framing sub-image analysis as a regression presents several challenges. The first challenge centers 43 around representing a 3D object as a (square integrable) covariate or feature vector. The transformation 44 should lose a minimum amount of geometric information and apply to a wide range of shape and imaging 45 datasets. In this paper, we use a tool from integral geometry and differential topology called the Eu-46 ler characteristic (EC) transform [1][2][3][4], which maps shapes into vectors without requiring pre-specified 47 landmark points or pairwise correspondences. This property is central to our innovations.
48After finding a vector representation of the shape, our second challenge is quantifying which topological 49 features are most relevant in explaining variation in a continuous outcome or binary class label. We 50 address this classic take on variable selection by using a Bayesian regression model and an information 51 theoretic metric to measure the relevance of each topological feature. Our Bayesian method allows us 52 to perform variable selection for nonlinear func...