Functional data are occurring more and more often in practice, and various statistical techniques have been developed to analyze them. In this paper we consider multivariate functional data, where for each curve and each time point a p-dimensional vector of measurements is observed. For functional data the study of outlier detection has started only recently, and was mostly limited to univariate curves (p = 1). In this paper we set up a taxonomy of functional outliers, and construct new numerical and graphical techniques for the detection of outliers in multivariate functional data, with univariate curves included as a special case. Our tools include statistical depth functions and distance measures derived from them. The methods we study are affine invariant in p-dimensional space, and do not assume elliptical or any other symmetry.
We construct classifiers for multivariate and functional data. Our approach is based on a kind of distance between data points and classes. The distance measure needs to be robust to outliers and invariant to linear transformations of the data. For this purpose we can use the bagdistance which is based on halfspace depth. It satisfies most of the properties of a norm but is able to reflect asymmetry when the class is skewed. Alternatively we can compute a measure of outlyingness based on the skew-adjusted projection depth. In either case we propose the DistSpace transform which maps each data point to the vector of its distances to all classes, followed by k-nearest neighbor (kNN) classification of the transformed data points. This combines invariance and robustness with the simplicity and wide applicability of kNN. The proposal is compared with other methods in experiments with real and simulated data. arXiv:1504.01128v3 [stat.ME] 7 Jul 2016 z := λx + (1 − λ)y. We can verify that z := (λg(x) + (1 − λ)g(y)) −1 z is a convex combination of c x and c y . By compactness of B we know that c x , c y ∈ B, and from convexity of B it then follows that z ∈ B. Therefore c z = c z z so that finally g(z) = z c z z z = λg(x) + (1 − λ)g(y) .
Correct classification of breast cancer subtypes is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma transcriptomic data publicly available from The Cancer Genome Atlas data portal. Our analysis identifies statistical outliers that may correspond to misdiagnosed patients. Furthermore, it is illustrated that classical statistical methods may fail to identify outliers due to their heavy influence, prompting the need for robust statistics. Using robust sparse logistic regression we obtain 36 relevant genes, of which ca. 60% have been previously reported as biologically relevant to triple-negative breast cancer, reinforcing the validity of the method. The remaining 14 genes identified are new potential biomarkers for triple-negative breast cancer. Out of these, JAM3, SFT2D2, and PAPSS1 were previously associated to breast tumors or other types of cancer. The relevance of these genes is confirmed by the new DetectDeviatingCells outlier detection technique. A comparison of gene networks on the selected genes showed significant differences between triple-negative breast cancer and non-triple-negative breast cancer data. The individual role of FOXA1 in triple-negative breast cancer and non-triple-negative breast cancer, and the strong FOXA1-AGR2 connection in triple-negative breast cancer stand out. The goal of our paper is to contribute to the breast cancer/triple-negative breast cancer understanding and management. At the same time it demonstrates that robust regression and outlier detection constitute key strategies to cope with high-dimensional clinical data such as omics data.
Healthy ageing is associated with decline in cognitive abilities such as language. Aerobic fitness has been shown to ameliorate decline in some cognitive domains, but the potential benefits for language have not been examined. In a cross-sectional sample, we investigated the relationship between aerobic fitness and tip-of-the-tongue states. These are among the most frequent cognitive failures in healthy older adults and occur when a speaker knows a word but is unable to produce it. We found that healthy older adults indeed experience more tip-of-the-tongue states than young adults. Importantly, higher aerobic fitness levels decrease the probability of experiencing tip-of-the-tongue states in healthy older adults. Fitness-related differences in word finding abilities are observed over and above effects of age. This is the first demonstration of a link between aerobic fitness and language functioning in healthy older adults.
Insurers are faced with the challenge of estimating the future reserves needed to handle historic and outstanding claims that are not fully settled. A well-known and widely used technique is the chain-ladder method, which is a deterministic algorithm. To include a stochastic component one may apply generalized linear models to the run-off triangles based on past claims data. Analytical expressions for the standard deviation of the resulting reserve estimates are typically difficult to derive. A popular alternative approach to obtain inference is to use the bootstrap technique. However, the standard procedures are very sensitive to the possible presence of outliers. These atypical observations, deviating from the pattern of the majority of the data, may both inflate or deflate traditional reserve estimates and corresponding inference such as their standard errors. Even when paired with a robust chain-ladder method, classical bootstrap inference may break down. Therefore, we discuss and implement several robust bootstrap procedures in the claims reserving framework and we investigate and compare their performance on both simulated and real data. We also illustrate their use for obtaining the distribution of one year risk measures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.