Biomedical research has come to rely on p-values as a deterministic measure for data-driven decision-making. In the largely extended null hypothesis significance testing for identifying statistically significant differences among groups of observations, a single p-value is computed from sample data. Then, it is routinely compared with a threshold, commonly set to 0.05, to assess the evidence against the hypothesis of having non-significant differences among groups, or the null hypothesis. Because the estimated p-value tends to decrease when the sample size is increased, applying this methodology to datasets with large sample sizes results in the rejection of the null hypothesis, making it not meaningful in this specific situation. We propose a new approach to detect differences based on the dependence of the p-value on the sample size. We introduce new descriptive parameters that overcome the effect of the size in the p-value interpretation in the framework of datasets with large sample sizes, reducing the uncertainty in the decision about the existence of biological differences between the compared experiments. The methodology enables the graphical and quantitative characterization of the differences between the compared experiments guiding the researchers in the decision process. An in-depth study of the methodology is carried out on simulated and experimental data. Code availability at https://github.com/BIIG-UC3M/pMoSS.
In this paper we address the problem of visualizing in a bounded region a set of individuals, which has attached a dissimilarity measure and a statistical value, as convex objects. This problem, which extends the standard Multidimensional Scaling Analysis, is written as a global optimization problem whose objective is the difference of two convex functions (DC). Suitable DC decompositions allow us to use the Difference of Convex Algorithm (DCA) in a very efficient way. Our algorithmic approach is used to visualize two real-world datasets.
In this paper we address the problem of visualizing a set of individuals, which have attached a statistical value given as a proportion, and a dissimilarity measure. Each individual is represented as a region within the unit square, in such a way that the area of the regions represent the proportions and the distances between them represent the dissimilarities. To enhance the interpretability of the representation, the regions are required to satisfy two properties. First, they must form a partition of the unit square, namely, the portions in which it is divided must cover its area without overlapping. Second, the portions must be made of a connected union of rectangles which verify the so-called box-connectivity constraints, yielding a visualization map called Space-filling Box-connected Map (SBM). The construction of an SBM is formally stated as a mathematical optimization problem, which is solved heuristically by using the Large Neighborhood Search technique. The methodology proposed in this paper is applied to three real-world datasets: the first one concerning financial markets in Europe and Asia, the second one about the letters in the English alphabet, and finally the provinces of The Netherlands as a geographical application.
* This version of the article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the publisher's final version AKA Version of Record.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.