A note on Type S/M errors in hypothesis testing

Lu, Jiannan; Qiu, Yixuan; Deng, Alex

doi:10.1111/bmsp.12132

Cited by 26 publications

(23 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To set up this threshold, it is important to evaluate the relative penalties of false positive and false negative errors. In most clinically relevant applications, this relative balance factor (B) varies between 0.25 and 4 [41][42][43][44][45]. For higher B values, the test sensitivity (SN) is low, and lower B means lower specificity (SP).…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

Tkachev

Sorokin

Borisov

et al. 2020

IJMS

View full text Add to dashboard Cite

(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41–235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61–0.88 range to 0.70–0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Several practitioners of clinical diagnostic tests have different opinions on how high/low should be this balance factor. In different applications, the preferred values can be B = 4 [41,42,45], B < 0.16 [70], 4.5 < B < 5 [44], B < 5 [43], B > 10 for emergency medicine only [71], B > 5 for toxicology [72].…”

Section: False Positive Vs False Negative Error Balancementioning

confidence: 99%

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

Tkachev

Sorokin

Borisov

et al. 2020

IJMS

View full text Add to dashboard Cite

show abstract

“…by Gelman and Carlin (2014) 9,10 and discussed further and more recently in greater mathematical detail in Lu, Qiu, and Deng (2019). Gelman and Carlin (2014) advocate using power calculations-reemphasized and named "design calculations" to focus on errors in magnitude and sign instead of declarations of statistical significance-after the data have been collected to help inform a statistical data summary.…”

Section: Discussionmentioning

confidence: 99%

emagnification: A tool for estimating effect-size magnification and performing design calculations in epidemiological studies

2020

View full text Add to dashboard Cite

Artificial effect-size magnification (ESM) may occur in underpowered studies, where effects are reported only because they or their associated p-values have passed some threshold. Ioannidis (2008, Epidemiology 19: 640–648) and Gelman and Carlin (2014, Perspectives on Psychological Science 9: 641–651) have suggested that the plausibility of findings for a specific study can be evaluated by computation of ESM, which requires statistical simulation. In this article, we present a new command called emagnification that allows straightforward implementation of such simulations in Stata. The commands automate these simulations for epidemiological studies and enable the user to assess ESM routinely for published studies using user-selected, study-specific inputs that are commonly reported in published literature. The intention of the command is to allow a wider community to use ESMs as a tool for evaluating the reliability of reported effect sizes and to put an observed statistically significant effect size into a fuller context with respect to potential implications for study conclusions.

show abstract

“…Such studies tends to be retrospective ( Thomas, 1997 ) and, unfortunately, they are often uninformative ( Gillett, 1996 ; Thomas, 1997 ; Hoenig & Heisey, 2001 ; Lenth, 2007 ) or even misleading ( Gelman & Carlin, 2014 ; Vasishth & Gelman, 2017 ). The latter results from the use of statistical significant effect sizes reported in the literature: statistical significance preferentially selects badly estimated effect sizes ( Lane & Dunlap, 1978 ), that can be exaggerated or even of the wrong sign ( Gelman & Tuerlinckx, 2000 ; Lu, Qiu & Deng, 2018 ). Thus, the problem with statistical power may not be solely caused by measurement difficulties, but also by structural ones with Null Hypothesis Significance Testing ( Lash, 2017 ).…”

Section: Introductionmentioning

confidence: 99%

Of power and despair in cetacean conservation: estimation and detection of trend in abundance with noisy and short time-series

Authier

Galatius

Gilles

et al. 2020

PeerJ

View full text Add to dashboard Cite

Many conservation instruments rely on detecting and estimating a population decline in a target species to take action. Trend estimation is difficult because of small sample size and relatively large uncertainty in abundance/density estimates of many wild populations of animals. Focusing on cetaceans, we performed a prospective analysis to estimate power, type-I, sign (type-S) and magnitude (type-M) error rates of detecting a decline in short time-series of abundance estimates with different signal-to-noise ratio. We contrasted results from both unregularized (classical) and regularized approaches. The latter allows to incorporate prior information when estimating a trend. Power to detect a statistically significant estimates was in general lower than 80%, except for large declines. The unregularized approach (status quo) had inflated type-I error rates and gave biased (either over- or under-) estimates of a trend. The regularized approach with a weakly-informative prior offered the best trade-off in terms of bias, statistical power, type-I, type-S and type-M error rates and confidence interval coverage. To facilitate timely conservation decisions, we recommend to use the regularized approach with a weakly-informative prior in the detection and estimation of trend with short and noisy time-series of abundance estimates.

show abstract

A note on Type S/M errors in hypothesis testing

Cited by 26 publications

References 48 publications

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology

emagnification: A tool for estimating effect-size magnification and performing design calculations in epidemiological studies

Of power and despair in cetacean conservation: estimation and detection of trend in abundance with noisy and short time-series

Contact Info

Product

Resources

About