In pursuit of the perfect supervised NLP classifier, razor thin margins and low-resource testsets can make modeling decisions difficult. Popular metrics such as Accuracy, Precision, and Recall are often insufficient as they fail to give a complete picture of the model's behavior. We present a probabilistic extension of Precision, Recall, and F1 score, which we refer to as confidence-Precision (cPrecision), confidence-Recall (cRecall), and confidence-F1 (cF1) respectively. The proposed metrics address some of the challenges faced when evaluating large-scale NLP systems, specifically when the model's confidence score assignments have an impact on the system's behavior. We describe four key benefits of our proposed metrics as compared to their threshold-based counterparts. Two of these benefits, which we refer to as robustness to missing values and sensitivity to model confidence score assignments are self-evident from the metrics' definitions; the remaining benefits, generalization, and functional consistency are demonstrated empirically.
Abstract-In stress sensing, Window-derived Heart Rate Variability (W-HRV) methods are by far the most heavily used feature extraction methods. However, these W-HRV methods come with a variety of tradeoffs that motivate the development of alternative methods in stress sensing. We compare our method of using HeartBeat Morphology (HBM) features for stress sensing to the traditional W-HRV method for feature extraction. In order to adequately evaluate these methods we conduct a Trier Social Stress Test (TSST) to elicit stress in a group of 13 firefighters while recording their ECG, actigraphy, and psychological self-assessment measures. We utilize the data from this experiment to analyze both feature extraction methods in terms of computational complexity, detection resolution performance, and event localization performance. Our results show that each method has an ideal niche for its use in stress sensing. HBM features tend to be more effective in an online, stress detection context. W-HRV shows to be more suitable for offline post processing to determine the exact localization of the stress event.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.