Confidence interval for micro-averaged F1 and macro-averaged F1 scores

Takahashi, Kanae; Yamamoto, Keiji; Kuchiba, Aya; Koyama, Tatsuki

doi:10.1007/s10489-021-02635-5

Cited by 101 publications

(47 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to the accuracy, we used the averaged F1 (AF1) score (short for macro-averaged F1 score), which treats all classes equally and can be used to evaluate the class imbalance problem (as shown in Equation ( 6 )). It can be defined by using Precision (Equation ( 3 )), Recall (Equation ( 4 )), and F1 score (Equation ( 5 )) [ 55 , 56 ].

…”

Section: Methodsmentioning

confidence: 99%

SenseHunger: Machine Learning Approach to Hunger Detection Using Wearable Sensors

Irshad

Nisar

Huang

et al. 2022

Sensors

View full text Add to dashboard Cite

The perception of hunger and satiety is of great importance to maintaining a healthy body weight and avoiding chronic diseases such as obesity, underweight, or deficiency syndromes due to malnutrition. There are a number of disease patterns, characterized by a chronic loss of this perception. To our best knowledge, hunger and satiety cannot be classified using non-invasive measurements. Aiming to develop an objective classification system, this paper presents a multimodal sensory system using associated signal processing and pattern recognition methods for hunger and satiety detection based on non-invasive monitoring. We used an Empatica E4 smartwatch, a RespiBan wearable device, and JINS MEME smart glasses to capture physiological signals from five healthy normal weight subjects inactively sitting on a chair in a state of hunger and satiety. After pre-processing the signals, we compared different feature extraction approaches, either based on manual feature engineering or deep feature learning. Comparative experiments were carried out to determine the most appropriate sensor channel, device, and classifier to reliably discriminate between hunger and satiety states. Our experiments showed that the most discriminative features come from three specific sensor modalities: Electrodermal Activity (EDA), infrared Thermopile (Tmp), and Blood Volume Pulse (BVP).

show abstract

…”

Section: Methodsmentioning

confidence: 99%

SenseHunger: Machine Learning Approach to Hunger Detection Using Wearable Sensors

Irshad

Nisar

Huang

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…The F 1 score is the harmonic mean of precision and recall with poorest performance at 0 and the highest score of 1 and is suited to situations where there is a high rate of true negatives, and which are not a relevant measure (i.e. non-variant positions) (31). Pairwise core SNP distance matrices were calculated using snp-dist (32).…”

Section: Methodsmentioning

confidence: 99%

Systematic benchmarking of ‘all-in-one’ microbial SNP calling pipelines

Falconer

Cuddihy

Beatson

et al. 2022

Preprint

View full text Add to dashboard Cite

Clinical and public health microbiology is increasingly utilising whole genome sequencing (WGS) technology and this has lead to the development of a myriad of analysis tools and bioinformatics pipelines. However, in order to ensure robust methodologies suitable for clinical application of this technology, accurate, reproducible, traceable and benchmarked analysis pipelines are necessary. To date, the approach to benchmarking of these has been largely ad-hoc with new pipelines benchmarked on their own datasets with limited comparisons to previously published pipelines. In this study, Snpdragon, a fast and accurate SNP calling pipeline is introduced. Written in Nextflow, Snpdragon is capable of handling small to very large and incrementally growing datasets. Snpdragon is benchmarked using previously published datasets against six other all-in-one microbial SNP calling pipelines, Lyveset, Lyveset2, Snippy, SPANDx, BactSNP and Nesoni. The effect of dataset choice on performance measures is demonstrated to highlight some of the issues associated with the current benchmarking approaches. The establishment of an agreed upon gold-standard benchmarking process for microbial variant analysis is becoming increasingly important to aid in its robust application, improve transparency of pipeline performance under different settings and direct future improvements and development of new pipelines. Snpdragon is available at https://github.com/FordeGenomics/SNPdragon.

show abstract

“…Because the numbers of HC and PD subjects and their voice records were matched in the UCI dysphonic voice data set, the

-score can be calculated as the harmonic mean of precision and recall based on the confusion matrix [ 41 ], derived as

…”

Section: Methodsmentioning

confidence: 99%

Classification of Dysphonic Voices in Parkinson’s Disease with Semi-Supervised Competitive Learning Algorithm

et al. 2022

View full text Add to dashboard Cite

This article proposes a novel semi-supervised competitive learning (SSCL) algorithm for vocal pattern classifications in Parkinson’s disease (PD). The acoustic parameters of voice records were grouped into the families of jitter, shimmer, harmonic-to-noise, frequency, and nonlinear measures, respectively. The linear correlations were computed within each acoustic parameter family. According to the correlation matrix results, the jitter, shimmer, and harmonic-to-noise parameters presented as highly correlated in terms of Pearson’s correlation coefficients. Then, the principal component analysis (PCA) technique was implemented to eliminate the redundant dimensions of the acoustic parameters for each family. The Mann–Whitney–Wilcoxon hypothesis test was used to evaluate the significant difference of the PCA-projected features between the healthy subjects and PD patients. Eight dominant PCA-projected features were selected based on the eigenvalue threshold criterion and the statistical significance level (p < 0.05) of the hypothesis test. The SSCL algorithm proposed in this paper included the procedures of the competitive prototype seed selection, K-means optimization, and the nearest neighbor classifications. The pattern classification experimental results showed that the proposed SSCL method can provide the excellent diagnostic performances in terms of accuracy (0.838), recall (0.825), specificity (0.85), precision (0.846), F-score (0.835), Matthews correlation coefficient (0.675), area under the receiver operating characteristic curve (0.939), and Kappa coefficient (0.675), which were consistently better than those results of conventional KNN or SVM classifiers.

show abstract

Confidence interval for micro-averaged F1 and macro-averaged F1 scores

Cited by 101 publications

References 19 publications

SenseHunger: Machine Learning Approach to Hunger Detection Using Wearable Sensors

SenseHunger: Machine Learning Approach to Hunger Detection Using Wearable Sensors

Systematic benchmarking of ‘all-in-one’ microbial SNP calling pipelines

Classification of Dysphonic Voices in Parkinson’s Disease with Semi-Supervised Competitive Learning Algorithm

Contact Info

Product

Resources

About