The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.
Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by methodspecific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters.
Passive infrared sensors have widespread use in many applications, including motion detectors for alarms, lighting systems and hand dryers. Combinations of multiple PIR sensors have also been used to count the number of humans passing through doorways. In this paper, we demonstrate the potential of the PIR sensor as a tool for occupancy estimation inside of a monitored environment. Our approach shows how flexible nonparametric machine learning algorithms extract useful information about the occupancy from a single PIR sensor. The approach allows us to understand and make use of the motion patterns generated by people within the monitored environment. The proposed counting system uses information about those patterns to provide an accurate estimate of room occupancy which can be updated every 30 seconds. The system was successfully tested on data from more than 50 real office meetings consisting of at most 14 room occupants.
Background Wearable sensors have been used successfully to characterize bradykinetic gait in patients with Parkinson disease (PD), but most studies to date have been conducted in highly controlled laboratory environments. Objective This paper aims to assess whether sensor-based analysis of real-life gait can be used to objectively and remotely monitor motor fluctuations in PD. Methods The Parkinson@Home validation study provides a new reference data set for the development of digital biomarkers to monitor persons with PD in daily life. Specifically, a group of 25 patients with PD with motor fluctuations and 25 age-matched controls performed unscripted daily activities in and around their homes for at least one hour while being recorded on video. Patients with PD did this twice: once after overnight withdrawal of dopaminergic medication and again 1 hour after medication intake. Participants wore sensors on both wrists and ankles, on the lower back, and in the front pants pocket, capturing movement and contextual data. Gait segments of 25 seconds were extracted from accelerometer signals based on manual video annotations. The power spectral density of each segment and device was estimated using Welch’s method, from which the total power in the 0.5- to 10-Hz band, width of the dominant frequency, and cadence were derived. The ability to discriminate between before and after medication intake and between patients with PD and controls was evaluated using leave-one-subject-out nested cross-validation. Results From 18 patients with PD (11 men; median age 65 years) and 24 controls (13 men; median age 68 years), ≥10 gait segments were available. Using logistic LASSO (least absolute shrinkage and selection operator) regression, we classified whether the unscripted gait segments occurred before or after medication intake, with mean area under the receiver operator curves (AUCs) varying between 0.70 (ankle of least affected side, 95% CI 0.60-0.81) and 0.82 (ankle of most affected side, 95% CI 0.72-0.92) across sensor locations. Combining all sensor locations did not significantly improve classification (AUC 0.84, 95% CI 0.75-0.93). Of all signal properties, the total power in the 0.5- to 10-Hz band was most responsive to dopaminergic medication. Discriminating between patients with PD and controls was generally more difficult (AUC of all sensor locations combined: 0.76, 95% CI 0.62-0.90). The video recordings revealed that the positioning of the hands during real-life gait had a substantial impact on the power spectral density of both the wrist and pants pocket sensor. Conclusions We present a new video-referenced data set that includes unscripted activities in and around the participants’ homes. Using this data set, we show the feasibility of using sensor-based analysis of real-life gait to monitor motor fluctuations with a single sensor location. Future work may assess the value of contextual sensors to control for real-world confounders.
Abstract-This paper presents the implementation and deployment of a compute/memory intensive non-parametric Bayesian machine learning algorithm on a microcontroller unit (MCU) to estimate room occupancy in a Smart Room using a single analogue PIR sensor. We envisage an IoT device consisting of a resource-constrained MCU, PIR sensor and a battery running the occupancy estimation algorithm and operating over days or months without recharging or replacing the battery. Both hardware-independent and hardware-dependent optimizations are performed to reduce memory footprint and yet provide acceptable real-time performance while consuming less energy. We show a significant reduction in the on-chip memory usage in the MCUs by the algorithm through optimisation of the machine learning models and of the static memory footprint and dynamic memory usage. We also show that a low-end MCU does not meet the real-time requirements of the application without causing high average power consumption. However, a moderately highperformance MCU with a higher clock frequency and hardware floating-point unit provides 19x improvement in the execution time of the algorithm, better meeting the real-time specification of the application and reducing power consumption. Further, we estimate the battery lifetime of the IoT device if it operates continuously in a Smart Room. With a typical size battery, an IoT device consisting of a Cortex-M4F MCU and PIR sensor can operate for more than a month without replacement or recharging of the battery while running the compute-intensive Bayesian machine learning algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.