Emergency departments (ED) in France are jeopardized each winter by the respiratory viruses. To limit the impact of those viruses, it is essential to have a better understanding of their impact on the patient flow. To tackle this, we propose in this work to use in conjunction ICD-10 code and laboratory-confirmed data with the aim of extracting a relevant patient flow. We first take benefice of the almost periodicity of both clinical diagnosis and laboratory-confirmed data and we embed next the underlying time series on the Stiefel manifold. The distance in the Stiefel manifold is finally used to extract clinical codes which are the nearest to the laboratory-confirmed time series. The results reveal that some of the respiratory and cardiac disorders codes have the same behaviours than that of the winter circulating viruses. At least, the Flag mean is employed to dispose of a picture of both the patient flow and the the length of stay for patients who might be infected by winter viruses.
Clustering is an unsupervised machine learning method giving insights on data without early knowledge. Classes of data are return by assembling similar elements together. Giving the increasing of the available data, this method is now applied in a lot of fields with various data types. Here, we propose to explore the case of time series clustering. Indeed, time series are one of the most classic data type, and are present in various fields such as medical or finance. This kind of data can be pre-processed by of dimension reduction methods, such as the recent UMAP algorithm. In this paper, a benchmark of time series clustering is created, comparing the results with and without UMAP as a pre-processing step. UMAP is used to enhance clustering results. For completeness, three different clustering algorithms and two different geometric representation for the time series (Classic Euclidean geometry, and Riemannian geometry on the Stiefel Manifold) are applied. The results are compared with and without UMAP as a pre-processing step on the databases available at UCR Time Series Classification Archive www.cs.ucr.edu/ ∼ eamonn/time series data/.
Most existing methods for time series clustering rely on distances calculated from the entire raw data using the Euclidean distance or Dynamic Time Warping distance. In this work, we propose to embed the time series onto higher-dimensional spaces to obtain geometric representations of the time series themselves. Particularly, the embedding on R n×p , on the Stiefel manifold, and on the unit sphere are analyzed for their performances with respect to several yet well-known clustering algorithms. The gain brought by the geometrical representation for the time series clustering is illustrated through a large benchmark of databases. We particularly exhibit that, firstly, the embedding of the time series on higher dimensional spaces gives better results than classical approaches and, secondly, that the embedding on the Stiefel manifold, in conjunction with UMAP and HDBSCAN clustering algorithms -is the recommended framework for time series clustering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.