We consider cell line classification using multivariate time series data obtained from electric cell-substrate impedance sensing (ECIS) technology. The ECIS device, which monitors the attachment and spreading of mammalian cells in real time through the collection of electrical impedance data, has historically been used to study one cell line at a time. However, we show that if applied to data from multiple cell lines, ECIS can be used to classify unknown or potentially mislabeled cells, which may help to mitigate the current crisis of reproducibility in the biological literature. We assess a range of approaches to this new problem, testing different classification methods and deriving a dictionary of 29 features to characterize ECIS data. Our analysis also makes use of simultaneous multi-frequency ECIS data, where previous studies have focused on only one frequency. In classification tests on fifteen mammalian cell lines, we obtain very high out-of-sample accuracy. These preliminary findings provide a baseline for future large-scale studies in this field.
We explore the behavior of wind speed over time, using the Eastern Wind Dataset published by the National Renewable Energy Laboratory. This dataset gives wind speeds over three years at hundreds of potential wind farm sites. Wind speed analysis is necessary to the integration of wind energy into the power grid; short-term variability in wind speed affects decisions about usage of other power sources, so that the shape of the wind speed curve becomes as important as the overall level. To assess differences in intra-day time series, we propose a functional distance measure, the band distance, which extends the band depth of Lopez-Pintado and Romo (2009). This measure emphasizes the shape of time series or functional observations relative to other members of a dataset, and allows clustering of observations without reliance on pointwise Euclidean distance. To emphasize short-term variability, we examine the short-time Fourier transform of the nonstationary speed time series; we can also adjust for seasonal effects, and use these standardizations as input for the band distance. We show that these approaches to characterizing the data go beyond mean-dependent standard clustering methods, such as k-means, to provide more shape-influenced cluster representatives useful for power grid decisions.
Abstract-An analysis of the characteristics and behavior of individual bus stops can reveal clusters of similar stops, which can be of use in making routing and scheduling decisions, as well as determining what facilities to provide at each stop. This paper provides an exploratory analysis, including several possible clustering results, of a dataset provided by the Regional Transit Service of Rochester, NY. The dataset describes ridership on public buses, recording the time, location, and number of entering and exiting passengers each time a bus stops. A description of the overall behavior of bus ridership is followed by a stop-level analysis. We compare multiple measures of stop similarity, based on location, route information, and ridership volume over time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.