The mosquito-borne dengue fever is a major public health problem in tropical countries, where it is strongly conditioned by climate factors such as temperature. In this paper, we formulate a holistic machine learning strategy to analyze the temporal dynamics of temperature and dengue data and use this knowledge to produce accurate predictions of dengue, based on temperature on an annual scale. The temporal dynamics are extracted from historical data by utilizing a novel multi-stage combination of auto-encoding, window-based data representation and trend-based temporal clustering. The prediction is performed with a trend association-based nearest neighbour predictor. The effectiveness of the proposed strategy is evaluated in a case study that comprises the number of dengue and dengue hemorrhagic fever cases collected over the period 1985-2010 in 32 federal states of Mexico. The empirical study proves the viability of the proposed strategy and confirms that it outperforms various state-of-the-art competitor methods formulated both in regression and in time series forecasting analysis.INDEX TERMS Clustering, machine learning, time-series analysis, predictive analysis.
Summary
We present a novel statistical analysis of legislative rhetoric in the US Senate that sheds a light on hidden patterns in the behaviour of Senators as a function of their time in office. Using natural language processing, we create a novel comprehensive data set based on the speeches of all Senators who served on the US Senate Committee on Energy and Natural Resources in 2001–2011. We develop a new measure of congressional speech, based on Senators’ attitudes towards the dominant energy interests. To evaluate intrinsically dynamic formation of groups among Senators, we adopt a model‐free unsupervised space–time data mining algorithm that has been proposed in the context of tracking dynamic clusters in environmental georeferenced data streams. Our approach based on a two‐stage hybrid supervised–unsupervised learning methodology is innovative and data driven and transcends conventional disciplinary borders. We discover that legislators become much more alike after the first few years of their term, regardless of their partisanship and campaign promises.
The Chesapeake Bay Program, initiated in 1983, is a regional partnership between several state governments, federal agencies, and advisory groups that is involved in the cleanup and restoration of the Bay. To study the ecological trends in the area, we propose a new data‐driven procedure for optimal selection of tuning parameters in dynamic clustering algorithms, using the notion of a stability probe. We refer to the new procedure as Downhill Riding (DR) because of the dynamics of the clustering stability probe. We study the finite sample performance of DR when clustering benchmark Iris data and synthetic times series, and illustrate the methods using data on water quality in the Chesapeake Bay.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.