A large number of extreme floods were closely related to heavy precipitation which lasted for several days or weeks. Long-lead prediction of extreme precipitation, i.e., prediction of 6-15 days ahead of time, is important for understanding the prognostic forecasting potential of many natural disasters, such as floods. Yet, long-lead flood forecasting is a challenging task due to the cascaded uncertainty with prediction errors from measurements to modeling, which makes the current physics-based numerical simulation models extremely complex and inaccurate. In this paper, we formulate the modeling work as a machine learning problem and introduce a complementary data mining framework for heavy precipitation prediction. Heavy precipitation that may lead to extreme floods is a rare event. Long-lead prediction requires the corresponding feature space to be sampled from extremely high spatio-temporal dimensions. Such a complexity makes long-lead heavy precipitation prediction a high dimensional and imbalanced machine learning problem. In this work, we firstly define the extreme precipitation and non-extreme precipitation clusters and then design the Nearest-Sample Choosing method to handle the imbalanced data sets. We introduce streaming feature selection and subspace learning to extract the most relevant features from high dimensional data. We evaluate the machine learning tools using historical flood data collected in the State of Iowa, the United States and associated hydrometeorological variables from 1948 to 2010.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.