Valid data are required to make climate assessments and to make climate-related decisions. The objective of this paper is threefold: to introduce an explicit treatment of Type I and Type II errors in evaluating the performance of quality assurance procedures, to illustrate a quality control approach that allows tailoring to regions and subregions, and to introduce a new spatial regression test. Threshold testing, step change, persistence, and spatial regression were included in a test of three decades of temperature and precipitation data at six weather stations representing different climate regimes. The magnitude of thresholds was addressed in terms of the climatic variability, and multiple thresholds were tested to determine the number of Type I errors generated. In a separate test, random errors were seeded into the data and the performance of the tests was such that most Type II errors were made in the range of Ϯ1ЊC for temperature, not too different from the sensor field accuracy. The study underscores the fact that precipitation is more difficult to quality control than temperature. The new spatial regression test presented in this document outperformed all the other tests, which together identified only a few errors beyond those identified by the spatial regression test.
The multiple-instance learning (MIL) model has been very successful in application areas such as drug discovery and content-based imageretrieval. Recently, a generalization of this model and an algorithm for this generalization were introduced, showing significant advantages over the conventional MIL model in certain application areas. Unfortunately, this algorithm is inherently inefficient, preventing scaling to high dimensions. We reformulate this algorithm using a kernel for a support vector machine, reducing its time complexity from exponential to polynomial. Computing the kernel is equivalent to counting the number of axis-parallel boxes in a discrete, bounded space that contain at least one point from each of two multisets P and Q. We show that this problem is #P-complete, but then give a fully polynomial randomized approximation scheme (FPRAS) for it. Finally, we empirically evaluate our kernel.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.