To design an efficient survey or monitoring program for a natural resource it is important to consider the spatial distribution of the resource. Generally, sample designs that are spatially balanced are more efficient than designs which are not. A spatially balanced design selects a sample that is evenly distributed over the extent of the resource. In this article we present a new spatially balanced design that can be used to select a sample from discrete and continuous populations in multi-dimensional space. The design, which we call balanced acceptance sampling, utilizes the Halton sequence to assure spatial diversity of selected locations. Targeted inclusion probabilities are achieved by acceptance sampling. The BAS design is conceptually simpler than competing spatially balanced designs, executes faster, and achieves better spatial balance as measured by a number of quantities. The algorithm has been programed in an R package freely available for download.
Decision trees are a popular technique in statistical data classification. They recursively partition the feature space into disjoint sub-regions until each sub-region becomes homogeneous with respect to a particular class. The basic Classification and Regression Tree (CART) algorithm partitions the feature space using axis parallel splits. When the true decision boundaries are not aligned with the feature axes, this approach can produce a complicated boundary structure. Oblique decision trees use oblique decision boundaries to potentially simplify the boundary structure. The major limitation of this approach is that the tree induction algorithm is computationally expensive. In this article we present a new decision tree algorithm, called HHCART. The method utilizes a series of Householder matrices to reflect the training data at each node during the tree construction. Each reflection is based on the directions of the eigenvectors from each classes' covariance matrix. Considering axis parallel splits in the reflected training data provides an efficient way of finding oblique splits in the unreflected training data. Experimental results show that the accuracy and size of the HHCART trees are comparable with some benchmark methods in the literature. The appealing feature of HHCART is that it can handle both qualitative and quantitative features in the same oblique split.
Some environmental studies use non-probabilistic sampling designs to draw samples from spatially distributed populations. Unfortunately, these samples can be difficult to analyse statistically and can give biased estimates of population characteristics. Spatially balanced sampling designs are probabilistic designs that spread the sampling effort evenly over the resource. These designs are particularly useful for environmental sampling because they produce good-sample coverage over the resource, they have precise design-based estimators and they can potentially reduce the sampling cost. The most popular spatially balanced design is Generalized Random Tessellation Stratified (GRTS), which has many desirable features including a spatially balanced sample, design-based estimators and the ability to select spatially balanced oversamples. This article considers the popularity of spatially balanced sampling, reviews several spatially balanced sampling designs and shows how these designs can be implemented in the statistical programming language R. We hope to increase the visibility of spatially balanced sampling and encourage environmental scientists to use these designs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.