The aim in this paper is to track articulated hand motion from monocular video. Bayesian filtering is implemented by using a tree-based representation of the posterior distribution. Each tree node corresponds to a partition of the state space with piecewise constant density. In a hierarchical search regions with low probability mass can be rapidly discarded, while the modes of the posterior can be approximated to high precision. Large sets of training data are captured using a data glove, and two techniques for constructing the tree are described: One method is to cluster the collected data points using a hierarchical clustering algorithm, and use the cluster centres as nodes. Alternatively, a lower dimensional eigenspace can be partitioned using a grid at multiple resolutions, and each partition centre corresponds to a node in the tree. The effectiveness of these techniques is demonstrated by using them for tracking 3D articulated hand motion in front of a cluttered background.