The computation of 3-D structure from motion using a monocular sequence of images in the paradigm of active vision is presented in this paper. Robotic tasks such as navigation, manipulation, and object recognition all require 3-D description of scene. The 3-D description for these tasks varies in resolution, accuracy, robustness, range, and time. For a robotic system capable of performing a wide range of applications, it must have the ability to actively control the imaging parameters so that a 3-D description sufficient enough for that task is generated. In the approach presented here, the 3-D structure is determined in two steps. In the first step, based on the analysis of the spatial and the temporal gradients of an image stream, a characterization of 3-D information in terms of camera displacements which result in a fixed disparity, is obtained. In the second step, extrapolated disparity values between the first and last frame of the image stream, are refined using normalized cross-correlation. The length of the image stream, interframe camera displacement, and the disparity value are actively controlled to obtain the 3-D structure of desired quality. This approach has been implemented on a pipeline based computing environment to provide a real-time performance. Extensive experiments have been conducted to verify the performance and capabilities of this approach.