The characterization of natural spaces by the precise observation of their material properties is highly demanded in remote sensing and computer vision. The production of novel sensors enables the collection of heterogeneous data to get a comprehensive knowledge of the living and non-living entities in the ecosystem. The high resolution of consumer-grade RGB cameras is frequently used for the geometric reconstruction of many types of environments. Nevertheless, the understanding of natural spaces is still challenging. The automatic segmentation of homogeneous materials in nature is a complex task because there are many overlapping structures and an indirect illumination, so the object recognition is difficult. In this paper, we propose a method based on fusing spatial and multispectral characteristics for the unsupervised classification of natural materials in a point cloud. A high-resolution camera and a multispectral sensor are mounted on a custom camera rig in order to simultaneously capture RGB and multispectral images. Our method is tested in a controlled scenario, where different natural objects coexist. Initially, the input RGB images are processed to generate a point cloud by applying the structure-from-motion (SfM) algorithm. Then, the multispectral images are mapped on the three-dimensional model to characterize the geometry with the reflectance captured from four narrow bands (green, red, red-edge and near-infrared). The reflectance, the visible colour and the spatial component are combined to extract key differences among all existing materials. For this purpose, a hierarchical cluster analysis is applied to pool the point cloud and identify the feature pattern for every material. As a result, the tree trunk, the leaves, different species of low plants, the ground and rocks can be clearly recognized in the scene. These results demonstrate the feasibility to perform a semantic segmentation by considering multispectral and spatial features with an unknown number of clusters to be detected on the point cloud. Moreover, our solution is compared to other method based on supervised learning in order to test the improvement of the proposed approach.