Jointly performing semantic and instance segmentation of 3D point cloud remains a challenging task. In this work, a novel framework called joint 3D semantic-instance segmentation via multi-scale Semantic Association and Salient point clustering Optimization was proposed to tackle this problem. Inspired by the inherent correlation among objects in semantic space, a Multi-scale Semantic Association (MSA) module to explore the constructive effect of the context information for semantic segmentation is designed. For instance, segmentation, different from previous works utilising clustering only in inference procedure, a Salient Point Clustering Optimization (SPCO) module is put forward to introduce the clustering algorithm into the training phase, which impels the network to focus on points that are difficult to be distinguished. Furthermore, affected by the inherent structure of indoor scenes, the problem of uneven distribution of categories has rarely been considered in the previous work, but it significantly limits the performance of 3D scene perception. To address the issue, an adaptive Water Filling Sampling (WFS) algorithm to balance the category distribution of training data is presented. Extensive experiments on a variety of changing datasets show that the authors' method outperforms the state-of-theart methods in both tasks of semantic segmentation and instance segmentation.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.