Current work proposes a novel approach for joint layout, object pose, and mesh reconstruction from scanned point clouds that leverages the latest neural network architectures and addresses the challenge of missing point data. The goal of this research is to develop an advanced approach that can generate accurate and complete 3D models from point cloud data obtained from real-world environments. The proposed method leverages state-of-the-art neural network architectures, including PointNet++ and Transformer, to accomplish the task. PointNet++ is used for feature extraction, while Transformer is used for joint feature representation learning. By combining these two networks, the proposed method can effectively encode the complex geometries of the 3D scenes and generate high-quality reconstructions. The core concept of the method is to initially segment the point cloud into small fragments using a neural network. Each fragment is then reconstructed as a polygonal mesh. Addressing the restoration of missing points in point cloud data is a significant problem that this paper tackles. It adapts a robust method, which utilizes the L1-Median algorithm and local point cloud features, to effectively fill in these missing points. This approach is capable of adjusting to various geometric structures and rectifying topological connection errors, enabling the reconstruction of complete and accurate models even when some original data is missing. The proposed method is compared against several state-of-the-art approaches and has the potential to be a valuable tool in a range of applications, including architecture, engineering, digitization of cultural heritages, as well as augmented and mixed reality systems.