Urban scene point cloud pose significant challenges for registration due to its large data volume, similar scenarios and dynamic objects. In this paper, we propose PCRMLP, a model for urban scene point cloud registration that achieves comparable registration performance to prior learning-based methods. Compared to previous works which focus on extracting features and estimating correspondence, the model estimates the transformation implicitly from concrete instances. An instance-level urban scene representation method is introduced to extract instance descriptors via semantic segmentation and DBSCAN, which enable the model to obtain robust instance features, filter dynamic objects and estimate transformation in a more logical manner. Then a lightweight network consisting of MLPs is employed to obtain transformation in an encoder-decoder manner. We validate the approach on KITTI dataset. Experimental results demonstrate that PCRMLP can obtain a satisfactory coarse transformation from instance descriptors just in 0.0028s. With a subsequent ICP refinement module, the proposed method achieves higher registration accuracy and computational efficiency than prior learning-based works.