Federated learning (FL) has emerged as an elegant privacy-preserving distributed machine learning (ML) paradigm. Particularly, vertical FL (VFL) has a promising application prospect for collaborating organizations owning data of the same set of users but with disjoint features to jointly train models without leaking their private data to each other. As the volume of training data and the model size increase rapidly, each organization may deploy a cluster of many servers to participant in the federation. As such, the intra-party communication cost (i.e., network transfers within each organization's cluster) can signi cantly impact the entire VFL job's performance. Despite this, existing FL frameworks use the ine cient gRPC for intra-party communication, leading to high latency and high CPU cost. In this paper, we propose a design to transmit data with RDMA for intra-party communication, with no modi cations to applications. To improve the network e ciency, we further propose an RDMA usage arbiter to adjust the RDMA bandwidth used for a non-straggler party dynamically, and a query data size optimizer to automatically nd out the optimal query data size that each response carries. Our preliminary results show that RDMA based intra-party communication is 10x faster than gRPC based one, leading to a reduction of 9% on the completion time of a VFL job. Moreover, the RDMA usage arbiter can save over 90% bandwidth, and the query data size optimizer can improve the transmission speed by 18%. CCS CONCEPTS • Computing methodologies → Distributed arti cial intelligence; • Networks → Data center networks.