Privacy protection has attracted increasing attention, and privacy concerns often prevent flexible data utilization. In most industries, data are distributed across multiple organizations due to privacy concerns. Federated learning (FL), which enables cross-organizational machine learning by communicating statistical information, is a state-of-the-art technology that is used to solve this problem. However, for gradient boosting decision tree (GBDT) in FL, balancing communication efficiency and security while maintaining sufficient accuracy remains an unresolved problem. In this paper, we propose an FL scheme for GBDT, i.e., efficient FL for GBDT (eFL-Boost), which minimizes accuracy loss, communication costs, and information leakage. The proposed scheme focuses on appropriate allocation of local computation (performed individually by each organization) and global computation (performed cooperatively by all organizations) when updating a model. It is known that tree structures incur high communication costs for global computation, whereas leaf weights do not require such costs and are expected to contribute relatively more to accuracy. Thus, in the proposed eFL-Boost, a tree structure is determined locally at one of the organizations, and leaf weights are calculated globally by aggregating the local gradients of all organizations. Specifically, eFL-Boost requires only three communications per update, and only statistical information that has low privacy risk is leaked to other organizations. Through performance evaluation on public data sets (ROC AUC, Log loss, and F1-score are used as metrics), the proposed eFL-Boost outperforms existing schemes that incur low communication costs and was comparable to a scheme that offers no privacy protection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.