Federated Learning (FL) is a distributed learning paradigm that can learn a global or personalized model from decentralized datasets on edge devices. However, in the computer vision domain, model performance in FL is far behind centralized training due to the lack of exploration in diverse tasks with a unified FL framework. FL has rarely been demonstrated effectively in advanced computer vision tasks such as object detection and image segmentation. To bridge the gap and facilitate the development of FL for computer vision tasks, in this work, we propose a federated learning library and benchmarking framework, named FedCV, to evaluate FL on the three most representative computer vision tasks: image classification, image segmentation, and object detection. We provide non-I.I.D. benchmarking datasets, models, and various reference FL algorithms. Our benchmark study suggests that there are multiple challenges that deserve future exploration: centralized training tricks may not be directly applied to FL; the non-I.I.D. dataset actually downgrades the model accuracy to some degree in different tasks; improving the system efficiency of federated training is challenging given the huge number of parameters and the per-client memory cost. We believe that such a library and benchmark, along with comparable evaluation settings, is necessary to make meaningful progress in FL on computer vision tasks. FedCV is publicly available: https://github.com/FedML-AI/FedCV .
Three-dimensional (3D) triangulation based on active binocular vision has increasing amounts of applications in computer vision and robotics. An active binocular vision system with non-fixed cameras needs to calibrate the stereo extrinsic parameters online to perform 3D triangulation. However, the accuracy of stereo extrinsic parameters and disparity have a significant impact on 3D triangulation precision. We propose a novel eye gaze based 3D triangulation method that does not use stereo extrinsic parameters directly in order to reduce the impact. Instead, we drive both cameras to gaze at a 3D spatial point P at the optical center through visual servoing. Subsequently, we can obtain the 3D coordinates of P through the intersection of the two optical axes of both cameras. We have performed experiments to compare with previous disparity based work, named the integrated two-pose calibration (ITPC) method, using our robotic bionic eyes. The experiments show that our method achieves comparable results with ITPC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.