Privacy concerns are considered one of the main challenges in smart cities as sharing sensitive data induces threatening problems in people's lives. Federated learning has emerged as an effective technique to avoid privacy infringement as well as increase the utilization of the data. However, there is a scarcity in the amount of labeled data and an abundance of unlabeled data collected in smart cities; hence there is a necessity to utilize semi-supervised learning. In this paper, we present the primary design aspects for enabling federated learning at the edge networks taking into account the problem of unlabeled data. We propose a semi-supervised federated edge learning method called FedSem that exploits unlabeled data in real-time. FedSem algorithm is divided into two phases. The first phase trains a global model using only the labeled data. In the second phase, Fedsem injects unlabeled data into the learning process using the pseudo labeling technique and the model developed in the first phase to improve the learning performance. We carried out several experiments using the traffic signs dataset as a case study. Our results show that FedSem can achieve accuracy by up to 8% by utilizing the unlabeled data in the learning process.
Clustered federated Multitask learning is introduced as an efficient technique when data is unbalanced and distributed amongst clients in a non-independent and identically distributed manner. While a similarity metric can provide client groups with specialized models according to their data distribution, this process can be time-consuming because the server needs to capture all data distribution first from all clients to perform the correct clustering. Due to resource and time constraints at the network edge, only a fraction of devices is selected every round, necessitating the need for an efficient scheduling technique to address these issues. Thus, this paper introduces a two-phased client selection and scheduling approach to improve the convergence speed while capturing all data distributions. This approach ensures correct clustering and fairness between clients by leveraging bandwidth reuse for participants spent a longer time training their models and exploiting the heterogeneity in the devices to schedule the participants according to their delay. The server then performs the clustering depending on predetermined thresholds and stopping criteria. When a specified cluster approximates a stopping point, the server employs a greedy selection for that cluster by picking the devices with lower delay and better resources. The convergence analysis is provided, showing the relationship between the proposed scheduling approach and the convergence rate of the specialized models to obtain convergence bounds under non-i.i.d. data distribution. We carry out extensive simulations, and the results demonstrate that the proposed algorithms reduce training time and improve the convergence speed by up to 50% while equipping every user with a customized model tailored to its data distribution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.