In the current era of the Internet of Vehicles (IoV), vehicle to vehicle data sharing can provide customized applications for Connected and Autonomous Vehicles (CAVs). The advancement of Deep Learning (DL) methodologies is one of the key driving forces for CAVs, allowing elaborating a massive amount of data by the resource-constrained onboard devices. In a traditional centralized DL approach, vehicle data are transmitted to the cloud for the training of models. This approach leads to significant communication overhead, high delays, and data privacy concerns. Conversely, Federated Learning (FL) performs the training using the local models in a distributed fashion and mitigates the data privacy risks by sharing only the model parameters with the server, optimizing the FL to be used with resources-constrained devices. In this paper, we propose the design of a scalable communication infrastructure to support the FL procedure based on Information-Centric Networking (ICN) using Apache Kafka, called KAFKAFED. The ICN-based infrastructure allows to overcome the shortcomings of current client-server architectures for FL, in which routing is contentbased or name-based to achieve efficient data retrieval for mobile nodes. In ICN, data are stored at intermediate nodes to provide efficient and reliable data delivery. A proof of concept of the KAFKAFED communication architecture is developed and tested in an emulated environment. The performance of the proposed framework compared to the client server-based FL architecture, i.e., FLOWER showed a boost of almost 40% with just 32 clients in addition to several other advantages of scalability, reliability, and security Index Terms-Federated Learning, Apache Kafka, Connected and autonomous vehicles, Publish/Subscribe model guarantee user's privacy and to keep the data on user's device, a decentralized approach for training AI models is proposed byMcMahan in 2016, called Federated Learning (FL) [5].In FL, interested clients collaboratively train a model on their local information and exchange their model parameters with the central server to generate a global model shared among all the involved entities [6]. However, to simulate the concept of FL in real-world scenarios is challenging for researchers because they lack the necessary resources to train their federated models on millions of real-world devices, except for a few who work for companies like Google and Facebook. Nevertheless, since the concept has been introduced, there has been a few open-access FL frameworks developed that emulate a federated environment, such as TensorFlow Federated (TFF) 1 introduced by Google, LEAF [7], and FedEval [8]. TFF provide a framework to implement decentralized training, LEAF offers datasets for FL applications, and FedEval provides customized strategy options for communication protocols and aggregation methods for FL. Other frameworks include FedML [9], which provides support for real-world IoT devices by using FedML-Mobile and FedML-IoT, PySyft [10], that offers the so-called remote workers...