We study federated machine learning at the wireless network edge, where limited power wireless devices, each with its own dataset, build a joint model with the help of a remote parameter server (PS).We consider a bandwidth-limited fading multiple access channel (MAC) from the wireless devices to the PS, and implement distributed stochastic gradient descent (DSGD) over-the-air. We first propose a digital DSGD (D-DSGD) scheme, in which one device is selected opportunistically for transmission at each iteration based on the channel conditions; the scheduled device quantizes its gradient estimate to a finite number of bits imposed by the channel condition, and transmits these bits to the PS in a reliable manner. Next, motivated by the additive nature of the wireless MAC, we propose a novel analog communication scheme, referred to as the compressed analog DSGD (CA-DSGD), where the devices first sparsify their gradient estimates while accumulating error from previous iterations, and project the resultant sparse vector into a low-dimensional vector. We also design a power allocation scheme to align the received gradient vectors at the PS in an efficient manner. Numerical results show that the proposed CA-DSGD algorithm converges much faster than the D-DSGD scheme and other schemes in the literature, while providing a significantly higher accuracy.where M denotes the number of wireless devices, and g m (θ t )In FL, each device participating in the training can also carry out model updates as in (3) locally, and share the overall difference with respect to the previous model parameters with the PS [1].What distinguishes FL from conventional ML is the large number of devices that participate in the training, and the low-capacity and unreliable links that connect these devices to the PS. Therefore, there have been significant research efforts to reduce the communication requirements in FL [1]- [24]. However, these and follow-up studies consider orthogonal channels from the participating devices to the PS, and ignore the physical layer aspects of wireless connections, even though FL has been mainly motivated for mobile devices.