Even though machine learning (ML) techniques are being widely used in communications, the question of how to train communication systems has received surprisingly little attention. In this paper, we show that the commonly used binary cross-entropy (BCE) loss is a sensible choice in uncoded systems, e.g., for training ML-assisted data detectors, but may not be optimal in coded systems. We propose new loss functions targeted at minimizing the block error rate and SNR deweighting, a novel method that trains communication systems for optimal performance over a range of signal-to-noise ratios. The utility of the proposed loss functions as well as of SNR deweighting is shown through simulations in NVIDIA Sionna.
INTRODUCTIONMachine learning (ML) has revolutionized a large number of fields, including communications. The availability of software frameworks, such as TensorFlow [2] and, recently, NVIDIA Sionna [3], has made implementation and training of MLassisted communication systems convenient. Existing results in ML-assisted communication systems range from the atomistic improvement of data detectors (e.g., using deep unfolding) [4][5][6][7] to model-free learning of end-to-end communication systems [8][9][10]. Quite surprisingly, only little attention has been devoted to the question of how ML-assisted communication systems should be trained. In particular, the choice of the cost function is seldom discussed (see, e.g., the recent overview papers [11,12]) and-given the similarity between communication and classification-one usually resorts to an empirical cross-entropy (CE) loss [13][14][15][16][17][18]. The question of training a communication system for good performance over a range of signal-to-noise ratios (SNRs) is another issue that has not been seriously investigated. Systems are usually trained on samples from only one SNR [4,9], or on samples uniformly drawn from the targeted SNR range [5,15,17], apparently without questioning how this may affect performance for different SNRs.In this paper, we investigate how ML-assisted communication systems should be trained. We first consider the case where the intended goal is to minimize the uncoded bit error rate (BER) and discuss why the empirical binary cross-entropy An extended version of this work that includes an appendix with all the proofs, some information-theoretic comments, and experiments for another communication scenario is available on arXiv [1]. All code and simulation scripts are available on GitHub: https://github.com/IIP-Group/BLER_TrainingThe authors thank Oscar Castañeda for comments and suggestions.