“…Finally, in what respect to the experimental setup, most works use simulated data either for training or for training and testing [44][45][46][47][48][49][50][51][52][54][55][56][57][58][59], usually by convolving clean (anechoic) speech with impulse responses (room, head related, or DOA related (azimuth, elevation)). Only some of them actually face real recordings [44,45,53,55,56], which in our opinion is a must to be able to assess the actual impact of the proposals in real conditions. So, in this paper we describe, for the first time in the literature to the best of our knowledge, a CNN architecture in which we directly exploit the raw acoustic signal to be provided to the neural network, with the objective of directly estimating the three dimensional position of an acoustic source in a given environment.…”