“…Deep Neural Networks have constantly pushed the state-of-the-art in speech technologies, for example automatic speech recognition (ASR) [2,3,4,5,6,7], pretrained speech transformers [8,9,10,11], dialect, language and speaker identification [12,13,14,15,16,17,18] models; along with other fields in Artificial Intelligence, including Natural Language Processing (NLP) [19] and Computer Vision (CV) [20]. While end-to-end deep architectures are simple, elegant and provide a flexible training mechanism, they are inherently black-boxes.…”