“…Transformer-based models have revolutionized the fields of natural language processing [22,33,46] and recommender systems [34,44], improving upon other neural embedding models [3, 5, 7, 8, 13-15, 17, 36, 37, 40]. Significant strides were made in tasks such as machine translation [46], sentiment analysis [49], semantic textual similarity [16,23,35], and item similarity [6,11,12,28,34]. However, transformers employ a complex attention-based architecture comprising hundreds of millions of parameters that cannot be decomposed into smaller more interpretable components.…”