“…The digital libraries offer a wide range of data formats, such as text (Ajij et al, 2023), video (Dias et al, 2023), image (Shi & Zhu, 2020), audio (Smith et al, 2019), cultural heritage (Otegi et al, 2014), and mathematical jargon (Schubotz et al, 2018), which poses a significant challenge for efficient information retrieval and effective recommendations to users. To address this inevitable challenge and deliver more personalized suggestions aligned with diverse user preferences, there is a need to develop a comprehensive framework that can effectively handle multimodal data.…”