This paper presents a benchmarking survey on query expansion techniques for social media information retrieval, with a focus on comparing the performance of methods using semantic web technologies. The study evaluated query expansion techniques such as generative AI models and semantic matching algorithms and how they are integrated in a semantic framework. The evaluation was based on cosine similarity metrics, including the Discounted Cumulative Gain (DCG), Ideal Discounted Cumulative Gain (IDCG), and normalized Discounted Cumulative Gain (nDCG), as well as the Mean Average Precision (MAP). Additionally, the paper discusses the use of semantic web technologies as a component in a pipeline for building thematic knowledge graphs from retrieved social media data with extended ontologies integrated for the refugee crisis. The paper begins by introducing the importance of query expansion in information retrieval and the potential benefits of incorporating semantic web technologies. The study then presents the methodologies and outlines the specific procedures for each query expansion technique. The results of the evaluation are presented, as well as the rest semantic framework, and the best-performing technique was identified, which was the curie-001 generative AI model. Finally, the paper summarizes the main findings and suggests future research directions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.