Determining hadith authenticity is vitally important in the Islamic religion because hadiths record the sayings and actions of Prophet Muhammad (PBUH), and they are the second source of Islamic teachings following the Quran. When authenticating a hadith, the reliability of the hadith narrators is a big factor that hadith scholars consider. However, many narrators share similar names, and the narrators’ full names are not usually included in the narration chains of hadiths. Thus, first, ambiguous narrators need to be identified. Then, their reliability level can be determined. There are no available datasets that could help address this problem of identifying narrators. Here, we present a new dataset that contains narration chains (sanads) with identified narrators. The AR-Sanad 280K dataset has around 280K artificial sanads and could be used to identify 18,298 narrators. After creating the AR-Sanad 280K dataset, we address the narrator disambiguation in several experimental setups. The hadith narrator disambiguation is modeled as a multiclass classification problem with 18,298 class labels. We test different representations and models in our experiments. The best results were achieved by finetuning BERT-Based deep learning model (AraBERT). We obtained a 92.9 Micro F1 score and 30.2 sanad error rate (SER) on the validation set of our artificial sanads AR-Sanad 280K dataset. Furthermore, we extracted a real test set from the sanads of the famous six books in Islamic hadith. We evaluated the best model on the real test data, and we achieved 83.5 Micro F1 score and 60.6 sanad error rate.
Historical texts are one of the main pillars for understanding current civilization and are used to reference different aspects. Hadiths are an example of one of the historical texts that should be securely preserved. Due to the expansion of the online resources, fabrications and alterations of fake Hadiths are easily feasible. Therefore, it has become more challenging to authenticate the online available Hadith contents and much harder to keep these authenticated results secure and unmanipulated. In this research, we are using the capabilities of the distributed blockchain technology to securely archive the Hadith and its level of authenticity in a blockchain. We selected a permissioned blockchain customized model in which the main entities approving the level of authenticity of the Hadith are well-established and specialized institutions in the main Islamic countries that can apply their own Hadith validation model. The proposed solution guarantees its integrity using the crowd wisdom represented in the selected nodes in the blockchain, which uses voting algorithms to decide the insertion of any new Hadiths into the database. This technique secures data integrity at any given time. If any organization’s credentials are compromised and used to update the data maliciously, 50% + 1 approval from the whole network nodes will be required. In case of any malicious or misguided information during the state of reaching consensus, the system will self-heal using practical Byzantine Fault Tolerance (pBFT). We evaluated the proposed framework’s read/write performance and found it adequate for the operational requirements.
Religious studies are a rich land for Natural Language Processing (NLP). The reason is that all religions have their instructions as written texts. In this paper, we apply NLP to Islamic Hadiths, which are the written traditions, sayings, actions, approvals, and discussions of the Prophet Muhammad, his companions, or his followers. A Hadith is composed of two parts: the chain of narrators (Sanad) and the content of the Hadith (Matn). A Hadith is transmitted from its author to a Hadith book author using a chain of narrators. The problem we solve focuses on the classification of Hadiths based on their origin of narration. This is important for several reasons. First, it helps determine the authenticity and reliability of the Hadiths. Second, it helps trace the chain of narration and identify the narrators involved in transmitting Hadiths. Finally, it helps understand the historical and cultural contexts in which Hadiths were transmitted, and the different levels of authority attributed to the narrators. To the best of our knowledge, and based on our literature review, this problem is not solved before using machine/deep learning approaches. To solve this classification problem, we created a novel Author-Based Hadith Classification Dataset (ABCD) collected from classical Hadiths’ books. The ABCD size is 29 K Hadiths and it contains unique 18 K narrators, with all their information. We applied machine learning (ML), and deep learning (DL) approaches. ML was applied on Sanad and Matn separately; then, we did the same with DL. The results revealed that ML performs better than DL using the Matn input data, with a 77% F1-score. DL performed better than ML using the Sanad input data, with a 92% F1-score. We used precision and recall alongside the F1-score; details of the results are explained at the end of the paper. We claim that the ABCD and the reported results will motivate the community to work in this new area. Our dataset and results will represent a baseline for further research on the same problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.