Biomedical Named Entity Recognition (BNER) is the task of identifying biomedical instances such as chemical compounds, genes, proteins, viruses, disorders, DNAs and RNAs. The key challenge behind BNER lies on the methods that would be used for extracting such entities. Most of the methods used for BNER were relying on Supervised Machine Learning (SML) techniques. In SML techniques, the features play an essential role in terms of improving the effectiveness of the recognition process. Features can be identified as a set of discriminating and distinguishing characteristics that have the ability to indicate the occurrence of an entity. In this manner, the features should be able to generalize which means to discriminate the entities correctly even on new and unseen samples. Several studies have tackled the role of features in terms of identifying named entities. However, with the surge of biomedical researches, there is a vital demand to explore biomedical features. This paper aims to accommodate a review study on the features that could be used for BNER in which various types of features will be examined including morphological features, dictionary-based features, lexical features and distance-based features.
Holistic schema matching is the process of carrying off several number of schemas as an input and outputs the correspondences among them. Treating large number of schemas may consume longer time with poor quality. Therefore, several clustering approaches have been proposed in order to reduce the search space by partitioning the data into smaller portions which can facilitate the matching process. However, there is still a demand for improving the partitioning mechanism by avoiding the random initial solutions (centroids) re-sulted from the clustering process. Such random solutions have a significant impact on the matching results. This study aims to integrate correlation clustering and agglomerative hierarchical clustering toward improving the effectiveness of holistic schema matching. The proposed integrated method avoids the random initial so-lutions and the predefined number of centroids. Several preprocessing steps have been performed with using auxiliary information (domain dictionary). The experiments have been carried out on Airfare, Auto and Book datasets from UIUC Web Integration Repository. The proposed method has been compared with K-means and K-medoids clustering methods. As a results the proposed method has outperformed K-means and K-medoids by achieving 0.9, 0.93 and 0.9 of accuracy for Airfare, Auto and Book respectively.
Chemical Compound Extraction refers to the task of recognizing chemical instances such as oxygen nitrogen and others. The majority of studies that addressed the task of chemical compound extraction used machine-learning techniques. The key challenge behind using machine-learning techniques lies in employing a robust set of features. The literature shows that there are numerous types of features used in the task of chemical compound extraction. Such dimensionality of features can be determined via data representation. Some researchers have used N-gram representation for biomedical named entity recognition, where the most significant terms are represented as features. Meanwhile, others have used detailed-attribute representation in which the features are generalized. As a result, identifying the best combination of features to yield high-accuracy classification becomes challenging. This paper aims to apply the Wrapper Subset Selection approach using two data representations-N-gram and detailed-attributes. Since each data representation would suit a specific classification algorithm, two classifiers were utilized-Naïve Bayes (for detailedattributes) and Support Vector Machine (for N-gram). The results show that the application of feature selection using detailedattributes outperformed that of N-gram representation by achieving a 0.722 f-measure. Despite the higher classification accuracy, the selected features using detailed-attribute representation have more meaning and can be applied for further datasets.
Query cloud process is an interested research study that caught many researchers' attentions. Several studies have presented different types of encryption in order to encrypt the data prior to being migrated over the cloud. However, there is an essential demand to balance between the time consumption and encryption security. This paper presented a comparative study of encryption methods for query execution over the cloud. Three common encryption methods have been used including Advanced Encryption Standard (AES), Rivest-Shamir-Adleman (RSA) and Elliptic Curve Cryptography (ECC). A benchmark dataset of queries has been used in the experiments. Based on the time of encryption and decryption along with the secrecy measure, the three methods have been evaluated. Results showed that RSA has the most competitive performance in terms of encryption and decryption time, meanwhile, it has a competitive secrecy measure values. It achieved an average encryption time of 0.57, 1.41 and 0.59 for Delete, Add and Select queries, as well as, it achieved an average decryption time of 2.31, 4.24 and 1.79 for Delete, Add and Select queries. Finally, RSA obtained an average secrecy of 1.10, 1.10 and 1.15 for Delete, Add and Select queries. This emphasis the usefulness of using RSA to maintain both efficiency and security of encryption.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.