The automatic extraction of disease named entity is a challenging research problem that has attracted attention from the biomedical text mining community. Handcrafted feature methods were employed for this task given a little success since they are limited by the scope of the expert. Lately, deep learning-based methods have been employed to solve this issue. However, most architectures used for this task take into consideration long dependencies only. The proposed method is a two-stage deep neural network model. We start by discovering local dependencies and creating high-level features from word embedding inputs using a deep convolutional neural network. Then we identify long dependencies using a bi-directional recurrent neural network. To solve the problem of unbalanced dataset given by the BMEWO tagging schema and to enforce sequence modeling, we developed a new POS-based tagging schema that subdivides the dominant class into smaller more balanced units. The proposed system was trained and tested on NCBI and achieved an [Formula: see text]-score of 85.59 outperforming the current state-of-the-art methods. Our research results show the effectiveness of using both long and short dependencies. The results also illustrate the benefits of combining different word embedding techniques and the incorporation of morphological features in this task.
Tat is an essential gene for increasing the transcription of all HIV genes, and affects HIV replication, HIV exit from latency, and AIDS progression. The Tat gene frequently mutates in vivo and produces variants with diverse activities, contributing to HIV viral heterogeneity as well as drug-resistant clones. Thus, identifying the transcriptional activities of Tat variants will help to better understand AIDS pathology and treatment. We recently reported the missense mutation landscape of all single amino acid Tat variants. In these experiments, a fraction of double missense alleles exhibited intragenic epistasis. However, it is too time-consuming and costly to determine the effect of the variants for all double mutant alleles through experiments. Therefore, we propose a combined GigaAssay/deep learning approach. As a first step to determine activity landscapes for complex variants, we evaluated a deep learning framework using previously reported GigaAssay experiments to predict how transcription activity is affected by Tat variants with single missense substitutions. Our approach achieved a 0.94 Pearson correlation coefficient when comparing the predicted to experimental activities. This hybrid approach can be extensible to more complex Tat alleles for a better understanding of the genetic control of HIV genome transcription.
Tat is an essential gene for increasing the transcription of all HIV genes, and it affects HIV replication, HIV exit from latency, and AIDS progression. The Tat gene frequently mutates in vivo producing variants with diverse activities, contributing to HIV viral heterogeneity, as well as drug-resistant clones. Thus, identifying the transcriptional activities of Tat variants will help to better understand AIDS pathology and treatment. We recently reported the missense mutation landscape of all single amino acid Tat variants. In these experiments, a fraction of double missense alleles exhibited intragenic epistasis. It is too time-consuming and costly to determine a variants' effect for all double mutant alleles with experiments. Therefore, we propose a combined GigaAssay/Deep learning approach. As a first step for determining activity landscapes for complex variants, we evaluated a deep learning framework using previously reported GigaAssay experiments to predict how transcription activity is affected by Tat variants with single missense substitutions. Our approach achieves a 0.94 Pearson correlation coefficient when comparing experimental to predicted activities. This hybrid approach should be extensible to more complex Tat alleles for better understanding the genetic control of HIV genome transcription.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.