Abstract:We present a general biomedical domain-oriented NLP engine called MedScan that efficiently processes sentences from MEDLINE abstracts and produces a set of regularized logical structures representing the meaning of each sentence. The engine utilizes a specially developed context-free grammar and lexicon. Preliminary evaluation of the system's performance, accuracy, and coverage exhibited encouraging results. Further approaches for increasing the coverage and reducing parsing ambiguity of the engine, as well as… Show more
“…For this specific purpose we use the information contained in the ResNet mammalian database from Ariadne Genomics (http://www.ariadnegenomics.com/) (Novichkova et al, 2003;Daraselia et al, 2004). We selected only the interactions included in the category of Promoter Binding and Direct Regulation.…”
With defined culture protocol, human embryonic stem cells (hESCs) are able to generate cardiomyocytes in vitro, therefore providing a great model for human heart development, and holding great potential for cardiac disease therapies. In this study, we successfully generated a highly pure population of human cardiomyocytes (hCMs) (>95% cTnT + ) from hESC line, which enabled us to identify and characterize an hCM-specific signature, at both the gene expression and DNA methylation levels. Gene functional association network and gene-disease network analyses of these hCM-enriched genes provide new insights into the mechanisms of hCM transcriptional regulation, and stand as an informative and rich resource for investigating cardiac gene functions and disease mechanisms. Moreover, we show that cardiac-structural genes and cardiac-transcription factors have distinct epigenetic mechanisms to regulate their gene expression, providing a better understanding of how the epigenetic machinery coordinates to regulate gene expression in different cell types.
“…For this specific purpose we use the information contained in the ResNet mammalian database from Ariadne Genomics (http://www.ariadnegenomics.com/) (Novichkova et al, 2003;Daraselia et al, 2004). We selected only the interactions included in the category of Promoter Binding and Direct Regulation.…”
With defined culture protocol, human embryonic stem cells (hESCs) are able to generate cardiomyocytes in vitro, therefore providing a great model for human heart development, and holding great potential for cardiac disease therapies. In this study, we successfully generated a highly pure population of human cardiomyocytes (hCMs) (>95% cTnT + ) from hESC line, which enabled us to identify and characterize an hCM-specific signature, at both the gene expression and DNA methylation levels. Gene functional association network and gene-disease network analyses of these hCM-enriched genes provide new insights into the mechanisms of hCM transcriptional regulation, and stand as an informative and rich resource for investigating cardiac gene functions and disease mechanisms. Moreover, we show that cardiac-structural genes and cardiac-transcription factors have distinct epigenetic mechanisms to regulate their gene expression, providing a better understanding of how the epigenetic machinery coordinates to regulate gene expression in different cell types.
“…Interestingly, this information then supports secondary studies concerned with the consistency of the information [18], methods to imitate manual curation [19] and the propagation of facts in the literature [20]. Other automated approaches to the curation of pathway information include the MedScan system [21,22].…”
“…It was observed that a F-score of 50.4% was achieved when tested on a general corpus randomly extracted from MEDLINE, which is impossible to those systems based on predefined semantic grammar rules. For example, MedScan [13] can only successfully parse and generate semantic structures for about 34% sentences randomly picked from MEDLINE. The recall rate of MedScan was found to be 21% [13].…”
Section: Resultsmentioning
confidence: 99%
“…For example, MedScan [13] can only successfully parse and generate semantic structures for about 34% sentences randomly picked from MEDLINE. The recall rate of MedScan was found to be 21% [13]. This demonstrated the robustness of the HVS model.…”
Abstract. In the field of bioinformatics in solving biological problems, the huge amount of knowledge is often locked in textual documents such as scientific publications. Hence there is an increasing focus on extracting information from this vast amount of scientific literature. In this paper, we present an information extraction system which employs a semantic parser using the Hidden Vector State (HVS) model for protein-protein interactions. Unlike other hierarchical parsing models which require fully annotated treebank data for training, the HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient ability to capture the hierarchical structure needed to robustly extract task domain semantics. When applied in extracting protein-protein interactions information from medical literature, we found that it performed better than other established statistical methods and achieved 47.9% and 72.8% in recall and precision respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.