As a new concept that emerged in the middle of 1990's, data mining can help researchers gain both novel and deep insights and can facilitate unprecedented understanding of large biomedical datasets. Data mining can uncover new biomedical and healthcare knowledge for clinical and administrative decision making as well as generate scientific hypotheses from large experimental data, clinical databases, and/or biomedical literature. This review first introduces data mining in general (e.g., the background, definition, and process of data mining), discusses the major differences between statistics and data mining and then speaks to the uniqueness of data mining in the biomedical and healthcare fields. A brief summarization of various data mining algorithms used for classification, clustering, and association as well as their respective advantages and drawbacks is also presented. Suggested guidelines on how to use data mining algorithms in each area of classification, clustering, and association are offered along with three examples of how data mining has been used in the healthcare industry. Given the successful application of data mining by health related organizations that has helped to predict health insurance fraud and under-diagnosed patients, and identify and classify at-risk people in terms of health with the goal of reducing healthcare cost, we introduce how data mining technologies (in each area of classification, clustering, and association) have been used for a multitude of purposes, including research in the biomedical and healthcare fields. A discussion of the technologies available to enable the prediction of healthcare costs (including length of hospital stay), disease diagnosis and prognosis, and the discovery of hidden biomedical and healthcare patterns from related databases is offered along with a discussion of the use of data mining to discover such relationships as those between health conditions and a disease, relationships among diseases, and relationships among drugs. The article concludes with a discussion of the problems that hamper the clinical use of data mining by health professionals.
The plethora of biomedical relations which are embedded in medical logs (records) demands researchers' attention. Previous theoretical and practical focuses were restricted on traditional machine learning techniques. However, these methods are susceptible to the issues of “vocabulary gap” and data sparseness and the unattainable automation process in feature extraction. To address aforementioned issues, in this work, we propose a multichannel convolutional neural network (MCCNN) for automated biomedical relation extraction. The proposed model has the following two contributions: (1) it enables the fusion of multiple (e.g., five) versions in word embeddings; (2) the need for manual feature engineering can be obviated by automated feature learning with convolutional neural network (CNN). We evaluated our model on two biomedical relation extraction tasks: drug-drug interaction (DDI) extraction and protein-protein interaction (PPI) extraction. For DDI task, our system achieved an overall f-score of 70.2% compared to the standard linear SVM based system (e.g., 67.0%) on DDIExtraction 2013 challenge dataset. And for PPI task, we evaluated our system on Aimed and BioInfer PPI corpus; our system exceeded the state-of-art ensemble SVM system by 2.7% and 5.6% on f-scores.
The state-of-the-art methods for protein-protein interaction (PPI) extraction are primarily based on kernel methods, and their performances strongly depend on the handcraft features. In this paper, we tackle PPI extraction by using convolutional neural networks (CNN) and propose a shortest dependency path based CNN (sdpCNN) model. The proposed method (1) only takes the sdp and word embedding as input and (2) could avoid bias from feature selection by using CNN. We performed experiments on standard Aimed and BioInfer datasets, and the experimental results demonstrated that our approach outperformed state-of-the-art kernel based methods. In particular, by tracking the sdpCNN model, we find that sdpCNN could extract key features automatically and it is verified that pretrained word embedding is crucial in PPI task.
Background Respiratory syncytial virus (RSV) is among the most important causes of acute lower respiratory tract infection (ALRI) in young children. We assessed the severity of RSV-ALRI in children less than 5 years old with bronchopulmonary dysplasia (BPD). Methods We searched for studies using EMBASE, Global Health, and MEDLINE. We assessed hospitalization risk, intensive care unit (ICU) admission, need for oxygen supplementation and mechanical ventilation, and in-hospital case fatality (hCFR) among children with BPD compared with those without (non-BPD). We compared the (1) length of hospital stay (LOS) and (2) duration of oxygen supplementation and mechanical ventilation between the groups. Results Twenty-nine studies fulfilled our inclusion criteria. The case definition for BPD varied substantially in the included studies. Risks were higher among children with BPD compared with non-BPD: RSV hospitalization (odds ratio [OR], 2.6; 95% confidence interval [CI], 1.7–4.2; P < .001), ICU admission (OR, 2.9; 95% CI, 2.3–3.5; P < .001), need for oxygen supplementation (OR, 4.2; 95% CI, .5–33.7; P = .175) and mechanical ventilation (OR, 8.2; 95% CI, 7.6–8.9; P < .001), and hCFR (OR, 12.8; 95% CI, 9.4–17.3; P < .001). Median LOS (range) was 7.2 days (4–23) (BPD) compared with 2.5 days (1–30) (non-BPD). Median duration of oxygen supplementation (range) was 5.5 days (0–21) (BPD) compared with 2.0 days (0–26) (non-BPD). The duration of mechanical ventilation was more often longer (>6 days) in those with BPD compared with non-BPD (OR, 11.9; 95% CI, 1.4–100; P = .02). Conclusions The risk of severe RSV disease is considerably higher among children with BPD. There is an urgent need to establish standardized BPD case definitions, review the RSV prophylaxis guidelines, and encourage more specific studies on RSV infection in BPD patients, including vaccine development and RSV-specific treatment.
Background: To learn from errors, electronic patient safety event reporting systems (e-reporting systems) have been widely adopted to collect medical incidents from the frontline practitioners in US hospitals. However, two issues of underreporting and low-quality of reports pervade and thus the system effectiveness remains dubious.
Leaves comprise multiple cell types but our knowledge of the patterns of gene expression that underpin their functional specialization is fragmentary. Our understanding and ability to undertake the rational redesign of these cells is therefore limited. We aimed to identify genes associated with the incompletely understood bundle sheath of C 3 plants, which represents a key target associated with engineering traits such as C 4 photosynthesis into Oryza sativa (rice). To better understand the veins, bundle sheath and mesophyll cells of rice, we used laser capture microdissection followed by deep sequencing. Gene expression of the mesophyll is conditioned to allow coenzyme metabolism and redox homeostasis, as well as photosynthesis. In contrast, the bundle sheath is specialized in water transport, sulphur assimilation and jasmonic acid biosynthesis. Despite the small chloroplast compartment of bundle sheath cells, substantial photosynthesis gene expression was detected. These patterns of gene expression were not associated with the presence or absence of specific transcription factors in each cell type, but were instead associated with gradients in expression across the leaf. Comparative analysis with C 3 Arabidopsis identified a small gene set preferentially expressed in the bundle sheath cells of both species. This gene set included genes encoding transcription factors from 14 orthogroups and proteins allowing water transport, sulphate assimilation and jasmonic acid synthesis. The most parsimonious explanation for our findings is that bundle sheath cells from the last common ancestor of rice and Arabidopsis were specialized in this manner, and as the species diverged these patterns of gene expression have been maintained.
When exposed to high light, plants produce reactive oxygen species (ROS). In Arabidopsis thaliana, local stress such as excess heat or light initiates a systemic ROS wave in phloem and xylem cells dependent on NADPH oxidase/respiratory burst oxidase homolog (RBOH) proteins. In the case of excess light, although the initial local accumulation of ROS preferentially takes place in bundle-sheath strands, little is known about how this response takes place. Using rice and the ROS probes diaminobenzidine and 2′,7′-dichlorodihydrofluorescein diacetate, we found that, after exposure to high light, ROS were produced more rapidly in bundle-sheath strands than mesophyll cells. This response was not affected either by CO2 supply or photorespiration. Consistent with these findings, deep sequencing of messenger RNA (mRNA) isolated from mesophyll or bundle-sheath strands indicated balanced accumulation of transcripts encoding all major components of the photosynthetic apparatus. However, transcripts encoding several isoforms of the superoxide/H2O2-producing enzyme NADPH oxidase were more abundant in bundle-sheath strands than mesophyll cells. ROS production in bundle-sheath strands was decreased in mutant alleles of the bundle-sheath strand preferential isoform of OsRBOHA and increased when it was overexpressed. Despite the plethora of pathways able to generate ROS in response to excess light, NADPH oxidase–mediated accumulation of ROS in the rice bundle-sheath strand was detected in etiolated leaves lacking chlorophyll. We conclude that photosynthesis is not necessary for the local ROS response to high light but is in part mediated by NADPH oxidase activity.
Summary The engineering of C4 photosynthetic activity into the C3 plant rice has the potential to nearly double rice yields. To engineer a two‐cell photosynthetic system in rice, the rice bundle sheath (BS) must be rewired to enhance photosynthetic capacity. Here, we show that BS chloroplast biogenesis is enhanced when the transcriptional activator, Oryza sativa Cytokinin GATA transcription factor 1 (OsCGA1), is driven by a vascular specific promoter. Ectopic expression of OsCGA1 resulted in increased BS chloroplast planar area and increased expression of photosynthesis‐associated nuclear genes (PhANG), required for the biogenesis of photosynthetically active chloroplasts in BS cells of rice. A further refinement using a DNAse dead Cas9 (dCas9) activation module driven by the same cell‐type specific promoter, directed enhanced chloroplast development of the BS cells when gRNA sequences were delivered by the dCas9 module to the promoter of the endogenous OsCGA1 gene. Single gRNA expression was sufficient to mediate the transactivation of both the endogenous gene and a transgenic GUS reporter fused with OsCGA1 promoter. Our results illustrate the potential for tissue‐specific dCas9‐activation and the co‐regulation of genes needed for multistep engineering of C4 rice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.