Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder represen-tation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy, and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites, and transcription factor binding sites, after easy fine-tuning using small task-specific labelled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability The source code, pretrained and finetuned model for DNABERT are available at GitHub https://github.com/jerryji1993/DNABERT Supplementary information Supplementary data are available at Bioinformatics online.
In this paper, we explore the slot tagging with only a few labeled support sentences (a.k.a. few-shot). Few-shot slot tagging faces a unique challenge compared to the other fewshot classification problems as it calls for modeling the dependencies between labels. But it is hard to apply previously learned label dependencies to an unseen domain, due to the discrepancy of label sets. To tackle this, we introduce a collapsed dependency transfer mechanism into the conditional random field (CRF) to transfer abstract label dependency patterns as transition scores. In the few-shot setting, the emission score of CRF can be calculated as a word's similarity to the representation of each label. To calculate such similarity, we propose a Label-enhanced Task-Adaptive Projection Network (L-TapNet) based on the stateof-the-art few-shot classification model -Tap-Net, by leveraging label name semantics in representing labels. Experimental results show that our model significantly outperforms the strongest few-shot learning baseline by 14.64 F1 scores in the one-shot setting. 1
Background Pancreatitis is a critical public health problem, and the burden of pancreatitis is increasing. We report the rates and trends of the prevalence, incidence, and years lived with disability (YLDs) for pancreatitis at the global, regional, and national levels in 195 countries and territories from 1990 to 2017, stratified by sex, age, and sociodemographic index (SDI). Methods Data on pancreatitis were available from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2017. Numbers and age-standardized prevalence, incidence, and YLDs’ rates per 100,000 population were estimated through a systematic analysis of modeled data from the 2017 GBD study. Both acute and chronic pancreatitis are being modeled separately in the GBD 2017; however, our data show acute and chronic pancreatitis together. Estimates were reported with uncertainty intervals (UIs). Results Globally, in 2017, the age-standardized rates were 76.2 (95% UIs 68.9 to 83.4), 20.6 (19.2 to 22.1), and 4.5 (2.3 to 7.6) per 100,000 population for the point prevalence, incidence, and YLDs, respectively. From 1990 to 2017, the percent changes in the age-standardized prevalence and YLDs rates increased, whereas the age-standardized incidence rate decreased. The global prevalence increased with age up to 60–64 years and 44–49 years in females and males, respectively, and then decreased, with no significant difference between females and males. The global prevalence rate increased with age, peaking in the 95+ age group, with no difference between sexes. Generally, positive correlation between age-standardized YLDs and SDIs at the regional and national levels was observed. Slovakia (297.7 [273.4 to 325.3]), Belgium (274.3 [242.6 to 306.5]), and Poland (266.7 [248.2 to 284.4]) had the highest age-standardized prevalence rates in 2017. Taiwan (Province of China) (104.2% [94.8 to 115.2%]), Maldives (72.4% [66.5 to 79.2%]), and Iceland (64.8% [57.2 to 72.9%]) had the largest increases in age-standardized prevalence rates from 1990 to 2017. Conclusions Pancreatitis is a major public health issue worldwide. The age-standardized prevalence and YLDs rates increased, but the age-standardized incidence rate decreased from 1990 to 2017. Improving the quality of pancreatitis health data in all regions and countries is strongly recommended for better monitoring the burden of pancreatitis.
Recently, accumulating evidences have indicated miRNAs play critical roles in the progression and development of various human complex diseases, which pointed out that identifying miRNA-disease association could enable us to understand diseases at miRNA level. Thus, revealing more and more potential miRNA-disease associations is a vital topic in biomedical domain. However, it will be extremely expensive and time-consuming if we examine all the possible miRNA-disease pairs. Therefore, more accurate and efficient methods are being highly requested to detect potential miRNA-disease associations. In this study, we developed a computational model of Ensemble Learning and Link Prediction for miRNA-Disease Association prediction (ELLPMDA) to achieve this goal. By integrating miRNA functional similarity, disease semantic similarity, miRNA-disease association and Gaussian profile kernel similarity for miRNAs and diseases, we constructed a similarity network and utilized ensemble learning to combine rank results given by three classic similarity-based algorithms. To evaluate the performance of ELLPMDA, we exploited global and local Leave-One-Out Cross Validation (LOOCV), 5-fold Cross Validation (CV) and three kinds of case studies. As a result, the AUCs of ELLPMDA is 0.9181, 0.8181 and 0.9193+/-0.0002 in global LOOCV, local LOOCV and 5-fold CV, respectively, which significantly exceed almost all the previous methods. Moreover, in three distinct kinds of case studies for Kidney Neoplasms, Lymphoma, Prostate Neoplasms, Colon Neoplasms and Esophageal Neoplasms, 88%, 92%, 86%, 98% and 98% out of the top 50 predicted miRNAs has been confirmed, respectively. Besides, ELLPMDA is based on global similarity measure and applicable to new diseases without any known related miRNAs.
Glandular secretory trichomes (GSTs) are regarded as biofactories for synthesizing, storing, and secreting artemisinin. It is necessary to figure out the initiation and development regulatory mechanism of GSTs to cultivate high-yielding Artemisia annua. Here, we identified an MYB transcription factor, AaTAR2, from bioinformatics analysis of the A. annua genome database and Arabidopsis trichome development-related genes. AaTAR2 is mainly expressed in young leaves and located in the nucleus. Repression and overexpression of AaTAR2 resulted in a decrease and increase, respectively, in the GSTs numbers, leaf biomass, and the artemisinin content in transgenic plants. Furthermore, the morphological characteristics changed obviously in trichomes, suggesting AaTAR2 plays a key role in trichome formation. In addition, the expression of flavonoid biosynthesis genes and total flavonoid content increased dramatically in AaTAR2-overexpressing transgenic plants. Owing to flavonoids possibly counteracting emerging resistance to artemisinin in Plasmodium species, AaTAR2 is a potential target to improve the effect of artemisinin in clinical therapy. Taken together, AaTAR2 positively regulates trichome development and artemisinin and flavonoid biosynthesis. A better understanding of this 'multiple functions' transcription factor may enable enhanced artemisinin and flavonoids yield. AaTAR2 is a potential breeding target for cultivating high-quality A. annua.
Plants respond to abiotic UV-B stress with enhanced expression of genes for flavonoid production, especially the key-enzyme chalcone synthase (CHS). Some flavonoids are antioxidative, antimicrobial and/or UV-B protective secondary metabolites. However, when plants are challenged with concomitant biotic stress (simulated e.g. by the bacterial peptide flg22, which induces MAMP triggered immunity, MTI), the production of flavonoids is strongly suppressed in both Arabidopsis thaliana cell cultures and plants. On the other hand, flg22 induces the production of defense related compounds, such as the phytoalexin scopoletin, as well as lignin, a structural barrier thought to restrict pathogen spread within the host tissue. Since all these metabolites require the precursor phenylalanine for their production, suppression of the flavonoid production appears to allow the plant to focus its secondary metabolism on the production of pathogen defense related compounds during MTI. Interestingly, several flavonoids have been reported to display anti-microbial activities. For example, the plant flavonoid phloretin targets the Pseudomonas syringae virulence factors flagella and type 3 secretion system. That is, suppression of flavonoid synthesis during MTI might have also negative side-effects on the pathogen defense. To clarify this issue, we deployed an Arabidopsis flavonoid mutant and obtained genetic evidence that flavonoids indeed contribute to ward off the virulent bacterial pathogen Pseudomonas syringae pv. tomato (Pst) DC3000. Finally, we show that UV-B attenuates expression of the flg22 receptor FLS2, indicating that there is negative and reciprocal interaction between this abiotic stress and the plant-pathogen defense responses.
Brassica napus is highly susceptible towards Verticillium longisporum (Vl43) with no effective genetic resistance. It is believed that the fungus reprogrammes plant physiological processes by up-regulation of so-called susceptibility factors to establish a compatible interaction. By transcriptome analysis, we identified genes, which were activated/up-regulated in rapeseed after Vl43 infection. To test whether one of these genes is functionally involved in the infection process and loss of function would lead to decreased susceptibility, we firstly challenged KO lines of corresponding Arabidopsis orthologs with Vl43 and compared them with wild-type plants. Here, we report that the KO of AtCRT1a results in drastically reduced susceptibility of plants to Vl43. To prove crt1a mutation also decreases susceptibility in B. napus, we identified 10 mutations in a TILLING population. Three T3 mutants displayed increased resistance as compared to the wild type. To validate the results, we generated CRISPR/Cas-induced BnCRT1a mutants, challenged T2 plants with Vl43 and observed an overall reduced susceptibility in 3 out of 4 independent lines. Genotyping by allele-specific sequencing suggests a major effect of mutations in the CRT1a A-genome copy, while the C-genome copy appears to have no significant impact on plant susceptibility when challenged with Vl43. As revealed by transcript analysis, the loss of function of CRT1a results in activation of the ethylene signalling pathway, which may contribute to reduced susceptibility. Furthermore, this study demonstrates a novel strategy with great potential to improve plant disease resistance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.