Discovering new long non-coding RNAs (lncRNAs) has been a fundamental step in lncRNA-related research. Nowadays, many machine learning-based tools have been developed for lncRNA identification. However, many methods predict lncRNAs using sequence-derived features alone, which tend to display unstable performances on different species. Moreover, the majority of tools cannot be re-trained or tailored by users and neither can the features be customized or integrated to meet researchers' requirements. In this study, features extracted from sequence-intrinsic composition, secondary structure and physicochemical property are comprehensively reviewed and evaluated. An integrated platform named LncFinder is also developed to enhance the performance and promote the research of lncRNA identification. LncFinder includes a novel lncRNA predictor using the heterologous features we designed. Experimental results show that our method outperforms several state-of-the-art tools on multiple species with more robust and satisfactory results. Researchers can additionally employ LncFinder to extract various classic features, build classifier with numerous machine learning algorithms and evaluate classifier performance effectively and efficiently. LncFinder can reveal the properties of lncRNA and mRNA from various perspectives and further inspire lncRNA-protein interaction prediction and lncRNA evolution analysis. It is anticipated that LncFinder can significantly facilitate lncRNA-related research, especially for the poorly explored species. LncFinder is released as R package (https://CRAN.R-project.org/package=LncFinder). A web server (http://bmbl.sdstate.edu/lncfinder/) is also developed to maximize its availability.
Objective-To determine susceptibility genes for high myopia in Singaporean Chinese.Design-A meta-analysis of two genome wide association (GWA) datasets in Chinese and a follow-up replication cohort in Japanese.Participants and Controls-Two independent datasets of Singaporean Chinese individuals aged 10-12 years (SCORM --Singapore Cohort Study of the Risk factors for Myopia: cases=65, controls=238) and aged > 21 years (SP2 --Singapore Prospective Study Program: cases=222, controls=435) for GWA studies, and a Japanese dataset aged >20 years (cases=959, controls=2128) for replication.Methods-Genomic DNA samples from SCORM and SP2 were genotyped using various Illumina Beadarray platforms (> HumanHap 500). Single-locus association tests were conducted for each dataset with meta-analysis using pooled z-scores. The top-ranked genetic markers were examined for replication in Japanese dataset. Fisher's P was calculated for the combined analysis of all three cohorts.Main outcome measures-High myopia, defined by spherical equivalent (SE) ≤ −6.00 diopters (D); controls defined by SE between −0.50D and +1.00D. SNPs (rs12716080 and rs6885224) in the gene CTNND2 on chromosome 5p15 ranked top in the meta-analysis of our Chinese datasets (meta-P = 1.14×10 −5 and meta-P = 1.51×10 −5 , respectively) with strong supporting evidence in each individual dataset analysis (Max P = 1.85.x10 −4 in SCORM: Max P = 8.8×10 −3 in SP2). Evidence of replication was observed in Japanese dataset for rs6885224 (P = 0.035, meta-P of three datasets: 7.84×10 −6 ). Results-TwoConclusion-This study identified strong association of CTNND2 for high myopia in Asian datasets. The CTNND2 gene maps to a known high myopia linkage region on chromosome 5p15. Keywords myopia; genome wide association; CTNND2; single nucleotide polymorphism; genetics Myopia is a common eye disorder and a major public health concern in urban East Asian populations, affecting nearly 40% of Chinese persons aged 40 to 79 years1 -3 . High myopia, defined by spherical equivalent (SE) ≤ −5.00 diopter (D) or SE ≤ −6.00 D for at least one eye, is associated with significant ocular morbidity, including retinal detachment and myopic macular degeneration 4; 5.The genetic etiologic basis of myopia and high myopia is supported by data from familial aggregation, segregation, and twin studies [5][6][7][8][9][10][11][12] . The relative risk of myopia in siblings of a person with myopia (λ s ) has been estimated to be strongest in high myopia (SE ≤ −6.00 D; λ s = 5 -20), and moderate for lower degrees of myopia (SE: −1.00 to −3.00D; λ s =1.5-3)5 ;12 . To date, more than 15 chromosomal regions (or genetic loci, designated as MYP loci) have been mapped for myopia-related phenotypes by genome wide linkage scans, and many candidate genes have been reported by association and sequencing studies 13 . However, no gene implicated in myopia has been consistently replicated.Genome wide association (GWA) studies have become an important and unbiased approach to aid in the search for causal sequence va...
Non-coding RNAs (ncRNAs) play crucial roles in multiple fundamental biological processes, such as post-transcriptional gene regulation, and are implicated in many complex human diseases. Mostly ncRNAs function by interacting with corresponding RNA-binding proteins. The research on ncRNA–protein interaction is the key to understanding the function of ncRNA. However, the biological experiment techniques for identifying RNA–protein interactions (RPIs) are currently still expensive and time-consuming. Due to the complex molecular mechanism of ncRNA–protein interaction and the lack of conservation for ncRNA, especially for long ncRNA (lncRNA), the prediction of ncRNA–protein interaction is still a challenge. Deep learning-based models have become the state-of-the-art in a range of biological sequence analysis problems due to their strong power of feature learning. In this study, we proposed a hierarchical deep learning framework RPITER to predict RNA–protein interaction. For sequence coding, we improved the conjoint triad feature (CTF) coding method by complementing more primary sequence information and adding sequence structure information. For model design, RPITER employed two basic neural network architectures of convolution neural network (CNN) and stacked auto-encoder (SAE). Comprehensive experiments were performed on five benchmark datasets from PDB and NPInter databases to analyze and compare the performances of different sequence coding methods and prediction models. We found that CNN and SAE deep learning architectures have powerful fitting abilities for the k-mer features of RNA and protein sequence. The improved CTF coding method showed performance gain compared with the original CTF method. Moreover, our designed RPITER performed well in predicting RNA–protein interaction (RPI) and could outperform most of the previous methods. On five widely used RPI datasets, RPI369, RPI488, RPI1807, RPI2241 and NPInter, RPITER obtained A U C of 0.821, 0.911, 0.990, 0.957 and 0.985, respectively. The proposed RPITER could be a complementary method for predicting RPI and constructing RPI network, which would help push forward the related biological research on ncRNAs and lncRNAs.
Corneal curvature (CC) is a key determinant of major eye diseases, such as keratoconus, myopia and corneal astigmatism. No prior studies have discovered the genes for CC. Here we report the findings from four genome-wide association studies of CC in 10 008 samples from three population groups in Singapore. Our discovery phase surveyed 2867 Chinese and 3072 Malays, allowing us to identify two loci that were associated with CC variation: FRAP1 on chromosome 1p36.2 and PDGFRA on chromosome 4q12. These findings were subsequently replicated in a validation study involving an additional 2953 Asian Indians and a further collection of 1116 Chinese children. The effect sizes of the identified variants were consistent across all four cohorts, with seven single nucleotide polymorphisms (SNPs) in FRAP1 (lead SNP: rs17036350, meta P-value = 4.06 × 10(-13)) and six SNPs in PDGFRA (lead SNP: rs2114039, meta P-value = 1.33 × 10(-9)) attaining genome-wide significance in the SNP-based meta-analysis of the four studies. This is the first genome-wide survey of CC variation and we have identified two implicated loci in three genetically diverse Asian populations, suggesting the presence of common genetic etiology across multiple populations.
It is very significant to explore the intrinsic differences in breast cancer subtypes. These intrinsic differences are closely related to clinical diagnosis and designation of treatment plans. With the accumulation of biological and medicine datasets, there are many different omics data that can be viewed in different aspects. Combining these multiple omics data can improve the accuracy of prediction. Meanwhile; there are also many different databases available for us to download different types of omics data. In this article, we use estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2) to define breast cancer subtypes and classify any two breast cancer subtypes using SMO-MKL algorithm. We collected mRNA data, methylation data and copy number variation (CNV) data from TCGA to classify breast cancer subtypes. Multiple Kernel Learning (MKL) is employed to use these omics data distinctly. The result of using three omics data with multiple kernels is better than that of using single omics data with multiple kernels. Furthermore; these significant genes and pathways discovered in the feature selection process are also analyzed. In experiments; the proposed method outperforms other state-of-the-art methods and has abundant biological interpretations.
Long noncoding RNA (lncRNA) is a kind of noncoding RNA with length more than 200 nucleotides, which aroused interest of people in recent years. Lots of studies have confirmed that human genome contains many thousands of lncRNAs which exert great influence over some critical regulators of cellular process. With the advent of high-throughput sequencing technologies, a great quantity of sequences is waiting for exploitation. Thus, many programs are developed to distinguish differences between coding and long noncoding transcripts. Different programs are generally designed to be utilised under different circumstances and it is sensible and practical to select an appropriate method according to a certain situation. In this review, several popular methods and their advantages, disadvantages, and application scopes are summarised to assist people in employing a suitable method and obtaining a more reliable result.
Non-coding RNAs with a length of more than 200 nucleotides are long non-coding RNAs (lncRNAs), which have gained tremendous attention in recent decades. Many studies have confirmed that lncRNAs have important influence in post-transcriptional gene regulation; for example, lncRNAs affect the stability and translation of splicing factor proteins. The mutations and malfunctions of lncRNAs are closely related to human disorders. As lncRNAs interact with a variety of proteins, predicting the interaction between lncRNAs and proteins is a significant way to depth exploration functions and enrich annotations of lncRNAs. Experimental approaches for lncRNA–protein interactions are expensive and time-consuming. Computational approaches to predict lncRNA–protein interactions can be grouped into two broad categories. The first category is based on sequence, structural information and physicochemical property. The second category is based on network method through fusing heterogeneous data to construct lncRNA related heterogeneous network. The network-based methods can capture the implicit feature information in the topological structure of related biological heterogeneous networks containing lncRNAs, which is often ignored by sequence-based methods. In this paper, we summarize and discuss the materials, interaction score calculation algorithms, advantages and disadvantages of state-of-the-art algorithms of lncRNA–protein interaction prediction based on network methods to assist researchers in selecting a suitable method for acquiring more dependable results. All the related different network data are also collected and processed in convenience of users, and are available at https://github.com/HAN-Siyu/APINet/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.