Functional genomics technologies have been widely adopted in the biological research of both model and non-model species. An efficient functional annotation of DNA or protein sequences is a major requirement for the successful application of these approaches as functional information on gene products is often the key to the interpretation of experimental results. Therefore, there is an increasing need for bioinformatics resources which are able to cope with large amount of sequence data, produce valuable annotation results and are easily accessible to laboratories where functional genomics projects are being undertaken. We present the Blast2GO suite as an integrated and biologist-oriented solution for the high-throughput and automatic functional annotation of DNA or protein sequences based on the Gene Ontology vocabulary. The most outstanding Blast2GO features are: (i) the combination of various annotation strategies and tools controlling type and intensity of annotation, (ii) the numerous graphical features such as the interactive GO-graph visualization for gene-set function profiling or descriptive charts, (iii) the general sequence management features and (iv) high-throughput capabilities. We used the Blast2GO framework to carry out a detailed analysis of annotation behaviour through homology transfer and its impact in functional genomics research. Our aim is to offer biologists useful information to take into account when addressing the task of functionally characterizing their sequence data.
Next-generation sequencing (NGS) technologies are revolutionizing genome research, and in particular, their application to transcriptomics (RNA-seq) is increasingly being used for gene expression profiling as a replacement for microarrays. However, the properties of RNA-seq data have not been yet fully established, and additional research is needed for understanding how these data respond to differential expression analysis. In this work, we set out to gain insights into the characteristics of RNA-seq data analysis by studying an important parameter of this technology: the sequencing depth. We have analyzed how sequencing depth affects the detection of transcripts and their identification as differentially expressed, looking at aspects such as transcript biotype, length, expression level, and fold-change. We have evaluated different algorithms available for the analysis of RNA-seq and proposed a novel approach-NOISeq-that differs from existing methods in that it is data-adaptive and nonparametric. Our results reveal that most existing methodologies suffer from a strong dependency on sequencing depth for their differential expression calls and that this results in a considerable number of false positives that increases as the number of reads grows. In contrast, our proposed method models the noise distribution from the actual data, can therefore better adapt to the size of the data set, and is more effective in controlling the rate of false discoveries. This work discusses the true potential of RNA-seq for studying regulation at low expression ranges, the noise within RNA-seq data, and the issue of replication.
We present a simple but powerful procedure to extract Gene Ontology (GO) terms that are significantly over- or under-represented in sets of genes within the context of a genome-scale experiment (DNA microarray, proteomics, etc.). Said procedure has been implemented as a web application, FatiGO, allowing for easy and interactive querying. FatiGO, which takes the multiple-testing nature of statistical contrast into account, currently includes GO associations for diverse organisms (human, mouse, fly, worm and yeast) and the TrEMBL/Swissprot GOAnnotations@EBI correspondences from the European Bioinformatics Institute.
Qualimap is freely available from http://www.qualimap.org.
Monozygotic (MZ) twins are partially concordant for most complex diseases, including autoimmune disorders. Whereas phenotypic concordance can be used to study heritability, discordance suggests the role of non-genetic factors. In autoimmune diseases, environmentally driven epigenetic changes are thought to contribute to their etiology. Here we report the first high-throughput and candidate sequence analyses of DNA methylation to investigate discordance for autoimmune disease in twins. We used a cohort of MZ twins discordant for three diseases whose clinical signs often overlap: systemic lupus erythematosus (SLE), rheumatoid arthritis, and dermatomyositis. Only MZ twins discordant for SLE featured widespread changes in the DNA methylation status of a significant number of genes. Gene ontology analysis revealed enrichment in categories associated with immune function. Individual analysis confirmed the existence of DNA methylation and expression changes in genes relevant to SLE pathogenesis. These changes occurred in parallel with a global decrease in the 5-methylcytosine content that was concomitantly accompanied with changes in DNA methylation and expression levels of ribosomal RNA genes, although no changes in repetitive sequences were found. Our findings not only identify potentially relevant DNA methylation markers for the clinical characterization of SLE patients but also support the notion that epigenetic changes may be critical in the clinical manifestations of autoimmune disease.[Supplemental material is available online at http://www.genome.org. The sequence data from this study have been submitted to the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under accession no. GSE19033.]Human monozygotic (MZ) twins exhibit variable degrees of concordance for complex diseases, such as cancer, cardiovascular diseases, or autoimmune disorders. Whereas concordance rates close to 100% in identical twins apply to coinheritance of mutant genes that are dominant and highly penetrant, most diseases or traits show a concordance in identical twins in the broad range of 5%-75% (Nance 1978). Most of the twin-based studies have focused on the concordance between siblings that has led to the identification of traitspecific genes (Hrubec and Robinette 1984), while less attention has been paid to the degree of discordance, which suggests the participation of factors other than pure genetic changes. Recently, interest has shifted toward exploring the molecular mechanisms involved in determining phenotypic differences. The increasing recognition of the influence of epigenetics in phenotypic outcomes continues to open up new lines of research involving twin studies. DNA methylation and histone modifications, the major sources of epigenetic information, regulate gene expression levels and provide an alternative mechanism for modulating gene function to those arising from genetic changes (Esteller 2008). Interestingly, epigenetic changes are
We report for the first time the genomics of a nuclear compartment of the eukaryotic cell. 454 sequencing and microarray analysis revealed the pattern of nucleolus-associated chromatin domains (NADs) in the linear human genome and identified different gene families and certain satellite repeats as the major building blocks of NADs, which constitute about 4% of the genome. Bioinformatic evaluation showed that NAD–localized genes take part in specific biological processes, like the response to other organisms, odor perception, and tissue development. 3D FISH and immunofluorescence experiments illustrated the spatial distribution of NAD–specific chromatin within interphase nuclei and its alteration upon transcriptional changes. Altogether, our findings describe the nature of DNA sequences associated with the human nucleolus and provide insights into the function of the nucleolus in genome organization and establishment of nuclear architecture.
Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.
A server running the program can be found at: http://bioinfo.cnio.es/sotarray.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.