ANISEED (https://www.aniseed.cnrs.fr) is the main model organism database for the worldwide community of scientists working on tunicates, the vertebrate sister-group. Information provided for each species includes functionally-annotated gene and transcript models with orthology relationships within tunicates, and with echinoderms, cephalochordates and vertebrates. Beyond genes the system describes other genetic elements, including repeated elements and cis-regulatory modules. Gene expression profiles for several thousand genes are formalized in both wild-type and experimentally-manipulated conditions, using formal anatomical ontologies. These data can be explored through three complementary types of browsers, each offering a different view-point. A developmental browser summarizes the information in a gene- or territory-centric manner. Advanced genomic browsers integrate the genetic features surrounding genes or gene sets within a species. A Genomicus synteny browser explores the conservation of local gene order across deuterostome. This new release covers an extended taxonomic range of 14 species, including for the first time a non-ascidian species, the appendicularian Oikopleura dioica. Functional annotations, provided for each species, were enhanced through a combination of manual curation of gene models and the development of an improved orthology detection pipeline. Finally, gene expression profiles and anatomical territories can be explored in 4D online through the newly developed Morphonet morphogenetic browser.
Protein methylation, one of the most important post-translational modifications, typically takes place on arginine or lysine residue. The reversible modification involves a series of basic cellular processes. Identification of methyl proteins with their sites will facilitate the understanding of the molecular mechanism of methylation. Besides the experimental methods, computational predictions of methylated sites are much more desirable for their convenience and fast speed. Here, we propose a method dedicated to predicting methylated sites of proteins. Feature selection was made on sequence conservation, physicochemical/biochemical properties, and structural disorder by applying maximum relevance minimum redundancy and incremental feature selection methods. The prediction models were built according to nearest the neighbor algorithm and evaluated by the jackknife cross-validation. We built 11 and 9 predictors for methylarginine and methyllysine, respectively, and integrated them to predict methylated sites. As a result, the average prediction accuracies are 74.25%, 77.02% for methylarginine and methyllysine training sets, respectively. Feature analysis suggested evolutionary information, and physicochemical/biochemical properties play important roles in the recognition of methylated sites. These findings may provide valuable information for exploiting the mechanisms of methylation. Our method may serve as a useful tool for biologists to find the potential methylated sites of proteins.
RNA sequencing analysis was carried out to characterize egg and larval transcriptomes in the appendicularian, Oikopleura dioica, a planktonic chordate, which is characterized by rapid development and short life cycle of 5 days, using a Japanese population of the organism. De novo transcriptome assembly matched with 16,423 proteins corresponding to 95.4% of the protein-encoding genes deposited in the OikoBase, the genome database of the Norwegian population. Nucleotide and amino acid sequence identities between the Japanese and Norwegian O. dioica were estimated to be around 91.0 and 94.8%, respectively. We discovered 175 novel protein-encoding genes: 144 unigenes were common to both the Japanese and Norwegian populations, whereas 31 unigenes were not found in the OikoBase genome reference. Among the total 12,311 unigenes, approximately 63% were detected in egg-stage RNAs, whereas 99% were detected in larval stage RNAs; 3772 genes were up-regulated, and 1336 genes were down-regulated more than four-fold in the larvae. Gene ontology analyses characterized gene activities in these two developmental stages. We found a messenger RNA (mRNA) 5' trans-spliced leader, which was observed in 40.8% of the total unique transcripts. It showed preferential linkage to adenine at the 5' ends of the downstream exons. Trans-splicing was observed more frequently in egg mRNAs compared with larva-specific mRNAs.
Larvaceans are chordates with a tadpole-like morphology. In contrast to most chordates of which early embryonic morphology is bilaterally symmetric and the left–right (L–R) axis is specified by the Nodal pathway later on, invariant L–R asymmetry emerges in four-cell embryos of larvaceans. The asymmetric cell arrangements exist through development of the tailbud. The tail thus twists 90° in a counterclockwise direction relative to the trunk, and the tail nerve cord localizes on the left side. Here, we demonstrate that larvacean embryos have nonconventional L–R asymmetries: 1) L- and R-cells of the two-cell embryo had remarkably asymmetric cell fates; 2) Ca2+ oscillation occurred through embryogenesis; 3) Nodal, an evolutionarily conserved left-determining gene, was absent in the genome; and 4) bone morphogenetic protein gene (Bmp) homolog Bmp.a showed right-sided expression in the tailbud and larvae. We also showed that Ca2+ oscillation is required for Bmp.a expression, and that BMP signaling suppresses ectopic expression of neural genes. These results indicate that there is a chordate species lacking Nodal that utilizes Ca2+ oscillation and Bmp.a for embryonic L–R patterning. The right-side Bmp.a expression may have arisen via cooption of conventional BMP signaling in order to restrict neural gene expression on the left side.
The larvacean Oikopleura dioica is a planktonic chordate and is a tunicate that belongs to the closest relatives to vertebrates. Its simple and transparent body, invariant embryonic cell lineages, and short life cycle of 5 days make it a promising model organism for the study of developmental biology. The genome browser OikoBase was established in 2013 using Norwegian O. dioica. However, genome information for other populations is not available, even though many researchers have studied local populations. In the present study, we sequenced using Illumina and PacBio RSII technologies the genome of O. dioica from a southwestern Japanese population that was cultured in our laboratory for 3 years. The genome of Japanese O. dioica was assembled into 576 scaffold sequences with a total length and N50 length of 56.6 and 1.5 Mb, respectively. A total of 18,743 gene models (transcript models) were predicted in the genome assembly, named OSKA2016. In addition, 19,277 non‐redundant transcripts were assembled using RNA‐seq data. The OSKA2016 has global sequence similarity of only 86.5% when compared with the OikoBase, highlighting the sequence difference between the two far distant O. dioica populations on the globe. The genome assembly, transcript assembly, and transcript models were incorporated into ANISEED (https://www.aniseed.cnrs.fr/) for genome browsing and BLAST searches. Mapping of reads obtained from male‐ or female‐specific genome libraries yielded male‐specific scaffolds in the OSKA2016 and revealed that over 2.6 Mb of sequence were included in the male‐specific Y‐region. The genome and transcriptome resources from two distinct populations will be useful datasets for developmental biology, evolutionary biology, and molecular ecology using this model organism.
BackgroundHydroxylation is an important post-translational modification and closely related to various diseases. Besides the biotechnology experiments, in silico prediction methods are alternative ways to identify the potential hydroxylation sites.Methodology/Principal FindingsIn this study, we developed a novel sequence-based method for identifying the two main types of hydroxylation sites – hydroxyproline and hydroxylysine. First, feature selection was made on three kinds of features consisting of amino acid indices (AAindex) which includes various physicochemical properties and biochemical properties of amino acids, Position-Specific Scoring Matrices (PSSM) which represent evolution information of amino acids and structural disorder of amino acids in the sliding window with length of 13 amino acids, then the prediction model were built using incremental feature selection method. As a result, the prediction accuracies are 76.0% and 82.1%, evaluated by jackknife cross-validation on the hydroxyproline dataset and hydroxylysine dataset, respectively. Feature analysis suggested that physicochemical properties and biochemical properties and evolution information of amino acids contribute much to the identification of the protein hydroxylation sites, while structural disorder had little relation to protein hydroxylation. It was also found that the amino acid adjacent to the hydroxylation site tends to exert more influence than other sites on hydroxylation determination.Conclusions/SignificanceThese findings may provide useful insights for exploiting the mechanisms of hydroxylation.
BackgroundGenes encoding transcription factors that constitute gene-regulatory networks and maternal factors accumulating in egg cytoplasm are two classes of essential genes that play crucial roles in developmental processes. Transcription factors control the expression of their downstream target genes by interacting with cis-regulatory elements. Maternal factors initiate embryonic developmental programs by regulating the expression of zygotic genes and various other events during early embryogenesis.ResultsThis article documents the transcription factors of 77 metazoan species as well as human and mouse maternal factors. We improved the previous method using a statistical approach adding Gene Ontology information to Pfam based identification of transcription factors. This method detects previously un-discovered transcription factors. The novel features of this database are: (1) It includes both transcription factors and maternal factors, although the number of species, in which maternal factors are listed, is limited at the moment. (2) Ontological representation at the cell, tissue, organ, and system levels has been specially designed to facilitate development studies. This is the unique feature in our database and is not available in other transcription factor databases.ConclusionsA user-friendly web interface, REGULATOR (http://www.bioinformatics.org/regulator/), which can help researchers to efficiently identify, validate, and visualize the data analyzed in this study, are provided. Using this web interface, users can browse, search, and download detailed information on species of interest, genes, transcription factor families, or developmental ontology terms.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0552-x) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.