SUMMARY
This review summarizes the recent discovery of the cupin superfamily (from the Latin term “cupa,” a small barrel) of functionally diverse proteins that initially were limited to several higher plant proteins such as seed storage proteins, germin (an oxalate oxidase), germin-like proteins, and auxin-binding protein. Knowledge of the three-dimensional structure of two vicilins, seed proteins with a characteristic β-barrel core, led to the identification of a small number of conserved residues and thence to the discovery of several microbial proteins which share these key amino acids. In particular, there is a highly conserved pattern of two histidine-containing motifs with a varied intermotif spacing. This cupin signature is found as a central component of many microbial proteins including certain types of phosphomannose isomerase, polyketide synthase, epimerase, and dioxygenase. In addition, the signature has been identified within the N-terminal effector domain in a subgroup of bacterial AraC transcription factors. As well as these single-domain cupins, this survey has identified other classes of two-domain bicupins including bacterial gentisate 1,2-dioxygenases and 1-hydroxy-2-naphthoate dioxygenases, fungal oxalate decarboxylases, and legume sucrose-binding proteins. Cupin evolution is discussed from the perspective of the structure-function relationships, using data from the genomes of several prokaryotes, especially Bacillus subtilis. Many of these functions involve aspects of sugar metabolism and cell wall synthesis and are concerned with responses to abiotic stress such as heat, desiccation, or starvation. Particular emphasis is also given to the oxalate-degrading enzymes from microbes, their biological significance, and their value in a range of medical and other applications.
The mixed linear model has been widely used in genome-wide association studies (GWAS), but its application to multi-locus GWAS analysis has not been explored and assessed. Here, we implemented a fast multi-locus random-SNP-effect EMMA (FASTmrEMMA) model for GWAS. The model is built on random single nucleotide polymorphism (SNP) effects and a new algorithm. This algorithm whitens the covariance matrix of the polygenic matrix K and environmental noise, and specifies the number of nonzero eigenvalues as one. The model first chooses all putative quantitative trait nucleotides (QTNs) with ≤ 0.005 P-values and then includes them in a multi-locus model for true QTN detection. Owing to the multi-locus feature, the Bonferroni correction is replaced by a less stringent selection criterion. Results from analyses of both simulated and real data showed that FASTmrEMMA is more powerful in QTN detection and model fit, has less bias in QTN effect estimation and requires a less running time than existing single- and multi-locus methods, such as empirical Bayes, settlement of mixed linear model under progressively exclusive relationship (SUPER), efficient mixed model association (EMMA), compressed MLM (CMLM) and enriched CMLM (ECMLM). FASTmrEMMA provides an alternative for multi-locus GWAS.
The cupin superfamily is a group of functionally diverse proteins that are found in all three kingdoms of life, Archaea, Eubacteria, and Eukaryota. These proteins have a characteristic signature domain comprising two histidine- containing motifs separated by an intermotif region of variable length. This domain consists of six beta strands within a conserved beta barrel structure. Most cupins, such as microbial phosphomannose isomerases (PMIs), AraC- type transcriptional regulators, and cereal oxalate oxidases (OXOs), contain only a single domain, whereas others, such as seed storage proteins and oxalate decarboxylases (OXDCs), are bi-cupins with two pairs of motifs. Although some cupins have known functions and have been characterized at the biochemical level, the majority are known only from gene cloning or sequencing projects. In this study, phylogenetic analyses were conducted on the conserved domain to investigate the evolution and structure/function relationships of cupins, with an emphasis on single- domain plant germin-like proteins (GLPs). An unrooted phylogeny of cupins from a wide spectrum of evolutionary lineages identified three main clusters, microbial PMIs, OXDCs, and plant GLPs. The sister group to the plant GLPs in the global analysis was then used to root a phylogeny of all available plant GLPs. The resulting phylogeny contained three main clades, classifying the GLPs into distinct subfamilies. It is suggested that these subfamilies correlate with functional categories, one of which contains the bifunctional barley germin that has both OXO and superoxide dismutase (SOD) activity. It is proposed that GLPs function primarily as SODs, enzymes that protect plants from the effects of oxidative stress. Closer inspection of the DNA sequence encoding the intermotif region in plant GLPs showed global conservation of thymine in the second codon position, a character associated with hydrophobic residues. Since many of these proteins are multimeric and enzymatically inactive in their monomeric state, this conservation of hydrophobicity is thought to be associated with the need to maintain the various monomer- monomer interactions. The type of structure-based predictive analysis presented in this paper is an important approach for understanding gene function and evolution in an era when genomes from a wide range of organisms are being sequenced at a rapid rate.
Although nonparametric methods in genome-wide association studies (GWAS) are robust in quantitative trait nucleotide (QTN) detection, the absence of polygenic background control in single-marker association in genome-wide scans results in a high false positive rate. To overcome this issue, we proposed an integrated nonparametric method for multi-locus GWAS. First, a new model transformation was used to whiten the covariance matrix of polygenic matrix K and environmental noise. Using the transferred model, Kruskal–Wallis test along with least angle regression was then used to select all the markers that were potentially associated with the trait. Finally, all the selected markers were placed into multi-locus model, these effects were estimated by empirical Bayes, and all the nonzero effects were further identified by a likelihood ratio test for true QTN detection. This method, named pKWmEB, was validated by a series of Monte Carlo simulation studies. As a result, pKWmEB effectively controlled false positive rate, although a less stringent significance criterion was adopted. More importantly, pKWmEB retained the high power of Kruskal–Wallis test, and provided QTN effect estimates. To further validate pKWmEB, we re-analyzed four flowering time related traits in Arabidopsis thaliana, and detected some previously reported genes that were not identified by the other methods.
Summary
Theobroma cacao—The Food of the Gods, provides the raw material for the multibillion dollar chocolate industry and is also the main source of income for about 6 million smallholders around the world. Additionally, cocoa beans have a number of other nonfood uses in the pharmaceutical and cosmetic industries. Specifically, the potential health benefits of cocoa have received increasing attention as it is rich in polyphenols, particularly flavonoids. At present, the demand for cocoa and cocoa‐based products in Asia is growing particularly rapidly and chocolate manufacturers are increasing investment in this region. However, in many Asian countries, cocoa production is hampered due to many reasons including technological, political and socio‐economic issues. This review provides an overview of the present status of global cocoa production and recent advances in biotechnological applications for cacao improvement, with special emphasis on genetics/genomics, in vitro embryogenesis and genetic transformation. In addition, in order to obtain an insight into the latest innovations in the commercial sector, a survey was conducted on granted patents relating to T. cacao biotechnology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.