Manhong Dai scite author profile

Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease. The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip. However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge. The resultant informatics problems have a profound impact on analysis and interpretation the data. Here, we address these critical issues and offer a solution. We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design. We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information. Comparing analysis results between the original and the redefined probe sets reveals ∼30–50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method. Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions.

show abstract

HIV infection reveals widespread expansion of novel centromeric human endogenous retroviruses

Contreras-Galindo

Kaplan

et al. 2013

Genome Res.

107

View full text Add to dashboard Cite

Human endogenous retroviruses (HERVs) make up 8% of the human genome. The HERV-K (HML-2) family is the most recent group of these viruses to have inserted into the genome, and we have detected the activation of HERV-K (HML-2) proviruses in the blood of patients with HIV-1 infection. We report that HIV-1 infection activates expression of a novel HERV-K (HML-2) provirus, termed K111, present in multiple copies in the centromeres of chromosomes throughout the human genome yet not annotated in the most recent human genome assembly. Infection with HIV-1 or stimulation with the HIV-1 Tat protein leads to the activation of K111 proviruses. K111 is present as a single copy in the genome of the chimpanzee, yet K111 is not found in the genomes of other primates. Remarkably, K111 proviruses appear in the genomes of the extinct Neanderthal and Denisovan, while modern humans have at least 100 K111 proviruses spread across the centromeres of 15 chromosomes. Our studies suggest that the progenitor K111 integrated before the Homo-Pan divergence and expanded in copy number during the evolution of hominins, perhaps by recombination. The expansion of K111 provides sequence evidence suggesting that recombination between the centromeres of various chromosomes took place during the evolution of humans. K111 proviruses show significant sequence variations in each individual centromere, which may serve as markers in future efforts to annotate human centromere sequences. Further, this work is an example of the potential to discover previously unknown genomic sequences through the analysis of nucleic acids found in the blood of patients.

show abstract

NGSQC: cross-platform quality analysis pipeline for deep sequencing data

et al. 2010

View full text Add to dashboard Cite

BackgroundWhile the accuracy and precision of deep sequencing data is significantly better than those obtained by the earlier generation of hybridization-based high throughput technologies, the digital nature of deep sequencing output often leads to unwarranted confidence in their reliability.ResultsThe NGSQC (Next Generation Sequencing Quality Control) pipeline provides a set of novel quality control measures for quickly detecting a wide variety of quality issues in deep sequencing data derived from two dimensional surfaces, regardless of the assay technology used. It also enables researchers to determine whether sequencing data related to their most interesting biological discoveries are caused by sequencing quality issues.ConclusionsNext generation sequencing platforms have their own share of quality issues and there can be significant lab-to-lab, batch-to-batch and even within chip/slide variations. NGSQC can help to ensure that biological conclusions, in particular those based on relatively rare sequence alterations, are not caused by low quality sequencing.

show abstract

G protein-linked signaling pathways in bipolar and major depressive disorders

et al. 2013

View full text Add to dashboard Cite

The G-protein linked signaling system (GPLS) comprises a large number of G-proteins, G protein-coupled receptors (GPCRs), GPCR ligands, and downstream effector molecules. G-proteins interact with both GPCRs and downstream effectors such as cyclic adenosine monophosphate (cAMP), phosphatidylinositols, and ion channels. The GPLS is implicated in the pathophysiology and pharmacology of both major depressive disorder (MDD) and bipolar disorder (BPD). This study evaluated whether GPLS is altered at the transcript level. The gene expression in the dorsolateral prefrontal (DLPFC) and anterior cingulate (ACC) were compared from MDD, BPD, and control subjects using Affymetrix Gene Chips and real time quantitative PCR. High quality brain tissue was used in the study to control for confounding effects of agonal events, tissue pH, RNA integrity, gender, and age. GPLS signaling transcripts were altered especially in the ACC of BPD and MDD subjects. Transcript levels of molecules which repress cAMP activity were increased in BPD and decreased in MDD. Two orphan GPCRs, GPRC5B and GPR37, showed significantly decreased expression levels in MDD, and significantly increased expression levels in BPD. Our results suggest opposite changes in BPD and MDD in the GPLS, “activated” cAMP signaling activity in BPD and “blunted” cAMP signaling activity in MDD. GPRC5B and GPR37 both appear to have behavioral effects, and are also candidate genes for neurodegenerative disorders. In the context of the opposite changes observed in BPD and MDD, these GPCRs warrant further study of their brain effects.

show abstract

SNP Function Portal: a web database for exploring the function implication of SNP alleles

Wang

Dai

Xuan

et al. 2006

View full text Add to dashboard Cite

show abstract

Expansion of a novel endogenous retrovirus throughout the pericentromeres of modern humans

et al. 2015

View full text Add to dashboard Cite

BackgroundApproximately 8% of the human genome consists of sequences of retroviral origin, a result of ancestral infections of the germ line over millions of years of evolution. The most recent of these infections is attributed to members of the human endogenous retrovirus type-K (HERV-K) (HML-2) family. We recently reported that a previously undetected, large group of HERV-K (HML-2) proviruses, which are descendants of the ancestral K111 infection, are spread throughout human centromeres.ResultsStudying the genomes of certain cell lines and the DNA of healthy individuals that seemingly lack K111, we discover new HERV-K (HML-2) members hidden in pericentromeres of several human chromosomes. All are related through a common ancestor, termed K222, which is a virus that infected the germ line approximately 25 million years ago. K222 exists as a single copy in the genomes of baboons and high order primates, but not New World monkeys, suggesting that progenitor K222 infected the primate germ line after the split between New and Old World monkeys. K222 exists in modern humans at multiple loci spread across the pericentromeres of nine chromosomes, indicating it was amplified during the evolution of modern humans.ConclusionsCopying of K222 may have occurred through recombination of the pericentromeres of different chromosomes during human evolution. Evidence of recombination between K111 and K222 suggests that these retroviral sequences have been templates for frequent cross-over events during the process of centromere recombination in humans.

show abstract

Regulation of the Human Endogenous Retrovirus K (HML-2) Transcriptome by the HIV-1 Tat Protein

Gonzalez-Hernandez

Cavalcoli

Sartor

et al. 2014

J Virol

View full text Add to dashboard Cite

Approximately 8% of the human genome is made up of endogenous retroviral sequences. As the HIV-1 Tat protein activates the overall expression of the human endogenous retrovirus type K (HERV-K) (HML-2), we used next-generation sequencing to determine which of the 91 currently annotated HERV-K (HML-2) proviruses are regulated by Tat. Transcriptome sequencing of total RNA isolated from Tat-and vehicle-treated peripheral blood lymphocytes from a healthy donor showed that Tat significantly activates expression of 26 unique HERV-K (HML-2) proviruses, silences 12, and does not significantly alter the expression of the remaining proviruses. Quantitative reverse transcription-PCR validation of the sequencing data was performed on Tattreated PBLs of seven donors using provirus-specific primers and corroborated the results with a substantial degree of quantitative similarity. IMPORTANCEThe expression of HERV-K (HML-2) is tightly regulated but becomes markedly increased following infection with HIV-1, in part due to the HIV-1 Tat protein.The findings reported here demonstrate the complexity of the genome-wide regulation of HERV-K (HML-2) expression by Tat. This work also demonstrates that although HERV-K (HML-2) proviruses in the human genome are highly similar in terms of DNA sequence, modulation of the expression of specific proviruses in a given biological situation can be ascertained using next-generation sequencing and bioinformatics analysis.

show abstract

Massively parallel enzyme kinetics reveals the substrate recognition landscape of the metalloprotease ADAMTS13

Kretz

Dai

Söylemez

et al. 2015

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

Proteases play important roles in many biologic processes and are key mediators of cancer, inflammation, and thrombosis. However, comprehensive and quantitative techniques to define the substrate specificity profile of proteases are lacking. The metalloprotease ADAMTS13 regulates blood coagulation by cleaving von Willebrand factor (VWF), reducing its procoagulant activity. A mutagenized substrate phage display library based on a 73-amino acid fragment of VWF was constructed, and the ADAMTS13-dependent change in library complexity was evaluated over reaction time points, using high-throughput sequencing. Reaction rate constants (k cat /K M ) were calculated for nearly every possible single amino acid substitution within this fragment. This massively parallel enzyme kinetics analysis detailed the specificity of ADAMTS13 and demonstrated the critical importance of the P1-P1′ substrate residues while defining exosite binding domains. These data provided empirical evidence for the propensity for epistasis within VWF and showed strong correlation to conservation across orthologs, highlighting evolutionary selective pressures for VWF.phage display | protease | high-throughput sequencing | ADAMTS13 | von Willebrand factor P rotease specificity is critical for maintaining diversity and compartmentalization of function, and is tightly controlled. For many proteases, a substrate initially docks to an exosite, which captures and orients the substrate scissile bond toward the active site of the enzyme. At the active site, the Px-Px′ (1) substrate amino acid side chains align with the complementary Sx-Sx′ pockets of the enzyme to optimize recognition by the active site residues that execute the proteolytic reaction (2).Conventional techniques for probing the substrate recognition requirements of a protease are cumbersome and time-consuming and require intimate knowledge of the enzyme/substrate pair. Such methods include engineering deletion mutants (3), use of competitive ligands (4, 5), and site-directed mutagenesis (6, 7). In contrast to these techniques, substrate phage display is a highthroughput, unbiased approach to studying protease substrate specificity (8-10). In this method, a library consisting of 10 6 -10 9 independent phage clones, each expressing a unique potential substrate on its surface, is panned for multiple rounds with a protease, and the cleaved or uncleaved phages after each reaction are removed and amplified for subsequent rounds of selection. In this manner, the library complexity is iteratively reduced and becomes populated by peptide sequences that are most informative. This methodology, although useful, is limited by the number of clones selected for individual Sanger sequencing after the last round of selection, and the selection of phages based on competitive growth advantages unrelated to enzyme specificity. The availability of high-throughput DNA sequencing technology (11) has facilitated detailed analysis of the changing complexity within a phage display library (12-16) without requiring multip...

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Manhong Dai

Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data

HIV infection reveals widespread expansion of novel centromeric human endogenous retroviruses

NGSQC: cross-platform quality analysis pipeline for deep sequencing data

G protein-linked signaling pathways in bipolar and major depressive disorders

SNP Function Portal: a web database for exploring the function implication of SNP alleles

Expansion of a novel endogenous retrovirus throughout the pericentromeres of modern humans

Regulation of the Human Endogenous Retrovirus K (HML-2) Transcriptome by the HIV-1 Tat Protein

Massively parallel enzyme kinetics reveals the substrate recognition landscape of the metalloprotease ADAMTS13

Contact Info

Product

Resources

About