Si Yang Liu scite author profile

Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits 1-4 . Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly 2,5-7 . However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology 4,8-13 . We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.Using a combination of high-depth (average 78× ) Illumina pairedend and mate-pair libraries, we applied Allpaths-LG 14 to create de novo assemblies of high quality and coverage for each of the 150 individuals with a median scaffold N50 of ~ 21 megabases (Mb; maximum ~ 30 Mb) (Supplementary Table 1). The 100 largest scaffolds in each of the 140 best assemblies typically covered more than 75% (median 77%, Extended Data Fig. 1a) of the genome, with the largest scaffolds exceeding 110 Mb in size (Supplementary Table 1). To evaluate the accuracy of the assemblies, we subsequently aligned the scaffolds for each individual to the human reference genome (GRCh38) 15 . Figure 1 shows an example individual where the euchromatic part of each chromosome was almost completely covered by a few large scaffolds and in several cases scaffolds covered almost entire chromosome arms. Only rarely did we find that large scaffolds break and align to more than one chromosome (Extended Data Fig. 1b), suggesting that even the largest scaffolds are seldom chimaeric. We also compared our de novo assemblies with a published long-read assembly based on BioNano mapping and PacBio sequencing 16 . Extended Data Figs 2a and 3 show that this assembly was less complete than our assemblies, but with similar scaffold lengths. The long-read assembly had 5.38% missing data compared with our median of 4.25% (Extended Data Fig. 3a), but the missing data in our assemblies were found in smaller gaps (Extended Data Fig. 3b, c), and the median contig length was therefore much smaller th...

show abstract

DNA methylation and mRNA and microRNA expression of SLE CD4+ T cells correlate with disease phenotype

Zhao

Liu

Luo

et al. 2014

Journal of Autoimmunity

156

View full text Add to dashboard Cite

Bisulfite Sequencing Reveals That Aspergillus flavus Holds a Hollow in DNA Methylation

Liu

Lin

et al. 2012

PLoS ONE

View full text Add to dashboard Cite

Aspergillus flavus first gained scientific attention for its production of aflatoxin. The underlying regulation of aflatoxin biosynthesis has been serving as a theoretical model for biosynthesis of other microbial secondary metabolites. Nevertheless, for several decades, the DNA methylation status, one of the important epigenomic modifications involved in gene regulation, in A. flavus remains to be controversial. Here, we applied bisulfite sequencing in conjunction with a biological replicate strategy to investigate the DNA methylation profiling of A. flavus genome. Both the bisulfite sequencing data and the methylome comparisons with other fungi confirm that the DNA methylation level of this fungus is negligible. Further investigation into the DNA methyltransferase of Aspergillus uncovers its close relationship with RID-like enzymes as well as its divergence with the methyltransferase of species with validated DNA methylation. The lack of repeat contents of the A. flavus' genome and the high RIP-index of the small amount of remanent repeat potentially support our speculation that DNA methylation may be absent in A. flavus or that it may possess de novo DNA methylation which occurs very transiently during the obscure sexual stage of this fungal species. This work contributes to our understanding on the DNA methylation status of A. flavus, as well as reinforces our views on the DNA methylation in fungal species. In addition, our strategy of applying bisulfite sequencing to DNA methylation detection in species with low DNA methylation may serve as a reference for later scientific investigations in other hypomethylated species.

show abstract

Association of the PTPN22/LYP gene with type 1 diabetes

et al. 2006

View full text Add to dashboard Cite

show abstract

Systematic assessment of reduced representation bisulfite sequencing to human blood samples: A promising method for large-sample-scale epigenomic studies

Wang

Sun

et al. 2012

Journal of Biotechnology

View full text Add to dashboard Cite

Prevalence and evolution of drug resistance HIV-1 variants in Henan, China

et al. 2005

Cell Res

View full text Add to dashboard Cite

ABSTACTTo understand the prevalence and evolution of drug resistant HIV strains in Henan China after the implementation of free antiretroviral therapy for AIDS patients. 45 drug naïve AIDS patients, 118 AIDS patients who received three months antiretroviral therapy and 124 AIDS patients who received six months antiretroviral treatment were recruited in the southern part of Henan province. Information on general condition, antiretroviral medicines, adherence and clinical syndromes were collected by face to face interview. Meanwhile, 14ml EDTA anticoagulant blood was drawn. CD4/CD8 T cell count, viral load and genotypic drug resistance were tested. The rates of clinical improvement were 55.1% and 50.8% respectively three months and six months after antiretroviral therapy. The mean CD4 cell count after antiretroviral therapy was significantly higher than in drug naïve patients. The prevalence rate of drug resistant HIV strains were 13. 9%, 45.4% and 62.7% in drug naïve patients, three month treatment patients and six month treatment patients, respectively. The number of resistance mutation codons and the frequency of mutations increased significantly with continued antiretroviral therapy. The mutation sites were primarily at the 103, 106 and 215 codons in the three-month treatment group and they increased to 15 codon mutations in the six-month treatment group. From this result, the evolution of drug resistant strains was inferred to begin with the high level NNRTI resistant strain, and then develop low level resistant strains to NRTIs. The HIV strains with high level resistance to NVP and low level resistance to AZT and DDI were highly prevalent because of the AZT+DDI+NVP combination therapy. These HIV strains were also cross resistant to DLV, EFV, DDC and D4T. Poor adherence to therapy was believed to be the main reason for the emergence and prevalence of drug resistant HIV strains. The prevalence of drug resistant HIV strains was increased with the continuation of antiretroviral therapy in the southern part of Henan province. Measures, that could promote high level adherence, provide new drugs and change ART regimens in failing patients, should be implemented as soon as possible.

show abstract

Large-scale inference of population structure in presence of missingness using PCA

Meisner

Liu

Huang

et al. 2021

View full text Add to dashboard Cite

Motivation Principal component analysis (PCA) is a commonly used tool in genetics to capture and visualize population structure. Due to technological advances in sequencing, such as the widely used non-invasive prenatal test, massive datasets of ultra-low coverage sequencing are being generated. These datasets are characterized by having a large amount of missing genotype information. Results We present EMU, a method for inferring population structure in the presence of rampant non-random missingness. We show through simulations that several commonly used PCA methods can not handle missing data arisen from various sources, which leads to biased results as individuals are projected into the PC space based on their amount of missingness. In terms of accuracy, EMU outperforms an existing method that also accommodates missingness while being competitively fast. We further tested EMU on around 100K individuals of the Phase 1 dataset of the Chinese Millionome Project, that were shallowly sequenced to around 0.08x. From this data we are able to capture the population structure of the Han Chinese and to reproduce previous analysis in a matter of CPU hours instead of CPU years. EMU’s capability to accurately infer population structure in the presence of missingness will be of increasing importance with the rising number of large-scale genetic datasets. Availability EMU is written in Python and is freely available at https://github.com/rosemeis/emu. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Multiomics analysis reveals a distinct response mechanism in multiple primary lung adenocarcinoma after neoadjuvant immunotherapy

Zhang

Yin

Liu

et al. 2021

J Immunother Cancer

View full text Add to dashboard Cite

Multiple primary lung cancer (MPLC) remains a tough challenge to diagnose and treat. Although neoadjuvant immunotherapy has shown promising results in early stage non-small cell lung cancer, whether such modality can benefit all primary lesions remains unclear. Herein, we performed integrated multiomics analysis in one patient with early stage MPLC with remarkable tumor shrinkage in a solid nodule and no response in two subsolid nodules after treatment with three cycles of neoadjuvant pembrolizumab. Genomic heterogeneity was observed among responding nodules with high levels of infiltrating CD8+ and CD68+ immune cells. Substantially downregulated human leukocyte antigen (HLA)-related genes and impaired T lymphocyte function were observed in non-responding nodules. A larger proportion of infiltrating tissue resident memory T cells (Trm) along with high T cell receptor repertoire clonality in responding nodules were validated as predictive and prognostic biomarkers in multiple cancer types using external public datasets. These results suggested that neoadjuvant programmed death 1 (PD-1)/programmed death ligand 1 inhibitors alone may not be an optimal therapeutic strategy for MPLC due to disparities in genomic alterations and immune microenvironment among different lesions. Additionally, we postulate that increased infiltration of Trm may be a unique marker of early immune responses to PD-1 blockade.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Si Yang Liu

Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

DNA methylation and mRNA and microRNA expression of SLE CD4+ T cells correlate with disease phenotype

Bisulfite Sequencing Reveals That Aspergillus flavus Holds a Hollow in DNA Methylation

Association of the PTPN22/LYP gene with type 1 diabetes

Systematic assessment of reduced representation bisulfite sequencing to human blood samples: A promising method for large-sample-scale epigenomic studies

Prevalence and evolution of drug resistance HIV-1 variants in Henan, China

Large-scale inference of population structure in presence of missingness using PCA

Multiomics analysis reveals a distinct response mechanism in multiple primary lung adenocarcinoma after neoadjuvant immunotherapy

Contact Info

Product

Resources

About