Untitled

BackgroundIn human genetic association studies with high-dimensional gene expression data, it has been well known that statistical selection methods utilizing prior biological network knowledge such as genetic pathways and signaling pathways can outperform other methods that ignore genetic network structures in terms of true positive selection. In recent epigenetic research on case-control association studies, relatively many statistical methods have been proposed to identify cancer-related CpG sites and their corresponding genes from high-dimensional DNA methylation array data. However, most of existing methods are not designed to utilize genetic network information although methylation levels between linked genes in the genetic networks tend to be highly correlated with each other.ResultsWe propose new approach that combines data dimension reduction techniques with network-based regularization to identify outcome-related genes for analysis of high-dimensional DNA methylation data. In simulation studies, we demonstrated that the proposed approach overwhelms other statistical methods that do not utilize genetic network information in terms of true positive selection. We also applied it to the 450K DNA methylation array data of the four breast invasive carcinoma cancer subtypes from The Cancer Genome Atlas (TCGA) project.ConclusionsThe proposed variable selection approach can utilize prior biological network information for analysis of high-dimensional DNA methylation array data. It first captures gene level signals from multiple CpG sites using data a dimension reduction technique and then performs network-based regularization based on biological network graph information. It can select potentially cancer-related genes and genetic pathways that were missed by the existing methods.Electronic supplementary materialThe online version of this article (10.1186/s12859-019-3040-x) contains supplementary material, which is available to authorized users.

show abstract

Population Structure and Genetic Diversity in Korean Cowpea Germplasm Based on SNP Markers

Seo

Kim

Jun

et al. 2020

Plants

View full text Add to dashboard Cite

Cowpea is one of the most essential legume crops providing inexpensive dietary protein and nutrients. The aim of this study was to understand the genetic diversity and population structure of global and Korean cowpea germplasms. A total of 384 cowpea accessions from 21 countries were genotyped with the Cowpea iSelect Consortium Array containing 51,128 single-nucleotide polymorphisms (SNPs). After SNP filtering, a genetic diversity study was carried out using 35,116 SNPs within 376 cowpea accessions, including 229 Korean accessions. Based on structure and principal component analysis, a total of 376 global accessions were divided into four major populations. Accessions in group 1 were from Asia and Europe, those in groups 2 and 4 were from Korea, and those in group 3 were from West Africa. In addition, 229 Korean accessions were divided into three major populations (Q1, Jeonra province; Q2, Gangwon province; Q3, a mixture of provinces). Additionally, the neighbor-joining tree indicated similar results. Further genetic diversity analysis within the global and Korean population groups indicated low heterozygosity, a low polymorphism information content, and a high inbreeding coefficient in the Korean cowpea accessions. The population structure analysis will provide useful knowledge to support the genetic potential of the cowpea breeding program, especially in Korea.

show abstract

Genetic Diversity and Genome-Wide Association Study of Seed Aspect Ratio Using a High-Density SNP Array in Peanut (Arachis hypogaea L.)

Zou

Kim²,

Kim

et al. 2020

Genes

View full text Add to dashboard Cite

Peanut (Arachis hypogaea L.) is one of the important oil crops of the world. In this study, we aimed to evaluate the genetic diversity of 384 peanut germplasms including 100 Korean germplasms and 284 core collections from the United States Department of Agriculture (USDA) using an Axiom_Arachis array with 58K single-nucleotide polymorphisms (SNPs). We evaluated the evolutionary relationships among 384 peanut germplasms using a genome-wide association study (GWAS) of seed aspect ratio data processed by ImageJ software. In total, 14,030 filtered polymorphic SNPs were identified from the peanut 58K SNP array. We identified five SNPs with significant associations to seed aspect ratio on chromosomes Aradu.A09, Aradu.A10, Araip.B08, and Araip.B09. AX-177640219 on chromosome Araip.B08 was the most significantly associated marker in GAPIT and Regularization method. Phosphoenolpyruvate carboxylase (PEPC) was found among the eleven genes within a linkage disequilibrium (LD) of the significant SNPs on Araip.B08 and could have a strong causal effect in determining seed aspect ratio. The results of the present study provide information and methods that are useful for further genetic and genomic studies as well as molecular breeding programs in peanuts.

show abstract

An empirical threshold of selection probability for analysis of high-dimensional correlated data

Kim

Koo

Sun

2020

Journal of Statistical Computation and Simulation

View full text Add to dashboard Cite

New variable selection strategy for analysis of high-dimensional DNA methylation data

Choi

Kim

Sun

2018

J. Bioinform. Comput. Biol.

View full text Add to dashboard Cite

In genetic association studies, regularization methods are often used due to their computational efficiency for analysis of high-dimensional genomic data. DNA methylation data generated from Infinium HumanMethylation450 BeadChip Kit have a group structure where an individual gene consists of multiple Cytosine-phosphate-Guanine (CpG) sites. Consequently, group-based regularization can precisely detect outcome-related CpG sites. Representative examples are sparse group lasso (SGL) and network-based regularization. The former is powerful when most of the CpG sites within the same gene are associated with a phenotype outcome. In contrast, the latter is preferred when only a few of the CpG sites within the same gene are related to the outcome. In this paper, we propose new variable selection strategy based on a selection probability that measures selection frequency of individual variables selected by both SGL and network-based regularization. In extensive simulation study, we demonstrated that the proposed strategy can show relatively outstanding selection performance under any situation, compared with both SGL and network-based regularization. Also, we applied the proposed strategy to identify differentially methylated CpG sites and their corresponding genes from ovarian cancer data.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kipoong Kim

Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data

Population Structure and Genetic Diversity in Korean Cowpea Germplasm Based on SNP Markers

Genetic Diversity and Genome-Wide Association Study of Seed Aspect Ratio Using a High-Density SNP Array in Peanut (Arachis hypogaea L.)

An empirical threshold of selection probability for analysis of high-dimensional correlated data

New variable selection strategy for analysis of high-dimensional DNA methylation data

Contact Info

Product

Resources

About