The MixGHD package for R performs model-based clustering, classification, and discriminant analysis using the generalized hyperbolic distribution (GHD). This approach is suitable for data that can be considered a realization of a (multivariate) continuous random variable. The GHD has the advantage of being flexible due to skewness, concentration, and index parameters; as such, clustering methods that use this distribution are capable of estimating clusters characterized by different shapes. The package provides five different models all based on the GHD, an efficient routine for discriminant analysis, and a function to measure cluster agreement. This paper is split into three parts: the first is devoted to the formulation of each method, extending them for classification and discriminant analysis applications, the second focuses on the algorithms, and the third shows the use of the package on real datasets.
Recently, Gill and Chien introduced a new radial quadrature for multiexponential integrands (MultiExp grid) to deal with the radial part of the numerical integration. In this article, the MultiExp grid is studied and used to integrate the charge density. The MultiExp grid, along with an optimal pruning scheme, performed very well both in terms of accuracy and efficiency compared to other radial mappings commonly used in Density Functional Theory.
Motivation
Genotype imputation, though generally accurate, often results in many genotypes being poorly imputed, particularly in studies where the individuals are not well represented by standard reference panels. When individuals in the study share regions of the genome identical by descent (IBD), it is possible to use this information in combination with a study-specific reference panel (SSRP) to improve the imputation results. Kinpute uses IBD information—due to recent, familial relatedness or distant, unknown ancestors—in conjunction with the output from linkage disequilibrium (LD) based imputation methods to compute more accurate genotype probabilities. Kinpute uses a novel method for IBD imputation, which works even in the absence of a pedigree, and results in substantially improved imputation quality.
Results
Given initial estimates of average IBD between subjects in the study sample, Kinpute uses a novel algorithm to select an optimal set of individuals to sequence and use as an SSRP. Kinpute is designed to use as input both this SSRP and the genotype probabilities output from other LD-based imputation software, and uses a new method to combine the LD imputed genotype probabilities with IBD configurations to substantially improve imputation. We tested Kinpute on a human population isolate where 98 individuals have been sequenced. In half of this sample, whose sequence data was masked, we used Impute2 to perform LD-based imputation and Kinpute was used to obtain higher accuracy genotype probabilities. Measures of imputation accuracy improved significantly, particularly for those genotypes that Impute2 imputed with low certainty.
Availability and implementation
Kinpute is an open-source and freely available C++ software package that can be downloaded from.
Supplementary information
Supplementary data are available at Bioinformatics online.
In a previous study, we compared, both in terms of accuracy and efficiency, the performance of some of the well-known grids, which use the Becke partitioning scheme for molecular numerical integration. We concluded, based on the number of electrons only, that the MultiExp grid performed well compared with the grids proposed by Becke, Gill et al., and Treutler and Ahlrichs. In this work, we re-examine the performance of the same set of grids in addition to the SG-0 grid and a benchmark grid. These grids are evaluated by integrating the Hartree–Fock electron density to calculate the number of electrons, dipole moment, potential energy, and Coulomb repulsion energy. Our results show that, except for the large benchmark grid, none of these grids were completely satisfactory.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.