Il Do HA, Jianxin PAN, Seungyoung OH, and Youngjo LEE Variable selection methods using a penalized likelihood have been widely studied in various statistical models. However, in semiparametric frailty models, these methods have been relatively less studied because the marginal likelihood function involves analytically intractable integrals, particularly when modeling multicomponent or correlated frailties. In this article, we propose a simple but unified procedure via a penalized h-likelihood (HL) for variable selection of fixed effects in a general class of semiparametric frailty models, in which random effects may be shared, nested, or correlated. We consider three penalty functions (least absolute shrinkage and selection operator [LASSO], smoothly clipped absolute deviation [SCAD], and HL) in our variable selection procedure. We show that the proposed method can be easily implemented via a slight modification to existing HL estimation approaches. Simulation studies also show that the procedure using the SCAD or HL penalty performs well. The usefulness of the new method is illustrated using three practical datasets too. Supplementary materials for the article are available online.
Given sparse multi-dimensional data (e.g., (user, movie, time; rating) for movie recommendations), how can we discover latent concepts/relations and predict missing values? Tucker factorization has been widely used to solve such problems with multi-dimensional data, which are modeled as tensors. However, most Tucker factorization algorithms regard and estimate missing entries as zeros, which triggers a highly inaccurate decomposition. Moreover, few methods focusing on an accuracy exhibit limited scalability since they require huge memory and heavy computational costs while updating factor matrices.In this paper, we propose P-TUCKER, a scalable Tucker factorization method for sparse tensors. P-TUCKER performs alternating least squares with a row-wise update rule in a fully parallel way, which significantly reduces memory requirements for updating factor matrices. Furthermore, we offer two variants of P-TUCKER: a caching algorithm P-TUCKER-CACHE and an approximation algorithm P-TUCKER-APPROX, both of which accelerate the update process. Experimental results show that P-TUCKER exhibits 1.7-14.1× speed-up and 1.4-4.8× less error compared to the state-of-the-art. In addition, P-TUCKER scales near linearly with the number of observable entries in a tensor and number of threads. Thanks to P-TUCKER, we successfully discover hidden concepts and relations in a large-scale real-world tensor, while existing methods cannot reveal latent features due to their limited scalability or low accuracy.
Currently, boars selected for commercial use as AI sires are evaluated on grow-finish performance and carcass characteristics. If AI sires were also evaluated and selected on semen production, it may be possible to reduce the number of boars required to service sows, thereby improving the productivity and profitability of the boar stud. The objective of this study was to estimate genetic correlations between production and semen traits in the boar: average daily gain (ADG), backfat thickness (BF) and muscle depth (MD) as production traits, and total sperm cells (TSC), total concentration (TC), volume collected (SV), number of extended doses (ND), and acceptance rate of ejaculates (AR) as semen traits. Semen collection records and performance data for 843 boars and two generations of pedigree data were provided by Smithfield Premium Genetics. Backfat thickness and MD were measured by real-time ultrasound. Genetic parameters were estimated from five four-trait and one five-trait animal models using MTDFREML. Average heritability estimates were 0.39 for ADG, 0.32 for BF, 0.15 for MD, and repeatability estimates were 0.38 for SV, 0.37 for TSC, 0.09 for TC, 0.39 for ND, and 0.16 for AR. Semen traits showed a strong negative genetic correlation with MD and positive genetic correlation with BF. Genetic correlations between semen traits and ADG were low. Therefore, current AI boar selection practices may be having a detrimental effect on semen production.
The objective of this study was to model the variances and covariances of total sperm cells per ejaculate (TSC) over the reproductive lifetime of AI boars. Data from boars (n = 834) selected for AI were provided by Smithfield Premium Genetics. The total numbers of records and animals were 19,629 and 1,736, respectively. Parameters were estimated for TSC by age of boar classification with a random regression model using the Simplex method and DxMRR procedures. The model included breed, collector, and year-season as fixed effects. Random effects were additive genetic, permanent environmental effect of boar, and residual. Observations were removed when the number of data at a given age of boar classification was < 10 records. Preliminary evaluations showed the best fit with fifth-order polynomials, indicating that the best model would have fifth-order fixed regression and fifth-order random regressions for animal and permanent environmental effects. Random regression models were fitted to evaluate all combinations of first- through seventh-order polynomial covariance functions. Goodness of fit for the models was tested using Akaike's Information Criterion and the Schwarz Criterion. The maximum log likelihood value was observed for sixth-, fifth-, and seventh-order polynomials for fixed, additive genetic, and permanent environmental effects, respectively. However, the best fit as determined by Akaike's Information Criterion and the Schwarz Criterion was by fitting sixth-, fourth-, and seventh-order polynomials; and fourth-, second-, and seventh-order polynomials for fixed, additive genetic, and permanent environmental effects, respectively. Heritability estimates for TSC ranged from 0.27 to 0.48 across age of boar classifications. In addition, heritability for TSC tended to increase with age of boar classification.
The proportional subdistribution hazards model (i.e. Fine-Gray model) has been widely used for analyzing univariate competing risks data. Recently, this model has been extended to clustered competing risks data via frailty. To the best of our knowledge, however, there has been no literature on variable selection method for such competing risks frailty models. In this paper, we propose a simple but unified procedure via a penalized h-likelihood (HL) for variable selection of fixed effects in a general class of subdistribution hazard frailty models, in which random effects may be shared or correlated. We consider three penalty functions (LASSO, SCAD and HL) in our variable selection procedure. We show that the proposed method can be easily implemented using a slight modification to existing h-likelihood estimation approaches. Numerical studies demonstrate that the proposed procedure using the HL penalty performs well, providing a higher probability of choosing the true model than LASSO and SCAD methods without losing prediction accuracy. The usefulness of the new method is illustrated using two actual data sets from multi-center clinical trials.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.