With the advancement of technology, analysis of large-scale data of gene expression is feasible and has become very popular in the era of machine learning. This paper develops an improved ridge approach for the genome regression modeling. When multicollinearity exists in the data set with outliers, we consider a robust ridge estimator, namely the rank ridge regression estimator, for parameter estimation and prediction. On the other hand, the efficiency of the rank ridge regression estimator is highly dependent on the ridge parameter. In general, it is difficult to provide a satisfactory answer about the selection for the ridge parameter. Because of the good properties of generalized cross validation (GCV) and its simplicity, we use it to choose the optimum value of the ridge parameter. The GCV function creates a balance between the precision of the estimators and the bias caused by the ridge estimation. It behaves like an improved estimator of risk and can be used when the number of explanatory variables is larger than the sample size in high-dimensional problems. Finally, some numerical illustrations are given to support our findings.
The purpose of this paper is to introduce a useful online interactive dashboard (https://mahdisalehi.shinyapps.io/Covid19Dashboard/) that visualize and follow confirmed cases of COVID-19 in real-time. The dashboard was made publicly available on 6 April 2020 to illustrate the counts of confirmed cases, deaths, and recoveries of COVID-19 at the level of country or continent. This dashboard is intended as a user-friendly dashboard for researchers as well as the general public to track the COVID-19 pandemic, and is generated from trusted data sources and built in open-source R software (Shiny in particular); ensuring a high sense of transparency and reproducibility. The R Shiny framework serves as a platform for visualization and analysis of the data, as well as an advance to capitalize on existing data curation to support and enable open science. Coded analysis here includes logistic and Gompertz growth models, as two mathematical tools for predicting the future of the COVID-19 pandemic, as well as the Moran's index metric, which gives a spatial perspective via heat maps that may assist in the identification of latent responses and behavioral patterns. This analysis provides real-time statistical application aiming to make sense to academic- and public consumers of the large amount of data that is being accumulated due to the COVID-19 pandemic.
In the modern era, using advanced technology, we have access to data with many features, and therefore, feature engineering has become a vital task in data analysis. One of the challenges in model estimation is to combat multicollinearity in high‐dimensional data problems where the number of features (
p) exceeds the number of samples
()n. We propose a novel, yet simple, strategy to estimate the regression parameters in a high‐dimensional regime in the presence of multicollinearity. The proposed approach enjoys the good properties of the random forest and the simple structure of a class of linear unified estimators. We give a fast and straightforward algorithm to estimate the regression coefficients when
p>n and multicollinearity exist. Numerical investigation reveals the superior performance of the method in test mean squared error. The technique is also applied to melting chemical data, where we conducted an estimation among 4885 features and discussed advantages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.