Jassim N. Hussain scite author profile

Nowadays, High dimensional data are quickly increasing in many areas because of the development of new technology which helping to collect data with a large number of variables in order to better understanding for a given phenomenon of interest. Multiple Linear Regression is a famous technique used to investigate the relationship between one dependent variable and one or more of independent variables and analyzing the effects of them. Fitting this model requests assumptions, one of them is large sample size. High dimensional data does not satisfy this assumption because the sample size is small compared to the number of explanatory variables (k). Consequently, the results of traditional methods to estimate the model can be misleading. Regularization or shrinkage techniques (e.g., LASSO) have been proposed to estimate this model in this case. Nonparametric method was proposed to estimate this model. Average mean square error and root mean square error criteria are used to assess the performance of nonparametric; LASSO and OLS methods in the case of simulation study and analyzing the real dataset. The results of simulation study and the analysis of real data set show that nonparametric regression method is outperformance of LASSO and OLS methods to fit this model with high dimensional data.

International Journal of Quality, Statistics, and Reliability

Sensitivity Analysis to Select the Most Influential Risk Factors in a Logistic Regression Model

Hussain

2008

Recommended by Myong K. (MK) JeongThe traditional variable selection methods for survival data depend on iteration procedures, and control of this process assumes tuning parameters that are problematic and time consuming, especially if the models are complex and have a large number of risk factors. In this paper, we propose a new method based on the global sensitivity analysis (GSA) to select the most influential risk factors. This contributes to simplification of the logistic regression model by excluding the irrelevant risk factors, thus eliminating the need to fit and evaluate a large number of models. Data from medical trials are suggested as a way to test the efficiency and capability of this method and as a way to simplify the model. This leads to construction of an appropriate model. The proposed method ranks the risk factors according to their importance.

Parameters estimation of new mixed Weibull Rayleigh and Exponential distribution

Hussain¹,

Shareef²

2021

A new idea of mixing was introduced in this paper. Mixing parameters; pi where 0 ≤ pi ≤ 1 and ∑ i = 1 n p i = 1 are used to find a new distribution from mixing some distributions. Therefore, we can get many mixed distributions with several parameters. Three distributions Weibull, Rayleigh, and Exponential are mixed to get a new distribution which is more flexible than these distributions. The mixed distribution with a new parameter is representing the ratio of contribution of each of these distributions which are mixed. Several values of the mixing parameter were taken, and the properties of the mixed distribution were found. Two methods (MLE and OLS) of estimation are used to estimate the parameters of the new distribution. Simulation studies are used to prove the properties of new distribution and to apply the estimation method to estimate the parameters of new distribution.

A comparative study to choose the appropriate growth model to forecast COVID-19 cases in Iraq

Hussain

2022

COVID-19 infection cases forecasting is a process of estimating future values based on historical data which is playing an important role in health decision making in various fields. Daily infection cases of COVID-19 can be considered as a time series represent the growth of the number of infected persons in a population. Consequently, the growth models may be used to forecast any population growth such as population of infected people with the Covid-19 virus. The popular models of growth such as logistic, log-logistic, Gompertz, Weibull and Richards models are useful to describe the growth of many phenomena like an epidemic and the spread of the number of infected people. The main objective of this paper is to choose a successful growth model after comparing these models to make good use of the current data on COVID-19 in Iraq to better understand the spread of this disease and to forecast the future daily infection cases. AIC, BIC and other goodness of fit criteria and daily infection cases in Iraq for the period from 1st Jan. 2021 until 30th April 2021 were used to compare these models and choose the successful model. The results of fitting these model show that the appropriate models are Weibull type 1 and log-logistic with five parameters models, and the predicted numbers of infected cases are near the actual numbers of infected cases.

A Comparative Study of Nonparametric Kernel estimators with Gaussian Weight Function

Dakhil¹,

Hussain²

2021

Nowadays, Parametric methods become unfavorable by researchers because of the restrictions on using them and losing the flexibility in estimating and analysis the data. Therefore, the researchers preferred the nonparametric method which proved their efficiency and capable to analysis the data without of predetermined assumptions. Consequently, the data and their included information are becoming who determine the functional shape for the studied population and there are no parameters instead of the observations. The objective of estimating the nonparametric regression function is to approximate the regression function to the real regression function. On the other hand, COVID-19 pandemic nowadays speared in all the countries one of them is Iraq. The function of infection speared have been studied in different countries but not in Iraq. Therefore, the aim of our research is to apply three nonparametric Kernel estimators with Gaussian weighted function to model and forecast the number of infections of COVID-19 in Iraq. R software and the data represent the daily number of COVID-19 infections for the period 23/2/2020 to 21/6/2020 are used to apply many models and choose the appropriate one. The results of applying three nonparametric Kernel model that the Priestley-Chao model is the appropriate one in all the sample sizes and other conditions