A robust dimension reduction method in Principal Component Analysis (PCA) was used to rectify the issue of unbalanced clusters in rainfall patterns due to the skewed nature of rainfall data. A robust measure in PCA using Tukey’s biweight correlation to downweigh observations was introduced and the optimum breakdown point to extract the number of components in PCA using this approach is proposed. A set of simulated data matrix that mimicked the real data set was used to determine an appropriate breakdown point for robust PCA and compare the performance of the both approaches. The simulated data indicated a breakdown point of 70% cumulative percentage of variance gave a good balance in extracting the number of components .The results showed a more significant and substantial improvement with the robust PCA than the PCA based Pearson correlation in terms of the average number of clusters obtained and its cluster quality.
<p>Identifying the local time scale of the torrential rainfall pattern through Singular Spectrum Analysis (SSA) is useful to separate the trend and noise components. However, SSA poses two main issues which are torrential rainfall time series data have coinciding singular values and the leading components from eigenvector obtained from the decomposing time series matrix are usually assesed by graphical inference lacking in a specific statistical measure. In consequences to both issues, the extracted trend from SSA tended to flatten out and did not show any distinct pattern. This problem was approached in two ways. First, an Iterative Oblique SSA (Iterative O-SSA) was presented to make adjustment to the singular values data. Second, a measure was introduced to group the decomposed eigenvector based on Robust Sparse K-means (RSK-Means). As the results, the extracted trend using modification of SSA appeared to fit the original time series and looked more flexible compared to SSA.</p>
Rainfall data are the most significant values in hydrology and climatology modelling. However, the datasets are prone to missing values due to various issues. This study aspires to impute the rainfall missing values by using various imputation methods such as Replacing by Mmean (RM), Nearest Neighbor (NN), Random Forest (RF), Nonlinear Interactive Partial Least-Square (NIPALS) and Markov Chain Monte Carlo (MCMC). Monthly rainfall datasets from 24 rainfall stations in Yogyakarta, Indonesia were used in this study. The datasets were then used for bootstrapping to obtain an estimate of the withinimputation standard errors for each imputed dataset. The performances of five methods were evaluated using root mean square method (RMSE). The experimental results showed that the RF-Bootstrap (RF-B) approach was attained as the most satisfying fitting for missing rainfall data in Yogyakarta, Indonesia.
This paper presents a modified correlation in principal component analysis (PCA) for selection number of clusters in identifying rainfall patterns. The approach of a clustering as guided by PCA is extensively employed in data with high dimension especially in identifying the spatial distribution patterns of daily torrential rainfall. Typically, a common method of identifying rainfall patterns for climatological investigation employed T mode-based Pearson correlation matrix to extract the relative variance retained. However, the data of rainfall in Peninsular Malaysia involved skewed observations in the direction of higher values with pure tendencies of values that are positive. Therefore, using Pearson correlation which was basing on PCA on rainfall set of data has the potentioal to influence the partitions of cluster as well as producing exceptionally clusters that are eneven in a space with high dimension. For current research, to resolve the unbalanced clusters challenge regarding the patterns of rainfall caused by the skewed character of the data, a robust dimension reduction method in PCA was employed. Thus, it led to the introduction of a robust measure in PCA with Tukey’s biweight correlation to downweigh observations along with the optimal breakdown point to obtain PCA’s quantity of components. Outcomes of this study displayed a highly substantial progress for the robust PCA, contrasting with the PCA-based Pearson correlation in respects to the average amount of acquired clusters and indicated 70% variance cumulative percentage at the breakdown point of 0.4.
<p><span>Hybridization is one of the popular approaches in modifying the conjugate gradient method. In this paper, a new hybrid conjugate gradient is suggested and analyzed in which the parameter <!--[if gte mso 9]><xml>
<o:OLEObject Type="Embed" ProgID="Equation.3" ShapeID="_x0000_i1025"
DrawAspect="Content" ObjectID="_1640083713">
</o:OLEObject>
</xml><![endif]-->is evaluated as a convex combination of <!--[if gte mso 9]><xml>
<o:OLEObject Type="Embed" ProgID="Equation.3" ShapeID="_x0000_i1026"
DrawAspect="Content" ObjectID="_1640083714">
</o:OLEObject>
</xml><![endif]--> while using exact line search. The proposed method is shown to possess both sufficient descent and global convergence properties. Numerical performances show that the proposed method is promising and has overpowered other hybrid conjugate gradient methods in its number of iterations and central processing unit per time. </span></p>
Novel coronavirus (COVID-19) was discovered in Wuhan, China in December 2019, and has affected millions of lives worldwide. On 29th April 2020, Malaysia reported more than 5,000 COVID-19 cases; the second highest in the Southeast Asian region after Singapore. Recently, a forecasting model was developed to measure and predict COVID-19 cases in Malaysia on daily basis for the next 10 days using previously-confirmed cases. A Recurrent Forecasting-Singular Spectrum Analysis (RF-SSA) is proposed by establishing L and ET parameters via several tests. The advantage of using this forecasting model is it would discriminate noise in a time series trend and produce significant forecasting results. The RF-SSA model assessment was based on the official COVID-19 data released by the World Health Organization (WHO) to predict daily confirmed cases between 30th April and 31st May, 2020. These results revealed that parameter L = 5 (T/20) for the RF-SSA model was indeed suitable for short-time series outbreak data, while the appropriate number of eigentriples was integral as it influenced the forecasting results. Evidently, the RF-SSA had over-forecasted the cases by 0.36%. This signifies the competence of RF-SSA in predicting the impending number of COVID-19 cases. Nonetheless, an enhanced RF-SSA algorithm should be developed for higher effectivity of capturing any extreme data changes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.