Jurgita Arnastauskaitė scite author profile

A goodness-of-fit test is a frequently used modern statistics tool. However, it is still unclear what the most reliable approach is to check assumptions about data set normality. A particular data set (especially with a small number of observations) only partly describes the process, which leaves many options for the interpretation of its true distribution. As a consequence, many goodness-of-fit statistical tests have been developed, the power of which depends on particular circumstances (i.e., sample size, outlets, etc.). With the aim of developing a more universal goodness-of-fit test, we propose an approach based on an N-metric with our chosen kernel function. To compare the power of 40 normality tests, the goodness-of-fit hypothesis was tested for 15 data distributions with 6 different sample sizes. Based on exhaustive comparative research results, we recommend the use of our test for samples of size .

show abstract

A New Goodness of Fit Test for Multivariate Normality and Comparative Simulation Study

Arnastauskaitė

Ruzgas

Bražėnas

2021

Mathematics

View full text Add to dashboard Cite

The testing of multivariate normality remains a significant scientific problem. Although it is being extensively researched, it is still unclear how to choose the best test based on the sample size, variance, covariance matrix and others. In order to contribute to this field, a new goodness of fit test for multivariate normality is introduced. This test is based on the mean absolute deviation of the empirical distribution density from the theoretical distribution density. A new test was compared with the most popular tests in terms of empirical power. The power of the tests was estimated for the selected alternative distributions and examined by the Monte Carlo modeling method for the chosen sample sizes and dimensions. Based on the modeling results, it can be concluded that a new test is one of the most powerful tests for checking multivariate normality, especially for smaller samples. In addition, the assumption of normality of two real data sets was checked.

show abstract

Tax Fraud Reduction Using Analytics in an East European Country

Ruzgas

Kizauskiene

Lukauskas

et al. 2023

Axioms

View full text Add to dashboard Cite

Tax authorities face the challenge of effectively identifying companies that avoid paying taxes, which is not unique to European Union countries. Limited resources often constrain tax administrators, who traditionally rely on time-consuming and labour-intensive tax audit tools. As a result of this established practice, governments are losing a lot of tax revenue. The main objective of this study is to increase the efficiency of the detection of tax evasion by applying data mining methods in the East European country Lithuania, which has a rapidly developing economy, by applying data mining methods concerning affluence-related impacts. The study develops various models for segmentation, risk assessment, behavioral templates, and tax crime detection. Results show that the data mining technique can effectively detect tax evasion and extract hidden knowledge that can be used to reduce revenue losses resulting from tax evasion. This study’s methods, software, and findings can assist decision-makers, experts, and scientists in developing countries in predicting tax fraud detection.

show abstract

Accuracy of Nonparametric Density Estimation for Univariate Gaussian Mixture Models: A Comparative Study

Arnastauskaitė

Ruzgas

2020

View full text Add to dashboard Cite

Flexible and reliable probability density estimation is fundamental in unsupervised learning and classification. Finite Gaussian mixture models are commonly used for this purpose. However, the parametric form of the distribution is not always known. In this case, non-parametric density estimation methods are used. Usually, these methods become computationally demanding as the number of components increases. In this paper, a comparative study of accuracy of some nonparametric density estimators is made by means of simulation. The following approaches have been considered: an adaptive bandwidth kernel estimator, a projection pursuit estimator, a logspline estimator, and a k-nearest neighbor estimator. It was concluded that data clustering as a pre-processing step improves the estimation of mixture densities. However, in case data does not have clearly defined clusters, the pre-preprocessing step does not give that much of advantage. The application of density estimators is illustrated using municipal solid waste data collected in Kaunas (Lithuania). The data distribution is similar (i.e., with kurtotic unimodal density) to the benchmark distribution introduced by Marron and Wand. Based on the homogeneity tests it can be concluded that distributions of the municipal solid waste fractions in Kutaisi (Georgia), Saint-Petersburg (Russia), and Boryspil (Ukraine) are statistically indifferent compared to the distribution of waste fractions in Kaunas. The distribution of waste data collected in Kaunas (Lithuania) follows the general observations introduced by Marron and Wand (i.e., has one mode and certain kurtosis).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.