This article addresses some problems in outlier detection and variable selection in linear regression models. First, in outlier detection there are problems known as smearing and masking. Smearing means that one outlier makes another, non-outlier observation appear as an outlier, and masking that one outlier prevents another one from being detected. Detecting outliers one by one may therefore give misleading results. In this article a genetic algorithm is presented which considers different possible groupings of the data into outlier and non-outlier observations. In this way all outliers are detected at the same time. Second, it is known that outlier detection and variable selection can influence each other, and that different results may be obtained, depending on the order in which these two tasks are performed. It may therefore be useful to consider these tasks simultaneously, and a genetic algorithm for a simultaneous outlier detection and variable selection is suggested. Two real data sets are used to illustrate the algorithms, which are shown to work well. In addition, the scalability of the algorithms is considered with an experiment using generated data.
Long memory in the form of fractional integration is analysed in stock market returns. Special emphasis is placed on taking into account the potential bias caused by neglected outliers in the data. It is first shown by a simulation experiment that outliers will bias the estimated fractional integration parameter towards zero. In a monthly data set, consisting of stock market indices of 16 OECD countries, statistically significant long memory is found for three countries. In one of these long memory is only found when outliers are first taken into account.
Fractional integration long memory models are used to estimate the unemployment persistence of different labour force groups. The data comes from Finland, and has youth and total labour force unemployment rates for both males and females. It is found that unemployment is less persistent for females and young people than for males and the whole labour force.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.