Maciej Beręsewicz scite author profile

Salvati

et al. 2018

Summary A three‐level M‐quantile model for small area estimation is proposed. The methodology represents an efficient alternative to prediction by using a three‐level linear mixed model in the presence of outliers and it is based on an extension of M‐quantile regression. A modified method of the traditional M‐quantile (two‐level) approach for poverty estimation is also proposed. In addition, an estimator of the mean‐squared prediction error is described, which is based on a bootstrap procedure. The methodology proposed, as well as the three‐level empirical best predictor, are applied to Polish European Union Survey on Income and Living Conditions and census data to estimate poverty at local administrative unit 1 level in Poland, i.e. the level for which the Central Statistical Office of Poland has not published any official estimates to date.

On Representativeness of Internet Data Sources for Real Estate Market in Poland

2015

AJS

Shifting paradigms in Official Statistics lead to the widespread use of administrative records in an effort to support or create an alternative for censuses and surveys. At the same time the demand for diversified detailed information is increasing. In order to meet this demand Official Statistics needs to seek new data sources. Internet data sources (IDS), or more generally, big data could be one of them. The potential usefulness of these new sources of statistical information should not be neglected.The aim of the paper is to report on a study intended to assess the representativeness of IDS for the real estate market in Poland. These sources could be used for describing the demand and supply on the secondary real estate market in a more detailed way than is possible with the existing methodology. The degree of representativeness is assessed on the basis of information from official surveys and other data sources. Due to the shortage of relevant literature on the subject, the article provides a definition of IDS and draws on insights from a study conducted by the author to enhance information from Official Statistics. The study involved using information on street names from the National Official Register of the Territorial Division of the Country (TERYT) to harmonize street names obtained from IDS. A special program for automated data collection (web spider ) was developed. All the calculations were made with R (R Core Team 2014) statistical software and additional R packages (XML, RCurl, httr and ggplot2).

Scanner data in inflation measurement: From raw data to price indices

Białek

2021

SJI

Scanner data offer new opportunities for CPI or HICP calculation. They can be obtained from a wide variety of retailers (supermarkets, home electronics, Internet shops, etc.) and provide information at the level of the barcode. One of advantages of using scanner data is the fact that they contain complete transaction information, i.e. prices and quantities for every sold item. After clearing data and unifying product names, products should be carefully classified (e.g. into COICOP 5 or below), matched, filtered and aggregated. One of new challenges connected with scanner data is the appropriate choice of the index formula. In this article we present a proposal for the implementation of individual stages of handling scanner data. We also point out potential problems during scanner data processing and their solutions. We compare a large number of price index methods based on real scanner data sets and we verify their sensitivity on adopted data filtering and aggregating methods. One of the aims is also to compare calculations of multilateral indices in terms of how time-consuming they are. Finally, the paper investigates distances between these indices and the theoretical, expected value of the price share when prices are log-normally distributed. It is a new approach to providing an additional criterion in the price index selection.

Big data w statystyce publicznej – nadzieje, osiągnięcia, wyzwania i zagrożenia

Beręsewicz¹,

Szymkowiak²

2015

Ekonometria

Streszczenie: W artykule opisany został aktualny stan wykorzystania tzw. big data w statystyce oficjalnej. Przedstawione zostały doświadczenia wybranych -krajowych urzędów statystycznych w praktycznym zastosowaniu danych pochodzących od operatorów telefonii komórkowej, czujników ruchu, z portali społecznościowych czy danych transakcyjnych na potrzeby statystyki publicznej. Sformułowane zostały również szanse, wyzwania i zagroże-nia, jakie stoją przed urzędami statystycznymi w wykorzystaniu tego typu informacji w nurcie statystyki publicznej.Słowa kluczowe: big data, statystyka publiczna, internetowe źródła danych. Summary:The main purpose of the article is to describe the state of the art in using big data in official statistics. The article presents selected examples of how data from mobile operators, sensors, social media or scanners are used by national statistical offices. The authors also identify chances, challenges and risks related to the use of big data in the field of official statistics.

A Two‐Step Procedure to Measure Representativeness of Internet Data Sources

2017

Int Statistical Rev

Summary So far, statistics has mainly relied on information collected from censuses and sample surveys, which are used to produce statistics about selected characteristics of the population. However, because of cost cuts and increasing non‐response in sample surveys, statisticians have started to search for new sources of information, such as registers, Internet data sources (IDSs, i.e. web portals) or big data. Administrative sources are already used for purposes of official statistics, while the suitability of the latter two sources is currently being discussed in the literature. Unfortunately, only a few papers devoted to statistical theory point out methodological problems related to the use of IDSs, particularly in the context of survey methodology. The unknown generation mechanism and the complexity of such data are often neglected in view of their size. Hence, before IDSs can be used for statistical purposes, especially for official statistics, they need to be assessed in terms of such fundamental issues as representativeness, non‐sampling errors or bias. The paper attempts to fill the first gap by proposing a two‐step procedure to measure representativeness of IDSs. The procedure will be exemplified using data about the secondary real estate market in Poland.