Modeling and understanding people's mobility at a temporal and geographical space are very strict requirements for developing better strategies of urban public and private transportation systems as well as establishing improved business techniques. This work proposes a random-search based approach to instantiate statistical indicators through an improved mobility scenario which provides specific information about people attending one or several days for some events. Then, we recreate that scenario with virtual humans, proposing a synthetic and open dataset that matches the original statistical data. The results show the proposed approach is very efficient to model people's mobility, and the generated data has a low error rate compared to the original one.
Longitudinal studies of human mobility could allow an understanding of human behavior on a vast scale. Mobile phone data call detail records (CDRs) have emerged as a prospective data source for such an important task. Nevertheless, there are significant risks when it comes to collecting this type of data, as human mobility has proven to be quite unique. Because CDRs are produced through the connection of mobile phones with mobile network operators' (MNOs) antennas, it means that users cannot sanitize their data. Once MNOs intend to use such a data source for human mobility analysis, data protection authorities such as the CNIL (in France) recommends that data be sanitized on the fly instead of collecting raw data and publishing private output at the end of the analysis. Local differential privacy (LDP) mechanisms are currently applied during the process of data collection to preserve the privacy of users. In this paper, we propose an efficient privacy-preserving LDP-based methodology to collect and analyze multi-dimensional data longitudinally through mobile connections. In our proposal, rather than regarding users as unique IDs, we propose a generic scenario where one directly collects users' sensitive data with LDP. The intuition behind this is collecting generic values, which can be generated by many users (e.g., gender), allowing a longitudinal study. As we show in the results, our methodology is very appropriate for this scenario, achieving accurate frequency estimation in a multi-dimensional setting while respecting some major recommendations of data protection authorities such as the GDPR and CNIL. This work was supported by the Region of Bourgogne Franche-Comté CADRAN Project and by the EIPHI-BFC Graduate School (contract "ANR-17-EURE-0002"). The authors would also like to thank the Orange Application for Business team for their useful feedback and comments. Computations have been performed on the supercomputer facilities of "Mésocentre de Calcul de Franche-Comté".
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.