Abstract-Human mobility trajectories are increasingly collected by ISPs to assist academic research and commercial applications. Meanwhile, there is a growing concern that individual trajectories can be de-anonymized when the data is shared, using information from external sources (e.g. online social networks). To understand this risk, prior works either estimate the theoretical privacy bound or simulate de-anonymization attacks on synthetically created (small) datasets. However, it is not clear how well the theoretical estimations are preserved in practice.In this paper, we collected a large-scale ground-truth trajectory dataset from 2,161,500 users of a cellular network, and two matched external trajectory datasets from a large social network (56,683 users) and a check-in/review service (45,790 users) on the same user population. The two sets of large ground-truth data provide a rare opportunity to extensively evaluate a variety of de-anonymization algorithms (7 in total). We find that their performance in the real-world dataset is far from the theoretical bound. Further analysis shows that most algorithms have underestimated the impact of spatio-temporal mismatches between the data from different sources, and the high sparsity of user generated data also contributes to the underperformance. Based on these insights, we propose 4 new algorithms that are specially designed to tolerate spatial or temporal mismatches (or both) and model user behavior. Extensive evaluations show that our algorithms achieve more than 17% performance gain over the best existing algorithms, confirming our insights.
Sentence generation is a key task in many natural language processing systems. Models based on a variational autoencoder (VAE) can generate plausible sentences from a continuous latent space. However, the VAE forces the latent distribution of each input sentence to match the same prior, which results in a large overlap among the latent subspaces of different sentences and a limited informative latent space. Therefore, the sentences generated by sampling from a subspace may have little correlation with the corresponding input, and the latent space cannot capture rich useful information from the input sentences, which leads to the failure of the model to generate diverse sentences from the latent space. Additionally, the Kullback-Leibler (KL) divergence collapse problem makes the VAE notoriously difficult to train. In this paper, a latent space expanded VAE (LSE-VAE) model is presented for sentence generation. The model maps each sentence to a continuous latent subspace under the constraint of its own prior distribution, and constrains nearby sentences to map to nearby subspaces. Sentences are dispersed to a large continuous latent space according to sentence similarity, where the latent subspaces of different sentences may be relatively far away from each other and arranged in an orderly manner. The experimental results show that the LSE-VAE improves the reconstruction ability of the VAE, generates plausible and more diverse sentences, and learns a larger informative latent space than the VAE with the properties of continuity and smoothness. The LSE-VAE does not suffer from the KL collapse problem, and it is robust to hyperparameters and much easier to train.
Surface water pollution has become a hot issue in recent years in that deterioration of surface water quality has hampered the sustainable development of China's economy. Previous studies have analyzed regional changes of water pollutants, but very few have studied at a national scale. By analyzing 9 water quality parameters recorded at 422 sampling stations nationwide, this studies summarized the spatial and temporal variations of surface water quality in China in "11th Five-Year Plan" period. Research showed that China's surface water quality is improving. But, further deterioration in several areas cannot be ignored. Human activities including over-urbanization and farming exerted a negative impact on surface water quality. Though the water quality in the upstream of major rivers located in northwest China was relatively better than that of other areas, deterioration of surface water quality has begun to emerge in the area. Additionally, the surface water quality in southern China was better than that of northern China. But some studies indicated that surface water quality was likely to worsen at a high speed. It was also found that different water quality parameters are characterized by spatial and temporal variations. These studies pointed out, the government should pay more attention to in the areas where the water quality parameters significantly exceeded the national standards. These studies provides theoretical basis for the decision-making and implementation of macro-scale water quality control policies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.