The text of this paper has passed across many Internet routers on its way to the reader, but some routers will not pass it along unfettered because of censored words it contains. We present two sets of results: 1) Internet measurements of keyword filtering by the Great "Firewall" of China (GFC); and 2) initial results of using latent semantic analysis as an efficient way to reproduce a blacklist of censored words via probing.Our Internet measurements suggest that the GFC's keyword filtering is more a panopticon than a firewall, i.e., it need not block every illicit word, but only enough to promote self-censorship. China's largest ISP, ChinaNET, performed 83.3% of all filtering of our probes, and 99.1% of all filtering that occurred at the first hop past the Chinese border. Filtering occurred beyond the third hop for 11.8% of our probes, and there were sometimes as many as 13 hops past the border to a filtering router. Approximately 28.3% of the Chinese hosts we sent probes to were reachable along paths that were not filtered at all. While more tests are needed to provide a definitive picture of the GFC's implementation, our results disprove the notion that GFC keyword filtering is a firewall strictly at the border of China's Internet.While evading a firewall a single time defeats its purpose, it would be necessary to evade a panopticon almost every time. Thus, in lieu of evasion, we propose ConceptDoppler, an architecture for maintaining a censorship "weather report" about what keywords are filtered over time. Probing with potentially filtered keywords is arduous due to the GFC's complexity and can be invasive if not done efficiently. Just as an understanding of the mixing of gases preceded effective weather reporting, understanding of the relationship between keywords and concepts is essential for tracking Internet censorship. We show that LSA can effectively pare down a corpus of text and cluster filtered keywords for efficient probing, present 122 keywords we discovered by probing, and underscore the need for tracking and studying censorship blacklists by discovering some surprising blacklisted keywords such as l ‡ (conversion rate), "K-(Mein Kampf), and ýE0(ÑfT (International geological scientific federation (Beijing)).
Summary
Parameter estimation in linear errors-in-variables models typically requires that the measurement error distribution be known or estimable from replicate data. A generalized method of moments approach can be used to estimate model parameters in the absence of knowledge of the error distributions, but it requires the existence of a large number of model moments. In this paper, parameter estimation based on the phase function, a normalized version of the characteristic function, is considered. This approach requires the model covariates to have asymmetric distributions, while the error distributions are symmetric. Parameters are estimated by minimizing a distance function between the empirical phase functions of the noisy covariates and the outcome variable. No knowledge of the measurement error distribution is needed to calculate this estimator. Both asymptotic and finite-sample properties of the estimator are studied. The connection between the phase function approach and method of moments is also discussed. The estimation of standard errors is considered and a modified bootstrap algorithm for fast computation is proposed. The newly proposed estimator is competitive with the generalized method of moments, despite making fewer model assumptions about the moment structure of the measurement error. Finally, the proposed method is applied to a real dataset containing measurements of air pollution levels.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.