Nationality identi cation unlocks important demographic information, with many applications in biomedical and sociological research. Existing name-based nationality classi ers use name substrings as features and are trained on small, unrepresentative sets of labeled names, typically extracted from Wikipedia. As a result, these methods achieve limited performance and cannot support ne-grained classi cation. We exploit the phenomena of homophily in communication patterns to learn name embeddings, a new representation that encodes gender, ethnicity, and nationality which is readily applicable to building classi ers and other systems. rough our analysis of 57M contact lists from a major Internet company, we are able to design a ne-grained nationality classi er covering 39 groups representing over 90% of the world population. In an evaluation against other published systems over 13 common classes, our F1 score (0.795) is substantial be er than our closest competitor Ethnea (0.580). To the best of our knowledge, this is the most accurate, ne-grained nationality classi er available.As a social media application, we apply our classi ers to the followers of major Twi er celebrities over six di erent domains. We demonstrate stark di erences in the ethnicities of the followers of Trump and Obama, and in the sports and entertainments favored by di erent groups. Finally, we identify an anomalous political gure whose presumably in ated following appears largely incapable of reading the language he posts in.
Abstract. Semifragile watermarking techniques aim to prevent tampering and fraudulent use of modified images. A semifragile watermark monitors the integrity of the content of the image but not its numerical representation. Therefore, the watermark is designed so that the integrity is proven if the content of the image has not been tampered with, despite some mild processing on the image. However, if parts of the image are replaced with the wrong key or are heavily processed, the watermark information should indicate evidence of forgery. We compare the performance of eight semifragile watermarking algorithms in terms of their miss probability under forgery attack, and in terms of false alarm probability under nonmalicious signal processing operations that preserve the content and quality of the image. We propose desiderata for semifragile watermarking algorithms and indicate the promising algorithms among existing ones.
Perceptual hash functions have been recently proposed as cryptographic primitives for multimedia security applications. However, many of these hash functions have been designed with signal processing robustness issues and have not addressed the key issues of confusion and diffusion that are central to the security of conventional hash functions. In this paper we give a definition for confusion and diffusion for perceptual hash functions and show how many common perceptual hash functions do not display desirable confusion/diffusion properties.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.