As architecture, systems, and data management communities pay greater attention to innovative big data systems and architecture, the pressure of benchmarking and evaluating these systems rises. However, the complexity, diversity, frequently changed workloads, and rapid evolution of parison with the traditional benchmarks: including PAR-SEC, HPCC, and SPECCPU, big data applications have very low operation intensity, which measures the ratio of the total number of instructions divided by the total byte number of memory accesses; Second, the volume of data input has non-negligible impact on micro-architecture characteristics, which may impose challenges for simulation-based big data architecture research; Last but not least, corroborating the observations in CloudSuite and DCBench (which use smaller data inputs), we find that the numbers of L1 instruction cache (L1I) misses per 1000 instructions (in short, MPKI) of the big data applications are higher than in the traditional benchmarks; also, we find that L3 caches are effective for the big data applications, corroborating the observation in DCBench.
Classification algorithm selection is an important issue in many disciplines. Since it normally involves more than one criterion, the task of algorithm selection can be modeled as multiple criteria decision making (MCDM) problems. Different MCDM methods evaluate classifiers from different aspects and thus they may produce divergent rankings of classifiers. The goal of this paper is to propose an approach to resolve disagreements among MCDM methods based on Spearman's rank correlation coefficient. Five MCDM methods are examined using 17 classification algorithms and 10 performance criteria over 11 public-domain binary classification datasets in the experimental study. The rankings of classifiers are quite different at first. After applying the proposed approach, the differences among MCDM rankings are largely reduced. The experimental results prove that the proposed approach can resolve conflicting MCDM rankings and reach an agreement among different MCDM methods.
Embedding methods have achieved success in face recognition by comparing facial features in a latent semantic space. However, in a fully unconstrained face setting, the facial features learned by the embedding model could be ambiguous or may not even be present in the input face, leading to noisy representations. We propose Probabilistic Face Embeddings (PFEs), which represent each face image as a Gaussian distribution in the latent space. The mean of the distribution estimates the most likely feature values while the variance shows the uncertainty in the feature values. Probabilistic solutions can then be naturally derived for matching and fusing PFEs using the uncertainty information. Empirical evaluation on different baseline models, training datasets and benchmarks show that the proposed method can improve the face recognition performance of deterministic embeddings by converting them into PFEs. The uncertainties estimated by PFEs also serve as good indicators of the potential matching accuracy, which are important for a risk-controlled recognition system.
We propose, WarpGAN, a fully automatic network that can generate caricatures given an input face photo. Besides transferring rich texture styles, WarpGAN learns to automatically predict a set of control points that can warp the photo into a caricature, while preserving identity. We introduce an identity-preserving adversarial loss that aids the discriminator to distinguish between different subjects. Moreover, WarpGAN allows customization of the generated caricatures by controlling the exaggeration extent and the visual styles. Experimental results on a public domain dataset, WebCaricature, show that WarpGAN is capable of generating caricatures that not only preserve the identities but also outputs a diverse set of caricatures for each input photo. Five caricature experts suggest that caricatures generated by WarpGAN are visually similar to hand-drawn ones and only prominent facial features are exaggerated. * indicates equal contribution
Despite the rapid development, the field of data mining and knowledge discovery (DMKD) is still vaguely defined and lack of integrated descriptions. This situation causes difficulties in teaching, learning, research, and application. This paper surveys a large collection of DMKD literature to provide a comprehensive picture of current DMKD research and classify these research activities into high-level categories using grounded theory approach; it also evaluates the longitudinal changes of DMKD research activities during the last decade.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.