Yong Shi scite author profile

As architecture, systems, and data management communities pay greater attention to innovative big data systems and architecture, the pressure of benchmarking and evaluating these systems rises. However, the complexity, diversity, frequently changed workloads, and rapid evolution of parison with the traditional benchmarks: including PAR-SEC, HPCC, and SPECCPU, big data applications have very low operation intensity, which measures the ratio of the total number of instructions divided by the total byte number of memory accesses; Second, the volume of data input has non-negligible impact on micro-architecture characteristics, which may impose challenges for simulation-based big data architecture research; Last but not least, corroborating the observations in CloudSuite and DCBench (which use smaller data inputs), we find that the numbers of L1 instruction cache (L1I) misses per 1000 instructions (in short, MPKI) of the big data applications are higher than in the traditional benchmarks; also, we find that L3 caches are effective for the big data applications, corroborating the observation in DCBench.

show abstract

The Role of Text Pre-processing in Sentiment Analysis

Haddi

Liu

Shi

2013

Procedia Computer Science

473

207

View full text Add to dashboard Cite

Evaluation of Classification Algorithms Using McDm and Rank Correlation

Kou

Peng

et al. 2012

Int. J. Info. Tech. Dec. Mak.

516

168

View full text Add to dashboard Cite

Classification algorithm selection is an important issue in many disciplines. Since it normally involves more than one criterion, the task of algorithm selection can be modeled as multiple criteria decision making (MCDM) problems. Different MCDM methods evaluate classifiers from different aspects and thus they may produce divergent rankings of classifiers. The goal of this paper is to propose an approach to resolve disagreements among MCDM methods based on Spearman's rank correlation coefficient. Five MCDM methods are examined using 17 classification algorithms and 10 performance criteria over 11 public-domain binary classification datasets in the experimental study. The rankings of classifiers are quite different at first. After applying the proposed approach, the differences among MCDM rankings are largely reduced. The experimental results prove that the proposed approach can resolve conflicting MCDM rankings and reach an agreement among different MCDM methods.

show abstract

Probabilistic Face Embeddings

2019

View full text Add to dashboard Cite

Embedding methods have achieved success in face recognition by comparing facial features in a latent semantic space. However, in a fully unconstrained face setting, the facial features learned by the embedding model could be ambiguous or may not even be present in the input face, leading to noisy representations. We propose Probabilistic Face Embeddings (PFEs), which represent each face image as a Gaussian distribution in the latent space. The mean of the distribution estimates the most likely feature values while the variance shows the uncertainty in the feature values. Probabilistic solutions can then be naturally derived for matching and fusing PFEs using the uncertainty information. Empirical evaluation on different baseline models, training datasets and benchmarks show that the proposed method can improve the face recognition performance of deterministic embeddings by converting them into PFEs. The uncertainties estimated by PFEs also serve as good indicators of the potential matching accuracy, which are important for a risk-controlled recognition system.

show abstract

A simple method to improve the consistency ratio of the pair-wise comparison matrix in ANP

Ergu

Kou

Peng

et al. 2011

European Journal of Operational Research

216

114

View full text Add to dashboard Cite

WarpGAN: Automatic Caricature Generation

2019

View full text Add to dashboard Cite

We propose, WarpGAN, a fully automatic network that can generate caricatures given an input face photo. Besides transferring rich texture styles, WarpGAN learns to automatically predict a set of control points that can warp the photo into a caricature, while preserving identity. We introduce an identity-preserving adversarial loss that aids the discriminator to distinguish between different subjects. Moreover, WarpGAN allows customization of the generated caricatures by controlling the exaggeration extent and the visual styles. Experimental results on a public domain dataset, WebCaricature, show that WarpGAN is capable of generating caricatures that not only preserve the identities but also outputs a diverse set of caricatures for each input photo. Five caricature experts suggest that caricatures generated by WarpGAN are visually similar to hand-drawn ones and only prominent facial features are exaggerated. * indicates equal contribution

show abstract

A Descriptive Framework for the Field of Data Mining and Knowledge Discovery

Peng

Kou

Shi

et al. 2008

Int. J. Info. Tech. Dec. Mak.

306

104

View full text Add to dashboard Cite

Despite the rapid development, the field of data mining and knowledge discovery (DMKD) is still vaguely defined and lack of integrated descriptions. This situation causes difficulties in teaching, learning, research, and application. This paper surveys a large collection of DMKD literature to provide a comprehensive picture of current DMKD research and classify these research activities into high-level categories using grounded theory approach; it also evaluates the longitudinal changes of DMKD research activities during the last decade.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yong Shi

Automatic Road Crack Detection Using Random Structured Forests

BigDataBench: A big data benchmark suite from internet services

The Role of Text Pre-processing in Sentiment Analysis

Evaluation of Classification Algorithms Using McDm and Rank Correlation

Probabilistic Face Embeddings

A simple method to improve the consistency ratio of the pair-wise comparison matrix in ANP

WarpGAN: Automatic Caricature Generation

A Descriptive Framework for the Field of Data Mining and Knowledge Discovery

Contact Info

Product

Resources

About