Summary. For high dimensional classification, it is well known that naively performing the Fisher discriminant rule leads to poor results due to diverging spectra and accumulation of noise. Therefore, researchers proposed independence rules to circumvent the diverging spectra, and sparse independence rules to mitigate the issue of accumulation of noise. However, in biological applications, often a group of correlated genes are responsible for clinical outcomes, and the use of the covariance information can significantly reduce misclassification rates. In theory the extent of such error rate reductions is unveiled by comparing the misclassification rates of the Fisher discriminant rule and the independence rule. To materialize the gain on the basis of finite samples, a regularized optimal affine discriminant (ROAD) is proposed. The ROAD selects an increasing number of features as the regularization relaxes. Further benefits can be achieved when a screening method is employed to narrow the feature pool before applying the ROAD method. An efficient constrained co‐ordinate descent algorithm is also developed to solve the associated optimization problems. Sampling properties of oracle type are established. Simulation studies and real data analysis support our theoretical results and demonstrate the advantages of the new classification procedure under a variety of correlation structures. A delicate result on continuous piecewise linear solution paths for the ROAD optimization problem at the population level justifies the linear interpolation of the constrained co‐ordinate descent algorithm.
An umbrella algorithm and a graphical tool for asymmetric error control in binary classification.
In statistics and machine learning, classification studies how to automatically learn to make good qualitative predictions (i.e., assign class labels) based on past observations. Examples of classification problems include email spam filtering, fraud detection, market segmentation. Binary classification, in which the potential class label is binary, has arguably the most widely used machine learning applications. Most existing binary classification methods target on the minimization of the overall classification risk and may fail to serve some real-world applications such as cancer diagnosis, where users are more concerned with the risk of misclassifying one specific class than the other. Neyman-Pearson (NP) paradigm was introduced in this context as a novel statistical framework for handling asymmetric type I/II error priorities. It seeks classifiers with a minimal type II error subject to a type I error constraint under some user-specified level. Though NP classification has the potential to be an important subfield in the classification literature, it has not received much attention in the statistics and machine learning communities. This article is a survey on the current status of the NP classification literature. To stimulate readers' research interests, the authors also envision a few possible directions for future research in NP paradigm and its applications.
International audience—This work demonstrates that a set of commercial and scale-out applications exhibit significant use of superpages and thus suffer from the fixed and small superpage TLB structures of some modern core designs. Other processors better cope with superpages at the expense of using power-hungry and slow fully-associative TLBs. We consider alternate designs that allow all pages to freely share a single, power-efficient and fast set-associative TLB. We propose a prediction-guided multi-grain TLB design that uses a superpage prediction mechanism to avoid multiple lookups in the common case. In addition, we evaluate the previously proposed skewed TLB [1] which builds on principles similar to those used in skewed associative caches [2]. We enhance the original skewed TLB design by using page size prediction to increase its effective associativity. Our prediction-based multi-grain TLB design delivers more hits and is more power efficient than existing alternatives. The predictor uses a 32-byte prediction table indexed by base register values. I. INTRODUCTION Over the last 50 years virtual memory has been an intrinsic facility of computer systems, providing each process the illusion of an equally large and contiguous address space while enforcing isolation and access control. Page tables act as gate-keepers; they maintain the mappings of virtual pages to physical frames (i.e., translations) along with additional information (e.g., access privileges). Except for a reserved part of memory, any code or data structure which currently resides in the computer's physical memory has such a translation. Page tables are usually organized as multi-level, hierarchical tables with four levels being common for 64-bit systems. Therefore multiple sequential memory references are necessary to retrieve the translation of the smallest supported page size. Hardware Translation Lookaside Buffers (TLBs) cache translations which are the result of accessing the page table (page-walk). A TLB access is in the critical path of each instruction fetch and memory reference; the translation is needed to complete the tag comparison in physically-tagged L1 caches. Thus, a short TLB latency is crucial. There are technology trends that compound making TLB performance and energy critical in today's systems. Physical memory sizes and application footprints have been increasing without a commensurate increase in TLB size and thus coverage. As a result, while TLBs still reap the benefits of spatial and temporal locality due to their entries' coarse tracking granularity, they now fall short of growing workload footprints. The use of superpages (i.e., large contiguous virtual memory regions which map to contiguous physical frames) can exten
Abstract. Though more and more researchers have realized the importance of creativity in software development, there are few empirical studies reported on this topic. In this paper, we present an exploratory empirical research in which several issues on creativity in software development are studied, that is, which development phases are perceived to include more creative work, whether or not UML-based documentation can make developers perceive more time is devoted to creative work, whether or not more creative work can accelerate the software development speed, and whether developers more prefer to do the creative work. Based on result analyses, we proposed four hypotheses to direct the future research in this field and discussed the challenge that 'since developers do not like to participate in those improving activities (quality assuring activities), how can we keep and improve software quality effectively and efficiently?'
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.