Xiaoyang Sun scite author profile

This study aims to develop a computer-aided diagnosis (CADx) scheme for classification between malignant and benign lung nodules, and also assess whether CADx performance changes in detecting nodules associated with early and advanced stage lung cancer. The study involves 243 biopsy-confirmed pulmonary nodules. Among them, 76 are benign, 81 are stage I and 86 are stage III malignant nodules. The cases are separated into three data sets involving: (1) all nodules, (2) benign and stage I malignant nodules, and (3) benign and stage III malignant nodules. A CADx scheme is applied to segment lung nodules depicted on computed tomography images and we initially computed 66 3D image features. Then, three machine learning models namely, a support vector machine, naïve Bayes classifier and linear discriminant analysis, are separately trained and tested by using three data sets and a leave-one-case-out cross-validation method embedded with a Relief-F feature selection algorithm. When separately using three data sets to train and test three classifiers, the average areas under receiver operating characteristic curves (AUC) are 0.94, 0.90 and 0.99, respectively. When using the classifiers trained using data sets with all nodules, average AUC values are 0.88 and 0.99 for detecting early and advanced stage nodules, respectively. AUC values computed from three classifiers trained using the same data set are consistent without statistically significant difference (p > 0.05). This study demonstrates (1) the feasibility of applying a CADx scheme to accurately distinguish between benign and malignant lung nodules, and (2) a positive trend between CADx performance and cancer progression stage. Thus, in order to increase CADx performance in detecting subtle and early cancer, training data sets should include more diverse early stage cancer cases.

show abstract

Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

Yang

Sun

et al. 2020

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

It is a long-standing challenge to achieve a high degree of resource utilization in cluster scheduling. Resource oversubscription has become a common practice in improving resource utilization and cost reduction. However, current centralized approaches to oversubscription suffer from the issue with resource mismatch and fail to take into account other performance requirements, e.g., tail latency. In this paper we present ROSE, a new resource management platform capable of conducting performance-aware resource oversubscription. ROSE allows latency-sensitive long-running applications (LRAs) to co-exist with computation-intensive batch jobs. Instead of waiting for resource allocation to be confirmed by the centralized scheduler, job managers in ROSE can independently request to launch speculative tasks within specific machines according to their suitability for oversubscription. Node agents of those machines can however avoid any excessive resource oversubscription by means of a mechanism for admission control using multi-resource threshold control and performance-aware resource throttle. Experiments show that in case of mixed co-location of batch jobs and latency-sensitive LRAs, the CPU utilization and the disk utilization can reach 56.34% and 43.49%, respectively, but the 95th percentile of read latency in YCSB workloads only increases by 5.4% against the case of executing the LRAs alone.

show abstract

ROSE: Cluster Resource Scheduling via Speculative Over-Subscription

Sun

Yang

et al. 2018

View full text Add to dashboard Cite

Abstract-A long-standing challenge in cluster scheduling is to achieve a high degree of utilization of heterogeneous resources in a cluster. In practice there exists a substantial disparity between perceived and actual resource utilization. A scheduler might regard a cluster as fully utilized if a large resource request queue is present, but the actual resource utilization of the cluster can be in fact very low. This disparity results in the formation of idle resources, leading to inefficient resource usage and incurring high operational costs and an inability to provision services. In this paper we present a new cluster scheduling system, ROSE, that is based on a multi-layered scheduling architecture with an ability to over-subscribe idle resources to accommodate unfulfilled resource requests. ROSE books idle resources in a speculative manner: instead of waiting for resource allocation to be confirmed by the centralized scheduler, it requests intelligently to launch tasks within machines according to their suitability to oversubscribe resources. A threshold control with timely task rescheduling ensures fully-utilized cluster resources without generating potential task stragglers. Experimental results show that ROSE can almost double the average CPU utilization, from 36.37% to 65.10%, compared with a centralized scheduling scheme, and reduce the workload makespan by 30.11%, with an 8.23% disk utilization improvement over other scheduling strategies.Index Terms-cluster scheduling, resource management, oversubscription

show abstract

Robust Face Recognition With Kernelized Locality-Sensitive Group Sparsity Representation

Tan

Sun²,

Chan

et al. 2017

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Abstract-In this paper, a novel joint sparse representation method is proposed for robust face recognition. We embed both group sparsity and kernelized locality-sensitive constraints into the framework of sparse representation. The group sparsity constraint is designed to utilize the grouped structure information in the training data. The local similarity between test and training data is measured in the kernel space instead of the Euclidian space. As a result, the embedded nonlinear information can be effectively captured, leading to a more discriminative representation. We show that, by integrating the kernelized localsensitivity constraint and the group sparsity constraint, the embedded structure information can be better explored, and significant performance improvement can be achieved. On the one hand, experiments on the ORL, AR, extended Yale B, and LFW data sets verify the superiority of our method. On the other hand, experiments on two unconstrained data sets, the LFW and the IJB-A, show that the utilization of sparsity can improve recognition performance, especially on the data sets with large pose variation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiaoyang Sun

Automatic detection of pulmonary nodules in CT images by incorporating 3D tensor filtering with local image feature analysis

Computer-aided diagnosis of lung cancer: the effect of training data sets on classification accuracy of lung nodules

Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters

ROSE: Cluster Resource Scheduling via Speculative Over-Subscription

Robust Face Recognition With Kernelized Locality-Sensitive Group Sparsity Representation

Contact Info

Product

Resources

About