Win Tsung Lo scite author profile

Chung

et al. 2021

Diagnostics

Background: Antinuclear antibody pattern recognition is vital for autoimmune disease diagnosis but labor-intensive for manual interpretation. To develop an automated pattern recognition system, we established machine learning models based on the International Consensus on Antinuclear Antibody Patterns (ICAP) at a competent level, mixed patterns recognition, and evaluated their consistency with human reading. Methods: 51,694 human epithelial cells (HEp-2) cell images with patterns assigned by experienced medical technologists collected in a medical center were used to train six machine learning algorithms and were compared by their performance. Next, we choose the best performing model to test the consistency with five experienced readers and two beginners. Results: The mean F1 score in each classification of the best performing model was 0.86 evaluated by Testing Data 1. For the inter-observer agreement test on Testing Data 2, the average agreement was 0.849 (?) among five experienced readers, 0.844 between the best performing model and experienced readers, 0.528 between experienced readers and beginners. The results indicate that the proposed model outperformed beginners and achieved an excellent agreement with experienced readers. Conclusions: This study demonstrated that the developed model could reach an excellent agreement with experienced human readers using machine learning methods.

CUDT: A CUDA Based Decision Tree Algorithm

Chang

The Scientific World Journal

et al. 2014

Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture), which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

Design and Implementation of File Deduplication Framework on HDFS

International Journal of Distributed Sensor Networks

Yuan

et al. 2014

File systems are designed to control how files are stored and retrieved. Without knowing the context and semantics of file contents, file systems often contain duplicate copies and result in redundant consumptions of storage space and network bandwidth. It has been a complex and challenging issue for enterprises to seek deduplication technologies to reduce cost and increase the storage efficiency. To solve such problem, researchers proposed in-line or offline solutions for primary storages or backup systems at the subfile or whole-file level. Some of the technologies are used for file servers and database systems. Fewer studies focus on the cloud file system deduplication technologies at the application level, especially for the Hadoop distributed file system. It is the goal of this paper to design a file deduplication framework on Hadoop distributed file system for cloud application developers. The architecture, interface, and implementation experiences are also shared in this paper.

Automatic cloud service testing and bottleneck detection system with scaling recommendation

Liu

Concurrency and Computation

et al. 2019

Performance problems in a cloud service are difficult to diagnose because they may be caused by various system components. This study proposes an automatic cloud service testing and bottleneck detection system that is applicable to different types of services. With the proposed test module, the user can customize test scenarios to automatically test and collect the corresponding metrics of the target service. Afterward, the proposed bottleneck detection algorithm analyzes the collected metrics and determines whether a bottleneck is presented in the target system. The bottleneck detection module also provides a scaling recommendation for the service provider to facilitate the service system reconfiguration. The experimental results reveal that the proposed system could detect a potential bottleneck in a service system accurately. In accordance with the scaling recommendation, the performance of the target cloud service can be improved efficiently after reconfiguration. Therefore, usage of the proposed system can ensure a high quality of service, and the service level objective could be fulfilled.