George Gatuha scite author profile

Association rule data mining is an important technique for finding important relationships in large datasets.Several frequent itemsets mining techniques have been proposed using a prefix-tree structure, FP-tree, a compressed data structure for database representation. The DIFFset data structure has also been shown to significantly reduce the run time and memory utilization of some data mining algorithms. Experimental results have demonstrated the efficiency of the two data structures in frequent itemsets mining. This work proposes FDM, a new algorithm based on FP-tree and DIFFset data structures for efficiently discovering frequent patterns in data. FDM can adapt its characteristics to efficiently mine long and short patterns from both dense and sparse datasets. Several optimization techniques are also outlined to increase the efficiency of FDM. An evaluation of FDM against three frequent itemset data mining algorithms, dEclat, FP-growth, and FDM* (FDM without optimization), was performed using datasets having both long and short frequent patterns. The experimental results show significant improvement in performance compared to the FP-growth, dEclat, and FDM* algorithms.

Android Based Naive Bayes Probabilistic Detection Model for Breast Cancer and Mobile Cloud Computing: Design and Implementation

Gatuha

2015

JERA

Mobile phone technology initiatives are revolutionizing healthcare delivery in Africa and other developing countries. M-health services have transformed maternal health, management of communicable diseases such as Ebola and prevention of chronic diseases. Technological innovations in m-health have improved healthcare efficiency and effectiveness as well as extending health services to remote locations in rural African communities. This paper describes a ubiquitous m- health system that is based on the user centric paradigm of Mobile Cloud Computing (MCC) and android medical-data mining techniques. The development of ultra-fast 4G mobile networks and sophisticated smartphones and tablets has brought the cloud computing paradigm to the mobile domain.The system’s client side is based on an android platform for breast bio-data collection; a data mining technique based on Naïve Bayes probabilistic classifier (NBC) algorithm for predicting malignancy in breast tissue and the server-side MCC data storage. Experimental results indicate that the android Naïve Bayes classifier achieves 96.4% accuracy on Wisconsin Breast Cancer (WBC) data from UCI machine learning database.

KenVACS: Improving Vaccination of Children through Cellular Network Technology in Developing Countries

Gatuha¹,

2015

IJIKM

Health Data collection is one of the major components of public health systems. Decision makers, policy makers, and medical service providers need accurate and timely data in order to improve the quality of health services. The rapid growth and use of mobile technologies has exerted pressure on the demand for mobile-based data collection solutions to bridge the information gaps in the health sector. We propose a prototype using open source data collection frameworks to test its feasibility in improving the vaccination data collection in Kenya. KenVACS, the proposed prototype, offers ways of collecting vaccination data through mobile phones and visualizes the collected data in a web application; the system also sends reminder short messages service (SMS) to remind parents on the date of the next vaccination. Early evaluation demonstrates the benefits of such a system in supporting and improving vaccination of children. Finally, we conducted a qualitative study to assess challenges in remote health data collection and evaluated usability and functionality of KenVACS.

Evaluating Diagnostic Performance of Machine Learning Algorithms on Breast Cancer

Gatuha

2015

Novel Frequent Pattern Mining Algorithm Based on Parallelization Scheme

Gatuha

2016

JERA

Frequent pattern mining (FPM) is a very important technique in data mining and has attracted a wide range of practical applications. Equivalent Class Clustering (Eclat) has been identified as one of the most efficient FPM algorithm. We present P-Eclat, a novel parallel FPM algorithm which is an improvement of the Eclat algorithm, where a partial breadth-first search is employed to achieve maximum parallelism. Our approach uses a TIDset representation of the vertical transaction lists across multiple threads on a CPU. Current parallelization techniques for mining frequent patterns don’t fully utilize benefits accrued from multi core shared memory machines. Our parallel mining approach reduces the synchronization requirements, maximizing independence of data and enhances scalability. We also introduce several optimization techniques to improve the algorithm’s performance. Experimental results show that P-Eclat algorithm outperforms both Eclat and dEclat algorithms.