Abdullahi Abubakar Imam scite author profile

Feature selection (FS) is a feasible solution for mitigating high dimensionality problem, and many FS methods have been proposed in the context of software defect prediction (SDP). Moreover, many empirical studies on the impact and effectiveness of FS methods on SDP models often lead to contradictory experimental results and inconsistent findings. These contradictions can be attributed to relative study limitations such as small datasets, limited FS search methods, and unsuitable prediction models in the respective scope of studies. It is hence critical to conduct an extensive empirical study to address these contradictions to guide researchers and buttress the scientific tenacity of experimental conclusions. In this study, we investigated the impact of 46 FS methods using Naïve Bayes and Decision Tree classifiers over 25 software defect datasets from 4 software repositories (NASA, PROMISE, ReLink, and AEEEM). The ensuing prediction models were evaluated based on accuracy and AUC values. Scott–KnottESD and the novel Double Scott–KnottESD rank statistical methods were used for statistical ranking of the studied FS methods. The experimental results showed that there is no one best FS method as their respective performances depends on the choice of classifiers, performance evaluation metrics, and dataset. However, we recommend the use of statistical-based, probability-based, and classifier-based filter feature ranking (FFR) methods, respectively, in SDP. For filter subset selection (FSS) methods, correlation-based feature selection (CFS) with metaheuristic search methods is recommended. For wrapper feature selection (WFS) methods, the IWSS-based WFS method is recommended as it outperforms the conventional SFS and LHS-based WFS methods.

show abstract

New cardinality notations and styles for modeling NoSQL document-store databases

Imam

Basri

Ahmad

et al. 2017

View full text Add to dashboard Cite

Nowadays, data with several characteristics such as volume, variety etc. are generated daily, i.e. big data; its complexity cannot be overemphasized. On the other hand, schema free NoSQL databases keep emerging at almost the same phase to accommodate such data which cannot be efficiently managed by relational databases. However, this advancement brings about the challenge to model such flexible databases and capably manage big data despite its complexity. In doing that, developers tend to apply their relational modeling skills; nonetheless, such skills may not be directly compatible with NoSQL databases due to their schema flexibility, linear scalability among others. To alleviate this difficulty, we propose a standard for modeling NoSQL databases, document-stores in particular. The standard can be classified as i) cardinality notations, and ii) relationship modeling styles. With such standard, NoSQL document-store databases can be properly designed, automated database testing can be applied, and database performance and stability can be considerably improved. To achieve this, experimental method is applied. Also, exploratory approach was used to explore the available literature as well as experts consultations. All possible entity relationships were extracted, aggregated and compiled from a heuristic evaluation of existing 4 different document-store databases. An experiment was conducted to assess the effect of the proposed standards, results indicate a profound improvements in various aspect of document modeling when the proposed standards are adopted, especially in a large scaled databases.

show abstract

Data Modeling Guidelines for NoSQL Document-Store Databases

Imam¹,

Basri²,

Ahmad³

et al. 2018

ijacsa

View full text Add to dashboard Cite

Good database design is key to high data availability and consistency in traditional databases, and numerous techniques exist to abet designers in modeling schemas appropriately. These schemas are strictly enforced by traditional database engines. However, with the emergence of schema-free databases (NoSQL) coupled with voluminous and highly diversified datasets (big data), such aid becomes even more important as schemas in NoSQL are enforced by application developers, which requires a high level of competence. Precisely, existing modeling techniques and guides used in traditional databases are insufficient for bigdata storage settings. As a synthesis, new modeling guidelines for NoSQL document-store databases are posed. These guidelines cut across both logical and physical stages of database designs. Each is developed based on solid empirical insights, yet they are prepared to be intuitive to developers and practitioners. To realize this goal, we employ an exploratory approach to the investigation of techniques, empirical methods and expert consultations. We analyze how industry experts prioritize requirements and analyze the relationships between datasets on the one hand and error prospects and awareness on the other hand. Few proprietary guidelines were extracted from a heuristic evaluation of 5 NoSQL databases. In this regard, the proposed guidelines have great potential to function as an imperative instrument of knowledge transfer from academia to NoSQL database modeling practices.

show abstract

Rank Aggregation Based Multi-filter Feature Selection Method for Software Defect Prediction

Balogun

Basri

Abdulkadir

et al. 2021

View full text Add to dashboard Cite

Empirical Analysis of Rank Aggregation-Based Multi-Filter Feature Selection Methods in Software Defect Prediction

et al. 2021

View full text Add to dashboard Cite

Selecting the most suitable filter method that will produce a subset of features with the best performance remains an open problem that is known as filter rank selection problem. A viable solution to this problem is to independently apply a mixture of filter methods and evaluate the results. This study proposes novel rank aggregation-based multi-filter feature selection (FS) methods to address high dimensionality and filter rank selection problem in software defect prediction (SDP). The proposed methods combine rank lists generated by individual filter methods using rank aggregation mechanisms into a single aggregated rank list. The proposed methods aim to resolve the filter selection problem by using multiple filter methods of diverse computational characteristics to produce a dis-joint and complete feature rank list superior to individual filter rank methods. The effectiveness of the proposed method was evaluated with Decision Tree (DT) and Naïve Bayes (NB) models on defect datasets from NASA repository. From the experimental results, the proposed methods had a superior impact (positive) on prediction performances of NB and DT models than other experimented FS methods. This makes the combination of filter rank methods a viable solution to filter rank selection problem and enhancement of prediction models in SDP.

show abstract

HABCSm: A Hamming Based t-way Strategy based on Hybrid Artificial Bee Colony for Variable Strength Test Sets Generation

Alazzawi¹,

Rais²,

Basri³

et al. 2021

INT J COMPUT COMMUN, Int. J. Comput. Commun. Control

View full text Add to dashboard Cite

Search-based software engineering that involves the deployment of meta-heuristics in applicable software processes has been gaining wide attention. Recently, researchers have been advocating the adoption of meta-heuristic algorithms for t-way testing strategies (where t points the interaction strength among parameters). Although helpful, no single meta-heuristic based t-way strategy can claim dominance over its counterparts. For this reason, the hybridization of meta-heuristic algorithms can help to ascertain the search capabilities of each by compensating for the limitations of one algorithm with the strength of others. Consequently, a new meta-heuristic based t-way strategy called Hybrid Artificial Bee Colony (HABCSm) strategy, based on merging the advantages of the Artificial Bee Colony (ABC) algorithm with the advantages of a Particle Swarm Optimization (PSO) algorithm is proposed in this paper. HABCSm is the first t-way strategy to adopt Hybrid Artificial Bee Colony (HABC) algorithm with Hamming distance as its core method for generating a final test set and the first to adopt the Hamming distance as the final selection criterion for enhancing the exploration of new solutions. The experimental results demonstrate that HABCSm provides superior competitive performance over its counterparts. Therefore, this finding contributes to the field of software testing by minimizing the number of test cases required for test execution.

show abstract

Automatic schema suggestion model for NoSQL document-stores databases

et al. 2018

View full text Add to dashboard Cite

New generation databases also called NoSQL (Not only SQL) databases are highly scalable, flexible, and low-latent. These types of databases emerge as a result of the rigidity shown by traditional databases to handle today's data which is voluminous, highly diversified and generated at a very high rate. With NoSQL, problems such as database expansion difficulties, low query performance and low storage capacity are addressed. However, the inherent complexity of contemporary datasets coupled with programmers' low NoSQL modeling competence are increasingly making database modeling and design vastly challenging, especially when parameters like consistency, availability and scalability are to be balanced in accordance with system requirements. As such, a schema suggestion model for NoSQL databases is posed to address this balancing issue. The proposed model aims to abstractly suggest schemas at the initial stage of system development based on user defined system requirements and CRUD (Create, Read, Update and Delete) operations among others. This is achieved through the adaptation of exploratory and experimental approaches of research. Also, few mathematical formulas are introduced to calculate clusters availability during entity mappings. A comparison was conducted between the schema produced using the proposed model and the one without. Results obtained shows substantial improvement in the areas of security and read-write query performance.

show abstract

The Organisational Factors of Software Process Improvement in Small Software Industry: Comparative Study

Basri

Almomani

Imam

et al. 2019

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.