Corona pandemic has affected the whole world, and it is a highly researched area in biological sciences. As the current pandemic has affected countries socially and economically, the purpose of this bibliometric analysis is to provide a holistic review of the corona pandemic in the field of social sciences. This study aims to highlight significant, influential aspects, research streams, and themes. We have reviewed 395 journal articles related to coronavirus in the field of social sciences from 2003 to 2020. We have deployed 'biblioshiny' a web-interface of the 'bibliometrix 3.0' package of R-studio to conduct bibliometric analysis and visualization. In the field of social sciences, we have reported influential aspects of coronavirus literature. We have found that the 'Morbidity and Mortality Weekly Report' is the top journal. The core article of coronavirus literature is 'Guidelines for preventing health-care-associated pneumonia'. The most commonly used word, in titles, abstracts, author's keywords, and keywords plus, is 'SARS'. Top affiliation is 'The University of Hong Kong'. Hong Kong is a leading country based on citations, and the USA is on top based on total publications. We have used a conceptual framework to identify potential research streams and themes in coronavirus literature. Four research streams are found by deploying a co-occurrence network. These research streams are 'Social and economic effects of epidemic disease', 'Infectious disease calamities and control', 'Outbreak of COVID 19,' and 'Infectious diseases and the role of international organizations'. Finally, a thematic map is used to provide a holistic understanding by dividing significant themes into basic or transversal, emerging or declining, motor, highly developed, but isolated themes. These themes and subthemes have proposed future directions and critical areas of research.
Medical datasets are usually imbalanced, where negative cases severely outnumber p osit iv e cases. Therefore, it is essential to deal with this data skew problem when training machine learning algorithms. This study uses two representative lung cancer datasets, PLCO an d NLST, wit h imb alan ce ratios (the proportion of samples in the majority class to those in the minority class) of 24.7 and 25.0, respectively, to predict lung cancer incidence. This research uses the performance o f 23 clas s imb alan ce methods (resampling and hybrid systems) with three classical classifiers (logistic regression, random forest, and LinearSVC) to identify the best imbalance techniques suitable for medical datasets. Resampling includes ten under-sampling methods (RUS, Etc.), seven over-sampling methods (SMOTE, Etc.), an d t wo integrated sampling methods (SMOTEENN, SMOTE-Tomek). Hybrid systems include (Balanced Bagging, Etc.). The results show that class imbalance learning can improve the classification abilit y o f t h e mo d el. Compared with other imbalanced techniques, under-sampling techniques have the highest standard deviation (SD), and over-sampling techniques have the lowest SD. Over-sampling is a stable met h od, an d the AUC in the model is generally higher than in other ways. Using ROS, the random forest p erforms t h e best predictive ability and is more suitable for the lung cancer datasets used in this study.
In our work, we have presented two widely used recommendation systems. We have presented a context-aware recommender system to filter the items associated with user’s interests coupled with a context-based recommender system to prescribe those items. In this study, context-aware recommender systems perceive the user’s location, time, and company. The context-based recommender system retrieves patterns from World Wide Web-based on the user’s past interactions and provides future news recommendations. We have presented different techniques to support media recommendations for smartphones, to create a framework for context-aware, to filter E-learning content, and to deliver convenient news to the user. To achieve this goal, we have used content-based, collaborative filtering, a hybrid recommender system, and implemented a Web ontology language (OWL). We have also used the Resource Description Framework (RDF), JAVA, machine learning, semantic mapping rules, and natural ontology languages that suggest user items related to the search. In our work, we have used E-paper to provide users with the required news. After applying the semantic reasoning approach, we have concluded that by some means, this approach works similarly as a content-based recommender system since by taking the gain of a semantic approach, we can also recommend items according to the user’s interests. In a content-based recommender system, the system provides additional options or results that rely on the user’s ratings, appraisals, and interests.
Financial threats are displaying a trend about the credit risk of commercial banks as the incredible improvement in the financial industry has arisen. In this way, one of the biggest threats faces by commercial banks is the risk prediction of credit clients. Recent studies mostly focus on enhancing the classifier performance for credit card default prediction rather than an interpretable model. In classification problems, an imbalanced dataset is also crucial to improve the performance of the model because most of the cases lied in one class, and only a few examples are in other categories. Traditional statistical approaches are not suitable to deal with imbalanced data. In this study, a model is developed for credit default prediction by employing various credit-related datasets. There is often a significant difference between the minimum and maximum values in different features, so Min-Max normalization is used to scale the features within one range. Data level resampling techniques are employed to overcome the problem of the data imbalance. Various undersampling and oversampling methods are used to resolve the issue of class imbalance. Different machine learning models are also employed to obtain efficient results. We developed the hypothesis of whether developed models using different machine learning techniques are significantly the same or different and whether resampling techniques significantly improves the performance of the proposed models. Oneway Analysis of Variance is a hypothesis-testing technique, used to test the significance of the results. The split method is utilized to validate the results in which data has split into training and test sets. The results on imbalanced datasets show the accuracy of 66.9% on Taiwan clients credit dataset, 70.7% on South German clients credit dataset, and 65% on Belgium clients credit dataset. Conversely, the results using our proposed methods significantly improve the accuracy of 89% on Taiwan clients credit dataset, 84.6% on South German clients credit dataset, and 87.1% on Belgium clients credit dataset. The results show that the performance of classifiers is better on the balanced dataset as compared to the imbalanced dataset. It is also observed that the performance of data oversampling techniques are better than undersampling techniques. Overall, the Gradient Boosted Decision Tree method performs better than other traditional machine learning classifiers. The Gradient Boosted Decision Tree method gives the best results while utilizing the K-means SMOTE oversampling method. Using one-way ANOVA, the null hypothesis was rejected by a p-value <0.001, hence confirming that the proposed model improved performance is statistical significance. The interpretable model is also deployed on the web to ease the different stakeholders. This model will help commercial banks, financial organizations, loan institutes, and other decision-makers to predict the loan defaulter earlier.
Cervical cancer remains an important reason of deaths worldwide because effective access to cervical screening methods is a big challenge. Data mining techniques including decision tree algorithms are used in biomedical research for predictive analysis. The imbalanced dataset was obtained from the dataset archive belongs to the University of California, Irvine. Synthetic Minority Oversampling Technique (SMOTE) has been used to balance the dataset in which the number of instances has increased. The dataset consists of patient age, number of pregnancies, contraceptives usage, smoking patterns and chronological records of sexually transmitted diseases (STDs). Microsoft azure machine learning tool was used for simulation of results. This paper mainly focuses on cervical cancer prediction through different screening methods using data mining techniques like Boosted decision tree, decision forest and decision jungle algorithms as well performance evaluation has done on the basis of AUROC (Area under Receiver operating characteristic) curve, accuracy, specificity and sensitivity. 10-fold cross-validation method was utilized to authenticate the results and Boosted decision tree has given the best results. Boosted decision tree provided very high prediction with 0.978 on AUROC curve while Hinslemann screening method has used. The results obtained by other classifiers were significantly worse than boosted decision tree.
Bilharzia or schistosomiasis is one of the most fatal and factitious disease happens through pollute which become a significant reason of deaths in the world. Prediction and factors identification that become causes of disease in early stage, may escort to treatment before it becomes critical. Data mining techniques are used to assist medical professionals effectively in diseases' classification. This research investigates the recovery and death factors which contributes to schistosomiasis disease preprocessed dataset, collected from Hubei, China. A computerized learning method, association rule mining (Apriori) is used to spot factors. Different tools were used for analysis and model evaluation with minimum support and minimum confidence indicated higher than 90% to generate rules. In addition, attributes indicating recovery and death of individuals were identified. Strong associations of disease factors; BMI, viability, nourishment, extent to ascites etc. determined and classified through Apriori algorithm. Further, results generated by association rule mining method may useful for professionals in treatment decision with better precision.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.