Abstract-Rapid progress in digital data acquisition techniques have led to huge volume of data. More than 80 percent of today's data is composed of unstructured or semi-structured data. The discovery of appropriate patterns and trends to analyze the text documents from massive volume of data is a big issue. Text mining is a process of extracting interesting and nontrivial patterns from huge amount of text documents. There exist different techniques and tools to mine the text and discover valuable information for future prediction and decision making process. The selection of right and appropriate text mining technique helps to enhance the speed and decreases the time and effort required to extract valuable information. This paper briefly discuss and analyze the text mining techniques and their applications in diverse fields of life. Moreover, the issues in the field of text mining that affect the accuracy and relevance of results are identified.
Blockchain is an emerging field which works on the concept of a digitally distributed ledger and consensus algorithm removing all the threats of intermediaries. Its early applications were related to the finance sector but now this concept has been extended to almost all the major areas of research including education, IoT, banking, supplychain, defense, governance, healthcare, etc. In the field of healthcare, stakeholders (provider, patient, payer, research organizations, and supply chain bearers) demand interoperability, security, authenticity, transparency, and streamlined transactions. Blockchain technology, built over the internet, has the potential to use the current healthcare data into peer to peer and interoperable manner by using a patient-centric approach eliminating the third party. Using this technology, applications can be built to manage and share secure, transparent and immutable audit trails with reduced systematic fraud. This study reviews existing literature in order to identify the major issues of various healthcare stakeholders and to explore the features of blockchain technology that could resolve identified issues. However, there are some challenges and limitations of this technology which are needed to be focused on future research.
This research reveals that from among the total 37 epidemiological weeks, the maximum impact was observed between weeks 22 and 27. The geographical flow and hotspots associated with dengue have been shown through thematic maps. A positive correlation between the risk for dengue and age was observed. The findings of this research can help health officials and decision-makers alert the public about future outbreaks and take preventive measures to considerably reduce the mortality and morbidity associated with the disease.
In natural language processing, text summarization is an important application used to extract desired information by reducing large text. Existing studies use keyword-based algorithms for grouping text, which do not give the documents' actual theme. Our proposed dynamic corpus creation mechanism combines metadata with summarized extracted text. The proposed approach analyzes the mesh of multiple unstructured documents and generates a linked set of multiple weighted nodes by applying multistage Clustering. We have generated adjacency graphs to link the clusters of various collections of documents. This approach comprises of ten steps: pre-processing, making multiple corpuses, first stage clustering, creating sub-corpuses, interlinking sub-corpuses, creating page rank keyword dictionary of each sub-corpus, second stage clustering, path creation among clusters of sub-corpuses, text processing by forward and backward propagation for results generation. The outcome of this technique consists of interlinked subcorpuses through clusters. We have applied our approach to a News dataset, and this interlinked corpus processing follows step by step clustering to search the most relevant parts of the corpus with less cost, time, and improve content detection. We have applied six different metadata processing combinations over multiple text queries to compare results during our experimentation. The comparison results of text satisfaction show that Page-Rank keywords give 38% related text, single-stage Clustering gives 46%, twostage Clustering gives 54%, and the proposed technique gives 67% associated text. Furthermore, this approach covers/searches the relevant data with a range of most to less relevant content. It provides the systematic query-relevant corpus processing mechanism, which automatically selects the most relevant subcorpus through dynamic path selection. We used the SHAP model to evaluate the proposed technique, and our evaluation results proved that the proposed mechanism improved text processing. Moreover, combining text summarization features, shown satisfactory results compared to the summaries generated by general models of abstractive & extractive summarization.
The success of data mining learned rules highly depends on its actionability: how useful it is to perform suitable actions in any real business environment. To improve rule actionability, different researchers have initially presented various Data Mining (DM) frameworks by focusing on different factors only from the business domain dataset. Afterward, different Domain-Driven Data Mining (D3M) frameworks were introduced by focusing on domain knowledge factors from the context of the overall business environment. Despite considering these several dataset factors and domain knowledge factors in different phases of their frameworks, the learned rules still lacked actionability. The objective of our research is to improve the learned rules' actionability. For this purpose, we have analyzed: (1) what overall actions or tasks are being performed in the overall business process, (2) in which sequence different tasks are being performed, (3) under what certain conditions these tasks are being performed, (4) by whom the tasks are being performed (5) what data is provided and produced in performing these tasks. We observed that the inclusion of rule learning factors only from dataset or from domain knowledge is not sufficient. Our Process-based Domain-Driven Data Mining-Actionable Knowledge Discovery (PD3M-AKD) framework explains its different phases to consider and include additional factors from five perspectives of the business process. This PD3M-AKD framework is also in line with the existing phases of current DM and D3M frameworks for considering and including dataset and domain knowledge accordingly. Finally, we evaluated and validated our case study results from different real-life scenarios from education, engineering, and business process domains at the end. INDEX TERMS Actionable knowledge, business process, data mining, data mining framework, domain-driven data mining framework, data privacy.
Darknet is commonly known as the epicenter of illegal online activities. An analysis of darknet traffic is essential to monitor real-time applications and activities running over the Darknet. Recognizing network traffic bound to unused Internet addresses has become undeniably significant for identifying and examining malicious activities on the internet. Since there are no authentic hosts or devices in an unused address block, any observed network traffic must be the aftereffect of misconfiguration from spoofed source addressed and other frameworks that monitor unused address space. However, the recent advancements in artificial intelligence allow digital systems to detect and identify darknet traffic autonomously. In this paper, we propose a generalized approach for darknet traffic detection and categorization using Deep Learning. We examine the state-of-the-art complex dataset, which provides excessive information about the darknet traffic and perform data preprocessing. Next, we analyze diverse feature selection techniques to select optimal features for darknet traffic detection and categorization. We apply fine-tuned machine learning (ML) algorithms which include Decision Tree (DT), Gradient Boosting (GB), Random Forest Regressor (RFR), and Extreme Gradient Boosting (XGB) on selected features and compare the performance. Next, we apply modified Convolution-Long Short-Term Memory (CNN-LSTM) and Convolution-Gradient Recurrent Unit (CNN-GRU) deep learning techniques to recognize the network traffic more accurately. The results demonstrate that the proposed approach outperforms the existing approaches by yielding the maximum accuracy of 96% of darknet traffic detection and 89% of darknet traffic categorization through XGB as a feature selection approach and CNN-LSTM a recognition model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.