Increasing progress in numerous research fields and information technologies, led to an increase in the publication of research papers. Therefore, researchers take a lot of time to find interesting research papers that are close to their field of specialization. Consequently, in this paper we have proposed documents classification approach that can cluster the text documents of research papers into the meaningful categories in which contain a similar scientific field. Our presented approach based on essential focus and scopes of the target categories, where each of these categories includes many topics. Accordingly, we extract word tokens from these topics that relate to a specific category, separately. The frequency of word tokens in documents impacts on weight of document that calculated by using a numerical statistic of term frequency-inverse document frequency (TF-IDF). The proposed approach uses title, abstract, and keywords of the paper, in addition to the categories topics to perform the classification process. Subsequently, documents are classified and clustered into the primary categories based on the highest measure of cosine similarity between category weight and documents weights.
Sensitive data may be stored in different forms. Not only legal owners but also malicious people are interesting of getting sensitive data. Exposing valuable data to others leads to severe Consequences. Customers, organizations, and /or companies lose their money and reputation due to data breaches. There are many reasons for data leakages. Internal threats such as human mistakes and external threats such as DDoS attacks are two main reasons for data loss. In general, data may be categorized based into three kinds: data in use, data at rest, and data in motion. Data Loss Prevention (DLP) are good tools to identify important data. DLP can do analysis for data content and send feedback to administrators to make decision such as filtering, deleting, or encryption. Data Loss Prevention (DLP) tools are not a final solution for data breaches, but they consider good security tools to eliminate malicious activities and protect sensitive information. There are many kinds of DLP techniques, and approximation matching is one of them. Mrsh-v2 is one type of approximation matching. It is implemented and evaluated by using TS dataset and confusion matrix. Finally, Mrsh-v2 has high score of true positive and sensitivity, and it has low score of false negative.
The demand for e-learning services increased during the developments of the COVID-19 virus and its rapid spread, and the recommendations of the World Health Organization (WHO) that social distancing should be required. The rapid transition to the e-learning environment quickly led to the neglect of some security aspects, which led to an increase in cyber attacks targeting computer accounts, which is one of the most important pillars of e-learning. In these papers, the attacks that target the cloud computer used in the most important e-learning have been studied and classified according to the victim using an inductive methodology based on global statistics related to cyber attacks and recent research. And suggest appropriate solutions to avoid its occurrence in the near future and raise the level of protection for those computer clouds.
Breast cancer is becoming a global epidemic, affecting predominantly women. As a result, the number of people diagnosed with breast cancer is increasing every day. As a result, it is critical to have certain early detection methods in place that can assist patients in recognizing this condition at an early stage. Therefore, they might begin taking their medication to prevent the sickness from killing them. Different prediction approaches for early diagnosis of such diseases have been created in the machine learning fields. Those algorithms employ a variety of computational classifiers and claim to achieve satisfactory results in a few areas. However, no research was reached to determine which computationally sophisticated approach is more effective in detecting breast cancer. As a result, it is necessary to select the most effective strategy from the available options. This paper makes a contribution to the performance evaluation of 12 alternative classification strategies on datasets of breast cancer. The right explanations for the classifiers' dominance were investigated.
Technology world has greatly evolved over the past decades, which led to inflated data volume. This progress of technology in the digital form generated scattered texts across millions of web pages. Unstructured texts contain a vast amount of textual data. Discover of useful and interesting relations from unstructured texts requires more processing by computers. Therefore, text mining and information extraction have become an exciting research field to get structured and valuable information. This paper focuses on text pre-processing of automotive advertisements domains to configure a structured database. The structured database was created by extract the information over unstructured automotive advertisements, which is an area of natural language processing. Information extraction deals with finding factual information in text using learning regular expressions. We manually craft rule-based specific approaches to extract structured information from unstructured web pages. Structured information will be provided by user-friendly search engine designed for topic-specific knowledge. Consequently, this information that extracted from these advertisements uses to perform a structured search over certain interesting attributes. Thus, the tuples are assigned a probability and indexed to support the efficiency of extraction and exploration via user queries.
Increased advancement in a variety of study subjects and information technologies, has increased the number of published research articles. However, researchers are facing difficulties and devote a significant time amount in locating scientific research publications relevant to their domain of expertise. In this article, an approach of document classification is presented to cluster the text documents of research articles into expressive groups that encompass a similar scientific field. The main focus and scopes of target groups were adopted in designing the proposed method, each group include several topics. The word tokens were separately extracted from topics related to a single group. The repeated appearance of word tokens in a document has an impact on the document's weight, which is computed using the term frequency-inverse document frequency (TF-IDF) numerical statistic. To perform the categorization process, the proposed approach employs the paper's title, abstract, and keywords, as well as the categories' topics. We exploited the K-means clustering algorithm for classifying and clustering the documents into primary categories. The K-means algorithm uses category weights to initialize the cluster centers (or centroids). Experimental results have shown that the suggested technique outperforms the k-nearest neighbors algorithm in terms of accuracy in retrieving information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.