Blockchain technology has allowed many abnormal schemes to hide behind smart contracts. This causes serious financial losses, which adversely affects the blockchain. Machine learning technology has mainly been utilized to enable automatic detection of abnormal contract accounts in recent years. In spite of this, previous machine learning methods have suffered from a number of disadvantages: first, it is extremely difficult to identify features that enable accurate detection of abnormal contracts, and based on these features, statistical analysis is also ineffective. Second, they ignore the imbalances and repeatability of smart contract accounts, which often results in overfitting of the model. In this paper, we propose a data-driven robust method for detecting abnormal contract accounts over the Ethereum Blockchain. This method comprises hybrid features set by integrating opcode n-grams, transaction features, and term frequency-inverse document frequency source code features to train an ensemble classifier. The extra-trees and gradient boosting algorithms based on weighted soft voting are used to create an ensemble classifier that balances the weaknesses of individual classifiers in a given dataset. The abnormal and normal contract data are collected by analyzing the open source etherscan.io, and the problem of the imbalanced dataset is solved by performing the adaptive synthetic sampling. The empirical results demonstrate that the proposed individual feature sets are useful for detecting abnormal contract accounts. Meanwhile, combining all the features enhances the detection of abnormal contracts with significant accuracy. The experimental and comparative results show that the proposed method can distinguish abnormal contract accounts for the data-driven security of blockchain Ethereum with satisfactory performance metrics.
Breast cancer death rates are higher than any other cancer in American women. Machine learning-based predictive models promise earlier detection techniques for breast cancer diagnosis. However, making an evaluation for models that efficiently diagnose cancer is still challenging. In this work, we proposed data exploratory techniques (DET) and developed four different predictive models to improve breast cancer diagnostic accuracy. Prior to models, four-layered essential DET, e.g., feature distribution, correlation, elimination, and hyperparameter optimization, were deep-dived to identify the robust feature classification into malignant and benign classes. These proposed techniques and classifiers were implemented on the Wisconsin Diagnostic Breast Cancer (WDBC) and Breast Cancer Coimbra Dataset (BCCD) datasets. Standard performance metrics, including confusion matrices and K-fold cross-validation techniques, were applied to assess each classifier’s efficiency and training time. The models’ diagnostic capability improved with our DET, i.e., polynomial SVM gained 99.3%, LR with 98.06%, KNN acquired 97.35%, and EC achieved 97.61% accuracy with the WDBC dataset. We also compared our significant results with previous studies in terms of accuracy. The implementation procedure and findings can guide physicians to adopt an effective model for a practical understanding and prognosis of breast cancer tumors.
Today's growing phishing websites pose significant threats due to their extremely undetectable risk. They anticipate internet users to mistake them as genuine ones in order to reveal user information and privacy, such as login ids, pass-words, credit card numbers, etc. without notice. This paper proposes a new approach to solve the anti-phishing problem. The new features of this approach can be represented by URL character sequence without phishing prior knowledge, various hyperlink information, and textual content of the webpage, which are combined and fed to train the XGBoost classifier. One of the major contributions of this paper is the selection of different new features, which are capable enough to detect 0-h attacks, and these features do not depend on any third-party services. In particular, we extract character level Term Frequency-Inverse Document Frequency (TF-IDF) features from noisy parts of HTML and plaintext of the given webpage. Moreover, our proposed hyperlink features determine the relationship between the content and the URL of a webpage. Due to the absence of publicly available large phishing data sets, we needed to create our own data set with 60,252 webpages to validate the proposed solution. This data contains 32,972 benign webpages and 27,280 phishing webpages. For evaluations, the performance of each category of the proposed feature set is evaluated, and various classification algorithms are employed. From the empirical results, it was observed that the proposed individual features are valuable for phishing detection. However, the integration of all the features improves the detection of phishing sites with significant accuracy. The proposed approach achieved an accuracy of 96.76% with only 1.39% false-positive rate on our dataset, and an accuracy of 98.48% with 2.09% false-positive rate on benchmark dataset, which outperforms the existing baseline approaches.
DNA has evolved as a cutting-edge medium for digital information storage due to its extremely high density and durable preservation to accommodate the data explosion. However, the strings of DNA are prone to errors during the hybridization process. In addition, DNA synthesis and sequences come with a cost that depends on the number of nucleotides present. An efficient model to store a large amount of data in a small number of nucleotides is essential, and it must control the hybridization errors among the base pairs. In this paper, a novel computational model is presented to design large DNA libraries of oligonucleotides. It is established by integrating a neural network (NN) with combinatorial biological constraints, including constant GC-content and satisfying Hamming distance and reverse-complement constraints. We develop a simple and efficient implementation of NNs to produce the optimal DNA codes, which opens the door to applying neural networks for DNA-based data storage. Further, the combinatorial bio-constraints are introduced to improve the lower bounds and to avoid the occurrence of errors in the DNA codes. Our goal is to compute large DNA codes in shorter sequences, which should avoid non-specific hybridization errors by satisfying the bio-constrained coding. The proposed model yields a significant improvement in the DNA library by explicitly constructing larger codes than the prior published codes.
Sentiment analysis or opinion mining is the key to natural language processing for the extraction of useful information from the text documents of numerous sources. Several different techniques, i.e., simple rule-based to lexicon-based and more sophisticated machine learning algorithms, have been widely used with different classifiers to get the factual analysis of sentiment. However, lexicon-based sentiment classification is still suffering from low accuracies, mainly due to the deficiency of domain-oriented competitive dictionaries. Similarly, machine learning-based sentiment is also tackling the accuracy constraints because of feature ambiguity from social data. One of the best ways to deal with the accuracy issue is to select the best feature-set and reduce the volume of the feature. This paper proposes a method (namely, GAWA) for feature selection by utilizing the Wrapper Approaches (WA) to select the premier features and the Genetic Algorithm (GA) to reduce the size of the premier features. The novelty of this work is the modified fitness function of heuristic GA to compute the optimal features by reducing the redundancy for better accuracy. This work aims to present a comprehensive model of hybrid sentiment by using the proposed method, GAWA. It will be valued in developing a new approach for the selection of feature-set with a better accuracy level. The experiments revealed that these techniques could reduce the feature-set upto 61.95% without negotiating the accuracy level. The new optimal feature sets enhanced the efficiency of the Naïve Bayes algorithm up to 92%. This work is compared with the conventional method of feature selection and concluded the 11% better accuracy than PCA and 8% better than PSO. Furthermore, the results are compared with the literature work and found that the proposed method outperformed the previous research.
Supply chain management (SCM) is essential for a company’s faster, efficient, and effective product life cycle. However, the current SCM systems are insufficient to provide product legitimacy, transaction privacy, and security. Therefore, this research proposes a secure SCM system for the authenticity of the products based on the Internet of Things (IoT) and blockchain technology. The IoT-enabled Quick Response (QR) scanner and the blockchain-integrated distributed system will allow all the SCM stakeholders to begin secure and private transactions for their products or services. Resulting, the consumer will receive an authentic and genuine product from the original producer. A lightweight asymmetric key encryption technique, i.e., elliptic curve cryptography (ECC) and Hyperledger Fabric-based blockchain technology with on-chain smart contracts are applied for distributed IoT devices to make the authentication process faster and lighter. Each SCM stakeholder is registered by the service provider and receives corresponding public and private keys, which will be used for the authentication process of the participants and IoT devices. The authenticated QR scanner records all transactions on the blockchain. Consequently, there will be no human intervention for the SCM transactions. The security and scalability analysis demonstrates that the proposed system is more secure and robust than other state-of-the-art techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.