BackgroundVarious kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly.MethodsIn this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets.ResultsThe two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task.ConclusionsNo algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.
An SQL Injection Attack (SQLIA) is a major cyber security threat to Web services, and its different stages can cause different levels of damage to an information system. Attackers can construct complex and diverse SQLIA statements, which often cause most existing inbound-based detection methods to have a high false-negative rate when facing deformed or unknown SQLIA statements. Although some existing works have analyzed different features for the stages of SQLIA from the perspectives of attackers, they primarily focus on stage analysis rather than different stages’ identification. To detect SQLIA and identify its stages, we analyze the outbound traffic from the Web server and find that it can differentiate between SQLIA traffic and normal traffic, and the outbound traffic generated during the two stages of SQLIA exhibits distinct characteristics. By employing 13 features extracted from outbound traffic, we propose an SQLIA detection and stage identification method based on outbound traffic (SDSIOT), which is a two-phase method that detects SQLIAs in Phase I and identifies their stages in Phase II. Importantly, it does not need to analyze the complex and diverse malicious statements made by attackers. The experimental results show that SDSIOT achieves an accuracy of 98.57% for SQLIA detection and 94.01% for SQLIA stage identification. Notably, the accuracy of SDSIOT’s SQLIA detection is 8.22 percentage points higher than that of ModSecurity.
The camera calibration is a key step for converting a projective reconstruction into a metric one, which is equivalent to recovering the unknown intrinsic parameters with each image. A circle is a common geometric primitive for the camera self-calibration. To avoid the limit of circle center to the camera self-calibration in a planar template, a method how to solve out the vanishing line is proposed. Then using the property of vanishing line, the camera intrinsic parameters are figured out and the camera self-calibration is achieved. The template contains a quartered circle in the study. Firstly, the camera is used to take photographs of the template from three or more direction. Secondly, using the ellipse and two mutually perpendicular diameters which are extracted from the image, according to the polarity principle and making use of the invariant property of cross-ratio of four lines which are intersected in one same point, the vanishing line can be solved out. In the end, the circular points can be figured out by the intersection between vanishing line and the circle. And using the property of the circular points, the selfcalibration is realized.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.