Recommender systems have the ability to filter unseen information for predicting whether a particular user would prefer a given item when making a choice. Over the years, this process has been dependent on robust applications of data mining and machine learning techniques, which are known to have scalability issues when being applied for recommender systems. In this paper, we propose a k-means clustering-based recommendation algorithm, which addresses the scalability issues associated with traditional recommender systems. An issue with traditional k-means clustering algorithms is that they choose the initial k centroid randomly, which leads to inaccurate recommendations and increased cost for offline training of clusters. The work in this paper highlights how centroid selection in k-means based recommender systems can improve performance as well as being cost saving. The proposed centroid selection method has the ability to exploit underlying data correlation structures, which has been proven to exhibit superior accuracy and performance in comparison to the traditional centroid selection strategies, which choose centroids randomly. The proposed approach has been validated with an extensive set of experiments based on five different datasets (from movies, books, and music domain). These experiments prove that the proposed approach provides a better quality cluster and converges quicker than existing approaches, which in turn improves accuracy of the recommendation provided.
Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20–30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1.
Abstract. This paper describes the advantages of using the anomaly detection approach over the misuse detection technique in detecting unknown network intrusions or attacks. It also investigates the performance of various clustering algorithms when applied to anomaly detection. Five different clustering algorithms: k-Means, improved k-Means, k-Medoids, EM clustering and distance-based outlier detection algorithms are used. Our experiment shows that misuse detection techniques, which implemented four different classifiers (naïve Bayes, rule induction, decision tree and nearest neighbour) failed to detect network traffic, which contained a large number of unknown intrusions; where the highest accuracy was only 63.97% and the lowest false positive rate was 17.90%. On the other hand, the anomaly detection module showed promising results where the distance-based outlier detection algorithm outperformed other algorithms with an accuracy of 80.15%. The accuracy for EM clustering was 78.06%, for k-Medoids it was 76.71%, for improved k-Means it was 65.40% and for k-Means it was 57.81%. Unfortunately, our anomaly detection module produces high false positive rate (more than 20%) for all four clustering algorithms. Therefore, our future work will be more focus in reducing the false positive rate and improving the accuracy using more advance machine learning techniques.Keywords: k-Means, EM clustering, k-medoids, intrusion detection system, anomaly detection, outlier detection IntroductionIntrusion detection is a process of gathering intrusion-related knowledge occurring in the process of monitoring events and analyzing them for signs of intrusion [1] [5]. There are two basic IDS approaches: misuse detection (signature-based) and anomaly detection. The misuse detection system uses patterns of well-known attacks to match and identify known intrusions. It performs pattern matching between the captured network traffic and attack signatures. If a match is detected, the system generates an alarm. The main advantage of the signature detection paradigm is that it can accurately detect instances of known attacks. The main disadvantage is that it lacks the ability to detect new intrusions or zero-day attacks [2][3].
Abstract-This paper identifies five distinct mechanisms by which a population-based algorithm might have an advantage over a solo-search algorithm in classical optimisation. These mechanisms are illustrated through a number of toy problems. Simulations are presented comparing different search algorithms on these problems. The plausibility of these mechanisms occurring in classical optimisation problems is discussed.The first mechanism we consider relies on putting together building blocks from different solutions. This is extended to include problems containing critical variables. The second mechanism is the result of focusing of the search caused by crossover. Also discussed in this context is strong focusing produced by averaging many solutions. The next mechanism to be examined is the ability of a population to act as a low-pass filter of the landscape, ignoring local distractions. The fourth mechanism is a population's ability to search different parts of the fitness landscape, thus hedging against bad luck in the initial position or the decisions it makes. The final mechanism is the opportunity of learning useful parameter values to balance exploration against exploitation.
A novel technique for analysing moving shapes is presented in an example application to automatic gait recognition. The technique uses masking functions to measure area as a time varying signal from a sequence of silhouettes of a walking subject. Essentially, this combines the simplicity of a baseline area measure with the specificity of the selected (masked) area. The dynamic temporal signal is used as a signature for automatic gait recognition. The approach is tested on the largest extant gait database, consisting of 114 subjects (filmed under laboratory conditions). Though individual masks have limited discriminatory ability, a correct classification rate of over 75% was achieved by combining information from different area masks. Knowledge of the leg with which the subject starts a gait cycle is shown to improve the recognition rate from individual masks, but has little influence on the recognition rate achieved from combining masks. Finally, this technique is used to attempt to discriminate between male and female subjects. The technique is presented in basic form: future work can improve implementation factors such as using better data fusion and classifiers with potential to increase discriminatory capability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.