In this article we focus on the detection of possible outliers based on the widely used boxplot procedures. The outliers in a set of data are defined to be a subset of observations that appear to be inconsistent with the remaining observations. We identify the outliers by constructing a boxplot with its lower fence (LF) and upper fence (UF) either (a) satisfying the requirement that if the given sample is outlier-free, then the probability that one or more of the sample data would fall outside the region (LF, UF) is equal to a prescribed small value α, or (b) taken to be the tolerance limits, derived from an outlier-free random sample, within which a specified large proportion β of the sampled population would be asserted to fall with a given large probability γ . Exact expressions that can be routinely used to evaluate the constants needed in the construction of the boxplot's outlier region for samples taken from the family of location-scale distributions are obtained for both procedures. This article shows that the commonly constructed boxplot is in general inappropriate for detecting outliers in the normal and especially the exponential samples. We recommend that the graphical boxplot be constructed based on the knowledge of the underlying distribution of the dataset and by controling the risk of labeling regular observations as outliers.
Various types of derivative information have been increasing exponentially, based on mobile devices and social networking sites (SNSs), and the information technologies utilizing them have also been developing rapidly. Technologies to classify and analyze such information are as important as data generation. This study concentrates on data clustering through principal component analysis and K-means algorithms to analyze and classify user data efficiently. We propose a technique of changing the cluster choice before cluster processing in the existing K-means practice into a variable cluster choice through principal component analysis, and expanding the scope of data clustering. The technique also applies an artificial neural network learning model for user recommendation and prediction from the clustered data. The proposed processing model for predicted data generated results that improved the existing artificial neural network–based data clustering and learning model by approximately 9.25%.
Various types of derivative information have been increasing exponentially, based on mobile devices and social networking sites (SNSs), and the information technologies utilizing them have also been developing rapidly. Technologies to classify and analyze such information are as important as data generation. This study concentrates on data clustering through principal component analysis and K-means algorithms to analyze and classify user data efficiently. We propose a technique of changing the cluster choice before cluster processing in the existing K-means practice into a variable cluster choice through principal component analysis, and expanding the scope of data clustering. The technique also applies an artificial neural network learning model for user recommendation and prediction from the clustered data. The proposed processing model for predicted data generated results that improved the existing artificial neural network-based data clustering and learning model by approximately 9.25%.
Abstract. There are many technology researches being conducted based on integration with the artificial intelligence technologies in the 6 th Industry, a recent topic of ICT convergence technology in the agriculture and life industry. Of the artificial intelligence technologies, machine learning, in particular, requires Big Data analysis techniques in the agriculture and life industry with various types of data and uses many different methodologies. This study set out to propose a Big Data-based integrated system to manage and analyze a mushroom growth environment for the efficient management of mushroom plantation.
This paper set out to revise and improve existing autonomous driving models using reinforcement learning, thus proposing a reinforced autonomous driving prediction model. The paper conducted training for a reinforcement learning model using DQN, a reinforcement learning algorithm. The main aim of this paper was to reduce the time spent on training and improve self-driving performance. Rewards for reinforcement learning agents were developed to mimic human driving behavior as much as possible. High rewards were given for greater distance travelled within lanes and higher speed. Negative rewards were given when a vehicle crossed into other lanes or had a collision. Performance evaluation was carried out in urban environments without pedestrians. The performance test results show that the model with the collision prevention model exhibited faster performance improvement within the same time compared to when the model was not applied. However, vulnerabilities to factors such as pedestrians and vehicles approaching from the side were not addressed, and the lack of stability in the definition of compensation functions and limitations with respect to the excessive use of memory were shown.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.