Abstract-This paper introduces an optimized version of the standard K-Means algorithm. The optimization refers to the running time and it comes from the observation that after a certain number of iterations, only a small part of the data elements change their cluster, so there is no need to re-distribute all data elements. Therefore the implementation proposed in this paper puts an edge between those data elements which won't change their cluster during the next iteration and those who might change it, reducing significantly the workload in case of very big data sets. The prototype implementation showed up to 70% reduction of the running time.
The availability of a dataset represents a critical component in educational data mining (EDM) pipelines. Once the dataset is at hand, the next steps within the research methodology regard proper research issue formulation, data analysis pipeline design and implementation and, finally, presentation of validation results. As the EDM research area is continuously growing due to the increasing number of available tools and technologies, one of the critical issues that constitute a bottleneck regards a properly documented review on publicly available datasets. This paper aims to present a succinct, yet informative, description of the most used publicly available data sources along with their associated EDM tasks, used algorithms, experimental results and main findings. We have found that there are three types of data sources: well‐known data sources, datasets used in EDM competitions and standalone EDM datasets. We conclude that the success of the future of EDM data sources will rely on their ability to manage proposed approaches and their experimental results as a dashboard of benchmarked runs. Under these circumstances, the reproducibility of data analysis pipelines and benchmarking of proposed algorithms becomes at hand for the research community such that progress in the EDM domain may be much more easily acquired. The most crucial outcome regards the possibility of continuously improving existing data analysis pipelines by tackling EDM tasks that rely on publicly available datasets and benchmarking data analysis pipelines that use open‐source implementations.
This article is categorized under:
Application Areas > Education and Learning
Fundamental Concepts of Data and Knowledge > Big Data Mining
Abstract-Wireless technologies have rapidly evolved and are becoming ubiquitous. An increasing number of users attach to the Internet using these technologies; hence the performance of these wireless access links is a key point when considering the performance of the whole Internet. In this paper we present a measurement-based analysis of the performance of an IEEE 802.16 (WiMAX) client and an UMTS client. The measurements were carried out in a controlled laboratory. The wireless access links were loaded with traffic from a multi-point videoconferencing application and we measured three layer-3 metrics (One-Way-Delay, IP-Delay-Variation and Packet Loss Ratio). Additionally we estimate the performance of a WiFi and Ethernet client as a reference. Our results show that Ethernet and WiFi have comparable performances. Both the WiMAX and the UMTS links exhibited an asymmetric behavior, with the uplink showing an inferior performance. We also assessed the causes of the discretization which appears in the jitter distributions of these links.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.