A MapReduce‐based parallel <i>K‐means</i> clustering for large‐scale CIM data verification

Deng, Chao; Liu, Yang; Xu, Lixiong; Yang, Jie; Liu, Junyong; Li, Siguang; Li, Maozhen

doi:10.1002/cpe.3580

Cited by 10 publications

(14 citation statements)

References 22 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…K‐means clustering is a well‐known technique for performing non‐hierarchical clustering . In K‐means methods, clusters are groups of data characterized by a small distance to the cluster center. An objective function, typically the sum of the distance to a set of putative cluster centers, is optimized until the best cluster center candidates are found.…”

Section: Related Workmentioning

confidence: 99%

A grouping approach based on non‐uniform binary grid partitioning for crowd evacuation simulation

Liu

et al. 2018

Concurrency and Computation

View full text Add to dashboard Cite

Summary Small social groups based on kinship or friendships are ubiquitous in human crowds. Therefore, the effect of social groups on crowd evacuations and that of crowd evacuations on social groups must be investigated. To simulate the group phenomenon when an emergency occurs, we propose an improved social force model that takes into account the social group relationship among the population, and based on our proposed model, a novel grouping algorithm predicated on non‐uniform binary grid partitioning is put forward. The approach initially maps the individuals into the plane space, and then it adopts top‐down binary grid partitioning iteratively until the divided grid contains only the individuals with relations; then, the values of the relation and density of the non‐empty grid cells are calculated, and the grids are sorted according to these values. After sorting, selecting, merging, and forming the core grids, the other grids are merged to the core grids. We have compared the algorithm with the hierarchical classification algorithm and the grid‐based algorithm. The results show that the accuracy, speed, and scalability are all advantages. We also establish a simulation platform to illustrate the proposed grouping algorithm and the improved social force model for crowd evacuation simulation.

show abstract

Section: Related Workmentioning

confidence: 99%

A grouping approach based on non‐uniform binary grid partitioning for crowd evacuation simulation

Liu

et al. 2018

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…To measure the fitness of a scheduler (chromosome), the fitness function is defined using mean square error (MSE) :

f ((), T) = \sqrt{true {true\sum}_{i = 1}^{k} {(\overset{true¯}{T} - T_{i})}^{2}}, \overset{T}{true} = \frac{true {true\sum}_{i = 1}^{k} T_{i}}{k}

where T i represents the processing time for the i th mapper, and

\overset{T}{true}

represents the average processing time of the number of k mappers. In our design, a single‐point crossover is employed.…”

Section: Algorithm Designmentioning

confidence: 99%

“…Based on Eqs. (11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), the relationship between data chunks D m and the overall processing time T is established. Therefore, the time T a of the cluster to process data in one processing wave is the maximal one in Eq.…”

Section: Modeling Of Data Processing In Hadoopmentioning

confidence: 99%

See 1 more Smart Citation

A sliding window‐based dynamic load balancing for heterogeneous Hadoop clusters

Liu

Jing

Liu

et al. 2016

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Summary At present MapReduce computing model‐based Hadoop framework has gradually become the most famous distributed computing framework because of its remarkable features such as scalability, fault tolerance, data security, and powerful IO ability. However, Hadoop framework only supports limited load balancing policies, which may result in performance deterioration in heterogeneous clusters. Additionally Hadoop does not have advanced dynamic load balancing mechanism in enabling its optimal performance in dynamic environment. This paper presents a sliding window‐based dynamic load balancing algorithm, which specially aims at balancing the load among the heterogeneous nodes during the Hadoop job processing. The presented algorithm is evaluated in both simulated and physical environments. The experimental results show that the performances in terms of efficiency of Hadoop cluster can be significantly improved. Copyright © 2016 John Wiley & Sons, Ltd.

show abstract

“…Ensuring the integrity of transmitted data is of paramount importance for data analysis, the paper ‘A MapReduce based Parallel K‐Means Clustering for Large Scale CIM Data Verification’ discusses the topic and presents a parallel K‐means clustering algorithm for large scale Common Information Model (CIM) data verification. The paper concludes that time saving is achievable using parallel K‐means while generating a high level of precision in data verification.…”

Section: Introductionmentioning

confidence: 99%