Research on HCKM Algorithm Based on Parallel Clustering

Min ZHANG, Zhao-jie ZANG

Abstract


With the explosive growth of data and the rapid increase in size, the method, using serial processing data to obtain information, apparently hasn’t met our requirements. Recently, the problem should be solved urgently has changed to how to find useful information from massive data quickly. Since the traditional K-medoids algorithm, which is sensitive to the initial cluster center, still exists many limitations in handling large datasets. Based on Hadoop platform, this paper puts forward a kind of Canopy-Kmedoids parallel algorithm, aiming to reduce the running time to a certain extent. According to the experimental results, the feasibility of algorithm has been proved in the changes of running time or speedup.

Keywords


Parallel, Clustering analysis, Canopy algorithm, Hadoop platform


DOI
10.12783/dtcse/aics2016/8193

Refbacks

  • There are currently no refbacks.