Research on HCKM Algorithm Based on Parallel Clustering
Abstract
With the explosive growth of data and the rapid increase in size, the method, using serial processing data to obtain information, apparently hasn’t met our requirements. Recently, the problem should be solved urgently has changed to how to find useful information from massive data quickly. Since the traditional K-medoids algorithm, which is sensitive to the initial cluster center, still exists many limitations in handling large datasets. Based on Hadoop platform, this paper puts forward a kind of Canopy-Kmedoids parallel algorithm, aiming to reduce the running time to a certain extent. According to the experimental results, the feasibility of algorithm has been proved in the changes of running time or speedup.
Keywords
Parallel, Clustering analysis, Canopy algorithm, Hadoop platform
DOI
10.12783/dtcse/aics2016/8193
10.12783/dtcse/aics2016/8193
Refbacks
- There are currently no refbacks.