Text Document Clustering Based on Density K-means

Di WU; Yan ZENG; Yin-chuan QU

doi:10.12783/dtcse/cmee2016/5349

Text Document Clustering Based on Density K-means

Di WU, Yan ZENG, Yin-chuan QU

Abstract

K-means is one of the most fundamental techniques in clustering. It has been applied in many fields, such as image processing and Natural Language Processing. It has good performance in many cases, especially in dealing with large data sets. However, how to choose the initial cluster centers is a hard problem, different choice may cause the clustering results by K-means unstable even get the local optimum. To solve this problem, many methods have be proposed, while these methods only apply in some certain fields and perform disappointed when we use for text documents clustering. In this paper, we designed a novel density K-means algorithm and apply it in the text document clustering. The experimental results show that it performs better than most of the existing methods in Chinese corpus. Furthermore, compared with other algorithms, our algorithm can effectively decrease the iterations.

Keywords

K-means, Density, Text document, Clustering

DOI
10.12783/dtcse/cmee2016/5349

Refbacks

There are currently no refbacks.

Username
Password
Remember me

COMPUTER SCIENCEand ENGINEERING

Text Document Clustering Based on Density K-means

Abstract

Keywords

Refbacks

COMPUTER SCIENCE
and ENGINEERING