Text Document Clustering Based on Density K-means
Abstract
K-means is one of the most fundamental techniques in clustering. It has been applied in many fields, such as image processing and Natural Language Processing. It has good performance in many cases, especially in dealing with large data sets. However, how to choose the initial cluster centers is a hard problem, different choice may cause the clustering results by K-means unstable even get the local optimum. To solve this problem, many methods have be proposed, while these methods only apply in some certain fields and perform disappointed when we use for text documents clustering. In this paper, we designed a novel density K-means algorithm and apply it in the text document clustering. The experimental results show that it performs better than most of the existing methods in Chinese corpus. Furthermore, compared with other algorithms, our algorithm can effectively decrease the iterations.
Keywords
K-means, Density, Text document, Clustering
DOI
10.12783/dtcse/cmee2016/5349
10.12783/dtcse/cmee2016/5349
Refbacks
- There are currently no refbacks.