An improved ant algorithm with LDA-based representation for text document clustering

Küçük Resim Yok

Tarih

2017

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Sage Publications Ltd

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.

Açıklama

Anahtar Kelimeler

Latent Dirichlet allocation, text clustering, text mining

Kaynak

Journal of Information Science

WoS Q Değeri

Q2

Scopus Q Değeri

Q1

Cilt

43

Sayı

2

Künye