MetaTOC stay on top of your field, easily

Performance of LDA and DCT models

,

Journal of Information Science

Published online on

Abstract

The Doubly Correlated Topic Model is a generative probabilistic topic model for automatically identifying topics from the corpus of the text documents. It is a mixed membership model, based on the fact that a document exhibits a number of topics. We used word co-occurrence statistical information for identifying an initial set of topics as posterior information for the model. Posterior inference methods utilized by the existing models are intractable and therefore provide an approximate solution. Consideration of co-occurred words as initial topics provides a tighter bound on the topic coherence. The proposed model is motivated by the Latent Dirichlet Allocation Model. The Doubly Correlated Topic Model differs from the Latent Dirichlet Allocation Model in its posterior inference; it uses the highest ranked co-occurred words as initial topics rather than obtaining from Dirichlet priors. The results of the proposed model suggest some improved performance on entropy and topical coherence over different datasets.