A weakly supervised approach to Chinese sentiment classification using partitioned self-training
Journal of Information Science
Published online on April 09, 2013
Abstract
With the rapid evolution of documents on the World Wide Web which express opinions, there exists an increasing demand for developing such a sentiment analysis technique that can easily adapt to new domains with minimum supervision. This article introduces a novel weakly supervised approach for Chinese sentiment classification. The approach applies a variant of self-training algorithm on two partitions split from test dataset, and combines classification results of the two partitions into a pseudo-labelled training set and an unlabelled test set, then trains an initial classifier on the pseudo-labelled training set and adopts a standard self-learning cycle to obtain the overall classification results. Experiments on the four datasets from two domains show that our approach has competitive advantages over baseline approaches; it even outperforms the supervised approach in some of the datasets despite using no labelled documents.