An effective ensemble classification framework using random forests and a correlation based feature selection technique

Transactions in GIS

Accurate classification of heterogeneous land surfaces with homogeneous land cover classes is a challenging task as satellite images are characterized by a large number of features in the spectral and spatial domains. The identifying relevance of a feature or feature set is an important task for designing an effective classification scheme. Here, an ensemble of random forests (RF) classifiers is realized on the basis of relevance of features. Correlation‐based Feature Selection (CFS) was utilized to assess the relevance of a subset of features by studying the individual predictive ability of each feature along with the degree of redundancy between them. Predictability of RF was greatly improved by random selection of the relevant features in each of the splits. An investigation was carried out on different types of images from the Landsat Enhanced Thematic Mapper Plus (Landsat ETM+) and QuickBird sensors. It has been observed that the performance of the RF classifier was significantly improved while using the optimal set of relevant features compared with a few of the most advanced supervised classifiers such as maximum likelihood classifier (MLC), Navie Bayes, multi‐layer perception (MLP), support vector machine (SVM) and bagging.