Sensitivity of Support Vector Machines to Random Feature Selection in Classification of Hyperspectral Data

Abstract

The accuracy of supervised land cover classifications depends on factors such as the chosen classification algorithm, adequate training data, the input data characteristics, and the selection of features. Hyperspectral imaging provides more detailed spectral and spatial information on the land cover than other remote sensing resources. Over the past ten years, traditional and formerly widely accepted statistical classification methods have been superseded by more recent machine learning algorithms, e.g., support vector machines (SVMs), or by multiple classifier systems (MCS). This can be explained by limitations of statistical approaches with regard to high-dimensional data, multimodal classes, and often limited availability of training data. In the presented study, MCSs based on SVM and random feature selection (RFS) are applied to explore the potential of a synergetic use of the two concepts. We investigated how the number of selected features and the size of the MCS influence classification accuracy using two hyperspectral data sets, from different environmental settings. In addition, experiments were conducted with a varying number of training samples. Accuracies are compared with regular SVM and random forests. Experimental results clearly demonstrate that the generation of an SVM-based classifier system with RFS significantly improves overall classification accuracy as well as producer’s and user’s accuracies. In addition, the ensemble strategy results in smoother, i.e., more realistic, classification maps than those from stand-alone SVM. Findings from the experiments were successfully transferred onto an additional hyperspectral data set.

Publication
IEEE Transactions on Geoscience and Remote Sensing