Effects of geographical data sampling bias on habitat models of species distributions: a case study with steppe birds in southern Portugal

Abstract

Habitat models of species distributions provide useful information about species and biodiversity spatial patterns, which form the basis of many ecological applications and management decisions such as the definition of conservation priorities and reserve selection. These models, however, are frequently based on existing datasets which have been collected in an unbalanced (biased) manner. In this study we investigated the effects of data sampling bias on model performance, interpretation and particularly spatial predictions. We collected a large steppe bird dataset in southern Portugal, following a carefully designed sampling scheme and then sub-sampled this dataset, roughly discarding between 80% and 90% of the observations, with varying degrees of geographical bias and random sampling. We characterised the data subsets in terms of data reduction and environmental bias. Multivariate adaptive regression splines (MARS) models were run on all datasets, and all the subset models compared with the baseline to assess the effect of the respective biases. We found that environmental bias in the datasets was very influential on the predicted spatial patterns of species occurrences. It is therefore important that special attention is paid to the quality of existing datasets used in habitat modelling, as well as the sampling design for collection of new data. Also, when modelling with biased datasets, the ecological interpretation of such models should be made with caution and explicit awareness of the existing bias.

Publication
International Journal of Geographical Information Science