Skip to main content

Table 2 Comparison of mathematical approach for each modeling algorithm being used to project species distributions in this study. Data requirements and advantages are also listed

From: Modeling the influence of livestock grazing pressure on grassland bird distributions


Data type



BioClim (BC)

Presence only

This method uses a parallelepiped classifier to define species potential presence as the multi-dimensional environmental spaces bounded by the minimum and maximum values for all occurrences and gives a binary classification of suitable environment and unsuitable environment (Busby 1986, 1991).

Interpretations are straightforward, and the model is relatively simple to execute. More recently, this approach has proven useful in predicting biological invasions and distribution of widespread diseases (Robertson et al. 2004; Zhao et al. 2006).

Generalized linear (GLM)


This is a generalization of the multiple regression model that uses the “link” function to accommodate non-linear relationships between the predictor and response variables. Using various transformations of the predictors (e.g., Logit, Poisson, and Gaussian) interactions between predictors can also be specified.

This approach is often ideal since occupancy modeling almost always involves multiple predictors, non-linear response functions, and response variables that are binary (Austin and Cunningham 198; Margules et al. 1987; Franklin 2010).

Random forest (RF)


An ensemble machine-learning method in which a large number (500–2000) of decision trees are grown with subsets of the data (e.g., species occurrences) containing a random subset of candidate predictor variables (Breiman 2001). Each tree votes for a binary outcome and the resulting predictions are averaged.

This method makes no assumptions on data distribution and instead uses bootstrap aggregation to subsample the given data. This approach has been shown to have higher prediction accuracy than ordinary decision trees in SDM and other applications (Prasad et al. 2006; Gislason et al. 2006).


Presence only

A machine-learning algorithm based on the principle from statistical mechanics and information theory that states that the probability distribution with maximum entropy is the best approximation of an unknown distribution (Phillips et al. 2006).

Recent investigations have shown the MaxEnt algorithm to be mathematically identical to that of the GLM (Poisson distribution) (Renner and Warton 2013). Its unique ability to accept environmental gradients as part of the prediction process makes its application to ecological niche modeling ideal (Saatchi et al. 2008; Evangelista et al. 2009).

Boosted regression tree (BRT)


An ensemble, regression-based method that combines the strengths of two commonly used algorithms: regression trees (models that define the response to predictors using binary splits) and boosting (a method for combining multiple simple models to improve performance). An initial regression tree is fitted and iteratively improved upon in a forward stage-wise manner (boosting) by minimizing the variation in the response not explained by the model at each iteration.

This approach can easily accommodate different types of predictor variables, missing data, and outliers as well as fit complex nonlinear relationships automatically handing collinearity between predictor variables. BRT interpretations can be easily summarized to provide powerful ecological insight (Franklin 2010).