Skip to main content

Species Distribution Modelling performance and its implication for Sentinel-2-based prediction of invasive Prosopis juliflora in lower Awash River basin, Ethiopia

Abstract

Background

Species Distribution Modelling (SDM) coupled with freely available multispectral imagery from Sentinel-2 (S2) satellite provides an immense contribution in monitoring invasive species. However, attempts to evaluate the performances of SDMs using S2 spectral bands and S2 Radiometric Indices (S2-RIs) and biophysical variables, in particular, were limited. Hence, this study aimed at evaluating the performance of six commonly used SDMs and one ensemble model for S2-based variables in modelling the current distribution of Prosopis juliflora in the lower Awash River basin, Ethiopia. Thirty-five variables were computed from Sentinel-2B level-2A, and out of the variables, twelve significant variables were selected using Variable Inflation Factor (VIF). A total of 680 presence and absence data were collected to train and validate variables using the tenfold bootstrap replication approach in the R software “sdm” package. The performance of the models was evaluated using sensitivity, specificity, True Skill Statistics (TSS), kappa coefficient, area under the curve (AUC), and correlation.

Results

Our findings demonstrated that except bioclim all machine learning and regression models provided successful prediction. Among the tested models, Random Forest (RF) performed better with 93% TSS and 99% AUC followed by Boosted Regression Trees (BRT), ensemble, Generalized Additive Model (GAM), Support Vector Machine (SVM), and Generalized Linear Model (GLM) in decreasing order. The relative influence of vegetation indices was the highest followed by soil indices, biophysical variables, and water indices in decreasing order. According to RF prediction, 16.14% (1553.5 km2) of the study area was invaded by the alien species.

Conclusions

Our results highlighted that S2-RIs and biophysical variables combined with machine learning and regression models have a higher capacity to model invasive species distribution. Besides, the use of machine learning algorithms such as RF algorithm is highly essential for remote sensing-based invasive SDM.

Introduction

Globally, the invasiveness of invasive alien plant species has become a great risk as it adversely affects ecological services and socio-economic systems (Rajah et al. 2019; Paz-Kagan et al. 2019). The distribution and subsequent socio-economic loss in East Africa are also increasing at an alarming rate (Landmann et al. 2020). Despite their current distribution and negative consequences, studies also daunt an expected large increase in their invasion and will adversely affect uninvaded areas (Howard 2019). Hence, monitoring the adverse impacts of invasive alien plant species using contemporary technologies before its dispersion has paramount importance (Rajah et al. 2019; West et al. 2016). This is particularly vital for developing countries with no or little financial and technical capabilities to avert the invasion and any delay further aggravates the problem (Pyšek et al. 2012; Vilà et al. 2011).

Prosopis juliflora (hereafter Prosopis) is one of the ten worst invasive species that adversely affect millions of hectares of land in many arid and semi-arid regions (Ilukor et al. 2016; Rembold et al. 2015; Shackleton et al. 2014). In Ethiopia, about 1.17 million hectares of land are currently invaded by Prosopis only in the Afar region, which results in approximately 602 million US dollar loss of ecosystem service (mainly due to Prosopis expansion) (Shiferaw et al. 2019b). Its social, economic, and ecological adverse impact in the area will also be expected to increase as the species is aggressively increasing at a rate of 8.3% annually (Shiferaw et al. 2019b). Hence, the best and recommended way is to control its invasion early by anticipating suitable habitats for its diversification using timely and cost-effective tools (Reaser et al. 2020; West et al. 2017).

Mapping the current invasion and modelling suitable habitat of invasive species has an immense contribution for ecologists and policymakers to control the expansion and its adverse threat (Ayanu et al. 2014; Evangelista et al. 2008; Feilhauer et al. 2012). However, controlling its spread and managing its consequence needs a robust, cost-effective, efficient, and precise monitoring system (Lopatin et al. 2016). In addition, ecologists are also in need of timely and cost-effective methods to model and predict in advance the distribution of invasive species (West et al. 2017). In this regard, the use of Species Distribution Medelling (SDM) and Geographic Information System (GIS) are among the widely used prediction tools (Bradley 2014). SDM’s had been used by ecologists for a long time to predict species distribution (Allouche et al. 2006; Jiménez-Valverde 2014; Lemke and Brown 2012; Wisz et al. 2008). However, models have shown varied performance and no single best model has been identified by studies for different species and environments (Reside et al. 2011).

The selection and implementation of models require great care as inappropriate use of models can affect the accuracy of species prediction (Elith and Leathwick 2009; Elith et al. 2006). Consequently, many studies used more than one model in comparison (Früh et al. 2018; Ng et al. 2018; Stohlgren et al. 2010). Besides, some studies recommend the use of presence-absence models, while others appreciate the use of presence-only models (Elith and Leathwick 2009). Owing to this, many studies compared machine learning algorithms with regression-based models (Shiferaw et al. 2019c) while others made a comparison among machine learning algorithms (Früh et al. 2018) and others also developed ensemble models from different SDMs rather than relying on a single model (Ng et al. 2018). However, the performances of SDMs are dependent upon the type of application and problem (Bhattacharya 2013), the spatial resolution of environmental variables (Reside et al. 2011), and the selection of environmental predictors (Elith and Leathwick 2009). Owing to this, the use of several models by applying similar environmental variables and methodologies gives confidence for ecologists to judge their results (Fischer et al. 2013; Lemke and Brown 2012).

The use of machine learning algorithms for remotely sensed-based prediction gets recent attention and significantly improves the prediction of invasive species (Benito et al. 2013; Früh et al. 2018). A timely, advanced, and cost-effective approach that integrates remote sensing technology to monitor invasive species risk is highly needed (Rajah et al. 2019; West et al. 2017). This is particularly important in arid and semi-arid regions of developing nations as the cost of survey data and high-resolution commercial data is difficult to justify. The possible options in such areas are the use of freely available multispectral data (Jensen et al. 2020). In this regard, S2 data due to its high spatial, spectral, and temporal resolutions provide an immense contribution to monitoring the distribution and spread of invasive species (Martinez et al. 2020; Meroni et al. 2017; Ng et al. 2016, 2017; Truong et al. 2017). S2’s high spectral resolution allowed ecologists to derive numerous indices (Rajah et al. 2019). The use of these indices is better than raw bands as they can reduce the effect of atmospheric condition and soil background on canopy reflectance (Liu et al. 2005). Among the available indices, most studies usually employed S2-derived Vegetation Indices (S2VIs). Though S2VIs have higher importance, the contribution of other radiometric indices is potentially also very high. For example, soil radiometric indices (Nouri et al. 2018) and biophysical variables have a higher capacity in monitoring and managing vegetation changes (Atzberger et al. 2015; Mudereri et al. 2019). Hence, evaluating the contribution of S2-derived RIs is of utmost importance as they incorporate different variables from the soil, water vegetation, and biophysical characteristics of the area.

So far, a number of studies have evaluated the performance of SDMs for mapping and modelling different invasive species. Stohlgren et al. (2010) made a comparison of five individual models for the prediction of invasive species and suggested that the use of an ensemble model significantly improves prediction compared to a single model. Früh et al. (2018) made a comparison on performances of four machine learning models in the prediction of invasive species and recommended the use of an ensemble approach from best-performing models rather than a single or ensemble model from all models. Likewise, Ng et al. (2018) made a comparison among machine learning models and argued that the performance of Random Forest (RF) and ensemble models are highly appreciated compared to other models used in their study. Abdi (2020) made a comparison of four machine learning algorithms using S2-derived variables for land cover classification and concluded that the Support Vector Machine (SVM) performed better than other models. These studies have used either integration of remote sensing and non-remote sensing datasets (e.g., Ng et al. 2018) or used coarse resolution remote sensing data (Stohlgren et al. 2010). Other studies used S2 data for land cover classification (Abdi 2020), mapping, and detecting invasive species distribution (Rajah et al. 2019). In Ethiopia, few studies were conducted on mapping and prediction of Prosopis. Wakie et al. (2014) assess the distribution of invasive Prosopis using Moderate Resolution Imaging Spectroradiometer (MODIS) data and Maxent model. Besides, Shiferaw et al. (2019c) evaluated the performance of different SDM using Landsat 8 Operational Land Manager (OLI), climate, and infrastructural data. However, comparative studies about SDM using high-resolution data in general and S2-RIs and biophysical variables in particular for monitoring Prosopis were scarce.

This study, therefore, aims at addressing the following research gaps and needs: (1) identifying a robust method for modelling remote sensing-based invasive species distribution is highly required; (2) assessing the potential of S2-derived vegetation, soil and water indices, and biophysical variables for modelling invasive species distribution in arid and semi-arid regions of developing countries is highly essential.

Materials and methods

Study area and species

The study was carried out in the lower Awash River basin, Ethiopia. It is located between 40.74 to 41.82° longitude and 10.99 to 12.36° latitude (Fig. 1). It covers an area of 9471.5 km2 with elevations ranging from 240 to 1341 m above mean sea level. In addition, 75% (7103.6 km2) of the study area is found in the desert and 25% (2367.9 km2) is found in arid to semi-arid agro-ecological zones. It is also part of the great east African rift valley system. According to the National Metrological Agency (NMA) (2020), the mean annual temperature, mean maximum, and mean minimum annual temperature at Dubti station for years between 2002 and 2017 were recorded as 28.1 °C, 33.5 °C, and 22.6 °C, respectively. December and July are the coldest and warmest months, respectively. Furthermore, frequent drought and subsequent famine are the two major characteristics of the area (Mulugeta et al. 2019).

Fig. 1
figure 1

Location of the study area using Sentinel-2 true-color image composite (left), including in situ collected occurrence points of Prosopis; map of Africa (upper right) including the location of Ethiopia; and map of Ethiopia (lower right) including the location of Afar regional state

Though pastoralism is the dominant way of life, agro-pastoralism is also practicing in the area. The state-owned Tendaho irrigation project which covers around 62,500 ha along the lower Awash River basin supports irrigation-based agriculture (Tadese et al. 2019). The cultivation of sugarcane, wheat, cotton, maize, and other vegetables has been also practiced in small-scale agriculture. In the Afar region, Prosopis was introduced in the early 1980s for soil and water conservation (Tilahun et al. 2017). This was part of the then government afforestation initiative to combat drought and desertification (Wakie et al. 2014). Before the invasion of Prosopis, native grasses, forbs, shrubs, and woody plants dominantly covered the area and were an important source of fodder for the locality (Ayanu et al. 2014; Wakie et al. 2014). After the invasion of the species, however, conflict among pastoralists has increased due to resource competition (Ilukor et al. 2016; Mehari 2015; Wakie et al. 2012). The expansion of Prosopis also affected livestock production with an expected loss of about 26 million dollars per year in the region (Ilukor et al. 2014).

Method overview

In this study, we evaluated the performance of six commonly used SDMs using S2-RIs. We computed thirty-five variables from vegetation, soil, water radiometric indices, and biophysical variables. Out of the variables, twelve significant variables were selected using the VIF. TSS, AUC, kappa, correlation, sensitivity, and specificity were used to evaluate model performance. Besides, reference data on the presence and absence of the species were collected using a handheld Global Positioning System (GPS). Also, an ensemble model was developed from best-performed models (except the least performed bioclim model). At last, prediction maps for all individual and ensemble models were produced. Graphical illustrations on the overall methods used for this study are presented in Fig. 2.

Fig. 2
figure 2

Methodological flowchart showing the prediction of the current distribution of invasive Prosopis distribution using six individual and one ensemble models from selected predictor variables of Sentinel-2B level-2A dataset

Presence and absence data

Georeferenced in situ data were collected with the help of a handheld GPS from both presence (invaded) and absence (uninvaded) points in the dry season between January and February 2020. In this period, Prosopis is highly discriminated from the other tree species as most tree types shed their leaves due to water scarcity (Godoy et al. 2011; Meroni et al. 2017; Xu 2014). A total of 680 points were collected using stratified random sampling in a 10 m by 10 m plot similar to the spatial resolution of S2 data (Arogoundade et al. 2020). Of all reference points, 30% (205 points) were presence while 70% (475 points) were absence data. This share was chosen considering the previous distribution of Prosopis in the area (Linders et al. 2020). Field data were collected throughout the study area and 200 m was the minimum distance among the points.

To evaluate the spatial autocorrelation among observations, Moran’s I was used. Accordingly, we found 0.28 of Moran’s Index and 2.45 of Z-score, indicating no apparent spatial clustering among the points (Abdulhafedh 2017). To further reduce spatial autocorrelation among points, we used the “Spatially Rarefy Occurrence Data for SDMs (reduce spatial autocorrelation)” function in ArcGIS SDM Toolbox. To get independent validation statistics, 70% of the collected data were used to train models while 30% were used to validate models (Engler et al. 2013).

Satellite image processing and selection of variables

In this study, Sentinel-2B level-2A data was used to produce varied radiometric indices and biophysical variables. Sentinel-2B level-2A product provides geometric and radiometrically corrected images. The data is delivered as Bottom-Of-Atmosphere (BOA) reflectance images converted from the level-1C Top-Of-Atmosphere (TOA) reflectance (Szantoi and Strobl 2019). Its application was tested by Vuolo et al. (2016) and used by different studies (e.g., Arogoundade et al. 2020; Ng et al. 2018) in the areas of invasive species distribution. It can detect the Earth’s surface at 10-m, 20-m, and 60-m spatial resolutions. Sentinel-2B level-2A product acquired between 19 January 2020 and 28 February 2020 which is concurrent with the field data collection campaign were downloaded from the European Space Agency (ESA) data portal (https://scihub.copernicus.eu/dhus/#/home).

A total of four scenes were required to cover the study area. Pre-processing such as image mosaicking, resampling to a common grid of 10-m resolution, and sub-setting were made using freely available Sentinel Application (SNAP) 7.0 software. Maps were produced by using ArcGIS software. A total of thirty-five variables (Table S1): seventeen from vegetation radiometric indices, eight from soil radiometric indices, five from water radiometric indices, and five from biophysical variables were considered (Table S1). To select important variables, we used the VIF correlation approach to reduce multicollinearity problems in the R 4.0 software “sdm” package (Naimi and Araújo 2016). VIF was used by several studies in the areas of invasive species distribution as a tool to select important variables (Ng et al. 2018; Zimmermann et al. 2007). Accordingly, for this study, out of thirty-five variables, a total of twelve important variables (Table 1) with threshold values less than 0.7 were selected (Engler et al. 2013; Zimmermann et al. 2007). Furthermore, the relative importance of variables for all models was computed using the “getVarImp” function in the R software “SDM” package (Naimi and Araújo 2016). Then, the weighted mean values of each variable for each model run was calculated and categorized as vegetation, soil radiometric indices, and biophysical variables.

Table 1 Description of the predictive variables used in modelling and mapping the current distribution of invasive Prosopis. The variables were categorized under vegetation indices, soil indices, water radiometric indices, and biophysical variables

Selection of Species Distribution Modelling

Today a large number of modelling methods are available and can be classified as “profile,” “regression,” and “machine learning” (Hijmans and Elith 2019). This study evaluates the performance of six commonly used models in the areas of invasive SDM. The models were selected from machine learning algorithms, regression models, and profile methods for comparison reasons (Table 2). Boosted Regression Trees (BRT), RF, and SVM from machine learning models; Generalized Additive Model (GAM) and Generalized Linear Model (GLM) from regression models; and bioclim from profile methods were considered. One ensemble model from all models, except the least performing bioclim model, was also developed.

Table 2 Predictive models from machine learning (RF, SVM, and BRT), regression (GAM and GLM), and profile (bioclim) methods and their short description and common pieces of literature that have used these models in modelling invasive species distribution

Model validation and mapping

For validations of the above-listed models, we used the bootstrap replication approach in the R 4.0 software “SDM” package developed by Naimi and Araújo (2016). Out of 680 collected points, 205 (30%) randomly selected points were used as test data to validate models and the remaining 70% were used for the training of models. This step was replicated ten times and their mean values of sensitivity, specificity, TSS, kappa, AUC, and correlation were used to assess the accuracy of the models. The bootstrapping replication method has the potential to offer unbiased predictive accuracy with fairly low variance (Harrell et al. 1996; Lima et al. 2019). Besides, the sensitivity-specificity sum maximization approach was used to select the best threshold. This threshold was recommended as the best approach for the prediction of species distribution (Liu et al. 2005).

Binary maps were developed as pixels greater than the threshold represented the presence of Prosopis at different levels of invasion and pixels lower than the threshold indicated the absence of Prosopis in the area for all models. Besides, a correction for over-prediction using clip models by buffered Minimum Convex Polygon (MCP) was made in ArcGIS SDM Toolbox. The buffered MCP as a posteriori method enables the reduction of over-prediction (Mendes et al. 2020). In addition, the ensemble model was evaluated using a weighted mean of all models except the least performing bioclim model. Maps were further classified into “uninvaded,” “low-invasion,” “moderate-invasion,” and “high-invasion” for the ensemble model.

Results

Performances of Species Distribution Modelling

The performances of SDMs using different evaluation techniques are presented in Table 3. Except for the bioclim model, the accuracy of the other models was very high. Machine learning algorithms (RF and BRT) performed better than regression models (GLM and GAM) and profile (bioclim) method for all evaluation techniques. Following RF, BRT, Ensemble, GAM, SVM, GLM, and bioclim performed better in decreasing order. In addition, BRT performed well in AUC and sensitivity, and GAM performed well in kappa and correlation evaluation techniques after RF.

Table 3 Performance evaluation of SDMs using different statistical parameters. Sensitivity and specificity describe the rate of true positive and negative respectively

Model accuracy can also be evaluated using the receiver operator characteristics (ROC) curve as it has the capacity to show the proportion of the true presence rate (sensitivity) and the true absence rate (specificity). The ROC curve for all models is presented in Fig. 3. Except in bioclim, sensitivity and specificity scores were very high for all models indicating both invaded and uninvaded areas were well identified and the proportion of correctly classified samples were maximum (Fig. 3).

Fig. 3
figure 3

Receiver operator characteristics (ROC) curve using bootstrap replication method for different SDMs. Sensitivity (true positive rate) of the vertical line and 1-specificity (false positive rate) of the horizontal line describe the proportion of correctly and incorrectly classified samples. The red and blue curves represent the mean of AUC using training and test data respectively

Prosopis distribution

The presence (invaded) of Prosopis distribution for RF, SVM, GLM, BRT, GAM, Ensemble, and bioclim models were 14.3%, 12.9%, 12.8%, 13.9%, 14.9%, 16.11%, and 3.9%, respectively (Fig. 4). According to the best-performed RF model, 1354.6 km2 of the study area was invaded by Prosopis, which was more intense around the central part (Fig. 4).

Fig. 4
figure 4

Prosopis distribution maps using different SDMs: a Random Forest, b Boosted Regression Trees, c Support Vector Machine, d Generalized Additive Model, e Generalized Linear Model, and f bioclim

Furthermore, the ensemble model was used to produce maps showing the invasion of Prosopis at varying levels of distribution. The threshold for the ensemble model was 0.47 (Table 3), and pixels below the threshold were considered as “uninvaded” of Prosopis and pixel values above the threshold were further divided into three classes as “low,” “medium,” and “high” invasion of Prosopis distribution (Ng et al. 2018). Accordingly, 86.8% of the study place was not invaded by Prosopis distribution. The rest of the study place (13.2%) was invaded by Prosopis at different levels of invasion as low (2.5%), medium (3.2%), and high (7.5%) distribution (Fig. 5).

Fig. 5
figure 5

Ensemble model produced from RF, SVM, BRT, GLM, and GAM for modelling and mapping of Prosopis habitat suitability distribution. The gray, black, pink, and red colors describe “uninvaded,” “low,” “medium,” and “high” invasion by Prosopis, respectively

Relative contribution of predictor variables

The relative influence of predictors is shown in Fig. 6 and supporting information (Table S2). The relative influence of few variables was very high while other variables were found to be insignificant. The relative influence of vegetation radiometric indices (TNDVI, MCARI, MTCI, and S2REP) for RF, SVM, BRT, GLM, GAM, and bioclim were 83%, 65.75%, 74.35%, 75.5%, 54.85%, and 51.95%, respectively. However, the relative importance of water radiometric indices (MNDWI) was the least in all models except in the bioclim model (Fig. 6 and Table S2).

Fig. 6
figure 6

The relative influence of predictive variables (vegetation indices, soil indices, and biophysical variables) for different models. The blue, red, and gray colors describe the weighted mean influence of vegetation indices, soil indices, and biophysical variables, respectively

Discussion

Implications of SDMs performance for remote sensing-based prediction

Our study highlights the relative performance of SDMs for remote sensing-based prediction of invasive species distribution. In the present study, the higher performance of machine learning algorithms (RF, BRT, and SVM) and regression models (GLM and GAM) were observed. Among all models, the bioclim model performed worst. The result of our study varied from 19.75% (GAM) to 5% (bioclim) prediction. This huge difference between the models’ predictions can affect the monitoring of invasive species. According to González-Ferreras et al. (2016), models with AUC values below 0.7 and above 0.9 are considered as “very poor” and “highly accurate,” respectively. Besides, models with TSS and kappa values below 0.4 and above 0.8 are considered as “poor” and “excellent,” respectively. Based on the above evaluation techniques the performances of all models, except bioclim, were in the category of “excellent” whereas the performance of the bioclim model was in the category of “very poor” and “poor.” The prediction obtained from the above models, except for bioclim, also agreed with previous studies conducted in the area indicating the performance of SDMs has great implication in providing certain predictions. Moreover, numerous studies highlight the higher performance of RF for remote sensing-based prediction of invasive species distribution after evaluating the performance of several SDMs (Jensen et al. 2020; West et al. 2016, 2017).

A study by Jensen et al. (2020) made comparisons among machine learning algorithms for the prediction of invasive Kudzu vine in the USA using S2 and Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data. They found a higher performance of RF, neural network, and SVM algorithms. Besides, West et al. (2016) investigated five SDMs for the prediction of Tamarisk invasive species using remote sensing data and found RF as the best-performed model followed by Maxent, BRT, Multivariate Adaptive Regression Spline (MARS), and GLM. Also, West et al. (2017) compared four SDMs using remote sensing data for the prediction of invasive cheatgrass and concluded that RF was better performed than GLM and BRT. The higher performance of RF is attributed to its ability to avoid overfitting as it combines and votes the most popular class from several individual trees (Breiman 2001). Due to its higher performance, it is the most flexible and widely applied machine learning algorithm for various field of studies such as land cover classification (Abdi 2020), forest monitoring (Ma et al. 2020), species richness, and density (Kosicki 2020), and invasive species modelling (Ng et al. 2018). In addition, its capability for remote sensing-based studies is also immense as it requires minimum time for satellite image classification (Sabat-Tomala et al. 2020).

Furthermore, the bioclim model was the least performed model in our study and was reaffirmed with the study findings of Hernandez et al. (2006) and Guisan et al. (2007). Except for bioclim, all models worked well for S2-based studies for the prediction of Prosopis distribution and can be used in similar environments.

Potential of Sentinel-2 for invasive species prediction

Our study indicated that mapping and modelling the distribution of invasive Prosopis using remote sensing-derived variables have an immense contribution to the management of the invasive species. Our study findings agreed with the government report and previous studies in the area. A study by Shiferaw et al. (2019b) mapped Prosopis distribution using Landsat-8 OLI and in situ survey data using RF classifier. They found that 15.4% of the study area was invaded by Prosopis. Their result finding was comparable with our findings which were 14.3%.

In addition, our study is also consistent with the study finding of Shiferaw et al. (2019a, b, c) and the report of the Ministry of Livestock and Fisheries (MoLF 2017). Shiferaw et al. (2019a, b, c) found a 12.33% distribution of Prosopis using remote sensing, climate, and infrastructural variables collected in 2017. Employing 8.3% as a rate of annual increase Shiferaw et al. (2019b), its distribution will increase to 15.4% of the region in 2020. This result is also comparable to our study findings which are 14.3%. Besides, a report by MoLF (2017) described that Prosopis covers 12.6% of the Afar region. Considering the same rate of increase (8.3%), its distribution will increase to 15.75% in 2020 which was also comparable to our study findings. On the contrary, a study by Wakie et al. (2014) using remote sensing-derived variables and topo-climatic variables in Afar predicts the distribution of Prosopis species and concludes that the distribution of species was 3.8%. Its minimum prediction might occur due to the low spatial resolution of MODIS vegetation indices and bioclimatic variables used for their study. Besides, the quality of WorldClim-based bioclimatic variables is uncertain in areas where weather stations are scarce (Deblauwe et al. 2016).

Martinez et al. (2020) and Truong et al. (2017) described the importance of contemporary remotely sensed variables for the prediction of invasive species. Sentinel-1 and S2 data have a huge contribution in detecting and mapping invasive species (Rajah et al. 2019). The freely available S2 data with its high spatial, spectral, and temporal resolutions provides important information for species-level monitoring and management (Immitzer et al. 2016; Rapinel et al. 2019). Several studies also used S2-derived variables for mapping and modelling of invasive species distribution (Arogoundade et al. 2020; Dube et al. 2020; Ng et al. 2017; Rajah et al. 2019). Dube et al. (2020) compared S2 with Landsat-8 OLI for mapping the distribution of invasive Lantana camara (Verbenaceae) and conclude that the higher performance of S2-derived variables is due to its higher spectral and spatial resolution. Moreover, a study by Arogoundade et al. (2020) applauds the use of S2 spectral bands, S2-derived vegetation indices, and their combination for modelling invasive species. In addition, they also pointed out that red edge S2 bands have a huge contribution to mapping and modelling invasive species distribution.

Among S2-derived variables, our study identified the higher influence of vegetation radiometric indices over soil indices and biophysical variables (Fig. 5). Vegetation indices derived from satellite remote sensing data are recognized as a reliable source of information for monitoring vegetation changes (Feilhauer et al. 2013; He et al. 2015; Maschler et al. 2018; Nouri et al. 2018; Teillet et al. 1997). Several studies also confirmed the higher importance of vegetation indices over other variables such as infrastructure, bio (climate), and remote sensing spectral bands. A study by Shiferaw et al. (2019c) illustrated that the Normalized Difference Vegetation Index (NDVI) had higher importance than other variables for invasive species prediction. Similarly, a study by Wakie et al. (2014) employed remote sensing and topo-climatic variables to map and model invasive Prosopis distribution in the Afar region, Ethiopia, and described that the relative influence of Enhanced Vegetation Index (EVI) and NDVI was better than other variables. Immitzer et al. (2019) also demonstrated that S2-derived vegetation indices enhanced model performance.

Furthermore, a study by Rajah et al. (2019) evaluated S2-based Vegetation Indices (S2-VIs) for detecting and modelling invasive American Bramble (Rubus cuneifolius) and described that the performance of models with S2-VIs was far better than S2-VIs fused with Sentinel-1 Synthetic Aperture Radar (SAR) and S2 optical imagery. Besides, a study by Arogoundade et al. (2020) evaluated the performance of S2-derived vegetation indices for modelling invasive Parthenium hysterophorus in South Africa and confirmed its potential for mapping and modelling invasive species. Moreover, TNDVI and NDVI derived from Landsat-8 OLI showed the higher performances for prediction and description of forest parameters such as density, canopy cover, and basal area (Nouri et al. 2018). Similarly, a study by Musande et al. (2012) evaluated the performance of vegetation indices to discriminate specific crop types and found that the performance of TNDVI was better than other vegetation indices used in their study. However, its limitation to identify water areas and considers it as vegetation cover can significantly decrease the accuracy of the model particularly if the study was conducted in areas where large water bodies are available (Shetty and Somashekar 2013).

However, several studies employed the commonly known vegetation indices, such as NDVI, without evaluating the performances of other variables from vegetation, soil and water radiometric indices, and biophysical variables (e.g., Arogoundade et al. 2020). Considering the different characteristics (benefits and limitations) of radiometric indices and biophysical variables, the use of several indices provides certain predictions as it incorporates different features such as soil, water, and vegetation. Moreover, the performances of radiometric indices varied in different studies. For example, Bannari et al. (2002) described the higher performance of the Transformed Difference Vegetation Index (TDVI) over widely used SAVI and NDVI. They pointed out that TDVI has higher sensitivity to bare soil below vegetation cover that helps to provide necessary information about the specific vegetation parameters. Besides, NDVI has some uncertainties as it is affected by soil reflectance and hence the use of other vegetation indices can reduce this problem (Koller and Upadhyaya 2005).

Besides vegetation indices, the contribution of biophysical variables in monitoring vegetation changes, in our study, was also immense. Mudereri et al. (2019) evaluated the performances of biophysical variables of LAI, FAPAR, CWC, Fraction of Vegetation Cover (FVC), chlorophyll content (Cab), and S2 bands to characterize land cover in semi-arid regions and conclude that both biophysical variables and S2 wavebands have great capability for land cover classification. They also conclude that FAPAR was the best-performing variable and outperforming the other variables used in their study. Also, biophysical variables have a higher capacity for monitoring and managing vegetation changes (Atzberger et al. 2015). Moreover, biophysical variables derived from satellite remote sensing data have a huge contribution to describe forest variables (Schlerf et al. 2005). In particular, S2 data provides an unprecedented option to retrieve biophysical parameters (Brown et al. 2019).

Furthermore, in our study, the relative influence of soil radiometric indices (BI, CI, TSAVI, and RI) was minimum compared to vegetation radiometric indices. According to Nouri et al. (2018), the lower performance of soil radiometric indices was observed in areas where low and high vegetation densities are available. Indeed, low species richness and low diversity of woody vegetation are the major characteristics of our study area (Ilukor et al. 2016). Therefore, evaluating the performance of several indices beyond the commonly used indices is necessary for mapping and modelling invasive species distribution (West et al. 2017). However, in our study, the relative influence of TNDVI was far greater than other vegetation and soil radiometric indices. Hence, our results would have benefited if it includes other bioclimatic variables (Ahmed et al. 2020). However, acquiring these variables at high resolution was difficult in the study area.

Conclusion

Our study describes the distribution of invasive Prosopis in the lower Awash River basin, Ethiopia, using machine learning (RF, BRT, and SVM), regression (GAM, GLM), and profile (bioclim) methods. We used S2-RIs and biophysical variables as predictors to evaluate the performances of models. The performance of machine learning algorithms (RF and BRT) was very high. Besides, the roles of regression models (GAM and GLM) were also found to be very high next to RF, BRT, and ensemble models. On the contrary, the performance of the bioclim model was insufficient. Hence, we encourage researchers not to highly depend on the prediction of the bioclim model with S2-RIs and biophysical variables for predictions of Prosopis distribution in the dryland ecosystem. We also encourage researchers to evaluate the performances of models or to use models evaluated previously in related ecosystems and datasets before directly employing specific models as the performances of models can create a significant difference. Therefore, the use of several models can provide reliable information and increase the confidence of ecologists in their result findings.

The best-performed RF model predicted 1354.6 km2 (14.3%) of the study area were invaded by the species indicating more efforts are required to reduce its distribution. Our study also demonstrated that the use of freely available S2 data has an immense contribution for detecting, mapping, and modelling the spatial distribution of invasive species with a high level of precision. In particular, the use of S2-RIs and biophysical variables can provide basic information about vegetation, soil, and water for better spatial modelling of invasive species. Also, the higher performances of S2-derived variables for mapping and modelling invasive Prosopis distribution indicates the use of such datasets are adequate for such type of studies. Moreover, the relative influences of vegetation radiometric indices were very high followed by soil radiometric indices, biophysical variables, and water radiometric indices. We recommend researchers integrate variables from vegetation indices, soil indices, and biophysical variables for modelling invasive species rather than relying on commonly known vegetation radiometric indices.

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author.

Abbreviations

SDM:

Species Distribution Modelling

S2-RIs:

Sentinel-2 Radiometric Indices

S2:

Sentinel 2

TSS:

True Skill Statistics

AUC:

Area under the curve

RF:

Random Forest

GIS:

Geographic Information System

NMA:

National Metrological Agency

VIF:

Variance Inflation Factor

GPS:

Global Positioning System

BOA:

Bottom Of the Atmosphere

ESA:

European Space Agency

SNAP:

Sentinel Application

BI:

Brightness Index

CI:

Colour Index

CWC:

Canopy Water Content

FAPAR:

Fraction of Absorbed Photosynthetically Active Radiation

LAI:

Leaf Area Index

MCARI:

Modified Chlorophyll Absorption Ratio Index

MNDWI:

Modified Normalized Difference Water Index

MTCI:

Meris Terrestrial Chlorophyll Index

RI:

Redness Index

S2REP:

Sentinel-2 Red-Edge Position Index

TNDVI:

Transformed Normalized Difference Vegetation Index

TSAVI:

Transformed Soil Adjusted Vegetation Index

BRT:

Boosted Regression Trees

SVM:

Support Vector Machine

GAM:

Generalized Additive Model

GLM:

Generalized Linear Model

MCP:

Minimum Convex Polygon

ROC:

Receiver operator characteristics

GBM:

Generalized Boosted regression Models

MARS:

Multivariate Adaptive Regression Spline

AVIRIS:

Airborn Visible/Infrared Imaging Spectrometer

MODIS:

Moderate Resolution Imaging Spectroradiometer

OLI:

Operational Land Manager

NDVI:

Normalized Difference Vegetation Index

EVI:

Enhanced Vegetation Index

SAR:

Synthetic Aperture Radar

TDVI:

Transformed Difference Vegetation Index

FVC:

Fraction of Vegetation Cover

Cab:

Chlorophyll content

References

Download references

Acknowledgements

This paper is part of the doctoral study entitled “Role of remote sensing in invasive species distribution modelling, the case of Prosopis in the lower Awash River basin, Ethiopia.” We would like to thank Wollo University and the Ethiopian Space Science and Technology Institute (ESSTI) for allowing this doctoral study.

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

All authors made a valuable contribution. NA designed and wrote the methodology, collected field data and literature, carried out data analysis, and wrote the draft manuscript; CA refines the methodology, supports in collecting additional literature, reviewed, edited, and rewrite the manuscript; and WZ supports in refining methodology and collection of additional literature coordinated fieldwork and field expanse and refining the manuscript. All authors agreed on the final draft.

Corresponding author

Correspondence to Nurhussen Ahmed.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

: Table S1. Sentinel-based variables used before multicollinearity analysis: Vegetation Radiometric Indices, Soil radiometric indices, water radiometric indices, and biophysical processors.

Additional file 2

: Table S2. Relative influence of variables (%) for models: Random Forest (RF), Support Vector Machine (SVM), Boosted Regression Trees (BRT), Generalised Additive Model (GAM), Generalized Linear Model (GLM), and bioclim.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmed, N., Atzberger, C. & Zewdie, W. Species Distribution Modelling performance and its implication for Sentinel-2-based prediction of invasive Prosopis juliflora in lower Awash River basin, Ethiopia. Ecol Process 10, 18 (2021). https://doi.org/10.1186/s13717-021-00285-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13717-021-00285-6

Keywords